Parallel execution of tasks. Is there a limit?

I have a workflow that has several tasks that can run concurrently.

S1 → S2 → C1/C2/C3 → S3 → C3/C5/C6 → S4
Cx = concurrent task
Px= parallel task

I have noticed that when I execute this workflow only two of the concurrent tasks are executed at the same time, although there does ot seem to be a preference on which task will be executed. e.g. I can get only P1 & P2, or P1 & P3 or P2 & P3.
When one of these concurrent tasks finish another starts.

Is there a limit on the concurent jobs that cwltool can start and monitor?

cwltool version: 3.1.20230601100705


Welcome @Konstantinos_Mavromm and thanks for sharing your issue,

The only limiting that cwltool --parallel will do is based upon the number of available CPU cores and memory requested by your tools. Otherwise the limit is just based upon the structure of the workflow; any job whose inputs are available and doesn’t require more memory & CPU cores than available should start executing.

P.S. Please consider upgrading cwltool; the latest version is 3.1.20240112164112

Thanks for your response.

I would like to elaborate an bit on your response.

Does cwltool check for the available number of threads/cores and uses that to determine how many concurrent tasks it can monitor ? e.g. imagine a workflow that has 10 jobs that can run in parallel, each requesting 1GB of RAM and 1 core, on an instance of 16 cores and 16GB RAM will it be able to simultaneously run all 10 (which fit in that instance based on their requirements for ramMin and cores)? or will it try to allocate cores and memory for itself for the purpose of monitoring which will limit the available resources for the tasks?

In my case I am running cwltool in a container that has 4 cores at its disposal, but the host has 128 total. I was wondering if that poses a limit to cwltool itself and not the actual tasks.
As a side note, i have tried to use cwltes (which internally uses cwltool) and submit jobs to a cluster, with identical results - although cwltes may be out of scope for this thread.

Yes, this is exactly what it does. See

No, cwltool does not allocate memory nor cores on behalf of the tools it executes.

Okay, running cwltool inside a container is probably the cause of your issue. It would be difficult for it to know the utilization level of the host. Not to mention the fragility of launching containers from within a container and getting all the paths to work.

For submitting jobs to a cluster, check out toil-cwl-runner