I was wondering how cwltool determines how many cores it has available to distribute to jobs. The only bit to go on that I could find was in cwltool/cwltool/executors.py, where I see a reference to psutil.cpu_count(), but I’m not sure if that sets this usage.
The reason I am asking is that when I submit a Slurm job requesting 1 core, which runs that psutil function, to one of our nodes with 128 cores it returns 128 which is undesirable.
So that’s my main question: if I execute cwltool on a node where Slurm has allocated less cores than the machine has, is cwltool aware of that, or is there any way to check how many cores cwltool thinks it can distribute?
Thanks or the suggestion. We are indeed looking into toil already. One reason for us wanting to limit this with cwltool, besides being able to respect Slurm’s allocation, is that some main steps are rather I/O heavy, but don’t require very many cores. On systems with high core counts, but where we don’t use Slurm, this could then overwhelm our disks, so a “pretend you have 32 cores” system would be useful (I’m also not sure if toil can do that on e.g. singleMachine mode?).
Is there perhaps another way to limit concurrent jobs aside from ad-hoc modification of resource requirements that I simply missed?
You are correct, it gets the core count from psutils.cpu_count(). A cgroups-aware version of that method would certainly be welcome.
The select_resource() method in executors.py uses min(coresMax, max_cores) where max_cores is determined by psutils.cpu_count() to determine the number of cores.
But it sounds like what you really want is a way to limit how many I/O intensive processes are launched at once, because they only use 1 core but you don’t want 128 of them at once. CWL doesn’t currently have a good way to way to do that.
I’ve encountered a similar issue with workflows that download from an external resource, where sending too many requests overwhelms the resource (or you get throttled because you’re asking for too much at once), so I definitely see the need.
I have a couple of ideas for CWL features that would here here. One would be an option specifically for scattering that limits the number of parallel scatter steps. Another idea is for generic resources that act like semaphores, where a workflow step can only start once it is able able to acquire a resource.
There is psutil.Process().cpu_affinity() which returns a list of available CPUs. I am not sure if it’s a completely clean way however, as while on my laptop it seems to ignore hyper threads and respect things like taskset on other machines, on this particular cluster it seems to include the hyper threads regardless as the list is six long when requesting 3 cores, for example. I don’t know if that is specific to this cluster though.
But it sounds like what you really want is a way to limit how many I/O intensive processes are launched at once, because they only use 1 core but you don’t want 128 of them at once. CWL doesn’t currently have a good way to way to do that.
The processes can scale with cores, to a certain extent, but too many of them running simultaneously slows it down beyond the gain of running them in parallel. So indeed as you say it’s more that we are looking for a way to limit the number of simultaneous jobs.
One would be an option specifically for scattering that limits the number of parallel scatter steps.
I like the sound of such an option! That sounds like it could give easy and fine-grained control without interfering much with the default behaviour.