tate
February 4, 2024, 12:49am
5
@tetron thank you for your feedback on this. I’ve opened a PR as requested:
common-workflow-language:main
← AlexTate:on-error-abort
opened 12:41AM - 04 Feb 24 UTC
### Summary
This pull request introduces a the new choice `kill` to the `--on-e… rror` parameter.
### Motivation
There currently isn't a way to have cwltool immediately stop parallel jobs when one of them fails. One might expect `--on-error stop` to accomplish this, but the help string is specific and accurate: "do not submit any more steps". Since scatter and subworkflow are treated as single "steps" within the parent workflow, this means cwltool is not wrong to wait for the rest of the scatter jobs to finish when `--on-error stop`. However, sometimes individual scatter jobs take a long time to complete, so if one of them fails early on, cwltool might wait great lengths of time for the other scatter jobs to complete before terminating the workflow. With `--on-error kill`, all running jobs are quickly notified and self-terminate upon one job's failure.
### Forum Post
https://cwl.discourse.group/t/how-to-fail-fast-during-parallel-scatter/868
Regarding shallow copies of RuntimeContext, I did find that threading.Event
can’t be pickled (makes sense) so as long as there aren’t any plans to make cwltool multiprocess then we should be good
1 Like