To aid development and debugging of our CWL workflows, I have been using this library that I built called
pluto, located here;
The key features of this being primarily related to the test suite classes and usages;
PlutoTestCaseclass that provides a temporary directory to execute your CWL in, and a large amount of convenience methods for validating common workflow outputs (example usage here)
PlutoPreRunTestCasewhich is the same as the above but designed to allow for running a CWL pipeline once then making multiple test assertions in parallel off the cached result (example here)
ODirclasses to represent CWL output objects in condensed Python code (examples here). Also has methods to convert CWL output back into this same Python code so you can run your CWL, save the JSON stdout, then pipe it back through this lib and get Python code needed to recreate it as a fixture in your test cases (with many fewer lines of code needed than the raw JSON representation).
- allows for modulating execution methods from the command line via environment variables, for example to execute with cwltool or Toil, run locally or with LSF on the HPC, print the raw cwltool / Toil commands used, prevent tmp dir deletion, etc…
A full working example of all this can be found here;
pluto is (currently) built on the base Python
unittest framework but I have recently shifted over to executing it with
All this has been pretty important in ensuring that our pipelines do not drift or break during development, and that things like important edge-cases that we have programmed the workflow to handle do not get lost over time with new changes being made.
Since my own implementation here has grown pretty large over time, I have been keeping an eye out for alternative frameworks that handle some of this out-of-the-box. In particular, I have been looking at
pytest-workflow; pytest-workflow — pytest-workflow 1.6.0 documentation
I like how
pytest-workflow allows you to write your test cases as YAML, though I need to spend more time looking into it to determine how best to handle the dynamically-adjusted CLI settings that I end up using a lot. Also need to look into how I could get custom convenient file validation methods hooked in.
I am assuming that the CWL dev team already has some sort of system they use internally for the same purposes, not sure if its available for use outside of the team’s source repos? Would be great to figure out how everyone else is handling these things, to avoid too much duplication of work.
pluto is useful to others, I am definitely interested in making it more generic and easy to use for others’ usages. I am also open to others using ideas and implementations from it in other frameworks.
Let me know what you guys think and what your preferred methods for handling CWL dev and testing look like.