Software & libraries to help with development and testing of CWL workflows

steve · December 13, 2022, 9:34pm

To aid development and debugging of our CWL workflows, I have been using this library that I built called pluto, located here;

The key features of this being primarily related to the test suite classes and usages;

PlutoTestCase class that provides a temporary directory to execute your CWL in, and a large amount of convenience methods for validating common workflow outputs (example usage here)
PlutoPreRunTestCase which is the same as the above but designed to allow for running a CWL pipeline once then making multiple test assertions in parallel off the cached result (example here)
OFile and ODir classes to represent CWL output objects in condensed Python code (examples here). Also has methods to convert CWL output back into this same Python code so you can run your CWL, save the JSON stdout, then pipe it back through this lib and get Python code needed to recreate it as a fixture in your test cases (with many fewer lines of code needed than the raw JSON representation).
allows for modulating execution methods from the command line via environment variables, for example to execute with cwltool or Toil, run locally or with LSF on the HPC, print the raw cwltool / Toil commands used, prevent tmp dir deletion, etc…

A full working example of all this can be found here;

github.com

mskcc/pluto-cwl/blob/71b8b5bcade424d818e0ff76e37240e628772416/tests/test_samples_fillout_index_batch_workflow_cwl.py

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Test case for the samples_fillout_index_batch_workflow cwl

example command:
$ CWL_ENGINE=Toil PRINT_COMMAND=T KEEP_TMP=T pytest -n 8 -s tests/test_samples_fillout_index_batch_workflow_cwl.py
"""
import os
import sys
from typing import Dict, Tuple
from datasets import (
    DATA_SETS,
)
from pluto import (
    CWLFile,
    PlutoTestCase,
    PlutoPreRunTestCase,
    OFile
)

This file has been truncated. show original

pluto is (currently) built on the base Python unittest framework but I have recently shifted over to executing it with pytest.

All this has been pretty important in ensuring that our pipelines do not drift or break during development, and that things like important edge-cases that we have programmed the workflow to handle do not get lost over time with new changes being made.

Since my own implementation here has grown pretty large over time, I have been keeping an eye out for alternative frameworks that handle some of this out-of-the-box. In particular, I have been looking at pytest-workflow; pytest-workflow — pytest-workflow 1.6.0 documentation

I like how pytest-workflow allows you to write your test cases as YAML, though I need to spend more time looking into it to determine how best to handle the dynamically-adjusted CLI settings that I end up using a lot. Also need to look into how I could get custom convenient file validation methods hooked in.

I am assuming that the CWL dev team already has some sort of system they use internally for the same purposes, not sure if its available for use outside of the team’s source repos? Would be great to figure out how everyone else is handling these things, to avoid too much duplication of work.

If pluto is useful to others, I am definitely interested in making it more generic and easy to use for others’ usages. I am also open to others using ideas and implementations from it in other frameworks.

Let me know what you guys think and what your preferred methods for handling CWL dev and testing look like.

mrc · December 14, 2022, 12:19pm

This is great! Maybe you can make an informal presentation at an upcoming call (perhaps in January)?
https://www.commonwl.org/community/#when (Mondays at 16:00 UTC)

Or you could make a more formal presentation at the upcoming CWL Conference 2023; proposal for 5,10, or 15 minute talks are due by January 16th: https://survey.bio-it.embl.de/313219?lang=en