A colleague was trying to import and use RNA-seq CWL pipelines from github as an exercise in understanding non-SB created CWL content. She found issues in almost every pipeline she imported, some during import, some during running. e.g. In one case she discovered a subprocess that had a required input that was not exposed. In another case she found a pipeline from a reputable center that was just malformed (the YAML was wrong).
On one hand CWL is code like any other, and so we can’t be responsible for the quality of all code out there, just like there can be non-functional Python code on github.
On the other it would be important for users to have a good out of the box experience with CWL.
To this end we should think about
- Encouraging developers to adopt some good development practices, such as CI, to raise the quality of their pipelines. Offer them badges for say
cwltool --validateor other tests.
- Offer users some online service to check the validity of a workflow on github, perhaps running
cwltool --validateon CWL passed via URL.
- Raise awareness with research centers as to the importance of the research code, including CWL, that they release publicly.
Just wanted to restart this discussion.