Moving bio-cwl-tools forward

As discussed on a recent call, the bio-cwl-tools repository provides an opportunity to:

  1. Collect tools that are practically useful (for bioinformaticians) and
  2. Demonstrate best practice in writing CWL tools

Unfortunately the repository is not without problems. As of this writing, the oldest open pull request dates from April, and only @mrc and (more recently) myself have engaged with PR authors. Engagement with the PRs also suggests that some of them are by authors relatively new to CWL and might require some effort to bring to “best practices standard”. These problems led me to this post, so I would like to consider two problems:

  1. How to get encourage contribution to the repository?
  2. How to ensure the quality of work being submitted?

Clearly the repository currently suffers from a lack of reviewers. Ideally members of the CWL community can be encouraged to review contributions, but this process is in turn hampered, perhaps by the lack of clear roles in the CWL community. I also know that I have held back on reviewing pull requests to bio-cwl-tools as I did not consider myself familiar enough with “best practice” for writing CWL tools. There is somewhat of a chicken-and-egg situation here in that while I did not consider myself experienced enough to contribute (and thus deferred work to visible experts, in particular @tetron and @mrc) I also was not clear on how to gain the necessary experience.

The Turing Way Guide for Collaboration has a discussion on “personas and pathways” that might be useful in helping the CWL community clarify to itself how it imagines roles and contributions. While there is a CONTRIBUTING document the role of a reviewer is not quite so well spelled out.

In parallel to the effort required to build a community of contributors and reviewers, care must be taken with quality of contributions. Here again the document offers some guidance, and the current CI system runs some validation tests. This ensures that all tools are valid. Further test infrastructure might be required to check if tools are working - but that would be a significant project to implement. Current the tests provided are more akin to “example inputs”. I did consider, for a while, trying to harness the Dockstore testing infrastructure but current Dockstore is pinned to a very old version of cwltool and is not at all easy to use to test CWL tools.

1 Like

BTW Github has some guidance on community engagement. To this end I have:

  1. Opened an issue on designing a pull request template
  2. Opened a pull request adding a link to the code of conduct.

Finally, to add to the hands needed to make bio-cwl-tools work: who can make changes to the bio-cwl-tools repository and how is that list updated? This is perhaps a question for the CommonWL leadership team?