Hi, CWL community! I’m writing you because I would like to write a non-standardized extension to CWL, in order to represent the concept of foreign workflow. So, I would like to ask you some questions and some advice.
I’m the main developer of WfExS-backend, which is a tool which is focused on the preparation of secure and reproducible workflow execution environments, so it delegates the execution on existing workflow engines. One of its missions is avoid at almost any cost to become an additional workflow engine. The system currently supports both CWL and Nextflow, in the way it is delegating the execution of workflows to the corresponding engine in the staged working directory. But now we want to be able to model and run a set of causally related workflows, in order to catch more complex scenarios, like preparation of input datasets for the main workflow or postprocessing the results of the main workflow. In this scenario some outputs of one workflow are going to feed others, and it might be needed to iterate over the inputs, etc…
So, we were thinking about how to tackle this, and suddenly we thought we could reuse in some way an existing workflow engine. And thinking about it I started wondering about CWL when I remembered it is possible to use almost any ontology to add annotations to CWL, like the workflows from SevenBridges with some foreign annotations. Thinking about the best way to do that, my conclusion has been to either:
Using some hints on tool concept (inspired in the way it can be described any DockerRequirements, for instance)
Hints on workflow concept in the same way it is done for tools.
Or a foreign workflow concept which should be a sibling of workflow concept, to represent workflows in other languages.
We guess “half” of the work is how to represent that in CWL, but the other half is how to write and plug an extension for an existing engine (for instance cwltool), or how to reuse the already existing code. I had a look to the cwltool implementation in some points, but it seems complicated to do it in the right way.
What could you advice me about this? How should I proceed?
But now we want to be able to model and run a set of causally related workflows, in order to catch more complex scenarios, like preparation of input datasets for the main workflow or postprocessing the results of the main workflow. In this scenario some outputs of one workflow are going to feed others, and it might be needed to iterate over the inputs, etc…
Maybe we should explore more your use case, before we start designing a new extension to cwltool or to some other CWL runner.
I had a quick look at WfExS, and if I understand correctly from your description, you want to cover a case where you have a workflow, that could be written in CWL for instance, but that has some dependency to workflows that are written in CWL or Nextflow (I believe only these two are supported at the moment in WfExS)?
Are the workflows completely independent? Or do you expect steps in the workflows to share data somehow?
Did you look at subworkflows/nested workflows in cwltool? Maybe we could test having nested workflows executing/orchestrating WfExS?
I think if we do a few iterations detailing more your use case we may figure out either a way to add something to cwltool, or maybe that it would be best to do it without cwltool or CWL.
WfExS-backend looks very interesting, and to me it seems to have some similarities with Sapporo Service. Both use GA4GH standards for interoperability, and allow users to execute workflows written in different languages - and both are not workflow engines. The Sapporo devs are also integrating it with Yevis, a workflow registry that provides a TRS URL (e.g) for workflows. I guess that means WfExS should work with Yevis too? I will mention WfExS in our next APAC meeting this week
Yes, but in a broader way. WfExS-backend is being designed in a modular way, so supporting additional workflow languages or engines (like Snakemake, WDL or Galaxy), or software containerisation should not be difficult.
Complex analysis which are not currently automated from end to end could depend on workflows written in different workflow languages (in the best case), and their coordination will require some “glue” code or workflow. Another scenario is where a workflow or tool requires pre-processing reference datasets to prepare a complex dataset layout for it are usually set up beforehand. In both cases, most of the time the glue workflow or set up is either manual work or through simple scripts which can ben easily forgotten or lost when its creator leaves her/his position.
So, as CWL was initially born to represent complex workflows, why do not model the concept of an external workflow? An abstract representation of a subworkflow in a foreign workflow language would be very similar to the abstract representation of a CWL subworkflow. The only point is that a CWL engine will not be able to validate the innards of the foreign subworkflow, as there is no guarantee that an abstract CWL representation can be derived from the foreign subworkflow.
Yes, I know about them, we have a couple of CWL workflows in our lab which have subworkflows (and a couple more which are “borrowed”). We do not want to explicitly put a step calling WfExS-backend, because then the concept of foreign workflow is lost. We want a separation from the foreign subworkflow representation, and the tooling to be used to run the foreign subworkflows (which should be “hookable”, in case more than one engine exist).
It is worth the test. In the worst case, with some tweaks it will work.
BTW @brunokinoshita we are having a look at the pull request related to the issues you have uncovered around Python 3.9 as soon as possible. Due several coincidences, we have to focus over several project milestones these days. And, obviously, all the feedback is more than welcome!