Packaging a CWL workflow as a tool

I have a CWL workflow that I would like to use as a tool (and deploy within Galaxy). How should I package this tool? (BTW I am happy for the tool to run on a single node) Options that I have considered are:

conda: Make a conda build script that installs all the tool dependencies (SoftwareRequirements and cwltool itself), copies the CWL to $PREFIX/share and runs via a wrapper script that calls cwltool.

Dockstore: Make a container (perhaps based on the conda above) and package together with the workflow. The Dockstore documentation talks about wrapping CWL CommandLineTools, not workflows though.

WorkflowHub.EU: While one can download a RO Crate from this service it is not clear how to use this for executable tool or workflow distribution.

1 Like

For a self-contained tool that will run inside of Galaxy, your conda approach is the way to go. Dockstore and WorkflowHub both require a platform with native CWL support. Hopefully Galaxy CWL support will land in mainline one day, but to run in Galaxy today I believe you will need a lightweight Galaxy tool wrapper to invoke your workflow. The Galaxy wrapper could consist of calling cwltool with your workflow, or you can make the workflow executable:

  • install cwlref-runner (this simply installs cwl-runner as an alias for cwltool)
  • put #!/usr/bin/env cwl-runner at the top of the workflow
  • set +x on the workflow file
  • now you can invoke the workflow directly

Workflow Hub do not require CWL - in fact Workflow Hub is not aiming itself to do any execution, although it will where possible link to possible execution platforms.

For instance we are in EOSC-Life trying to set up automated testing of workflows where the Life Monitor will consume the RO-Crate and look for tests to execute. We have not formalized yet how those are specified, as they should work the same for non-CWL workflows - see discussion on using intermediate JSON for now.

Given an RO-Crate with many files you can find the “main” workflow by looking for the mainEntity property, e.g. from https://dev.workflowhub.eu/workflows/132/ro_crate?version=1 (a Galaxy workflow)

{
  "@context": "https://w3id.org/ro/crate/1.1/context",
  "@graph": [
    {
      "@id": "./",
      "@type": "Dataset",
      "name": "1 - read pre-processing",
      "url": "https://github.com/galaxyproject/SARS-CoV-2/blob/e9cc477fc1a754eaa85d061fb28607eee3ba6d06/genomics/deploy/workflows/1-PreProcessing.ga",
      "mainEntity": {
        "@id": "1-PreProcessing.ga"
      },
      "hasPart": [
        {
          "@id": "1-PreProcessing.ga"
        },
        {
          "@id": "pp_wf.png"
        },
        {
          "@id": "all_covid_tools.yaml"
        },
        {
          "@id": "README.md"
        }
      ]
    },
    {
      "@id": "1-PreProcessing.ga",
      "@type": [
        "File",
        "SoftwareSourceCode",
        "Workflow"
      ],
      "programmingLanguage": {
        "@id": "#galaxy"
      },
    {
      "@id": "#galaxy",
      "@type": "ComputerLanguage",
      "name": "Galaxy",
      "identifier": {
        "@id": "https://galaxyproject.org/"
      },
      "url": {
        "@id": "https://galaxyproject.org/"
      }
    }

This at the moment is just a convention for when uploading from a GitHub repository - note that at the moment the Workflow Hub UI when given a single CWL file will not recursively add its file dependencies (e.g. neighbouring *.cwl tools) to the RO-Crate - thus native CWL registrations should primarily be done from GitHub.

CWL can also be used in Workflow Hub to provide the abstract workflow of a non-CWL workflow, using CWL 1.2 Operation type - in that case it’s a single sibling file to the native workflow.

Of course one additional challenge here is identifying the workflow languages programmatically - CWL luckily have permalinks like https://w3id.org/cwl/v1.1/

Another is that you will need some set of inputs (specially for automated testing) - like a job file. With cwltool you can actually make job files executable as well:

curl-get-many-job.yml:

#!/usr/bin/env cwltool
cwl:tool: ../../tools/curl-get-many.cwl
urls: 
 - https://example.com/f1
 - https://example.org/f2
 - https://example.net/f3

However I don’t think that cwl:tool trick works with other cwlref-runner implementations so usually you need to pair the workflow engine, the workflow definition and the workflow inputs, and potentially particular engine options. There is a dangerous possibility that this could also be described as an outer cwl tool that executes the CWL engine…!

(Much simpler: You could instead make a outer CWL workflow that just binds all the inputs)

For your run-CWL-from-Galaxy you could also consider using cwltool --pack first to make the workflow project into a single CWL file (which arguably is no longer human editable) - otherwise you will need to preserve the original structure with its tool imports etc.