Using schema.org attributes to define compatible workflow engines

alexiswl · October 2, 2023, 11:16pm

Hello,

I am writing a suite of workflows in our cwl-ica repository.

These workflows were written for ICAv1 (which is being deprecated). We are moving to using ICAv2 along with testing out other CWL compatible workflow engines such as Amazon Omics etc.

Some workflows will be compatible only with ICAv2, and I assume some in future will only be compatible with Omics etc.

Rather than create separate repositories for each workflow engine, I would like to know if there are any official schemas to use to be able to specify which workflow engines are appropriate for a given workflow.

# Extensions
$namespaces:
    s: https://schema.org/
$schemas:
  - https://schema.org/version/latest/schemaorg-current-http.rdf

# Metadata
s:compatibleWorkflowEngines:
  - cwltool.local
  - https://ica.illumina.com/ica/rest
  - omics
  - toil

For dragen-specific workflows, https://ica.illumina.com/ica/rest may be the only compatible workflow engine.

Our catalogue (see example) would then be able to denote which engines are compatible by scraping the workflow.

I would not want anyone to come across our workflows on the likes of GitHub or Dockstore only to realise after testing that the workflow is not compatible with their setup.

Is there a best-practise for documenting this?

Would be keen to hear any other solutions / workarounds people have come across as well.

tetron · December 7, 2023, 4:03pm

A few thoughts:

schema.org is a specific ontology, so you shouldn’t just make up terms. But you can make up your own prefix like this:

$namespaces:
  umccr: https://mdhs.unimelb.edu.au/cwl/

Instead of “compatible workflow engines” you would recommend saying calling it something like “tested workflow engines”. I’d also recommend using full URLs that are meaningful to other people:

umccr:testedWorkflowEngines:
  - https://github.com/common-workflow-language/cwltool
  - https://ica.illumina.com/ica/rest
  - https://github.com/DataBiosphere/toil
  - omics  # IDK exactly which service this refers to, which is why full URLs are good

While it makes practical sense to list which workflow engines have been tested, CWL workflows ought to be portable unless (a) the engine doesn’t support a standard CWL feature used by the workflow or (b) the workflow has special non-standard requirements (e.g. FPGA support).

In the second case, you should be able to infer from the requirements section which workflows need that special support and you can compare that with a list of which engines support that special requirement.

alexiswl · December 8, 2023, 6:04am

This isn’t always possible if the requirements are defined at the tool level.

The new namespace makes sense though, does https://mdhs.unimelb.edu.au/cwl/ have to be a valid site? Could this point to a file in GitHub if so?

alexiswl · December 8, 2023, 6:05am

Omics - linked here Amazon Omics now supports Common Workflow Language

mrc · December 8, 2023, 8:42am

Stable identifier for some of the engines mentioned

1 cwltool is RRID:SCR_015528 a.k.a. https://identifiers.org/RRID/RRID:SCR_015528
2. Toil is RRID:SCR_024391 a.k.a https://identifiers.org/RRID/RRID:SCR_024391
3. Arvados is RRID:SCR_002223 a.k.a. https://identifiers.org/RRID/RRID:SCR_002223
4. CWL-Airflow is RRID:SCR_017196 a.k.a. https://identifiers.org/RRID/RRID:SCR_017196
5. Galaxy is RRID:SCR_006281 a.k.a. https://identifiers.org/RRID/RRID:SCR_006281

tetron · December 8, 2023, 10:38pm

No, it doesn’t have to be real. However, it is nice if it does load a page that describes the item being identified.

Do we link to these anywhere on commonwl.org? Feels like something that could be mentioned on Implementations | Common Workflow Language (CWL)
Regrettably this has a similar problem to the EDAM ontology (and many ontologies) where the numeric identifiers are extremely user-unfriendly. Maybe we should have a way to define URI aliases in CWL?

It’s possible something like this already works, I haven’t tried it though:

$namespaces:
  umccr: https://mdhs.unimelb.edu.au/cwl/
  cwltool_engine: https://identifiers.org/RRID/RRID:SCR_015528

umccr:testedWorkflowEngines:
  - "cwltool_engine:"

Topic		Replies	Views
JSON-Schemas for validating your CWL Code and CWL Code inputs CWLCon 2024	2	270	May 9, 2024
Specifying custom hints for specific workflow engines CWL Questions	5	420	December 13, 2023
A Way to Know How Well an Engine Supports Features in CWL Spec (Check the Conformance Badges!) CWLcon 2021	0	634	February 1, 2021
Representing and implementing "foreign" workflows CWL Questions	2	476	June 30, 2022
CWL 1.2 released Announcements	0	908	August 10, 2020

Using schema.org attributes to define compatible workflow engines

Related topics