How to specify that an input file can be one of several formats?

I am writing some CWL wrappers for bioinformatics tools, and I’ve run into a situation that I don’t know how to represent in CWL. Some tools accept input files in one of several formats. One common example is a tool that accepts a file of aligned reads in either BAM, SAM, or CRAM format. Is there a way to represent this faithfully in CWL?

I assume you have seen the file formats lesson in the user guide. Your input parameter definition can take an array of file formats to accept in the format field. If you want indexes in secondaryFiles you will need to list the name pattern each index type and mark them as not required.

Something like:

cwlVersion: v1.1
inputs:
  reads:
    type: File
    format: [SAM, BAM, CRAM]   # not real symbols for formats, use edam ontology
    secondaryFiles:
      - pattern: .bai
        required: false
      - pattern: .crai
        required: false

If you are creating CWL descriptions for bioinformatics tools, be sure to check out https://github.com/common-workflow-library/bio-cwl-tools to save yourself some time. Contributions of new descriptions are also very welcome!

Thanks! This is what I was looking for.

1 Like