Set tool output format string from workflow inputs

We have a tool description wrapping a tool which can use multiple (defined) formats for both input and output files. We can configure the input format simply using:

inputs:
  input_path:
    label: Path to the input file
    type: File
    format:
    - edam:format_1476
    - edam:format_3816

and then, in the configuration yaml file, defining which of the two available file formats should be used:

input_path:
  class: File
  path: ligand.pdb
  format: https://edamontology.org/format_1476

For the output path we have tried to do similar. So we have created a input field for defining the output path (which is passed to the tool, so it can name the output file properly), again with two defined options for format:

inputs:
  output_info:
    label: Path and format for output
    type: string
    format:
    - edam:format_1476
    - edam:format_3816
    inputBinding:
      position: 2
      prefix: --output_path
    default: system.pdb

We then have tried to use this information to build the output arguments:

outputs:
  output_path:
    type: File
    outputBinding:
      glob: $(inputs.output_path)
    format: $(inputs.output_path.format)

But we can’t then work out how to add the format field to a string object in the configuration yaml file.

Are we going about this in the correct manner? Can a string object have an associated format field, or do we need to pass the information in a different manner? We can read the input file format instead, but would like to keep the two formats independant. We could use a javascript parser to determine the format from the given file extension, but this feels back-to-front to me. Alternatively, is it possible to add an input argument to the tool descriptor, which isn’t passed to the tool itself, but can be used by the tool descriptor for controlling the output settings?

Hello @douglowe !

format is only for type: File, and can’t be used with type: string

If you want to restrict the possible output formats, then try an enum

inputs:
  output_name:
    label: Name of the output file
    type: string
    inputBinding:
      position: 2
      prefix: --output_path
    default: system.pdb
  output_format:
   label: Format of the output file
   type: enum
   symbols:
    - edam:format_1476
    - edam:format_3816

and

outputs:
   output_pdb:
      type: File
      outputBinding:
         glob: $(inputs.output_name)
      format: $(inputs.output_format)

@mrc mr - that looks to be exactly what we need - thanks!

1 Like

This example didn’t quite work, but it led me to a working example (with the help of examples such as Input of array of enum is not recognized by the cwltool command line input object feature · Issue #576 · common-workflow-language/cwltool · GitHub):

  output_path_format:
    label: Format for output file
    doc: |-
      Format for the output file
      Type: string
      Accepted values:
        - for pdb files:
          https://edamontology.org/format_1476
        - for mol2 files:
          https://edamontology.org/format_3816
      Default value: https://edamontology.org/format_1476 (for default system.pdb file)
    type:
      - string
      - type: array
        items:
          type: enum
          symbols:
            - edam:format_1476
            - edam:format_3816
    default: https://edamontology.org/format_1476

I’ve included a default value - but found that cwltool doesn’t seem to use the namespace for expanding this, so have had to enter it explicitly.

$namespaces:
  edam: https://edamontology.org/

The above code I wrote didn’t check the value in the string. I’ve tried again, and this code does work:

  output_path_format:
    label: Format for output file
    doc: |-
      Format for the output file
      Type: string
      Accepted values:
        - for pdb files:
          https://edamontology.org/format_1476
        - for mol2 files:
          https://edamontology.org/format_3816
      Default value: https://edamontology.org/format_1476 (for default system.pdb file)
    type: 
      - string
      - type: enum
        symbols:
        - edam:format_1476
        - edam:format_3816
    default: https://edamontology.org/format_1476

In the previous code I embedded the enum values one layer too deep - which caused them to not be used for checking that the string is one of the allowed strings (defined in the symbols). This code does cause the string to be checked properly.

1 Like

Is this really needed? Ideally it would be

 type: 
    type: enum
    symbols:
       - edam:format_1476
       - edam:format_3816

If I don’t specify string as the type then I get this error:

workflow.cwl:13:3: Source 'step2_babel_minimize_output_format' of type "string" is incompatible
workflow.cwl:53:7:   with sink 'output_path_format' of type {"type": "enum", "symbols":
                     ["https://edamontology.org/format_1476",
                     "https://edamontology.org/format_3816"]}

My configuration file contains this code:

step2_babel_minimize_output_format: https://edamontology.org/format_3816

And the workflow code for defining this input is:

inputs:
  step2_babel_minimize_output_format: string

Unless I can define this input as a symbol I think I have to stick with defining the type as string before then adding the enum list of acceptable values?

Ah, interesting! Your workflow input step2_babel_minimize_output_format could be set with a matching enum type, yes.

inputs:
  step2_babel_minimize_output_format:
    type: enum
    symbols:
       - edam:format_1476
       - edam:format_3816

Which would improve type checking of your workflow inputs, so I think it is worth the effort.

Ahh, yes - adding this code (with the extra type layer), does enable us to control the inputs at the workflow end of things, and it means that your suggested code for the tool descriptor works as expected:

inputs:
  step2_babel_minimize_output_format:
    type:
      type: enum
      symbols:
        - edam:format_1476
        - edam:format_3816
1 Like

I made a tester and I ran into some surprising cwltool behaviour, the format field isn’t getting the full EDAM URI, just the end part from the enum symbol. @douglowe , are you seeing something like "format": "https://edamontology.org/format_3816" in the logs when you run your workflow?

Or just "format": "format_3816"?

I think that you’ll need to do the following: format: https://edamontology.org/$(inputs.output_path_format)

I have to (for cwl versions 3.0.2020… and 3.1.2021…) give the full URI in my input to the tool / workflow, in order to get the correct format string when I try your tool descriptor.

In 3.0.2020… trying to use format_3816 or edam:format_3816 causes a run-time error (because these aren’t matched against the full URI). In 3.1.2021… I get format_3816 as you do.

Ahh, so we perhaps could require the user to only provide the format_3816 string, and then build the URI only in the format field that is returned.