We have a tool description wrapping a tool which can use multiple (defined) formats for both input and output files. We can configure the input format simply using:
inputs:
input_path:
label: Path to the input file
type: File
format:
- edam:format_1476
- edam:format_3816
and then, in the configuration yaml file, defining which of the two available file formats should be used:
For the output path we have tried to do similar. So we have created a input field for defining the output path (which is passed to the tool, so it can name the output file properly), again with two defined options for format:
inputs:
output_info:
label: Path and format for output
type: string
format:
- edam:format_1476
- edam:format_3816
inputBinding:
position: 2
prefix: --output_path
default: system.pdb
We then have tried to use this information to build the output arguments:
But we can’t then work out how to add the format field to a string object in the configuration yaml file.
Are we going about this in the correct manner? Can a string object have an associated format field, or do we need to pass the information in a different manner? We can read the input file format instead, but would like to keep the two formats independant. We could use a javascript parser to determine the format from the given file extension, but this feels back-to-front to me. Alternatively, is it possible to add an input argument to the tool descriptor, which isn’t passed to the tool itself, but can be used by the tool descriptor for controlling the output settings?
The above code I wrote didn’t check the value in the string. I’ve tried again, and this code does work:
output_path_format:
label: Format for output file
doc: |-
Format for the output file
Type: string
Accepted values:
- for pdb files:
https://edamontology.org/format_1476
- for mol2 files:
https://edamontology.org/format_3816
Default value: https://edamontology.org/format_1476 (for default system.pdb file)
type:
- string
- type: enum
symbols:
- edam:format_1476
- edam:format_3816
default: https://edamontology.org/format_1476
In the previous code I embedded the enum values one layer too deep - which caused them to not be used for checking that the string is one of the allowed strings (defined in the symbols). This code does cause the string to be checked properly.
If I don’t specify string as the type then I get this error:
workflow.cwl:13:3: Source 'step2_babel_minimize_output_format' of type "string" is incompatible
workflow.cwl:53:7: with sink 'output_path_format' of type {"type": "enum", "symbols":
["https://edamontology.org/format_1476",
"https://edamontology.org/format_3816"]}
Unless I can define this input as a symbol I think I have to stick with defining the type as string before then adding the enum list of acceptable values?
Ahh, yes - adding this code (with the extra type layer), does enable us to control the inputs at the workflow end of things, and it means that your suggested code for the tool descriptor works as expected:
I made a tester and I ran into some surprising cwltool behaviour, the format field isn’t getting the full EDAM URI, just the end part from the enumsymbol. @douglowe , are you seeing something like "format": "https://edamontology.org/format_3816" in the logs when you run your workflow?
Or just "format": "format_3816"?
I think that you’ll need to do the following: format: https://edamontology.org/$(inputs.output_path_format)
I have to (for cwl versions 3.0.2020… and 3.1.2021…) give the full URI in my input to the tool / workflow, in order to get the correct format string when I try your tool descriptor.
In 3.0.2020… trying to use format_3816 or edam:format_3816 causes a run-time error (because these aren’t matched against the full URI). In 3.1.2021… I get format_3816 as you do.