I assume you provided a directory named output as input to this command line tool? If so, then InitialWorkDirRequirement behaved as expected: An empty directory with the name output was created in the working directory and later collected as tool output in the “split_samplesheets”-array. That being said, i’m not sure you need InitialWorkDirRequirement at all for this wrapper.
The samplesheet-check.py script writes all output in the current directory and doesn’t care about the output-subdirectory you staged. Can you provide a command line argument specifying the output directory to this script?
If the script will not accept a parameter for the output directory, then a cwl-based solution might look like:
This definition produces two split_samplesheets in the current working directory.
I was wondering if there is a way to instead write the output to the input file samplesheet's directory i.e. /data/bcl/ in this case?
ERROR Tool definition failed validation:
tools/sampleSheetCheck.cwl:3:1: Object `tools/sampleSheetCheck.cwl` is not valid because
tried `CommandLineTool` but
tools/sampleSheetCheck.cwl:32:1: the `outputs` field is not valid because
tools/sampleSheetCheck.cwl:36:3: item is invalid because
tools/sampleSheetCheck.cwl:37:5: * the `type` field is not valid because
- tried CommandOutputRecordSchema but
tools/sampleSheetCheck.cwl:38:7: * the `type` field is not valid
because
the value 'Directory' is not a valid
Record_symbol, expected 'record'
tools/sampleSheetCheck.cwl:39:7: * invalid field `items`, expected
one of: 'fields', 'type', 'label'
tools/sampleSheetCheck.cwl:37:5: - tried CommandOutputEnumSchema but
* missing required field `symbols`
tools/sampleSheetCheck.cwl:38:7: * the `type` field is not valid
because
the value 'Directory' is not a valid
Enum_symbol, expected 'enum'
tools/sampleSheetCheck.cwl:39:7: * invalid field `items`, expected
one of: 'symbols', 'type', 'label', 'outputBinding'
tools/sampleSheetCheck.cwl:37:5: - tried CommandOutputArraySchema but
tools/sampleSheetCheck.cwl:38:7: the `type` field is not valid
because
the value 'Directory' is not a valid
Array_symbol, expected 'array'
tools/sampleSheetCheck.cwl:42:5: * invalid field `outputEval`, expected one of:
'label', 'secondaryFiles', 'streamable', 'doc', 'id',
'outputBinding', 'format', 'type'
Also, please see the response to Peter for an update on the tool definition.
Regarding the outputBinding for split_samplesheet: glob: . instructs cwl to collect the current working directory as output. This should contain all the data that the python script has written. The working directory gets created by your cwl-runner at runtime, and will have a random collection of characters as a name. This is why we use outputEval to change the basename-property of the directory to a name of our choosing (in this case: “output”).
What you cannot do is instruct the CommandLineTool to create output at any specific location, it is constrained to the temporary directory used at runtime. For this purpose, use the corresponding argument of your cwl-runner. For cwltool this would be --outdir myDirectory
Thank you so much Tom for the explanation.
It definitely makes better sense now.
Just one question, if the script produces file and directories (with more files) as output and I am interested in capturing everything from the tool output. Would glob: . suffice for such case scenario as well? Or in this case glob: "*" makes more sense? I am not entirely sure the difference between both.
The difference between glob: “*” and glob “.” is the first one will return an array with all the File and/or Directory objects from the output directory. The second one will produce a single Directory object with all the File and/or Directory objects in the listing.
Another way of thinking of it is the contents of the “listing” of the Directory from glob: “.” is the same as glob: “*”.