I need to pass the initial dir as input to a following step of the workflow and it to be able to find all files inside its subdirectories. I’ve tried this:
Tool definition failed validation:
GermlineCNVCaller-scattered-workflow-0.cwl:41:11: Source ‘files’ of type {“items”: {“type”: “array”, “items”: “File”}, “type”: “array”} is incompatible
GermlineCNVCaller-scattered-workflow-0.cwl:52:7: with sink ‘my_file’ of type {“type”: “array”, “items”: “File”}
Cause it recognizes is as a nested array. How can I workaround it?
Dear CWL community,
can someone kindly pick up on this?
I’m trying to achieve something very similar, but so far failed to find a solution – likely due to a lack in knowledge of CWL and JavaScript.
Just to rephrase: I’d like to take a directory as input (step 0), pass the subdirectories of that directory to step 1, which passes the files of the subdirectories to step 2, which runs a CommandLineTool on an array of files.
I’ve tried to upgrade the above example to CWL v1.2, by also adding the LoadListingRequirement to zeroth_step and first_step.
That is very helpful @Brilator , thank you for the specifics.
Do you have to start with a directory? CWL was designed for specific inputs and as you can see, not really designed for teasing apart complex directories (though it is possible).
I recommend making your workflow use specific inputs and adding the directory parsing later (if you still need that).
For example, make the workflow for a single sample. Then add a scatter for multiple samples, switching some or all of the inputs to arrays. Then add the Directory parsing “step 0” if you really need it.
Ok, I see. Well, I don’t have to start with the parent directory. I was just hoping to be able to cover the full analysis via CWL in its context together with the input data as-is.
The alternative would be to write bash for loops to go through the nested directories (which again I could wrap as a CWL workflow step).