Hello,
I am working on a tool definition that requires specifying an input parameter for the outputs location.
I have managed to write the following tool definition, that works as expected when testing with cwltool.
cwlVersion: v1.0
class: CommandLineTool
hints:
DockerRequirement:
dockerPull: umccr/alpine_pandas:1.0.1
requirements:
InitialWorkDirRequirement:
listing:
- entry: $(inputs.outputDir)
writable: true
baseCommand: []
inputs:
script:
type: File
inputBinding:
position: 0
samplesheet:
type: File
inputBinding:
position: 1
prefix: -s
metadata:
type: File
inputBinding:
position: 2
prefix: -t
outputDir:
type: Directory
inputBinding:
position: 4
prefix: -o
outputs:
splitSheets:
type:
type: array
items: [File, Directory]
outputBinding:
glob: "*"
I am having an issue when trying to port this to an environment that does not allow writing to input files/directories. So, trying to find a workaround for this issue. I’m trying an alternate approach where the outputDir is passed as a string, i.e.
.
.
outputDir:
type: string
inputBinding:
position: 4
prefix: -o
outputs:
splitSheets:
type:
type: array
items: [File, Directory]
outputBinding:
glob: "$(inputs.outputDir)"
But that produces the following error:
("Error collecting output for parameter 'splitSheets':\nsamplesheetPrep2.cwl:41:7: glob patterns must not start with '/'", {})
I have also tried specifying inputs.outputDir as a writable entry under InitialWorkDirRequirement and then specifying glob as
outputs:
splitSheets:
type:
type: array
items: [File, Directory]
outputBinding:
glob: "*"
This produces following error OSError: [Errno 30] Read-only file system: '/Users'
Wondering if there could be any other solution to this issue?
Any help will be appreciated.
Cheers,
Sehrish
Hello,
If I understood the interface of your tool correctly, you just have to pass the input name as a string and then in the output glob that directory. So it will look like
cwlVersion: v1.0
class: CommandLineTool
hints:
DockerRequirement:
dockerPull: umccr/alpine_pandas:1.0.1
# Pass the working directory name here to the command line
baseCommand: []
inputs:
script:
type: File
inputBinding:
position: 0
samplesheet:
type: File
inputBinding:
position: 1
prefix: -s
metadata:
type: File
inputBinding:
position: 2
prefix: -t
outputDir:
type: string
inputBinding:
position: 4
prefix: -o
outputs:
splitSheets:
type: Directory
outputBinding:
glob: "$(inputs.outputDir)"
Hi Kaushik,
Thanks for the response. I tried capturing output explicitly as a directory.
It still throws the glob error ("Error collecting output for parameter 'splitSheets':\nsamplesheetPrep2.cwl:43:7: glob patterns must not start with '/'", {})
I think it does not like specifying outputDir as a string path that is starting with /Users/....
Can the program take a relative path. Typically I enter a relative path, something like output_dir
I was still getting the following error when specifying a relative path for outputDir.
("Error collecting output for parameter 'splitSheets':\nsamplesheetPrep2.cwl:43:7: Did not find output file with glob pattern: '['cwl']'", {})
I had to update requirements as following to make it work - thanks to Michael F. for the hints
InitialWorkDirRequirement:
listing:
- '$({class: "Directory", basename: inputs.outputdir, listing: []})'
Hope this makes sense to others as well.
Thank you.
2 Likes
Sorry, I missed this bit - what did you mean by adding the working dir name before the base command?
That was a reminder to pass the string to the command line, and that is done via the inputBinding.
@skanwal Glad your problem was solved.
For completeness, here is an example where the command line program creates a directory and the directory is globbed as an output
outdir.cwl
cwlVersion: v1.0
class: CommandLineTool
hints:
DockerRequirement:
dockerPull: python:alpine3.11
baseCommand: [python]
arguments: [$(inputs.script.path), $(inputs.outdir)]
inputs:
script: File
outdir: string
outputs:
mydir:
type: Directory
outputBinding:
glob: "$(inputs.outdir)"
Where the input script is script.py
import pathlib
import sys
out = sys.argv[1]
pathlib.Path(out).mkdir()
This can be run as
cwltool outdir.cwl --script script.py --outdir hello
Hi Kaushik,
Thanks for the example, it makes sense.
I believe to capture the contents in the output directory, it’s necessary to pass InitialWorkDirRequirement with appropriate listing array?
I’ll give some context to the solution, inside the InitialWorkDirRequirement, you return listing value which can be an array<File | Directory | Dirent | string | Expression> | string | Expression:
May be an expression. If so, the expression return value must validate as {type: array, items: [File, Directory]}
Hence through the expression we can return a valid Directory object with attributes:
class: “Directory” (duh),
basename: Name of the folder (no leading slashes or anything)
listing: empty list ([]), but you could return anything in here if you wanted the directory to have something in here.
Nb: wrap the expression in a string for it to be valid yaml.
This gives the tool:
class: CommandLineTool
baseCommand: ls
requirements:
InitialWorkDirRequirement:
listing:
- '$({class: "Directory", basename: inputs.outputdir, listing: []})'
inputs:
outputdir: string
outputs:
outdir:
type: Directory
outputBinding:
glob: $(inputs.outputdir)
outls: stdout
Hi Michael,
Thanks for adding to the discussion.
listing : empty list ( [] ), but you could return anything in here if you wanted the directory to have something in here.
Just another point, I was able to return files in the directory by using an empty list i.e. listing: [].
Hi @skanwal, the InitialWorkDirRequirement is to stage or create files/directories during the job setup phase. It is separate from the output gathering stage. The example I provided does not have an InitialWorkDirRequirement because I don’t want to stage any inputs, merely gather the output directory, which the tool is creating.
Thanks a lot Kaushik. That makes absolute sense.
I am also facing the same error.
INFO [job varscan4.cwl] Max memory used: 230MiB
ERROR [job varscan4.cwl] Job error:
(“Error collecting output for parameter ‘vcf’: varscan4.cwl:39:7: Did not find output file with glob pattern: [‘output.vcf’].”, {})
WARNING [job varscan4.cwl] completed permanentFail
{}WARNING Final process status is permanentFail
Here is the cwl script
cwlVersion: v1.0
class: CommandLineTool
label: "VarScan Variant Calling"
requirements:
- class: DockerRequirement
dockerImageId: "kboltonlab/varscan2:1.1"
baseCommand: ["java", "-jar", "/opt/varscan/VarScan.v2.4.2.jar"]
arguments:
- "pileup2cns"
- "$(inputs.bam.path)"
- "$(inputs.reference.path)"
- "--output-vcf"
- "1"
inputs:
bam:
type: File
inputBinding:
position: 3
reference:
type: File
inputBinding:
position: 2
secondaryFiles: [.fai]
outputs:
vcf:
type: File
outputBinding:
glob: "output.vcf"
and this is the json file with the input params
{
"bam": {
"class": "File",
"path": "/home/ec2-user/healthomics_SI/varscan/SRRoutput.pileup",
"secondaryFiles": [
{
"class": "File",
"path": "/home/ec2-user/healthomics_SI/varscan/ref_with_svs.bai"
}
]
},
"reference": {
"class": "File",
"path": "/ngs/reference/Homo_sapiens_assembly38.fasta",
"secondaryFiles": [
{"class": "File", "path": "/ngs/reference/Homo_sapiens_assembly38.fasta.fai"},
]
},
"strand_filter": 0,
"min_coverage": 8,
"min_var_freq": 0.1,
"min_reads": 2,
"p_value": 0.99,
"sample_name": "save.vcf"
}