Hello,
I am working on a tool definition that requires specifying an input parameter for the outputs location.
I have managed to write the following tool definition, that works as expected when testing with cwltool.
cwlVersion: v1.0
class: CommandLineTool
hints:
DockerRequirement:
dockerPull: umccr/alpine_pandas:1.0.1
requirements:
InitialWorkDirRequirement:
listing:
- entry: $(inputs.outputDir)
writable: true
baseCommand: []
inputs:
script:
type: File
inputBinding:
position: 0
samplesheet:
type: File
inputBinding:
position: 1
prefix: -s
metadata:
type: File
inputBinding:
position: 2
prefix: -t
outputDir:
type: Directory
inputBinding:
position: 4
prefix: -o
outputs:
splitSheets:
type:
type: array
items: [File, Directory]
outputBinding:
glob: "*"
I am having an issue when trying to port this to an environment that does not allow writing to input files/directories. So, trying to find a workaround for this issue. I’m trying an alternate approach where the outputDir
is passed as a string, i.e.
.
.
outputDir:
type: string
inputBinding:
position: 4
prefix: -o
outputs:
splitSheets:
type:
type: array
items: [File, Directory]
outputBinding:
glob: "$(inputs.outputDir)"
But that produces the following error:
("Error collecting output for parameter 'splitSheets':\nsamplesheetPrep2.cwl:41:7: glob patterns must not start with '/'", {})
I have also tried specifying inputs.outputDir
as a writable entry under InitialWorkDirRequirement
and then specifying glob as
outputs:
splitSheets:
type:
type: array
items: [File, Directory]
outputBinding:
glob: "*"
This produces following error OSError: [Errno 30] Read-only file system: '/Users'
Wondering if there could be any other solution to this issue?
Any help will be appreciated.
Cheers,
Sehrish
Hello,
If I understood the interface of your tool correctly, you just have to pass the input name as a string and then in the output glob that directory. So it will look like
cwlVersion: v1.0
class: CommandLineTool
hints:
DockerRequirement:
dockerPull: umccr/alpine_pandas:1.0.1
# Pass the working directory name here to the command line
baseCommand: []
inputs:
script:
type: File
inputBinding:
position: 0
samplesheet:
type: File
inputBinding:
position: 1
prefix: -s
metadata:
type: File
inputBinding:
position: 2
prefix: -t
outputDir:
type: string
inputBinding:
position: 4
prefix: -o
outputs:
splitSheets:
type: Directory
outputBinding:
glob: "$(inputs.outputDir)"
Hi Kaushik,
Thanks for the response. I tried capturing output
explicitly as a directory.
It still throws the glob error ("Error collecting output for parameter 'splitSheets':\nsamplesheetPrep2.cwl:43:7: glob patterns must not start with '/'", {})
I think it does not like specifying outputDir
as a string path that is starting with /Users/....
Can the program take a relative path. Typically I enter a relative path, something like output_dir
I was still getting the following error when specifying a relative path for outputDir
.
("Error collecting output for parameter 'splitSheets':\nsamplesheetPrep2.cwl:43:7: Did not find output file with glob pattern: '['cwl']'", {})
I had to update requirements as following to make it work - thanks to Michael F. for the hints
InitialWorkDirRequirement:
listing:
- '$({class: "Directory", basename: inputs.outputdir, listing: []})'
Hope this makes sense to others as well.
Thank you.
2 Likes
Sorry, I missed this bit - what did you mean by adding the working dir name before the base command?
That was a reminder to pass the string to the command line, and that is done via the inputBinding
.
@skanwal Glad your problem was solved.
For completeness, here is an example where the command line program creates a directory and the directory is globbed as an output
outdir.cwl
cwlVersion: v1.0
class: CommandLineTool
hints:
DockerRequirement:
dockerPull: python:alpine3.11
baseCommand: [python]
arguments: [$(inputs.script.path), $(inputs.outdir)]
inputs:
script: File
outdir: string
outputs:
mydir:
type: Directory
outputBinding:
glob: "$(inputs.outdir)"
Where the input script is script.py
import pathlib
import sys
out = sys.argv[1]
pathlib.Path(out).mkdir()
This can be run as
cwltool outdir.cwl --script script.py --outdir hello
Hi Kaushik,
Thanks for the example, it makes sense.
I believe to capture the contents in the output directory, it’s necessary to pass InitialWorkDirRequirement
with appropriate listing array?
I’ll give some context to the solution, inside the InitialWorkDirRequirement
, you return listing
value which can be an array<File | Directory | Dirent | string | Expression> | string | Expression
:
May be an expression. If so, the expression return value must validate as {type: array, items: [File, Directory]}
Hence through the expression we can return a valid Directory object with attributes:
class
: “Directory” (duh),
basename
: Name of the folder (no leading slashes or anything)
listing
: empty list ([]
), but you could return anything in here if you wanted the directory to have something in here.
Nb: wrap the expression in a string for it to be valid yaml.
This gives the tool:
class: CommandLineTool
baseCommand: ls
requirements:
InitialWorkDirRequirement:
listing:
- '$({class: "Directory", basename: inputs.outputdir, listing: []})'
inputs:
outputdir: string
outputs:
outdir:
type: Directory
outputBinding:
glob: $(inputs.outputdir)
outls: stdout
Hi Michael,
Thanks for adding to the discussion.
listing
: empty list ( []
), but you could return anything in here if you wanted the directory to have something in here.
Just another point, I was able to return files in the directory by using an empty list i.e. listing: []
.
Hi @skanwal, the InitialWorkDirRequirement
is to stage or create files/directories during the job setup phase. It is separate from the output gathering stage. The example I provided does not have an InitialWorkDirRequirement
because I don’t want to stage any inputs, merely gather the output directory, which the tool is creating.
Thanks a lot Kaushik. That makes absolute sense.
I am also facing the same error.
INFO [job varscan4.cwl] Max memory used: 230MiB
ERROR [job varscan4.cwl] Job error:
(“Error collecting output for parameter ‘vcf’: varscan4.cwl:39:7: Did not find output file with glob pattern: [‘output.vcf’].”, {})
WARNING [job varscan4.cwl] completed permanentFail
{}WARNING Final process status is permanentFail
Here is the cwl script
cwlVersion: v1.0
class: CommandLineTool
label: "VarScan Variant Calling"
requirements:
- class: DockerRequirement
dockerImageId: "kboltonlab/varscan2:1.1"
baseCommand: ["java", "-jar", "/opt/varscan/VarScan.v2.4.2.jar"]
arguments:
- "pileup2cns"
- "$(inputs.bam.path)"
- "$(inputs.reference.path)"
- "--output-vcf"
- "1"
inputs:
bam:
type: File
inputBinding:
position: 3
reference:
type: File
inputBinding:
position: 2
secondaryFiles: [.fai]
outputs:
vcf:
type: File
outputBinding:
glob: "output.vcf"
and this is the json file with the input params
{
"bam": {
"class": "File",
"path": "/home/ec2-user/healthomics_SI/varscan/SRRoutput.pileup",
"secondaryFiles": [
{
"class": "File",
"path": "/home/ec2-user/healthomics_SI/varscan/ref_with_svs.bai"
}
]
},
"reference": {
"class": "File",
"path": "/ngs/reference/Homo_sapiens_assembly38.fasta",
"secondaryFiles": [
{"class": "File", "path": "/ngs/reference/Homo_sapiens_assembly38.fasta.fai"},
]
},
"strand_filter": 0,
"min_coverage": 8,
"min_var_freq": 0.1,
"min_reads": 2,
"p_value": 0.99,
"sample_name": "save.vcf"
}