CWL - specifying Directory outputs as an input

skanwal · March 30, 2020, 10:45pm

Hello,

I am working on a tool definition that requires specifying an input parameter for the outputs location.
I have managed to write the following tool definition, that works as expected when testing with cwltool.

cwlVersion: v1.0
class: CommandLineTool

hints:
  DockerRequirement:
    dockerPull: umccr/alpine_pandas:1.0.1

requirements:
  InitialWorkDirRequirement:
    listing:
      - entry: $(inputs.outputDir)
        writable: true

baseCommand: []

inputs:
  script:
    type: File
    inputBinding:
      position: 0
  samplesheet:
    type: File
    inputBinding:
      position: 1
      prefix: -s
  metadata:
    type: File
    inputBinding:
      position: 2
      prefix: -t
  outputDir:
    type: Directory
    inputBinding:
      position: 4
      prefix: -o

outputs:
  splitSheets:
    type:
      type: array
      items: [File, Directory]
    outputBinding:
      glob: "*"

I am having an issue when trying to port this to an environment that does not allow writing to input files/directories. So, trying to find a workaround for this issue. I’m trying an alternate approach where the outputDir is passed as a string, i.e.

.
.
  outputDir:
    type: string
    inputBinding:
    position: 4
    prefix: -o

outputs:
  splitSheets:
    type:
      type: array
      items: [File, Directory]
    outputBinding:
      glob: "$(inputs.outputDir)"

But that produces the following error:

("Error collecting output for parameter 'splitSheets':\nsamplesheetPrep2.cwl:41:7: glob patterns must not start with '/'", {})

I have also tried specifying inputs.outputDir as a writable entry under InitialWorkDirRequirement and then specifying glob as

outputs:
  splitSheets:
    type:
      type: array
      items: [File, Directory]
    outputBinding:
      glob: "*"

This produces following error OSError: [Errno 30] Read-only file system: '/Users'

Wondering if there could be any other solution to this issue?
Any help will be appreciated.

Cheers,
Sehrish

kaushik-work · March 31, 2020, 2:01pm

Hello,

If I understood the interface of your tool correctly, you just have to pass the input name as a string and then in the output glob that directory. So it will look like

cwlVersion: v1.0
class: CommandLineTool

hints:
  DockerRequirement:
    dockerPull: umccr/alpine_pandas:1.0.1


# Pass the working directory name here to the command line
baseCommand: []

inputs:
  script:
    type: File
    inputBinding:
      position: 0
  samplesheet:
    type: File
    inputBinding:
      position: 1
      prefix: -s
  metadata:
    type: File
    inputBinding:
      position: 2
      prefix: -t
  outputDir:
    type: string
    inputBinding:
      position: 4
      prefix: -o

outputs:
  splitSheets:
    type: Directory
    outputBinding:
      glob: "$(inputs.outputDir)"

skanwal · March 31, 2020, 10:38pm

Hi Kaushik,

Thanks for the response. I tried capturing output explicitly as a directory.

It still throws the glob error ("Error collecting output for parameter 'splitSheets':\nsamplesheetPrep2.cwl:43:7: glob patterns must not start with '/'", {})

I think it does not like specifying outputDir as a string path that is starting with /Users/....

kaushik-work · April 1, 2020, 2:31am

Can the program take a relative path. Typically I enter a relative path, something like output_dir

skanwal · April 1, 2020, 8:41am

I was still getting the following error when specifying a relative path for outputDir.

("Error collecting output for parameter 'splitSheets':\nsamplesheetPrep2.cwl:43:7: Did not find output file with glob pattern: '['cwl']'", {})

I had to update requirements as following to make it work - thanks to Michael F. for the hints

InitialWorkDirRequirement:
    listing:
    - '$({class: "Directory", basename: inputs.outputdir, listing: []})'

Hope this makes sense to others as well.
Thank you.

skanwal · April 1, 2020, 8:56am

Sorry, I missed this bit - what did you mean by adding the working dir name before the base command?

kaushik-work · April 1, 2020, 11:10am

That was a reminder to pass the string to the command line, and that is done via the inputBinding.

kaushik-work · April 1, 2020, 11:38am

@skanwal Glad your problem was solved.

For completeness, here is an example where the command line program creates a directory and the directory is globbed as an output

outdir.cwl

cwlVersion: v1.0
class: CommandLineTool

hints:
  DockerRequirement:
    dockerPull: python:alpine3.11


baseCommand: [python]
arguments: [$(inputs.script.path), $(inputs.outdir)]

inputs:
  script: File
  outdir: string

outputs:
  mydir:
    type: Directory
    outputBinding:
      glob: "$(inputs.outdir)"

Where the input script is script.py

import pathlib
import sys

out = sys.argv[1]
pathlib.Path(out).mkdir()

This can be run as

cwltool outdir.cwl --script script.py --outdir hello

skanwal · April 2, 2020, 6:52am

Hi Kaushik,

Thanks for the example, it makes sense.

I believe to capture the contents in the output directory, it’s necessary to pass InitialWorkDirRequirement with appropriate listing array?

illusional · April 2, 2020, 6:56am

May be an expression. If so, the expression return value must validate as {type: array, items: [File, Directory]}

Hence through the expression we can return a valid Directory object with attributes:

class: “Directory” (duh),
basename: Name of the folder (no leading slashes or anything)
listing: empty list ([]), but you could return anything in here if you wanted the directory to have something in here.

Nb: wrap the expression in a string for it to be valid yaml.

This gives the tool:

class: CommandLineTool
baseCommand: ls

requirements:
  InitialWorkDirRequirement:
    listing:
      - '$({class: "Directory", basename: inputs.outputdir, listing: []})'

inputs:
  outputdir: string

outputs:
  outdir:
    type: Directory
    outputBinding:
      glob: $(inputs.outputdir)
  outls: stdout

skanwal · April 2, 2020, 10:03pm

Hi Michael,

Thanks for adding to the discussion.

listing : empty list ( [] ), but you could return anything in here if you wanted the directory to have something in here.

Just another point, I was able to return files in the directory by using an empty list i.e. listing: [].

kaushik-work · April 3, 2020, 10:39am

Hi @skanwal, the InitialWorkDirRequirement is to stage or create files/directories during the job setup phase. It is separate from the output gathering stage. The example I provided does not have an InitialWorkDirRequirement because I don’t want to stage any inputs, merely gather the output directory, which the tool is creating.

skanwal · April 5, 2020, 2:23am

Thanks a lot Kaushik. That makes absolute sense.

Jyotsana_Mehra · May 15, 2024, 6:55pm

I am also facing the same error.
INFO [job varscan4.cwl] Max memory used: 230MiB
ERROR [job varscan4.cwl] Job error:
(“Error collecting output for parameter ‘vcf’: varscan4.cwl:39:7: Did not find output file with glob pattern: [‘output.vcf’].”, {})
WARNING [job varscan4.cwl] completed permanentFail
{}WARNING Final process status is permanentFail

Here is the cwl script

cwlVersion: v1.0
class: CommandLineTool
label: "VarScan Variant Calling"

requirements:
  - class: DockerRequirement
    dockerImageId: "kboltonlab/varscan2:1.1"

baseCommand: ["java", "-jar", "/opt/varscan/VarScan.v2.4.2.jar"]

arguments:
  - "pileup2cns"
  - "$(inputs.bam.path)"
  - "$(inputs.reference.path)"
  - "--output-vcf"
  - "1"

inputs:
  bam:
    type: File
    inputBinding:
      position: 3
  reference:
    type: File
    inputBinding:
      position: 2
    secondaryFiles: [.fai]

outputs:
  vcf:
    type: File
    outputBinding:
      glob: "output.vcf"

and this is the json file with the input params

{
  "bam": {
    "class": "File",
    "path": "/home/ec2-user/healthomics_SI/varscan/SRRoutput.pileup",
    "secondaryFiles": [
      {
        "class": "File",
        "path": "/home/ec2-user/healthomics_SI/varscan/ref_with_svs.bai"
      }
    ]
  },
  "reference": {
    "class": "File",
    "path": "/ngs/reference/Homo_sapiens_assembly38.fasta",
    "secondaryFiles": [
        {"class": "File", "path": "/ngs/reference/Homo_sapiens_assembly38.fasta.fai"},
    ]
  },
  "strand_filter": 0,
  "min_coverage": 8,
  "min_var_freq": 0.1,
  "min_reads": 2,
  "p_value": 0.99,
  "sample_name": "save.vcf"
}