Cwl: pass array of elements with repeated input flags

I am trying to create a .cwl for the GATK FilterIntervals tool that takes several files as input, each one specified by --input flag.
I know about cwl itemSeparator and here how I tried to pass the argument to the .cwl:

  - id: input_read_counts
    type:
      - "null"
      - type: array
        items: File
    inputBinding:
      prefix: '--input'
      itemSeparator: ' --input '

but the code is apparently rendered as:

gatk \
FilterIntervals \
--output \
hs37d5.preprocessed_300bp.filtered.interval_list \
--input \
'/var/lib/cwl/stgb9ec701d-59f1-4e6e-81e4-a31eb2531eb9/1.WGS.M.KO0004.hdf5 --input /var/lib/cwl/stg31ff1f67-85c6-43f2-a56b-66330038efff/2.WGS.M.KO0005.hdf5'

that it misinterpreted as a single file and makes the workflow fail:

A USER ERROR has occurred: Couldn’t read file
/var/lib/cwl/stgb9ec701d-59f1-4e6e-81e4-a31eb2531eb9/1.WGS.M.KO0004.hdf5
–input /var/lib/cwl/stg31ff1f67-85c6-43f2-a56b-66330038efff/2.WGS.M.KO0005.hdf5

How can I resolve this? I think the problem is the quote appended around the final variable.

Thank you very much in advance for any help!!!

--------- utilities ---------

Here the full script code:

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
label: GATK FilterIntervals on docker images

hints:
  DockerRequirement:
    dockerPull: broadinstitute/gatk:latest

baseCommand: gatk
arguments: [ "FilterIntervals", "--output", "$(inputs.interval_list_file.nameroot).filtered.interval_list" ]

inputs:
  - id: annotated_intervals
    type: File?
    inputBinding:
      position: 1
      prefix: '--annotated-intervals'
  - id: blacklist_bed
    type: File
    inputBinding:
      position: 2
      prefix: '-XL'
  - id: interval_list_file
    type: File
    inputBinding:
      position: 3
      prefix: '-L'
  - id: interval_merging_rule
    type: string
    inputBinding:
      position: 4
      prefix: '--interval-merging-rule'
  - id: minimum_gc_content
    type: float?
    inputBinding:
      position: 5
      prefix: '--minimum-gc-content'
  - id: maximum_gc_content
    type: float?
    inputBinding:
      position: 6
      prefix: '--maximum-gc-content'
  - id: minimum_mappability
    type: float?
    inputBinding:
      position: 7
      prefix: '--minimum-mappability'
  - id: maximum_mappability
    type: float?
    inputBinding:
      position: 8
      prefix: '--maximum-mappability'
  - id: minimum_segmental_duplication_content
    type: float?
    inputBinding:
      position: 9
      prefix: '--minimum-segmental-duplication-content'
  - id: maximum_segmental_duplication_content
    type: float?
    inputBinding:
      position: 10
      prefix: '--maximum-segmental-duplication-content'
  - id: low_count_filter_count_threshold
    type: float?
    inputBinding:
      position: 11
      prefix: '--low-count-filter-count-threshold'
  - id: low_count_filter_percentage_of_samples
    type: float?
    inputBinding:
      position: 12
      prefix: '--low-count-filter-percentage-of-samples'
  - id: extreme_count_filter_minimum_percentile
    type: float?
    inputBinding:
      position: 13
      prefix: '--extreme-count-filter-minimum-percentile'
  - id: extreme_count_filter_maximum_percentile
    type: float?
    inputBinding:
      position: 14
      prefix: '--extreme-count-filter-maximum-percentile'
  - id: extreme_count_filter_percentage_of_samples
    type: float?
    inputBinding:
      position: 15
      prefix: '--extreme-count-filter-percentage-of-samples'
  - id: input_read_counts
    type:
      - "null"
      - type: array
        items: File
    inputBinding:
      prefix: '--input'
      itemSeparator: ' --input '

outputs:
  filtered_intervals:
    type: File
    outputBinding:
      glob: $(inputs.interval_list_file.nameroot).filtered.interval_list

and here the inputs:

annotated_intervals:
  class: File
  path: /home/enrico/Dropbox/NY/app/GATK_CNV_germline/annotateIntervals/hs37d5.annotated_intervals.tsv
blacklist_bed:
  class: File
  path: /media/enrico/cells_WGS/gatk/gatk_SV/blacklists_GATK/CNV_and_centromere_blacklist.hg19.list
interval_list_file:
  class: File
  path: /home/enrico/Dropbox/NY/app/GATK_CNV_germline/preProcessIntervals/hs37d5.preprocessed_300bp.interval_list
interval_merging_rule: OVERLAPPING_ONLY
minimum_gc_content: 0.1
maximum_gc_content: 0.9
minimum_mappability: 0.9
maximum_mappability: 1.0
minimum_segmental_duplication_content: 0.0
maximum_segmental_duplication_content: 0.5
low_count_filter_count_threshold: 5
low_count_filter_percentage_of_samples: 90.0
extreme_count_filter_minimum_percentile: 1.0
extreme_count_filter_maximum_percentile: 99.0
extreme_count_filter_percentage_of_samples: 90.0
input_read_counts:
  - { class: File, path: /media/enrico/cells_WGS/columbia/pon_gatkSV/output_cureGN_QC_M/1.WGS.M.KO0004.hdf5 }
  - { class: File, path: /media/enrico/cells_WGS/columbia/pon_gatkSV/output_cureGN_QC_M/2.WGS.M.KO0005.hdf5 }

The solution here is to place the inputBinding block within the type section, and you only need the “prefix” field.

  - id: input_read_counts
    type:
      - "null"
      - type: array
        items: File
        inputBinding:
          prefix: '--input'

Don’t do this, butttt, if you wanted to keep your prefix / separator pattern, you could add shellQuote: false to your input to avoid the string being quoted and interpreted as a single shell argument.

1 Like