Hi noticed that everytime I try to run .cwl
scripts that include GATK GenotypeGVCFs
the runner encounter an error that is related to how the previous step creates filenames in genomic DB (from GATK GenomicsDBImport
called through .cwl
too):
Invalid filename: ‘8$1$146364022’ contains illegal characters
and actually investigating the genomic DB directory (GenomicsDBImport
output) it actually creates filenames for each chromosome directory within quotes that then raise the error above:
enrico@godzilla:/media/kong/enrico/MCD/cwl-run-DIR$ ls -thal MCD_n15/
total 136K
drwxrwsr-x 3 enrico lab 4.0K Dec 5 10:47 ..
drwx------ 4 enrico lab 4.0K Dec 5 10:35 'X$1$155270560'
drwx------ 25 enrico lab 4.0K Dec 5 10:29 .
drwx------ 4 enrico lab 4.0K Dec 5 10:29 '22$1$51304566'
drwx------ 4 enrico lab 4.0K Dec 5 10:25 '21$1$48129895'
drwx------ 4 enrico lab 4.0K Dec 5 10:22 '20$1$63025520'
drwx------ 4 enrico lab 4.0K Dec 5 10:17 '19$1$59128983'
drwx------ 4 enrico lab 4.0K Dec 5 10:10 '18$1$78077248'
drwx------ 4 enrico lab 4.0K Dec 5 10:04 '17$1$81195210'
drwx------ 4 enrico lab 4.0K Dec 5 09:56 '16$1$90354753'
drwx------ 4 enrico lab 4.0K Dec 5 09:49 '15$1$102531392'
drwx------ 4 enrico lab 4.0K Dec 5 09:42 '14$1$107349540'
drwx------ 4 enrico lab 4.0K Dec 5 09:34 '13$1$115169878'
drwx------ 4 enrico lab 4.0K Dec 5 09:28 '12$1$133851895'
drwx------ 4 enrico lab 4.0K Dec 5 09:17 '11$1$135006516'
drwx------ 4 enrico lab 4.0K Dec 5 09:06 '10$1$135534747'
drwx------ 4 enrico lab 4.0K Dec 5 08:55 '9$1$141213431'
drwx------ 4 enrico lab 4.0K Dec 5 08:45 '8$1$146364022'
drwx------ 4 enrico lab 4.0K Dec 5 08:35 '7$1$159138663'
drwx------ 4 enrico lab 4.0K Dec 5 08:22 '6$1$171115067'
drwx------ 4 enrico lab 4.0K Dec 5 08:09 '5$1$180915260'
drwx------ 4 enrico lab 4.0K Dec 5 07:56 '4$1$191154276'
drwx------ 4 enrico lab 4.0K Dec 5 07:43 '3$1$198022430'
drwx------ 4 enrico lab 4.0K Dec 5 07:28 '2$1$243199373'
drwx------ 4 enrico lab 4.0K Dec 5 07:09 '1$1$249250621'
-rwx------ 1 enrico lab 8.4K Dec 5 06:49 vidmap.json
-rwx------ 1 enrico lab 18K Dec 5 06:49 vcfheader.vcf
-rwx------ 1 enrico lab 1.4K Dec 5 06:49 callset.json
-rwx------ 1 enrico lab 0 Dec 5 06:49 __tiledb_workspace.tdb
This happens every single time I have a GenomicsDBImport
output in my Linux Ubuntu 18.04.5
Does anybody worked this around? I know I can call it from GATK outside .cwl
but for pipeline purposes I’d like to be able to pass this DB through .cwl
too.
Thank you very much in advance for any help! Below my cwl script:
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
label: gatk GenomicsDBImport on GATK docker images
hints:
DockerRequirement:
dockerPull: broadinstitute/gatk:latest
ResourceRequirement:
coresMin: $(inputs.GenomicsDBImport_coresMin)
ramMin: $(inputs.GenomicsDBImport_ramMin)
requirements:
InlineJavascriptRequirement: {}
baseCommand: gatk
arguments: [ "GenomicsDBImport" ]
inputs:
- id: interval_list
type: File
inputBinding:
position: 1
prefix: '-L'
- id: cohort_name
type: string
inputBinding:
position: 2
prefix: '--genomicsdb-workspace-path'
- id: gvcf_files
type:
- type: array
items: File
inputBinding:
position: 0
prefix: '-V'
separate: true
secondaryFiles:
- .tbi
outputs:
GenomicsDBImport_directory:
type: Directory
outputBinding:
glob: $(inputs.cohort_name)