Hi noticed that everytime I try to run .cwl scripts that include GATK GenotypeGVCFs the runner encounter an error that is related to how the previous step creates filenames in genomic DB (from GATK GenomicsDBImport called through .cwl too):
Invalid filename: ‘8$1$146364022’ contains illegal characters
and actually investigating the genomic DB directory (GenomicsDBImport output) it actually creates filenames for each chromosome directory within quotes that then raise the error above:
enrico@godzilla:/media/kong/enrico/MCD/cwl-run-DIR$ ls -thal MCD_n15/
total 136K
drwxrwsr-x 3 enrico lab 4.0K Dec 5 10:47 ..
drwx------ 4 enrico lab 4.0K Dec 5 10:35 'X$1$155270560'
drwx------ 25 enrico lab 4.0K Dec 5 10:29 .
drwx------ 4 enrico lab 4.0K Dec 5 10:29 '22$1$51304566'
drwx------ 4 enrico lab 4.0K Dec 5 10:25 '21$1$48129895'
drwx------ 4 enrico lab 4.0K Dec 5 10:22 '20$1$63025520'
drwx------ 4 enrico lab 4.0K Dec 5 10:17 '19$1$59128983'
drwx------ 4 enrico lab 4.0K Dec 5 10:10 '18$1$78077248'
drwx------ 4 enrico lab 4.0K Dec 5 10:04 '17$1$81195210'
drwx------ 4 enrico lab 4.0K Dec 5 09:56 '16$1$90354753'
drwx------ 4 enrico lab 4.0K Dec 5 09:49 '15$1$102531392'
drwx------ 4 enrico lab 4.0K Dec 5 09:42 '14$1$107349540'
drwx------ 4 enrico lab 4.0K Dec 5 09:34 '13$1$115169878'
drwx------ 4 enrico lab 4.0K Dec 5 09:28 '12$1$133851895'
drwx------ 4 enrico lab 4.0K Dec 5 09:17 '11$1$135006516'
drwx------ 4 enrico lab 4.0K Dec 5 09:06 '10$1$135534747'
drwx------ 4 enrico lab 4.0K Dec 5 08:55 '9$1$141213431'
drwx------ 4 enrico lab 4.0K Dec 5 08:45 '8$1$146364022'
drwx------ 4 enrico lab 4.0K Dec 5 08:35 '7$1$159138663'
drwx------ 4 enrico lab 4.0K Dec 5 08:22 '6$1$171115067'
drwx------ 4 enrico lab 4.0K Dec 5 08:09 '5$1$180915260'
drwx------ 4 enrico lab 4.0K Dec 5 07:56 '4$1$191154276'
drwx------ 4 enrico lab 4.0K Dec 5 07:43 '3$1$198022430'
drwx------ 4 enrico lab 4.0K Dec 5 07:28 '2$1$243199373'
drwx------ 4 enrico lab 4.0K Dec 5 07:09 '1$1$249250621'
-rwx------ 1 enrico lab 8.4K Dec 5 06:49 vidmap.json
-rwx------ 1 enrico lab 18K Dec 5 06:49 vcfheader.vcf
-rwx------ 1 enrico lab 1.4K Dec 5 06:49 callset.json
-rwx------ 1 enrico lab 0 Dec 5 06:49 __tiledb_workspace.tdb
This happens every single time I have a GenomicsDBImport output in my Linux Ubuntu 18.04.5
Does anybody worked this around? I know I can call it from GATK outside .cwl but for pipeline purposes I’d like to be able to pass this DB through .cwl too.
Thank you very much in advance for any help! Below my cwl script:
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
label: gatk GenomicsDBImport on GATK docker images
hints:
DockerRequirement:
dockerPull: broadinstitute/gatk:latest
ResourceRequirement:
coresMin: $(inputs.GenomicsDBImport_coresMin)
ramMin: $(inputs.GenomicsDBImport_ramMin)
requirements:
InlineJavascriptRequirement: {}
baseCommand: gatk
arguments: [ "GenomicsDBImport" ]
inputs:
- id: interval_list
type: File
inputBinding:
position: 1
prefix: '-L'
- id: cohort_name
type: string
inputBinding:
position: 2
prefix: '--genomicsdb-workspace-path'
- id: gvcf_files
type:
- type: array
items: File
inputBinding:
position: 0
prefix: '-V'
separate: true
secondaryFiles:
- .tbi
outputs:
GenomicsDBImport_directory:
type: Directory
outputBinding:
glob: $(inputs.cohort_name)