Building CWL tool for FindBadGenomicKmers

Ambarish_Kumar · May 28, 2020, 10:39am

Sir,
I am getting error while making CWL tool for BadGenomicKmers. Tool descriptions are as follows.

#!/usr/bin/env cwl-runner

# This tool description was generated automatically by wdl2cwl ver. 0.2

class: CommandLineTool

cwlVersion: v1.0


requirements:

- class: ShellCommandRequirement
- class: InlineJavascriptRequirement
- class: DockerRequirement
  dockerPull: quay.io/biocontainers/gatk4:4.1.6.0--py38_0
- class: InitialWorkDirRequirement
  listing: 
    - $(inputs.ReferenceGenome)



inputs:

- id: ReferenceGenome

  type: File

- id: ReferenceGenomeDict

  type: File

- id: sampleName

  type: string


outputs:

- id: kmers_to_ignore

  type: File

  outputBinding:

    glob: $(inputs.sampleName).txt



baseCommand: []

arguments:

- valueFrom: |-

    gatk FindBadGenomicKmersSpark -R $(inputs.ReferenceGenome.path) -O $(inputs.sampleName).txt

  shellQuote: false

Ambarish_Kumar · May 28, 2020, 10:42am

It asks for reference genome dictionary file. Although I have set tool requirements to access directory containing reference genome and it’s dictionary file. But it does not works.

Error is as follows.

[May 28, 2020 10:00:10 AM GMT] org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBadGenomicKmersSpark done. Elapsed time: 0.22 minutes.
Runtime.totalMemory()=230686720
***********************************************************************

A USER ERROR has occurred: Fasta dict file for reference /ySLAXq/NormalizeFasta.fasta does not exist. Please see http://gatkforums.broadinstitute.org/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference for help creating it.

***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
20/05/28 10:00:10 INFO ShutdownHookManager: Shutdown hook called
20/05/28 10:00:10 INFO ShutdownHookManager: Deleting directory /tmp/spark-0ec24ec5-7519-42d9-a6f7-316529086161
Using GATK jar /usr/local/share/gatk4-4.1.6.0-0/gatk-package-4.1.6.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /usr/local/share/gatk4-4.1.6.0-0/gatk-package-4.1.6.0-local.jar FindBadGenomicKmersSpark -R /ySLAXq/NormalizeFasta.fasta -O test.txt
INFO [job FindBadGenomicKmers.cwl] Max memory used: 252MiB
ERROR [job FindBadGenomicKmers.cwl] Job error:
("Error collecting output for parameter 'kmers_to_ignore':\nFindBadGenomicKmers.cwl:44:5: Did not find output file with glob pattern: '['test.txt']'", {})
WARNING [job FindBadGenomicKmers.cwl] completed permanentFail
{}
WARNING Final process status is permanentFail

kmhernan · February 26, 2021, 4:06am

I think you need to also add the reference sequence dictionary to the initial workdir list