CWL workflow using ncbi sra tool kit fails

Here is the cwl file i have created

#!/usr/bin/env cwl-runner

cwlVersion: cwl:v1.0
class: CommandLineTool
requirements:
- class: DockerRequirement
  dockerPull: ncbi/sra-tools:latest
- class: InlineJavascriptRequirement
  expressionLib:
  - var get_ngc= function(){ if(inputs["ngc_file"]==null){ return " "; }else{ return
    "--ngc "+inputs["ngc_file"].path+" "; } }
- class: InitialWorkDirRequirement
  listing:
  - entry: "set -x\nPS4='[\\\\d \\\\t] '\n\n## pfetch \na=\\$(basename \\${1})\nid=\\\
      ${a/.id/}\nvdb-config --interactive\n\nprefetch $(get_ngc()) --max-size 150G\
      \ \\${id}\n\nsraID=\\$(ls -1|grep -v make_fastq.sh)\n\n## fastq-dump \nvdb-validate\
      \ \\${sraID}\nfasterq-dump \\${sraID}\n\n## check read1, read2 or not paired\n\
      for s in \\${sraID}\ndo\n  echo \\${s}\n  if [ \\${s} != \\${id} ]\n  then\n\
      \    [ -f \\${s}_1.fastq ] && cat \\${s}_1.fastq >> \\${id}_R1.fastq & \n  \
      \  p1=\\$!\n    [ -f \\${s}_2.fastq ] && cat \\${s}_2.fastq >> \\${id}_R2.fastq\
      \ & \n    p2=\\$!\n    [ -f \\${s}.fastq ] && cat \\${s}.fastq >> \\${id}.fastq\
      \ &\n    p3=\\$!\n    wait \\$p1 \\$p2 \\$p3\n  fi\ndone\n\n## compress fastq\n\
      for i in \\${id}*.fastq\ndo \n  echo \\${i}\n  pigz \\${i} & sleep 1\ndone\n\
      wait\n\n## remove techniqual reads if any  \n[ -f \\${id}_R1.fastq.gz ] && [\
      \ -f \\${id}_R2.fastq.gz ] && [ -f \\${id}.fastq.gz ] && rm \\${id}.fastq.gz\n\
      ## test deleting intermediate folder\necho \"Space before tmp folder delete\"\
      \necho \"Removing tmp folder \\${sraID} \"\nls *\nrm -rf \\${sraID}*\nls *\n\
      echo 'done'"
    entryname: make_fastq.sh
    writable: false
label: Make_fastq_modiffied
doc: vdb config modified
inputs:
  sra_ID:
    type:
    - type: array
      items: string
    - 'null'
    inputBinding:
      position: 1
  id_file:
    type:
    - File
    - 'null'
    inputBinding:
      position: 2
outputs:
  fastq1:
    type: File
    label: read1 fastq.gz file
    outputBinding:
      glob:
      - $(inputs.sra_ID).fastq.gz
      - $(inputs.sra_ID)_R1.fastq.gz
      - $((inputs["id_file"]===null) ?null:(inputs["id_file"].basename.replace(".id",""))+'.fastq.gz')
      - $((inputs["id_file"]===null) ?null:(inputs["id_file"].basename.replace(".id",""))+'_R1.fastq.gz')
  fastq2:
    type:
    - File
    - 'null'
    label: read2 fastq.gz file
    outputBinding:
      glob:
      - $(inputs.sra_ID)_R2.fastq.gz
      - $((inputs["id_file"]===null) ?null:(inputs["id_file"].basename.replace(".id",""))+'_R2.fastq.gz')
baseCommand:
- sh
- make_fastq.sh

Now the run is terminated and I get this error message

I’m unable to figure out the issue with the output file type it is showing

EXECUTOR_ERROR
{"error_cause": "Task execution failed with \".ica/user/bpe1-ca0dbaf0-7377-4b9a-91ce-22d5f43e4039-2uwf/0/stderr.kg85x.log\"", "executor": {"command": ["/app/cwl_launch 3.0.20201203173111 --strict-memory-limit --tmpdir-prefix='/ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/tmp' --outdir='/ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/out' --debug --timestamp --default-container='bash:5' --ilmn-output-json-path='/ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/.ica/user/output.json' --ilmn-stdouterr-dir-prefix='/ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/.ica/user' --tmp-outdir-prefix='/ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/tmp-out' --preserve-environment CLOUD_REGION --preserve-environment CLOUD_PROVIDER --preserve-environment ICA_REGION --ilmn-resume 'workflow.cwl' 'input.json'"], "env": {"BPE_CWL_FEATURE_READ_POD_LOGS_VIA_K8S_API": "false", "CES_EXEC_WORKFLOW_RUN_LAUNCH": "true", "CES_MAIN_WORKFLOW_FILENAME": "workflow.cwl", "K8S_NAMESPACE": "wf-ca0dbaf0-7377-4b9a-91ce-22d5f43e4039", "K8S_PVC_NAME": "pvc-ca0dbaf0-7377-4b9a-91ce-22d5f43e4039", "REMOTE_S3_PREFIX": "bpe-ces-test-data/ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/", "SHARED_DIR_PREFIX": "/ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data", "WES_PVC_MOUNT_PATH": "/ces", "WES_PVC_NAME": "pvc-ca0dbaf0-7377-4b9a-91ce-22d5f43e4039", "WES_STDOUTERR_DIR_PREFIX": "/ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/logs/", "WES_WORKFLOW_NAME": "ca0dbaf0-7377-4b9a-91ce-22d5f43e4039", "WES_WORKFLOW_RUN_ID": "ca0dbaf0-7377-4b9a-91ce-22d5f43e4039", "WES_WORKFLOW_TASK_NAMESPACE": "wf-ca0dbaf0-7377-4b9a-91ce-22d5f43e4039", "WORKFLOW_RUN_ID": "ca0dbaf0-7377-4b9a-91ce-22d5f43e4039"}, "image": "NotUsed", "stderr": ".ica/user/bpe1-ca0dbaf0-7377-4b9a-91ce-22d5f43e4039-2uwf/0/stderr.kg85x.log", "stdin": null, "stdout": ".ica/user/bpe1-ca0dbaf0-7377-4b9a-91ce-22d5f43e4039-2uwf/0/stdout.kg85x.log", "workdir": "/ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/"}, "stderr": "/app/cwl_runner/.venv/bin/cwl-runner 3.0.20201203173111
\u001b[32m[2023-07-28 10:26:36]\u001b[0m \u001b[1;30mINFO\u001b[0m /app/cwl_runner/.venv/bin/cwl-runner 3.0.20201203173111
\u001b[32m[2023-07-28 10:26:36]\u001b[0m \u001b[1;30mINFO\u001b[0m Resolved 'workflow.cwl' to 'file:///ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/workflow.cwl'
input.json:1:333: Warning: Field `type` references unknown identifier `standard`, tried
 file:///ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/input.json#standard
input.json:1:543: Warning: Field `type` references unknown identifier `standard`, tried
 file:///ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/input.json#standard
URI prefix 'ilmn-tes' of 'ilmn-tes:resources' not recognized, are you missing a $namespaces section?
9baf3873-b77c-4273-9b8d-e40069f0cdbc.cwl:79:5: Warning: Field `type` references unknown identifier
 `standard`, tried
 file:///ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/9baf3873-b77c-4273-9b8d-e40069f0cdbc.cwl#standard
\u001b[32m[2023-07-28 10:26:37]\u001b[0m \u001b[1;30mERROR\u001b[0m \u001b[31mTool definition failed validation:

workflow.cwl:39:7: Source 'fastq1' of type {\"type\": \"array\", \"items\": \"File\"} is incompatible
workflow.cwl:19:5: with sink 'Make_fastq_modiffied__fastq1' of type \"File\"
workflow.cwl:40:7: Source 'fastq2' of type {\"type\": \"array\", \"items\": [\"File\", \"null\"]} is
 incompatible
workflow.cwl:22:5: with sink 'Make_fastq_modiffied__fastq2' of type [\"null\", \"File\"]\u001b[0m
Traceback (most recent call last):
 File \"/usr/local/lib/cwltool-3.0.20201203173111/cwltool/main.py\", line 989, in main
 tool = make_tool(uri, loadingContext)
 File \"/usr/local/lib/cwltool-3.0.20201203173111/cwltool/load_tool.py\", line 457, in make_tool
 tool = loadingContext.construct_tool_object(processobj, loadingContext)
 File \"/app/cwl_runner/src/cwl_runner/cwl_runner.py\", line 166, in construct_tool_object
 return default_make_tool(toolpath_object, loadingContext)
 File \"/usr/local/lib/cwltool-3.0.20201203173111/cwltool/workflow.py\", line 51, in default_make_tool
 return Workflow(toolpath_object, loadingContext)
 File \"/usr/local/lib/cwltool-3.0.20201203173111/cwltool/workflow.py\", line 135, in __init__
 static_checker(
 File \"/usr/local/lib/cwltool-3.0.20201203173111/cwltool/checker.py\", line 333, in static_checker
 raise ValidationException(all_exception_msg)
schema_salad.exceptions.ValidationException: 
workflow.cwl:39:7: Source 'fastq1' of type {\"type\": \"array\", \"items\": \"File\"} is incompatible
workflow.cwl:19:5: with sink 'Make_fastq_modiffied__fastq1' of type \"File\"
workflow.cwl:40:7: Source 'fastq2' of type {\"type\": \"array\", \"items\": [\"File\", \"null\"]} is
 incompatible
workflow.cwl:22:5: with sink 'Make_fastq_modiffied__fastq2' of type [\"null\", \"File\"]
", "stdout": ""} (type: Non 0 exit code within CES commands, retryable: false)

Any suggestion or help will be really appreciated

Hi @kcmtest,

Looks like you’re running on ICAv2, would love to touch base and work with you on running on ICAv2.

I’ve been working on some plugins for the ICAv2 cli to make running CWL from the CLI / API less of a pain than is currently required by the ICAv2 software.

Firstly, please use the | operator on your InitialWorkDirRequirement.listing.entry attribute. This allows entry to be a multi-line string and franky is much easier to read.

I’ve fixed it up for you below:

- class: InitialWorkDirRequirement
  listing:
  - entry: |
      #!/usr/bin/env sh
      
      set -x
      PS4='[\\d \\t] '
      
      ## pfetch
      a="\$(basename \${1})"
      id="\${a/.id/}"
      
      vdb-config --interactive
      
      prefetch $(get_ngc()) --max-size 150G "\${id}"
      
      sraID="\$(ls -1 | grep -v make_fastq.sh)"
      
      ## fastq-dump
      
      vdb-validate "\${sraID}"
      fasterq-dump "\${sraID}"
      
      ## check read1, read2 or not paired
      for s in \${sraID}; do
        echo "\${s}"
        if [ "\${s}" != "\${id}" ]; then
          [ -f \${s}_1.fastq ] && cat \${s}_1.fastq >> \${id}_R1.fastq & p1="\$!"
      
          [ -f \${s}_2.fastq ] && cat \${s}_2.fastq >> \${id}_R2.fastq & p2="\$!"
      
          [ -f \${s}.fastq ] && cat \${s}.fastq >> \${id}.fastq & p3="\$!"
      
          wait "\${p1}" "\${p2}" "\${p3}"
        fi
      
      done
      
      ## compress fastq
      for i in \${id}*.fastq; do
        echo "\${i}"
        pigz "\${i}" & sleep 1
      done;
      wait
      
      ## remove techniqual reads if any
      [ -f "\${id}_R1.fastq.gz" ] && \\
      [ -f "\${id}_R2.fastq.gz" ] && \\
      [ -f "\${id}.fastq.gz" ] && \\
      rm "\${id}.fastq.gz"
      
      ## test deleting intermediate folder
      echo "Space before tmp folder delete"
      
      echo "Removing tmp folder \${sraID}"
      ls *
      rm -rf "\${sraID}"*
      ls *
      echo 'done'

As for the actual error, this looks like a resource requirement error, are you sure this is your actual workflow file?

For icav2, here is an example of the resource requirements we use:

hints:
    ResourceRequirement:
        ilmn-tes:resources:tier: standard
        ilmn-tes:resources:type: standard
        ilmn-tes:resources:size: medium

But you will also need a $namespaces section, so CWL knows what ilmn-tes should point to, like so

$namespaces:
    s: https://schema.org/
    ilmn-tes: https://platform.illumina.com/rdf/ica/

We have a large set of ICA resources here - GitHub - umccr/cwl-ica: A collection of cwl-ica workflows along with a user guide for the commands to use and contributions guide, but note the code in GitHub here is designed for ICAv1, please head to the Releases page for ICAv2 workflows (we manipulate a few of the attributes that are not forwards compatible from ICAv1).

Please let me know if any of this was useful.

Kind regards,
Alexis