CWL workflow using ncbi sra tool kit fails

kcmtest · July 28, 2023, 1:56pm

Here is the cwl file i have created

#!/usr/bin/env cwl-runner

cwlVersion: cwl:v1.0
class: CommandLineTool
requirements:
- class: DockerRequirement
  dockerPull: ncbi/sra-tools:latest
- class: InlineJavascriptRequirement
  expressionLib:
  - var get_ngc= function(){ if(inputs["ngc_file"]==null){ return " "; }else{ return
    "--ngc "+inputs["ngc_file"].path+" "; } }
- class: InitialWorkDirRequirement
  listing:
  - entry: "set -x\nPS4='[\\\\d \\\\t] '\n\n## pfetch \na=\\$(basename \\${1})\nid=\\\
      ${a/.id/}\nvdb-config --interactive\n\nprefetch $(get_ngc()) --max-size 150G\
      \ \\${id}\n\nsraID=\\$(ls -1|grep -v make_fastq.sh)\n\n## fastq-dump \nvdb-validate\
      \ \\${sraID}\nfasterq-dump \\${sraID}\n\n## check read1, read2 or not paired\n\
      for s in \\${sraID}\ndo\n  echo \\${s}\n  if [ \\${s} != \\${id} ]\n  then\n\
      \    [ -f \\${s}_1.fastq ] && cat \\${s}_1.fastq >> \\${id}_R1.fastq & \n  \
      \  p1=\\$!\n    [ -f \\${s}_2.fastq ] && cat \\${s}_2.fastq >> \\${id}_R2.fastq\
      \ & \n    p2=\\$!\n    [ -f \\${s}.fastq ] && cat \\${s}.fastq >> \\${id}.fastq\
      \ &\n    p3=\\$!\n    wait \\$p1 \\$p2 \\$p3\n  fi\ndone\n\n## compress fastq\n\
      for i in \\${id}*.fastq\ndo \n  echo \\${i}\n  pigz \\${i} & sleep 1\ndone\n\
      wait\n\n## remove techniqual reads if any  \n[ -f \\${id}_R1.fastq.gz ] && [\
      \ -f \\${id}_R2.fastq.gz ] && [ -f \\${id}.fastq.gz ] && rm \\${id}.fastq.gz\n\
      ## test deleting intermediate folder\necho \"Space before tmp folder delete\"\
      \necho \"Removing tmp folder \\${sraID} \"\nls *\nrm -rf \\${sraID}*\nls *\n\
      echo 'done'"
    entryname: make_fastq.sh
    writable: false
label: Make_fastq_modiffied
doc: vdb config modified
inputs:
  sra_ID:
    type:
    - type: array
      items: string
    - 'null'
    inputBinding:
      position: 1
  id_file:
    type:
    - File
    - 'null'
    inputBinding:
      position: 2
outputs:
  fastq1:
    type: File
    label: read1 fastq.gz file
    outputBinding:
      glob:
      - $(inputs.sra_ID).fastq.gz
      - $(inputs.sra_ID)_R1.fastq.gz
      - $((inputs["id_file"]===null) ?null:(inputs["id_file"].basename.replace(".id",""))+'.fastq.gz')
      - $((inputs["id_file"]===null) ?null:(inputs["id_file"].basename.replace(".id",""))+'_R1.fastq.gz')
  fastq2:
    type:
    - File
    - 'null'
    label: read2 fastq.gz file
    outputBinding:
      glob:
      - $(inputs.sra_ID)_R2.fastq.gz
      - $((inputs["id_file"]===null) ?null:(inputs["id_file"].basename.replace(".id",""))+'_R2.fastq.gz')
baseCommand:
- sh
- make_fastq.sh

Now the run is terminated and I get this error message

I’m unable to figure out the issue with the output file type it is showing

EXECUTOR_ERROR
{"error_cause": "Task execution failed with \".ica/user/bpe1-ca0dbaf0-7377-4b9a-91ce-22d5f43e4039-2uwf/0/stderr.kg85x.log\"", "executor": {"command": ["/app/cwl_launch 3.0.20201203173111 --strict-memory-limit --tmpdir-prefix='/ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/tmp' --outdir='/ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/out' --debug --timestamp --default-container='bash:5' --ilmn-output-json-path='/ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/.ica/user/output.json' --ilmn-stdouterr-dir-prefix='/ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/.ica/user' --tmp-outdir-prefix='/ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/tmp-out' --preserve-environment CLOUD_REGION --preserve-environment CLOUD_PROVIDER --preserve-environment ICA_REGION --ilmn-resume 'workflow.cwl' 'input.json'"], "env": {"BPE_CWL_FEATURE_READ_POD_LOGS_VIA_K8S_API": "false", "CES_EXEC_WORKFLOW_RUN_LAUNCH": "true", "CES_MAIN_WORKFLOW_FILENAME": "workflow.cwl", "K8S_NAMESPACE": "wf-ca0dbaf0-7377-4b9a-91ce-22d5f43e4039", "K8S_PVC_NAME": "pvc-ca0dbaf0-7377-4b9a-91ce-22d5f43e4039", "REMOTE_S3_PREFIX": "bpe-ces-test-data/ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/", "SHARED_DIR_PREFIX": "/ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data", "WES_PVC_MOUNT_PATH": "/ces", "WES_PVC_NAME": "pvc-ca0dbaf0-7377-4b9a-91ce-22d5f43e4039", "WES_STDOUTERR_DIR_PREFIX": "/ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/logs/", "WES_WORKFLOW_NAME": "ca0dbaf0-7377-4b9a-91ce-22d5f43e4039", "WES_WORKFLOW_RUN_ID": "ca0dbaf0-7377-4b9a-91ce-22d5f43e4039", "WES_WORKFLOW_TASK_NAMESPACE": "wf-ca0dbaf0-7377-4b9a-91ce-22d5f43e4039", "WORKFLOW_RUN_ID": "ca0dbaf0-7377-4b9a-91ce-22d5f43e4039"}, "image": "NotUsed", "stderr": ".ica/user/bpe1-ca0dbaf0-7377-4b9a-91ce-22d5f43e4039-2uwf/0/stderr.kg85x.log", "stdin": null, "stdout": ".ica/user/bpe1-ca0dbaf0-7377-4b9a-91ce-22d5f43e4039-2uwf/0/stdout.kg85x.log", "workdir": "/ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/"}, "stderr": "/app/cwl_runner/.venv/bin/cwl-runner 3.0.20201203173111
\u001b[32m[2023-07-28 10:26:36]\u001b[0m \u001b[1;30mINFO\u001b[0m /app/cwl_runner/.venv/bin/cwl-runner 3.0.20201203173111
\u001b[32m[2023-07-28 10:26:36]\u001b[0m \u001b[1;30mINFO\u001b[0m Resolved 'workflow.cwl' to 'file:///ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/workflow.cwl'
input.json:1:333: Warning: Field `type` references unknown identifier `standard`, tried
 file:///ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/input.json#standard
input.json:1:543: Warning: Field `type` references unknown identifier `standard`, tried
 file:///ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/input.json#standard
URI prefix 'ilmn-tes' of 'ilmn-tes:resources' not recognized, are you missing a $namespaces section?
9baf3873-b77c-4273-9b8d-e40069f0cdbc.cwl:79:5: Warning: Field `type` references unknown identifier
 `standard`, tried
 file:///ces/scheduler/run/ca0dbaf0-7377-4b9a-91ce-22d5f43e4039/data/9baf3873-b77c-4273-9b8d-e40069f0cdbc.cwl#standard
\u001b[32m[2023-07-28 10:26:37]\u001b[0m \u001b[1;30mERROR\u001b[0m \u001b[31mTool definition failed validation:

workflow.cwl:39:7: Source 'fastq1' of type {\"type\": \"array\", \"items\": \"File\"} is incompatible
workflow.cwl:19:5: with sink 'Make_fastq_modiffied__fastq1' of type \"File\"
workflow.cwl:40:7: Source 'fastq2' of type {\"type\": \"array\", \"items\": [\"File\", \"null\"]} is
 incompatible
workflow.cwl:22:5: with sink 'Make_fastq_modiffied__fastq2' of type [\"null\", \"File\"]\u001b[0m
Traceback (most recent call last):
 File \"/usr/local/lib/cwltool-3.0.20201203173111/cwltool/main.py\", line 989, in main
 tool = make_tool(uri, loadingContext)
 File \"/usr/local/lib/cwltool-3.0.20201203173111/cwltool/load_tool.py\", line 457, in make_tool
 tool = loadingContext.construct_tool_object(processobj, loadingContext)
 File \"/app/cwl_runner/src/cwl_runner/cwl_runner.py\", line 166, in construct_tool_object
 return default_make_tool(toolpath_object, loadingContext)
 File \"/usr/local/lib/cwltool-3.0.20201203173111/cwltool/workflow.py\", line 51, in default_make_tool
 return Workflow(toolpath_object, loadingContext)
 File \"/usr/local/lib/cwltool-3.0.20201203173111/cwltool/workflow.py\", line 135, in __init__
 static_checker(
 File \"/usr/local/lib/cwltool-3.0.20201203173111/cwltool/checker.py\", line 333, in static_checker
 raise ValidationException(all_exception_msg)
schema_salad.exceptions.ValidationException: 
workflow.cwl:39:7: Source 'fastq1' of type {\"type\": \"array\", \"items\": \"File\"} is incompatible
workflow.cwl:19:5: with sink 'Make_fastq_modiffied__fastq1' of type \"File\"
workflow.cwl:40:7: Source 'fastq2' of type {\"type\": \"array\", \"items\": [\"File\", \"null\"]} is
 incompatible
workflow.cwl:22:5: with sink 'Make_fastq_modiffied__fastq2' of type [\"null\", \"File\"]
", "stdout": ""} (type: Non 0 exit code within CES commands, retryable: false)

Any suggestion or help will be really appreciated

alexiswl · September 20, 2023, 11:24pm

Hi @kcmtest,

Looks like you’re running on ICAv2, would love to touch base and work with you on running on ICAv2.

I’ve been working on some plugins for the ICAv2 cli to make running CWL from the CLI / API less of a pain than is currently required by the ICAv2 software.

Firstly, please use the | operator on your InitialWorkDirRequirement.listing.entry attribute. This allows entry to be a multi-line string and franky is much easier to read.

I’ve fixed it up for you below:

- class: InitialWorkDirRequirement
  listing:
  - entry: |
      #!/usr/bin/env sh
      
      set -x
      PS4='[\\d \\t] '
      
      ## pfetch
      a="\$(basename \${1})"
      id="\${a/.id/}"
      
      vdb-config --interactive
      
      prefetch $(get_ngc()) --max-size 150G "\${id}"
      
      sraID="\$(ls -1 | grep -v make_fastq.sh)"
      
      ## fastq-dump
      
      vdb-validate "\${sraID}"
      fasterq-dump "\${sraID}"
      
      ## check read1, read2 or not paired
      for s in \${sraID}; do
        echo "\${s}"
        if [ "\${s}" != "\${id}" ]; then
          [ -f \${s}_1.fastq ] && cat \${s}_1.fastq >> \${id}_R1.fastq & p1="\$!"
      
          [ -f \${s}_2.fastq ] && cat \${s}_2.fastq >> \${id}_R2.fastq & p2="\$!"
      
          [ -f \${s}.fastq ] && cat \${s}.fastq >> \${id}.fastq & p3="\$!"
      
          wait "\${p1}" "\${p2}" "\${p3}"
        fi
      
      done
      
      ## compress fastq
      for i in \${id}*.fastq; do
        echo "\${i}"
        pigz "\${i}" & sleep 1
      done;
      wait
      
      ## remove techniqual reads if any
      [ -f "\${id}_R1.fastq.gz" ] && \\
      [ -f "\${id}_R2.fastq.gz" ] && \\
      [ -f "\${id}.fastq.gz" ] && \\
      rm "\${id}.fastq.gz"
      
      ## test deleting intermediate folder
      echo "Space before tmp folder delete"
      
      echo "Removing tmp folder \${sraID}"
      ls *
      rm -rf "\${sraID}"*
      ls *
      echo 'done'

As for the actual error, this looks like a resource requirement error, are you sure this is your actual workflow file?

For icav2, here is an example of the resource requirements we use:

hints:
    ResourceRequirement:
        ilmn-tes:resources:tier: standard
        ilmn-tes:resources:type: standard
        ilmn-tes:resources:size: medium

But you will also need a $namespaces section, so CWL knows what ilmn-tes should point to, like so

$namespaces:
    s: https://schema.org/
    ilmn-tes: https://platform.illumina.com/rdf/ica/

We have a large set of ICA resources here - GitHub - umccr/cwl-ica: A collection of cwl-ica workflows along with a user guide for the commands to use and contributions guide, but note the code in GitHub here is designed for ICAv1, please head to the Releases page for ICAv2 workflows (we manipulate a few of the attributes that are not forwards compatible from ICAv1).

Please let me know if any of this was useful.

Kind regards,
Alexis

kcmtest · October 4, 2023, 7:22pm

wow i have response i thought would never get it

kcmtest · October 4, 2023, 7:24pm

Do you have resources where I can actually check my cwl file before testing in the tool it looks like most of the time im stuck with one or the other indentation issue. If there any tool which can check if my cwl file structure is in correct format or not that would be really helpful

mrc · October 4, 2023, 9:06pm

Try cwltool --validate

kcmtest · October 4, 2023, 11:22pm

cwlVersion: cwl:v1.0
this is the version im running

alexiswl · October 5, 2023, 12:29am

You need to run

cwltool --validate /path/to/main.cwl

This will check that all of your indentation is correct.

I’m not sure that cwltool --validate will check your workflow engine (ICAv2) viability though.
I would try my recommendations above (with respect to hints and namespaces) and see if that works for you?

I’d also try the icav2-cli-plugins toolkit for debugging (particularly the get-analysis-step-logs command for debugging cwl workflows on ICAv2.

As for the bash script, I usually first run cwltool locally with --debug and --leave-tmpdir, that way, even if the tool fails I can use the logs to find what bash script was generated. I can then copy the contents into my IDE (usually PyCharm) and see if the syntax highlighting in the IDE raises any errors.

kcmtest · October 5, 2023, 7:04am

The above issue is fixed with your suggestion and some modification. Now i observed some GSM id are failing at one time and then passing some other time in the make fastq steps, although most of them are passing the validate fastq. Now to fix this i wanted to incorporate a retry function in the shell script which does work in the shell environment without the cwl but when im putting inside cwl it fails here is my cwl file.It passes the validate fastq steps but fails at the make fastq it self


#!/usr/bin/env cwl-runner

cwlVersion: cwl:v1.0
class: CommandLineTool
requirements:
- class: DockerRequirement
  dockerPull: kcm1400/fastq_ncbi_sra:v1
- class: InlineJavascriptRequirement
  expressionLib:
  - |
    var get_ngc = function () {
      if (inputs["ngc_file"] == null) {
        return " ";
      } else {
        return "--ngc " + inputs["ngc_file"].path + " ";
      }
    }
- class: InitialWorkDirRequirement
  listing:
  - entry: |
      set -x
      PS4='[\\d \\t] '

      ## pfetch
      a=$(basename \${1})
      id=\${a/.id/}
      #vdb-config --interactive

      max_retries=5
      retry_count=0
      retry_delay=60  # sec

      download_succeeded=false

      while [ \$retry_count -lt \$max_retries ] && [ "\$download_succeeded" = false ]; do
        prefetch \$(get_ngc) --max-size 420G \${id} && download_succeeded=true

        if [ "\$download_succeeded" = false ]; then
          echo "Download failed. Retrying in \$retry_delay sec.."
          sleep \$retry_delay
          retry_count=$((retry_count + 1))
        fi
      done

      if [ "\$download_succeeded" = false ]; then
        echo "Download failed after \$max_retries retries. Exiting..."
        exit 1
      fi

      sraID=\$(ls -1 | grep -v make_fastq.sh)

      ## fastq-dump
      vdb-validate \${sraID}
      fasterq-dump \${sraID}

      ## check read1, read2, or not paired
      for s in \${sraID}; do
        echo \${s}
        if [ \${s} != \${id} ]; then
          [ -f \${s}_1.fastq ] && cat \${s}_1.fastq >> \${id}_R1.fastq &
          p1=\$!
          [ -f \${s}_2.fastq ] && cat \${s}_2.fastq >> \${id}_R2.fastq &
          p2=\$!
          [ -f \${s}.fastq ] && cat \${s}.fastq >> \${id}.fastq &
          p3=\$!
          wait \$p1 \$p2 \$p3
        fi
      done

      ## compress fastq
      for i in \${id}*.fastq; do
        echo \${i}
        pigz \${i} & sleep 1
      done
      wait

      ## remove technical reads if any
      [ -f \${id}_R1.fastq.gz ] && [ -f \${id}_R2.fastq.gz ] && [ -f \${id}.fastq.gz ] && rm \${id}.fastq.gz

      ## test deleting intermediate folder
      echo "Space before tmp folder delete"
      echo "Removing tmp folder \${sraID}"
      ls *
      rm -rf \${sraID}*
      ls *
      echo 'done'
    entryname: make_fastq.sh
    writable: false
label: make_fastq_testing
doc: vdb config modified
inputs:
  sra_ID:
    type:
    - string
    - 'null'
    inputBinding:
      position: 1
  id_file:
    type:
    - File
    - 'null'
    inputBinding:
      position: 2
  ngc_file:
    type:
    - File
    - 'null'
    inputBinding:
      position: 3
outputs:
  fastq1:
    type: File
    label: read1 fastq.gz file
    outputBinding:
      glob:
      - $(inputs.sra_ID).fastq.gz
      - $(inputs.sra_ID)_R1.fastq.gz
      - $((inputs["id_file"]===null) ?null:(inputs["id_file"].basename.replace(".id",""))+'.fastq.gz')
      - $((inputs["id_file"]===null) ?null:(inputs["id_file"].basename.replace(".id",""))+'_R1.fastq.gz')
  fastq2:
    type:
    - File
    - 'null'
    label: read2 fastq.gz file
    outputBinding:
      glob:
      - $(inputs.sra_ID)_R2.fastq.gz
      - $((inputs["id_file"]===null) ?null:(inputs["id_file"].basename.replace(".id",""))+'_R2.fastq.gz')
baseCommand:
- sh
- make_fastq.sh

alexiswl · October 5, 2023, 10:32am

Trying to run your code locally, I see a few issues, particularly with escaping quotes.

We need to escape $ in both shell contexts of command $() and variable substitution ${}.
This also includes mathematical evaluations with $(()).

If we do not escape $(), then the string inside the brackets will be evaluated as JavaScript by CWL.

a=$(basename \${1})

Should be

a=\$(basename \${1})

Since we want the shell to be

a=$(basename ${1})

Likewise,

retry_count=$((retry_count + 1))

Should instead be

retry_count=\$((retry_count + 1))

On the contrary, get_ngc is a custom JavaScript function, so we DO want this to be evaluated by JavaScript before the shell script is written.

We also need to make sure we call our JavaScript functions for them to return a value, note $(get_ngc ) should instead be $(get_ngc()).

So

prefetch \$(get_ngc) --max-size 420G \${id} && download_succeeded=true

becomes

prefetch $(get_ngc()) --max-size 420G \${id} && download_succeeded=true

This then gives me the following as a shell script (if specifiying an ngc input)

Click to expand!

set -x
PS4='[\d \t] '

## pfetch
a=$(basename ${1})
id=${a/.id/}
#vdb-config --interactive

max_retries=5
retry_count=0
retry_delay=60  # sec

download_succeeded=false

while [ $retry_count -lt $max_retries ] && [ "$download_succeeded" = false ]; do
  prefetch --ngc /var/lib/cwl/stg2329ea58-b1dc-457d-9971-f4c246b7f2a0/ngc_config  --max-size 420G ${id} && download_succeeded=true

  if [ "$download_succeeded" = false ]; then
    echo "Download failed. Retrying in $retry_delay sec.."
    sleep $retry_delay
    retry_count=$((retry_count + 1))
  fi
done

if [ "$download_succeeded" = false ]; then
  echo "Download failed after $max_retries retries. Exiting..."
  exit 1
fi

sraID=$(ls -1 | grep -v make_fastq.sh)

## fastq-dump
vdb-validate ${sraID}
fasterq-dump ${sraID}

## check read1, read2, or not paired
for s in ${sraID}; do
  echo ${s}
  if [ ${s} != ${id} ]; then
    [ -f ${s}_1.fastq ] && cat ${s}_1.fastq >> ${id}_R1.fastq &
    p1=$!
    [ -f ${s}_2.fastq ] && cat ${s}_2.fastq >> ${id}_R2.fastq &
    p2=$!
    [ -f ${s}.fastq ] && cat ${s}.fastq >> ${id}.fastq &
    p3=$!
    wait $p1 $p2 $p3
  fi
done

## compress fastq
for i in ${id}*.fastq; do
  echo ${i}
  pigz ${i} & sleep 1
done
wait

## remove technical reads if any
[ -f ${id}_R1.fastq.gz ] && [ -f ${id}_R2.fastq.gz ] && [ -f ${id}.fastq.gz ] && rm ${id}.fastq.gz

## test deleting intermediate folder
echo "Space before tmp folder delete"
echo "Removing tmp folder ${sraID}"
ls *
rm -rf ${sraID}*
ls *
echo 'done'

kcmtest · October 5, 2023, 3:06pm

 icav2 projectanalyses get-analysis-step-logs ed01dd3d-a821-47ff-9e5b-3c8d45fd8504 --step-name=build_fastq -
-stdout --output-path=output_file.txt
unknown flag: --step-name

any suggestion?

alexiswl · October 5, 2023, 10:23pm

Yep, this isn’t invoking the icav2 plugins but the standard binary icav2 instead.

Please head to the icav2-cli-plugins installation page. Once the plugin has been installed correctly you should be able to use the get-analysis-step-logs subcommand.

kcmtest · October 6, 2023, 5:43am

#!/usr/bin/env cwl-runner

cwlVersion: cwl:v1.0
class: CommandLineTool
requirements:
- class: DockerRequirement
  dockerPull: kcm1400/fastq_ncbi_sra:v1
- class: InlineJavascriptRequirement
  expressionLib:
  - |
    var get_ngc = function () {
      if (inputs["ngc_file"] == null) {
        return " ";
      } else {
        return "--ngc " + inputs["ngc_file"].path + " ";
      }
    }
- class: InitialWorkDirRequirement
  listing:
  - entry: |
      #!/bin/sh

      set -x
      PS4='[\\d \\t] '

      ## Retry function
      download_with_retry() {
          local id="$1"
          local retries=3
          local retry_delay=10  # seconds
          local download_succeeded=false

          for attempt in $(seq "$retries"); do
              # Download with prefetch
              prefetch $get_ngc() --max-size 420G "$id" && download_succeeded=true

              if [ "$download_succeeded" = true ]; then
                  echo "$id downloaded successfully."
                  return 0
              else
                  echo "Download of $id failed. Retrying in $retry_delay seconds..."
                  sleep "$retry_delay"
              fi
          done

          echo "Download of $id failed after $retries retries. Exiting..."
          return 1
      }

      ## Process each GSM ID
      for id in "$@"; do
          download_with_retry "$id" || exit 1

          sraID=$(ls -1 | grep -v make_fastq.sh)

          ## fastq-dump
          vdb-validate "$sraID"
          fasterq-dump "$sraID"

          ## check read1, read2, or not paired
          for s in $sraID; do
              echo "$s"
              if [ "$s" != "$id" ]; then
                  [ -f "${s}_1.fastq" ] && cat "${s}_1.fastq" >> "${id}_R1.fastq" &
                  p1=$!
                  [ -f "${s}_2.fastq" ] && cat "${s}_2.fastq" >> "${id}_R2.fastq" &
                  p2=$!
                  [ -f "${s}.fastq" ] && cat "${s}.fastq" >> "${id}.fastq" &
                  p3=$!
                  wait "$p1" "$p2" "$p3"
              fi
          done

          ## compress fastq
          for i in "${id}"*.fastq; do
              echo "$i"
              pigz "$i" & sleep 1
          done
          wait

          ## remove technical reads if any
          [ -f "${id}_R1.fastq.gz" ] && [ -f "${id}_R2.fastq.gz" ] && [ -f "${id}.fastq.gz" ] && rm "${id}.fastq.gz"

          ## test deleting intermediate folder
          echo "Space before tmp folder delete"

          echo "Removing tmp folder $sraID"
          ls *
          rm -rf "$sraID"*
          ls *
          echo 'done'
      done
    entryname: make_fastq.sh
    writable: false
label: build_fastq
doc: vdb config modified
inputs:
  sra_ID:
    type:
    - string
    - 'null'
    inputBinding:
      position: 1
  id_file:
    type:
    - File
    - 'null'
    inputBinding:
      position: 2
  ngc_file:
    type:
    - File
    - 'null'
    inputBinding:
      position: 3
outputs:
  fastq1:
    type: File
    label: read1 fastq.gz file
    outputBinding:
      glob:
      - $(inputs.sra_ID).fastq.gz
      - $(inputs.sra_ID)_R1.fastq.gz
      - $((inputs["id_file"]===null) ?null:(inputs["id_file"].basename.replace(".id",""))+'.fastq.gz')
      - $((inputs["id_file"]===null) ?null:(inputs["id_file"].basename.replace(".id",""))+'_R1.fastq.gz')
  fastq2:
    type:
    - File
    - 'null'
    label: read2 fastq.gz file
    outputBinding:
      glob:
      - $(inputs.sra_ID)_R2.fastq.gz
      - $((inputs["id_file"]===null) ?null:(inputs["id_file"].basename.replace(".id",""))+'_R2.fastq.gz')
baseCommand:
- sh
- make_fastq.sh

I ran this cwltool --debug --leave-tmpdir build_fastq.cwl

I get something like this

cwltool.errors.WorkflowException: Expression evaluation error:
Expecting value: line 1 column 1 (char 0)
script was:
01 "use strict";
02 var get_ngc = function () {
03   if (inputs["ngc_file"] == null) {
04     return " ";
05   } else {
06     return "--ngc " + inputs["ngc_file"].path + " ";
07   }
08 }
09
10 var inputs = {
11     "id_file": null,
12     "ngc_file": null,
13     "sra_ID": null
14 };
15 var self = null;
16 var runtime = {
17     "cores": 1,
18     "ram": 1024,
19     "tmpdirSize": 1024,
20     "outdirSize": 1024,
21     "tmpdir": "/tmp",
22     "outdir": "/sXpjTT"
23 };
24 (function(){return ((seq "$retries"));})()
stdout was: ''
stderr was: 'evalmachine.<anonymous>:24
(function(){return ((seq "$retries"));})()
                         ^^^^^^^^^^

SyntaxError: Unexpected string
    at new Script (node:vm:100:7)
    at createScript (node:vm:265:10)
    at Object.runInNewContext (node:vm:306:10)
    at Socket.<anonymous> ([eval]:11:57)
    at Socket.emit (node:events:513:28)
    at addChunk (node:internal/streams/readable:315:12)
    at readableAddChunk (node:internal/streams/readable:285:11)
    at Socket.Readable.push (node:internal/streams/readable:228:10)
    at Pipe.onStreamRead (node:internal/stream_base_commons:190:23)'

Would require your help how to get through the errors

alexiswl · October 6, 2023, 5:56am

As pointed out above, you need to escape ‘$’ to distinguish between JavaScript notation and bash notation.

You are getting the error (function(){return ((seq "$retries"));})() because cwltool thinks your code snippet $(seq "$retries") is a JavaScript function, not a bash function. Use \$(seq "\$retries") instead.

Of the 45 ‘$’ in the bash script you’ve put under mask_fastq.sh, all except for $get_ngc() need escaping.

In the case of $get_ngc(), you will need to wrap the JavaScript with brackets, so $get_ngc() becomes $(get_ngc()) instead.