CWL workflow using ncbi sra tool kit fails

Trying to run your code locally, I see a few issues, particularly with escaping quotes.

We need to escape $ in both shell contexts of command $() and variable substitution ${}.
This also includes mathematical evaluations with $(()).

If we do not escape $(), then the string inside the brackets will be evaluated as JavaScript by CWL.

a=$(basename \${1})

Should be

a=\$(basename \${1})

Since we want the shell to be

a=$(basename ${1})

Likewise,

retry_count=$((retry_count + 1))

Should instead be

retry_count=\$((retry_count + 1))

On the contrary, get_ngc is a custom JavaScript function, so we DO want this to be evaluated by JavaScript before the shell script is written.

We also need to make sure we call our JavaScript functions for them to return a value, note $(get_ngc ) should instead be $(get_ngc()).

So

prefetch \$(get_ngc) --max-size 420G \${id} && download_succeeded=true

becomes

prefetch $(get_ngc()) --max-size 420G \${id} && download_succeeded=true

This then gives me the following as a shell script (if specifiying an ngc input)

Click to expand!
set -x
PS4='[\d \t] '

## pfetch
a=$(basename ${1})
id=${a/.id/}
#vdb-config --interactive

max_retries=5
retry_count=0
retry_delay=60  # sec

download_succeeded=false

while [ $retry_count -lt $max_retries ] && [ "$download_succeeded" = false ]; do
  prefetch --ngc /var/lib/cwl/stg2329ea58-b1dc-457d-9971-f4c246b7f2a0/ngc_config  --max-size 420G ${id} && download_succeeded=true

  if [ "$download_succeeded" = false ]; then
    echo "Download failed. Retrying in $retry_delay sec.."
    sleep $retry_delay
    retry_count=$((retry_count + 1))
  fi
done

if [ "$download_succeeded" = false ]; then
  echo "Download failed after $max_retries retries. Exiting..."
  exit 1
fi

sraID=$(ls -1 | grep -v make_fastq.sh)

## fastq-dump
vdb-validate ${sraID}
fasterq-dump ${sraID}

## check read1, read2, or not paired
for s in ${sraID}; do
  echo ${s}
  if [ ${s} != ${id} ]; then
    [ -f ${s}_1.fastq ] && cat ${s}_1.fastq >> ${id}_R1.fastq &
    p1=$!
    [ -f ${s}_2.fastq ] && cat ${s}_2.fastq >> ${id}_R2.fastq &
    p2=$!
    [ -f ${s}.fastq ] && cat ${s}.fastq >> ${id}.fastq &
    p3=$!
    wait $p1 $p2 $p3
  fi
done

## compress fastq
for i in ${id}*.fastq; do
  echo ${i}
  pigz ${i} & sleep 1
done
wait

## remove technical reads if any
[ -f ${id}_R1.fastq.gz ] && [ -f ${id}_R2.fastq.gz ] && [ -f ${id}.fastq.gz ] && rm ${id}.fastq.gz

## test deleting intermediate folder
echo "Space before tmp folder delete"
echo "Removing tmp folder ${sraID}"
ls *
rm -rf ${sraID}*
ls *
echo 'done'
1 Like