How to stage secondary files for baseCommand?

Hi,

A command tool requires multiple scripts in the same directory (such as a.R, b.py, c.dat for run.sh). Could anyone please show me how to stage those dependency files for baseCommand? Thanks!

Hi @hubentu

There are a number of ways to do this. Here’s what I would do in roughly descending order of preference:

  1. Use secondaryFiles. Make your primary file (the script you will actually run) a default input and list the other dependencies as secondaryFiles. You can leave baseCommand blank and put the script in arguments. Here’s an example:
inputs:
  script:
    type: File
    default:
      class: File
      location: run.sh
      secondaryFiles:
        - class: File
          location: a.R
        - class: File
          location: b.py
        - class: File
          location: c.dat
arguments: [$(inputs.script)]
  1. Use a directory input
inputs:
  script:
    type: Directory
    default:
      class: Directory
      location: myscripts
arguments: [$(inputs.script)/run.sh]
  1. Use InitialWorkingDirRequirement

This puts all the files in the working directory (which is also the output directory).

inputs:
  script:
    type: File
    default:
      class: File
      location: run.sh
  a:
    type: File
    default:
      class: File
      location: a.R
  b:
    type: File
    default:
      class: File
      location: b.py
  c:
    type: File
    default:
      class: File
      location: c.dat
requirements:
  InitialWorkingDirRequirement:
    listing:
      - $(inputs.script)
      - $(inputs.a)
      - $(inputs.b)
      - $(inputs.c)
arguments: [$(inputs.script)]

Note: for (1) and (2) the working directory will not be the same as the directory where the script is located, so you may need a little bit of code to figure out where the script was placed to in order to find the other files it depends on. In shell you can do dirname $0.

Hope this helps!

1 Like

Thanks a lot for the explicit solutions. This is very helpful.

1 Like

Hi @tetron,

Thanks again for the answers. I have tried the 3 solutions. Only InitialWorkDirRequirement works for me.
My test script run.sh:

#!/bin/sh
cat a.dat
  1. Use secondaryFiles:
cwlVersion: v1.0
class: CommandLineTool
requirements:
  InitialWorkDirRequirement:
    listing:
      - $(inputs.src)
arguments:
- $(inputs.src)
inputs:
  src:
    type: File
    default:
      class: File
      location: run.sh 
      secondaryFiles:
        - class: File
          location: a.dat
outputs: []
stdout: out.txt

This one requires the InitialWorkDirRequirement to work. Otherwise, a.dat couldn’t be found.

  1. Use a directory input
cwlVersion: v1.0
class: CommandLineTool
arguments:
- $(inputs.src.dirname)/$(inputs.src.basename)/run.sh
inputs:
  src:
    type: Directory
    default:
      class: Directory
      location: ./ 
outputs: []

The upper directory of the inputs will be mapped to ‘$(inputs.src)’, so the a.dat could not be found in the work directory.

INFO [job run_dir.cwl] /tmp/vp3nic_i$ /tmp/tmpxfakvow_/stg49989a4b-73bd-4117-8d35-c979e24c97cc/test/run.sh
cat: a.dat: No such file or directory
  1. Use InitialWorkDirRequirement
cwlVersion: v1.0
class: CommandLineTool
requirements:
  InitialWorkDirRequirement:
    listing:
      - $(inputs.src)
      - $(inputs.a)
arguments:
- $(inputs.src)
inputs:
  src:
    type: File
    default:
      class: File
      location: run.sh 
  a:
    type: File
    default:
      class: File
      location: a.dat
outputs: []
stdout: out.txt

This one works perfectly.

Thanks!

1 Like

Your sample script requires files to be in the current working directory, not that same directory as the script. This is why some of @tetron’s solutions didn’t work for you.

Here is an example script that requires the current working directory:

#!/bin/sh
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
# from https://stackoverflow.com/a/246128
cat ${DIR}/a.dat

Is this case, all three solutions provided should work.

I’m glad you found a method that works for you!

1 Like