Passing reference or whole directory as input?

Good Afternoon,

Apologies for the newbie question. I am trying to set up CWL with my set of tools so I am going through a kind of boot camp now. :slight_smile:

One of my tools works by having a directory specified in a config file (a config file for the inner program) which will contain many files that the program will read. So for example the program will have a config file with:

[settings]
main_dir=/path/to/some/data/dir

I pass in this settings file itself through the usual CWL “class: File” method which works as expected. What I need to do is also pass in as input the data directory. So for example, I may have in the config file that is passed in:

[settings]
main_dir=data_dir

And then include an input in the CWL file like:
inputs:
dataDir:
type: Directory

(and in settings)
dataDir:
class: Directory
path: /actual/absolute/path/to/data/dir/with/data/files

This runs without error, but the actual data directory does not seem to be passed, or at least I think I am using “Directory” incorrectly. I have tries testing with “tree” and “ls” commands and only one temp file is shown - no directory. How can I make this directory location available to the program either by copying it or using a symlink? Is there a way to include a whole directory?

Thanks for reading and have an awesome day!

2 Likes

Hi Paul!

Let’s start by what is happening when you run your tool in its current form: During runtime, your cwl-runner will create a temporary directory in which the CommandLineTool operates and writes output. Symlinks to input Files/Directories are passed to the Tool. If a config file specifies an absolute path, the Tool will not be able to find it because it does not have access to anything outside the temporary directory.

We can use an InitialWorkDirRequirement and relative paths to adress the issue:

requirements:
  InitialWorkDirRequirement:
    listing:
      - $(inputs.dataDir)

inputs:
  settings:
    type: File
    inputBinding: 
      [...omitted, because i don't know how the tool works...]
  dataDir:
    type: Directory

This allows us to pass dataDir as in input parameter to the CommandLineTool in the job file. The cwl-runner will place it in the temporary folder where the tool runs. You can then specify a relative path to the data directory (in this case ./nameOfDataDir) in the settings file.

I had to do a very similar thing for the necat assembler, maybe the workflow and tools can help you as an additional example: https://github.com/ttubb/autopore/blob/f5b2d90dc52acc1565c6ac42fd639a3f9e8b5fe4/cwl-tools/assembly/necatWorkflow.cwl

To make the whole workflow more elegant, you might consider writing a short script which creates the settings file at runtime according to the input provided in the cwl-job.

I hope this post is helpful. Feel free to ask follow-up questions! Figuring this scenario out for the first time gave me a headache.

Cheers
Tom

2 Likes

Welcome @Paul_Biociphers !

When you pass in files and directories as input, you don’t know where they will be staged to ahead of time. So either you can stage them under a known name in the working directory (as suggested by @ttubb) or you can create a configuration file with the correct paths on the fly:

class: CommandLineTool
inputs:
  dataDir: Directory
requirements:
  InitialWorkDirRequirement:
    listing:
      - entryname: config.ini
        entry: |
          [settings]
          main_dir=$(inputs.dataDir.path)
arguments: ["my_program", "config.ini"]

Hope this helps!

2 Likes