Apologies for the newbie question. I am trying to set up CWL with my set of tools so I am going through a kind of boot camp now.
One of my tools works by having a directory specified in a config file (a config file for the inner program) which will contain many files that the program will read. So for example the program will have a config file with:
[settings]
main_dir=/path/to/some/data/dir
I pass in this settings file itself through the usual CWL “class: File” method which works as expected. What I need to do is also pass in as input the data directory. So for example, I may have in the config file that is passed in:
[settings]
main_dir=data_dir
And then include an input in the CWL file like:
inputs:
dataDir:
type: Directory
(and in settings)
dataDir:
class: Directory
path: /actual/absolute/path/to/data/dir/with/data/files
This runs without error, but the actual data directory does not seem to be passed, or at least I think I am using “Directory” incorrectly. I have tries testing with “tree” and “ls” commands and only one temp file is shown - no directory. How can I make this directory location available to the program either by copying it or using a symlink? Is there a way to include a whole directory?
Let’s start by what is happening when you run your tool in its current form: During runtime, your cwl-runner will create a temporary directory in which the CommandLineTool operates and writes output. Symlinks to input Files/Directories are passed to the Tool. If a config file specifies an absolute path, the Tool will not be able to find it because it does not have access to anything outside the temporary directory.
We can use an InitialWorkDirRequirement and relative paths to adress the issue:
requirements:
InitialWorkDirRequirement:
listing:
- $(inputs.dataDir)
inputs:
settings:
type: File
inputBinding:
[...omitted, because i don't know how the tool works...]
dataDir:
type: Directory
This allows us to pass dataDir as in input parameter to the CommandLineTool in the job file. The cwl-runner will place it in the temporary folder where the tool runs. You can then specify a relative path to the data directory (in this case ./nameOfDataDir) in the settings file.
To make the whole workflow more elegant, you might consider writing a short script which creates the settings file at runtime according to the input provided in the cwl-job.
I hope this post is helpful. Feel free to ask follow-up questions! Figuring this scenario out for the first time gave me a headache.
When you pass in files and directories as input, you don’t know where they will be staged to ahead of time. So either you can stage them under a known name in the working directory (as suggested by @ttubb) or you can create a configuration file with the correct paths on the fly: