Gathering files into folder for output inside Scatter

douglowe · November 19, 2020, 9:32am

I’m adapting a workflow to be useable inside a scatter construct - for this I am gathering all output files into a directory which has a unique name for each workflow (as the files themselves have generic names), and returning these directories to the user instead.

My working example of this tool is here:
https://github.com/douglowe/biobb_hpc_cwl_workflows/tree/main/test_step15
My cut-down running script is md_step15_only.cwl, the tool that it is calling is md_gather.cwl.

I’ve tested this with the cwl-runner tool. The workflow does what I want if I pre-create the directories using mkdir (see script run_script_mkdir_working.sh), and also if I use the --copy-outputs flag (see script run_script_copy_working.sh). But if I don’t pre-create the directory, and allow cwl-tool to do it’s own thing, what I get in the final directory is links to the original files, not copies of them.

Is there a way for me to write this script so that the default behaviour is to return copies of the files, not links?

Also - this tool does not work using TOIL-CWL (see the run_script_toil_not_working.sh script, and md_step15_only_toil.cwl - as it needs to use standard 1.2.0-rev5). I think that this is because the second step in the process can’t access the directory created in the first step, but I’m having trouble getting useful information out from the log files. Can anyone advise on what might be going wrong, and how I might fix it?

douglowe · November 19, 2020, 7:01pm

I’ve found a solution to my problem (perhaps not to my question about differences in behaviour of tools, but this is less important), from adapting the dir7.cwl tool in this post.

My new function is:

class: ExpressionTool
cwlVersion: v1.1
doc: |
  This javascript takes two inputs, a list of 
  files, and a project file. It will create a
  directory named after the project file, populate
  that directory with the files in the list, and
  return the directory.
requirements:
  InlineJavascriptRequirement: {}
inputs:
  external_files: File[]
  external_project_file: File
outputs:
  project_work_dir: Directory
expression: |
  ${
  return {"project_work_dir": 
      {"class": "Directory", 
       "basename": inputs.external_project_file.basename, 
       "listing": inputs.external_files}
  };
  }

This now works as I want for the cwl-runner tool - a directory is created (named after the project file), and passed back with the files I require.

This also works in TOIL-CWL (I guess expression tools are more portable than command line tools), although I do need to use the --noMoveExports flag in order to have the outputs copied, rather than linked.