How to create a UUID inside CWL?

Due to the issue described here where input files with the same basename cause errors with a directory InitialWorkDirRequirement, I need to find a way to generate unique filenames that will not conflict with each other.

Ideally, I would just use a Javascript expression inside my CWL to accomplish this, but I could not get any of these suggested methods to generate random strings to work (or they required huge amounts of custom code too large to easily use across many CWL’s).

I did come up with one solution that uses Python’s uuid library;

requirements:
  InitialWorkDirRequirement:
    listing:
      - entryname: run.sh
        entry: |-
          set -euo pipefail
          output_file=\$(python3 -c "import uuid; print('_maf2bed_merged.' + str(uuid.uuid4()) + '.bed')")
          grep -v '#' "$1" | grep -v 'Hugo' | cut -f5-7 | sort -V -k1,1 -k2,2n > "\$output_file"

outputs:
  output_file:
    type: File
    outputBinding:
      glob: _maf2bed_merged.*.bed

This works, but has limitations / annoyances in that it messes up the input bindings for the other CLI args I might want to use, and it relies on having Python available which might not be the case.

Is there a way to do something like this directly within CWL?

At the very least, I would like to avoid needing to use the InitialWorkDirRequirement, and I was hoping there was a method where I could use a shell expression, like this;

output_filename:
    valueFrom: ${
       python -c "import uuid; print( 'file.' + str(uuid.uuid4()) + '.txt' )"
    }

But I could not find a way to implement that either.

Backing up a step, each input file will have a unique path (just not a unique name), what’s the reason you want to put them all into the working directory and not refer to them as regular input files?

Because that breaks Singularity; Singularity exec stops working when command line args are too long · Issue #5815 · hpcng/singularity · GitHub

something I had considered was instead of copying the files into the staging dir like this;

  InitialWorkDirRequirement:
    listing:
      - entryname: some_dir # <- put all the input files into a dir
        writable: true
        entry: "$({class: 'Directory', listing: inputs.input_files})"

was to instead copy the files’ entire directory tree into the dir, which would include the unique filepath component that would avoid name conflicts. However, I couldnt figure out how to accomplish that.

Ok, some runners like Toil do have issues with large numbers of input files. The Toil developers were working on a fix, I just nudged them about it.

If you have an array of files, you could rename them sequentially like this:

cwlVersion: v1.0
class: ExpressionTool
requirements:
  InlineJavascriptRequirement: {}
inputs:
  fn: File[]
outputs:
  renamed: File[]
expression: |-
  ${
  for (var i = 0; i < inputs.fn.length; i++) {
   inputs.fn[i].basename = i + "_" + inputs.fn[i].basename;
  }
  return {renamed: inputs.fn};
  }

By changing the “basename” you change the name the file will be staged to.

1 Like

Thanks, this looks helpful. But how do I use this within a CommandLineTool? For example, one like this;

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [ "ls", "-la", "some_dir" ]
doc: "CWL to save a listing of the execution Directory for debugging"

requirements:
  InlineJavascriptRequirement: {}

  InitialWorkDirRequirement:
    listing:
      - entryname: some_dir
        writable: true
        entry: "$({class: 'Directory', listing: inputs.input_files})"

stdout: ls.txt

inputs:
  input_files:
    type: File[]

outputs:
  output_file:
    type: stdout

Try this

  InitialWorkDirRequirement:
    listing:
      - entryname: some_dir
        writable: true
        entry: |-
            ${
              for (var i = 0; i < inputs.fn.length; i++) {
                inputs.input_files[i].basename = i + "_" + inputs.input_files[i].basename;
              }
              return {class: 'Directory', listing: inputs.input_files};
            }
1 Like

cool that works. I had to fix one typo;

  InitialWorkDirRequirement:
    listing:
      - entryname: some_dir
        writable: true
        entry: |-
            ${
              for (var i = 0; i < inputs.input_files.length; i++) {
                inputs.input_files[i].basename = inputs.input_files[i].basename + "." + i;
              }
              return {class: 'Directory', listing: inputs.input_files};
            }

thanks so much

1 Like