Workflow fails when run without --cachedir

Hi,

can someone help me understand allocating resources to workflows?

I’m trying to run a rather simple workflow, scattered over an array of inputs (i.e. samples). Per sample the workflow creates ca. 10GB temporary data, which I would not want to pile up for 100 samples. The workflow works fine with the --cachedir option. However, it fails in a simple concatenate step due to missing disk space (cat: write error: No space left on device).
Adding ResourceRequirement: tmpdirMin: 1000000 did not help.

Also, I was wondering about the “ResourceRequirement” overall.
I would expect a CommandLineTool to fail, if no space is “allowed”.

class: CommandLineTool
cwlVersion: v1.2
requirements:
  ResourceRequirement:
    tmpdirMin: 0
    tmpdirMax: 0
    outdirMax: 0
    outdirMin: 0
baseCommand: cat
inputs:
    - id: files
      type: File[]
      inputBinding:
        position: 1
outputs:
  concatenatedFasta:
    type: File
    streamable: true
    outputBinding:
      glob: cat.out
stdout: cat.out

Thanks!

Hello!

cwltool --cachedir currently keeps all intermediates files, even those that aren’t globed up into other outputs.

You can use the CWL v1.1+ hint WorkReuse and set enableReuse: false ; then cwltool will not cache that CommandLineTool.

https://www.commonwl.org/v1.1/CommandLineTool.html#WorkReuse

I think cwltool ignores resource requirements unless you are running in --parallel mode; and even then I think that cwltool ignore the disk space reqs.

Thanks! Right, it might be outputs not being globed.