Scatter workflow step n times

azzaea · December 16, 2019, 3:29am

Hello there!

Is there a way to scatter a workflow step a specific number of times or to generate an array of a specific length for scattering? I’m not quite sure how to use expressions within scatter for this purpose.
In my workflow, I need some logic like below. For now, I’m manually creating and passing an array of consecutive integers in my yml input, but this is quite cumbersome.

Thank you much in advance.
Azza

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow

requirements:
  ScatterFeatureRequirement: {}

inputs:
  iter: int[] 

steps:
  hostStep1:
    run: host.hostname.tool.cwl 
    scatter: iteration 
    in:
      iteration: iter
    out: [result]

ttubb · December 16, 2019, 12:27pm

I’m not entirely sure I understand what you are trying to do. Is the goal to run a step multiple times, but with identical input? If so, why? Is the outcome not deterministic?

Can you provide am example with some (abstract) input and what you want to happen with it?

azzaea · December 16, 2019, 1:12pm

Thank you for your response!

In a realistic application, I could think of numbers in a range passed as such to be used as file names, samples, … etc
In my case however, I’m evaluating a few workflow management systems (language expressiveness and engine capabilities). Scalability (and hence scattering) is naturally part of that. For this, I like for a tool (could be anything, hostname in my case) to run a specific number of times so I could benchmark how a given engine behaves (where/when it breaks, how it dispatches jobs, … etc)

I hope this gives better context?

kaushik-work · December 17, 2019, 1:33pm

Hi,
I typically end up creating a two step workflow. The first step produces a list and the second step scatters on it.

class: Workflow
cwlVersion: v1.2.0-dev1
inputs:
  val: int

steps: 

  step1:
    in:
      in1: val
    run: listify.cwl
    out: [out1]

  step2:
    in:
      in1: step1/out1
    scatter: [in1]
    run: foo.cwl
    out: [out1]
    
outputs: 
  out1:
    type: string[]
    outputSource: step2/out1

requirements: 
  ScatterFeatureRequirement: {}
  MultipleInputFeatureRequirement: {}

Where

listify.cwl

# This takes in a single int as input and 
# returns an array of ints [0 ... N - 1]

class: CommandLineTool
cwlVersion: v1.0
inputs:
  in1: int

baseCommand: ["echo"]

outputs: 
  out1:
    type: int[]
    outputBinding:
      outputEval: |-
        ${
          var out = []
          for(var i = 0; i < inputs.in1; i++) {
            out.push(i)
          }
          return out
        }

requirements:
  InlineJavascriptRequirement: {}

foo.cwl

# A simple tool that just prints "foo <int>"

class: CommandLineTool
cwlVersion: v1.0
inputs:
  in1: int
baseCommand: [echo]
outputs:
  out1:
    type: string
    outputBinding:
      outputEval: ${return "foo " + inputs.in1}

requirements:
  InlineJavascriptRequirement: {}

mrc · May 26, 2020, 8:05am

Here’s listify.cwl as a pure ExpressionTool

cwlVersion: v1.0
class: ExpressionTool

inputs:
  in1: int

expression:  |-
        ${
          var out = [];
          for(var i = 0; i < inputs.in1; i++) {
            out.push(i);
          }
          return {"out1": out};
         }

outputs: 
  out1:
    type: int[]

requirements:
  InlineJavascriptRequirement: {}

[/quote]

azzaea · June 2, 2020, 9:41am

Thank you @mrc .

The advantage I see by using an ExpressionTool vs CommandLineTool here is potentially more clarity/less clutter. My understanding from the v1.1 CWL description is that pure ExpressionTools should be used sparingly (But this is probably just the general recommendation of minimizing use of java expressions anyways).

Are there any advantages/disadvantages of going one way vs the other?

Again, due thanks.
Azza

mrc · June 8, 2020, 6:38pm

I think you have summarized the reasons for one way or the other, yep!