I am familiar with the scatter/gather method available in CWL, but I am trying to figure out if a “reduce” method is available somewhere or could be implemented somehow. Here is an example of how “reduce” works in languages such as R;
> a <- c(1,2,3,4,5)
> Reduce(function(x, y){print(sprintf("x=%s, y=%s", x, y)); return(x+y)}, a)
[1] "x=1, y=2"
[1] "x=3, y=3"
[1] "x=6, y=4"
[1] "x=10, y=5"
[1] 15
A real-life example would be if I have e.g. 1,000 .bed files, and I want to run something like bedtools intersect
on all the files, but I want to intersect 2 files at a time (or any other number of files), then take the product of each intersect and use it as one of the inputs of the next iteration.
The closest I have gotten so far is to just do a wrapper around GNU parallel
inside an InitialWorkDirRequirement
, but this still requires inputting all 1,000+ files into a single CWL step. Would be a lot nicer if I could implement it somehow in the CWL.