Collect input files by name from directories

Hi everyone,

I’m trying to collect input files for a workflow step from various directories into one directory (as required input for the next step).
The files should match a name pattern (i.e. be of a specific file type).

After trying to achieve that with Expression Tools and / or by executing bash directly as part of the .cwl, I went back to writing a shell script that only takes a single Directory as input and trying to wrap that as cwl:

collectFiles-oneDir.sh:

#!/bin/bash

# Search an array of directories for files of specified extension and copy them to a directory

# Read the input arguments as strings
inDir=$1
inFileExtensionsStr=$2
outDir=$3

# Convert the comma-separated strings to arrays
IFS=',' read -r -a inFileExtensionsArray <<< "$inFileExtensionsStr"

mkdir -p "$outDir"

for ext in "${inFileExtensionsArray[@]}"
do
    find "$inDir" -maxdepth 1 -type f -name "*.$ext" -exec cp {} "$outDir" \;
done

collectFiles-oneDir.cwl

#!/usr/bin/env cwl-runner

cwlVersion: v1.2
class: CommandLineTool

requirements:
  - class: InlineJavascriptRequirement
  - class: InitialWorkDirRequirement
    listing:
      - entry: "$({class: 'Directory', listing: []})"
        entryname: $(inputs.outDir)
        writable: true
      - entry: $(inputs.inDir)

baseCommand: bash

arguments:
  - valueFrom: $(inputs.inDir.basename)
    position: 1
  - valueFrom: $(inputs.inFileExtensions.join(","))
    position: 2
  - valueFrom: $(inputs.outDir)
    position: 3

inputs:
  script:
    type: File
    inputBinding:
        position: 0
  inDir:
    type: Directory
  inFileExtensions:
    type: string[]
  outDir:
    type: string

outputs:
  outdir:
    type: Directory
    outputBinding:
      glob: $(runtime.outdir)/$(inputs.outDir)

collectFiles-oneDir.yml:

inDir:
  class: Directory
  path: "../../path/to/fasta-files/"
inFileExtensions: ["fasta","fasta.gz","pep","fa","fa.gz","faa","faa.gz"]
outDir: test
script:
  class: File
  path: collectFiles-oneDir.sh

From the discussion here .cwl workflow find file in subdirectories of an input directory - #8 by Brilator, I understand that this is not a typical CWL job.

However, I’d really like to get that step covered to connect all the inputs within their context.

Any help highly appreciated. Thanks