Hi everyone,
I’m trying to collect input files for a workflow step from various directories into one directory (as required input for the next step).
The files should match a name pattern (i.e. be of a specific file type).
After trying to achieve that with Expression Tools and / or by executing bash directly as part of the .cwl, I went back to writing a shell script that only takes a single Directory as input and trying to wrap that as cwl:
collectFiles-oneDir.sh:
#!/bin/bash
# Search an array of directories for files of specified extension and copy them to a directory
# Read the input arguments as strings
inDir=$1
inFileExtensionsStr=$2
outDir=$3
# Convert the comma-separated strings to arrays
IFS=',' read -r -a inFileExtensionsArray <<< "$inFileExtensionsStr"
mkdir -p "$outDir"
for ext in "${inFileExtensionsArray[@]}"
do
find "$inDir" -maxdepth 1 -type f -name "*.$ext" -exec cp {} "$outDir" \;
done
collectFiles-oneDir.cwl
#!/usr/bin/env cwl-runner
cwlVersion: v1.2
class: CommandLineTool
requirements:
- class: InlineJavascriptRequirement
- class: InitialWorkDirRequirement
listing:
- entry: "$({class: 'Directory', listing: []})"
entryname: $(inputs.outDir)
writable: true
- entry: $(inputs.inDir)
baseCommand: bash
arguments:
- valueFrom: $(inputs.inDir.basename)
position: 1
- valueFrom: $(inputs.inFileExtensions.join(","))
position: 2
- valueFrom: $(inputs.outDir)
position: 3
inputs:
script:
type: File
inputBinding:
position: 0
inDir:
type: Directory
inFileExtensions:
type: string[]
outDir:
type: string
outputs:
outdir:
type: Directory
outputBinding:
glob: $(runtime.outdir)/$(inputs.outDir)
collectFiles-oneDir.yml:
inDir:
class: Directory
path: "../../path/to/fasta-files/"
inFileExtensions: ["fasta","fasta.gz","pep","fa","fa.gz","faa","faa.gz"]
outDir: test
script:
class: File
path: collectFiles-oneDir.sh
From the discussion here .cwl workflow find file in subdirectories of an input directory - #8 by Brilator, I understand that this is not a typical CWL job.
However, I’d really like to get that step covered to connect all the inputs within their context.
Any help highly appreciated. Thanks