Apologies if this is a repeat. I found some much older questions on this but the solutions appear to be work arounds for older schemas and links are broken.
My question is - I have a csv input (a sample sheet) with pairs of FASTQ file paths in the form
sampleName,path/to/file1,path/to/file2
I want to extract path/to/file1
and path/to/file2
and output those as File
types so that they can be consumed by a downstream task that requires File
inputs.
What is the most idiomatic way to do this in CWL 1.2?
Hi Mark,
Does the CSV comprise multiple rows?
Generate the schema
File name schema.yaml
type: record
name: fastq-pair
fields:
sampleName:
label: sample name
doc: |
The name of the sample
type: string
read1File:
label: read 1
doc: |
The read 1 file object
type: File
read2File:
label: read 2
doc: |
The read 2 file object
type: File?
Tool
When building the Schema def requirement, import the schema with the following
requirements:
InlineJavascriptRequirement: {}
SchemaDefRequirement:
types:
- $import: schema.yaml
Then in the output section, you will need to use ‘loadContents’
outputs:
fastqPairObjects:
label: fastq pair objects
doc: |
The fastq pair objects
# Syntax is schema-path#schema-name
# Added [] because we will be expecting an array of this type
type: schema.yaml#fastq-pair[]
outputBinding:
glob: "samplesheet.csv"
loadContents: true
outputEval: |
${
var content_lines = self[0].contents.split("\n");
fastq_pair_objects = [];
content_lines.forEach(function (value, i) {
fastq_pair_objects.push(
{
"sampleName": content_lines.split(",")[0]
"read_1": {
"class": "File",
"path": content_lines.split(",")[1]
},
"read_2": {
"class": "File",
"path": content_lines.split(",")[2]
},
}
);
});
return fastq_pair_objects;
}
Hope that points you in the right direction!