Apologies for the basic question. I have gff and genome file pairs that I would like to process together, however, the number of file pairs will vary. So I’m trying to input arrays of file pairs. The error I’m getting is “Field ‘source’ references unknown identifier”.
This is my yaml:
file_pairs:
- gff:
class: File
path: /path/to/file
genome:
class: File
path: /path/to/file
- gff:
class: File
path: /path/to/file
genome:
class: File
path: /path/to/file
This is my workflow:
cwlVersion: v1.2
class: Workflow
inputs:
file_pairs:
type:
type: array
items:
type: record
fields:
gff: File
genome: File
steps:
reduce_isoforms:
run:
tools/isoform_tool.cwl
in:
in_gff: file_pairs.gff
out:
[out_reduced_gff]
get_reduced_protein_sequences:
run:
tools/extract_sequences_tool.cwl
in:
in_reduced_gff: reduce_isoforms/out_reduced_gff
in_genome: file_pairs.genome
out:
[out_reduced_protein_sequences]
The full error is:
agat_only_workflow.cwl:29:9: checking field 'in'
agat_only_workflow.cwl:30:13: checking object 'agat_only_workflow.cwl#reduce_isoforms/in_gff'
Field 'source' references unknown identifier
'file_pairs.gff', tried
file://workflow.cwl#file_pairs.gff
mrc
October 10, 2024, 8:52am
2
Hello @kvertacnik , and thank you for your question.
If you want to reference an item from a record, currently the syntax is slightly more complicated than what you tried.
steps:
reduce_isoforms:
run:
tools/isoform_tool.cwl
in:
in_gff:
source: file_pairs
valueFrom: $(self.gff)
out:
[out_reduced_gff]
get_reduced_protein_sequences:
run:
tools/extract_sequences_tool.cwl
in:
in_reduced_gff: reduce_isoforms/out_reduced_gff
in_genome:
source: file_pairs
valueFrom: $(self.genome)
out:
[out_reduced_protein_sequences]
Thank you for your feedback! I added your edits and the following to the workflow file:
requirements:
- class: StepInputExpressionRequirement
- class: InlineJavascriptRequirement
However, I am getting the following error:
ERROR [step reduce_isoforms] Cannot make job: Expression evaluation error:
Expecting value: line 1 column 1 (char 0)
script was:
01 "use strict";
02 var inputs = {
03 "in_gff": [
04 {
05 "gff": {
06 "class": "File",
07 "location": "file:///path/to/Amel_HAv3.1_genomic.gff",
08 "size": 152741106,
09 "basename": "Amel_HAv3.1_genomic.gff",
10 "nameroot": "Amel_HAv3.1_genomic",
11 "nameext": ".gff"
12 },
13 "genome": {
14 "class": "File",
15 "location": "file:///path/to/Amel_HAv3.1_genomic.fna",
16 "size": 228091137,
17 "basename": "Amel_HAv3.1_genomic.fna",
18 "nameroot": "Amel_HAv3.1_genomic",
19 "nameext": ".fna"
20 }
21 }
22 ]
23 };
24 var self = [
25 {
26 "gff": {
27 "class": "File",
28 "location": "file:///path/to/Amel_HAv3.1_genomic.gff",
29 "size": 152741106,
30 "basename": "Amel_HAv3.1_genomic.gff",
31 "nameroot": "Amel_HAv3.1_genomic",
32 "nameext": ".gff"
33 },
34 "genome": {
35 "class": "File",
36 "location": "file:///path/to/Amel_HAv3.1_genomic.fna",
37 "size": 228091137,
38 "basename": "Amel_HAv3.1_genomic.fna",
39 "nameroot": "Amel_HAv3.1_genomic",
40 "nameext": ".fna"
41 }
42 }
43 ];
44 var runtime = {
45 "tmpdir": null,
46 "outdir": null
47 };
48 (function(){return ((self.gff));})()
stdout was: 'undefined'
stderr was: ''
INFO [workflow ] completed permanentFail
mrc
November 5, 2024, 2:41pm
4
Ah, to run a step once for every entry in an array, then add scatter
s
steps:
reduce_isoforms:
run:
tools/isoform_tool.cwl
in:
in_gff:
source: file_pairs
valueFrom: $(self.gff)
scatter: in_gff
out:
[out_reduced_gff]
get_reduced_protein_sequences:
run:
tools/extract_sequences_tool.cwl
in:
in_reduced_gff: reduce_isoforms/out_reduced_gff
in_genome:
source: file_pairs
valueFrom: $(self.genome)
scatter: in_genome
out:
[out_reduced_protein_sequences]