I’ll write out my thought process.
Picking up outputs
I can get the secondary files to be picked up with an expression on the secondaryFiles field on the output.
out:
type: File
outputBinding:
glob: $(inputs.bam.basename)
secondaryFiles: |
${
function resolveSecondary(base, secPattern) {
if (secPattern[0] == "^") {
var spl = base.split(".");
var endIndex = spl.length > 1 ? spl.length - 1 : 1;
return resolveSecondary(spl.slice(undefined, endIndex).join("."), secPattern.slice(1));
}
return base + secPattern
}
return [{
path: resolveSecondary(self.path, "^.bai"),
basename: resolveSecondary(self.basename, ".bai"),
}]
}
But this doesn’t actually rename the file though (except on a completed workflow that gets exported). It just references it, eg:
- TaskA generates the file “file.bam” + “file.bai”.
- if TaskB comes along wanting the file “file.bam.bai”
- It fails with the error:
"Missing required secondary file '$name' from file object"
, even though it seems to exist in the secondaryFiles array (depends what it matches on I guess).
Error:
cwltool.errors.WorkflowException: Missing required secondary file 'generated-7999e776-212c-11ea-a264-acde48001122.bam.bai' from file object: {
"location": "file:///private/tmp/docker_tmpv8y7n43v/generated-7999e776-212c-11ea-a264-acde48001122.bam",
"basename": "generated-7999e776-212c-11ea-a264-acde48001122.bam",
"nameroot": "generated-7999e776-212c-11ea-a264-acde48001122",
"nameext": ".bam",
"class": "File",
"checksum": "sha1$a1a12417d413ab4847188d6eee7f6e675450656d",
"size": 2998029,
"secondaryFiles": [
{
"basename": "generated-7999e776-212c-11ea-a264-acde48001122.bam.bai",
"location": "file:///private/tmp/docker_tmpv8y7n43v/generated-7999e776-212c-11ea-a264-acde48001122.bai",
"class": "File",
"nameroot": "generated-7999e776-212c-11ea-a264-acde48001122.bam",
"nameext": ".bai",
"checksum": "sha1$e5e276cfbd0a1cfe7828ec744b02de3d8ee78f88",
"size": 1472592,
"http://commonwl.org/cwltool#generation": 0
}
],
"http://commonwl.org/cwltool#generation": 0
}
Picking up inputs
I’m also having trouble doing a similar process on the CommandInput, when I place a very similar expression block in the secondaryFiles on the input (location instead of path):
type: File
secondaryFiles: |
${
function resolveSecondary(base, secPattern) {
if (secPattern[0] == "^") {
var spl = base.split(".");
var endIndex = spl.length > 1 ? spl.length - 1 : 1;
return resolveSecondary(spl.slice(undefined, endIndex).join("."), secPattern.slice(1));
}
return base + secPattern
}
// return resolveSecondary(self.basename, "^.bai")
return [{
location: resolveSecondary(self.location, ".bai"),
basename: resolveSecondary(self.basename, "^.bai"),
}]
}
But, I get a CWLTool error:
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/site-packages/cwltool/executors.py", line 169, in run_jobs
for job in jobiter:
File "/anaconda3/lib/python3.7/site-packages/cwltool/command_line_tool.py", line 430, in job
builder = self._init_job(job_order, runtimeContext)
File "/anaconda3/lib/python3.7/site-packages/cwltool/process.py", line 747, in _init_job
discover_secondaryFiles=getdefault(runtime_context.toplevel, False)))
File "/anaconda3/lib/python3.7/site-packages/cwltool/builder.py", line 276, in bind_input
bindings.extend(self.bind_input(f, datum[f["name"]], lead_pos=lead_pos, tail_pos=f["name"], discover_secondaryFiles=discover_secondaryFiles))
File "/anaconda3/lib/python3.7/site-packages/cwltool/builder.py", line 332, in bind_input
sf_location = datum["location"][0:datum["location"].rindex("/")+1]+sfname
TypeError: can only concatenate str (not "dict") to str
This seems to go against the spec which says:
The expression must return:
- a filename string relative to the path to the primary File,
- a File or Directory object with either
path
or location
and basename
fields set,
- or an array consisting of strings or File or Directory objects.
It is legal to reference an unchanged File or Directory object taken from input as a secondaryFile.
Potential solution
I made a small modification to CWLTool to pull the location + fname from this File object, ie:
builder.py
:
- Remove
L332
,
- Insert the following stub at
L325
to get the correct basename (instead of assuming it’s a string).
if isinstance(sfname, string_types):
sf_location = datum["location"][0:datum["location"].rindex("/")+1]+sfname
else:
sf_location = sfname["location"]
sfname = sfname["basename"]
And this seems to solve the problems:
- Returning a File object in the secondary expression
- Matching the correct basename for follow up steps
I’ll page @mr-c to see if this change might go against the spec.