Hi all,
This is more of a developer question than a user question. A full command is composed of a base command (baseCommand), a series of inputs with positions defined in inputBindings and arguments also with positions defined. I have been playing with cwl_utils to extract the baseCommand from a Workflow step but I found I need to come up with some logic to incorporate the inputs and arguments to obtain the full command.
In order to avoid duplicating work is there a package or function in cwl_utils that does this for a commandLineTool process?
Thanks in advance,
Pedro
Hey Pedro, I’m not aware of any code to do that using the cwl_utils
objects. But it would be really useful to have! I would even help review and improve any PR on this topic.
Thanks Michael. I’ll get a proof of concept first working and get back at you on this.
Regards,
Pedro
Hi Peter,
Interested to know how you got on with this?
Does this imply you have an input json as well? I’m curious to know how you’d handle optional inputs?
Hi Alexis and Michael,
So here is more or less how I’m trying to work this out. For the time being I’m ignoring expressions. For every step in my workflow I check if I have a workflow or a task (CommandLineTool) and act accordingly. The magic happens in the _process_task method.
for step in cwl_content.steps:
step_element = cite_extract.get_process_from_step(step)
if isinstance(step_element, CommandLineTool):
_process_task(step, wf_graph)
elif isinstance(step_element, Workflow):
_process_workflow(step_element, wf_graph)
return wf_graph
I extract the baseCommand
def _process_task(step: WorkflowStep, wf_graph: nx.DiGraph) -> nx.DiGraph:
command_line_tool = cite_extract.get_process_from_step(step)
inputs = {
os.path.basename(step_inp.id.split("#")[1]): step_inp.source.split("#")[1]
for step_inp in step.in_
}
if isinstance(command_line_tool.baseCommand, str):
base_command = str(command_line_tool.baseCommand)
elif isinstance(command_line_tool.baseCommand, list):
base_command = " ".join(command_line_tool.baseCommand)
and add the arguments that I extract with the following method
def _build_commmand_arguments(command_line_tool, inputs: dict) -> list:
command_arguments = []
for inp in command_line_tool.inputs:
tool_inputs = os.path.basename(inp.id.split("#")[1])
if isinstance(inp.inputBinding, CommandLineBinding):
command_arguments.append(
_build_params(inputs[tool_inputs], inp.inputBinding)
)
else:
pass
if command_line_tool.arguments:
for argument in command_line_tool.arguments:
command_arguments.append(_build_params("", argument))
return command_arguments
Probably it is not the most optimal solution but I can build a full command with handles for the input and output files (I’m not really interested in the real inputs) to build an abstract graph. I looked into cwltools to see if there was something that I could directly invoke and get the full command but to be honest I was a bit lost in it. I also don’t need the yaml file for the inputs this way so I though it was easier. So for the following example:
cwlVersion: v1.2
class: Workflow
label: An example tool demonstrating metadata.
doc: Note that this is an example and the metadata is not necessarily consistent.
requirements:
SubworkflowFeatureRequirement: {}
StepInputExpressionRequirement: {}
ScatterFeatureRequirement: {}
InlineJavascriptRequirement: {}
ResourceRequirement:
coresMin: 4
ramMin: 3000
inputs:
input-file1: File
input-file2: File
steps:
process_file:
run:
label: "A test label"
class: CommandLineTool
baseCommand: [sh, process.sh]
requirements:
InlineJavascriptRequirement: {}
InitialWorkDirRequirement:
listing:
- entry: ""
entryname: "output.metadata"
writable: true
stdout: output.txt
inputs:
ifile:
type: File
inputBinding:
position: 1
ifile2:
type: File
inputBinding:
position: 2
outputs:
ofile:
type: File
format: edam:format_1964
label: A text file that contains a line count
outputBinding:
glob: output.txt
secondaryFiles:
- pattern: ^.metadata
required: true
in:
ifile: input-file1
ifile2: input-file2
out: [ofile]
process_file_2:
run:
class: CommandLineTool
baseCommand: cat
requirements:
InlineJavascriptRequirement: {}
ResourceRequirement:
coresMin: 2
ramMin: 6000
InitialWorkDirRequirement:
listing:
- entry: ""
entryname: "output_2.metadata"
writable: true
stdout: output_2.txt
inputs:
ifile:
type: File
inputBinding:
position: 1
outputs:
ofile:
type: File
format: edam:format_1964
label: A text file that contains a line count
outputBinding:
glob: output_2.txt
secondaryFiles:
- pattern: ^.metadata
required: true
in:
ifile: process_file/ofile
out: [ofile]
outputs:
output-file:
type: File
outputSource: process_file_2/ofile
$namespaces:
s: https://schema.org/
edam: http://edamontology.org/
I would obtain the following (networkx) graph:
[
(
"process_file",
{
"description": "process_file",
"command": "sh process.sh input-file1 input-file2",
"inputs": ["input-file1", "input-file2"],
"outputs": ["process_file/ofile"],
},
),
(
"process_file_2",
{
"description": "process_file_2",
"command": "cat process_file/ofile",
"inputs": ["process_file/ofile"],
"outputs": ["process_file_2/ofile"],
},
),
],
[("process_file", "process_file_2")]
Please let me know your thoughts and if there is something I missed in clwtools to achieve a similar result. Thanks!
Have a great day!