Access envDef of Workflow in CWLtool

HrishiDhondge · March 2, 2022, 11:15pm

Hello,
I would like to know whether is there any way to access the Environment Variable defined in workflow from inside a tool/step of the workflow without explicitly passing it as an argument?

In the CWL tutorials, the only information to define an Env Variable and access it by usually passing it as an argument/input?

But I would like to access it without explicitly passing the variable to a tool. That should be possible as it is there in the environment, but I don’t know how to access it.
Any help would be appreciated on this. Thank you in advance!

mrc · March 3, 2022, 11:39am

Do you have a EnvVarRequirement at the workflow level?

Then that EnvVarRequirement will automatically be present on all steps and sub-steps, unless they define their own EnvVarRequirement.

Accessing the actual value is not possible except for making a CommandLineTool that does echo $MY_ENV_VAR and captures the output.

cwlVersion: v1.0
class: CommandLineTool
requirements:
  InlineJavascriptRequirement: {}
  ShellCommandRequirement: {}
 
inputs: []

baseCommand: echo 

arguments:
 - valueFrom: $MY_ENV_VAR
   shellQuote: false

stdout: result.txt

outputs:
  MY_ENV_VAR_value:
     type: string
     outputBinding:
       glob: result.txt
       loadContents: true
       outputEval: $(self[0].contents)

In general I would suggest not defining an EnvVarRequirement at the workflow level. Can you explain more about your goal in doing that?

HrishiDhondge · March 3, 2022, 11:42am

I have a variable that I need to pass to more than half of the steps/tools, so I thought it might be a better idea to define it as ENV_VAR and access it directly from tools.

mrc · March 3, 2022, 11:47am

Then that should be an explicit input for each step.

In chat, you used the example of an output directory. That concept is not needed for CWL as any CWL runner must manage your output directories for you.

If you need to know the current output directory, use $(runtime.outdir)

HrishiDhondge · March 3, 2022, 12:41pm

Thank you for the prompt answer.
I have an output directory along with a few more directories, I am working with the splitting of PDB files, so I have one directory for complete PDB files (pdbDir) and another directory for split PDB files (sepDir).
The pdbDir, has to be there to split the PDB files and return output in sepDIr.
Returning output can be done using --outdir option, but passing pdbDir to several steps/tools in workflow is a must.
So I was wondering if there is a simple solution to inherit and access those ENV_VAR defined at workflow level?

mrc · March 3, 2022, 1:26pm

Do any of your steps need multiple PDB files and for them to be in the same directory? Then you should have an output of type: Directory; in CWL we do not pass file/directory paths around as strings, as that would break during distributed execution when there is not a shared filesystem.

Or are you thinking more about presenting the final outputs? In that case, the CWL way would be to have individual type: File outputs from steps. Possibly combining them into a directory later, if truly needed.

HrishiDhondge · March 3, 2022, 2:49pm

Hello again,

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool

requirements:
  InlineJavascriptRequirement: {}
  InitialWorkDirRequirement:
    listing:
      - $(inputs.src)
inputs:
  src:
    type: File
    inputBinding:
      position: 1
      valueFrom: $(self.basename)

outputs:
  classfile:
    type: stdout

baseCommand:
  - echo
  - $(runtime.outdir)

This prints $(runtime.outdir) as string and not its value; And below code says the baseCommand` field is not valid because tried array of but
try.cwl:24:5: item is invalid because the value is not string

baseCommand:
  - echo
  - valueFrom: $(runtime.outdir)

And in addition to this I would like to pass the file as argument/input only if it exists because if file does not exists it gives me error like
[Errno 2] No such file or directory

mrc · March 3, 2022, 7:35pm

baseCommand only accepts static strings, once you have a dynamic element you have to start using arguments for it and all remaining elements of the command line (including other static elements). Likewise valueFrom is valid as part of an arguments element, but not in baseCommand.

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool

requirements:
  InitialWorkDirRequirement:
    listing:
      - $(inputs.src)

inputs:
  src:
    type: File
    inputBinding:
      position: 1
      valueFrom: $(self.basename)

outputs:
  classfile: stdout

baseCommand: echo

arguments:
  - $(runtime.outdir)

HrishiDhondge · March 4, 2022, 6:36am

My workflow consists of multiple iterations and as loops are not implemented in CWL yet. I am writing the workflow for a single iteration and creating the parameter file at the end of each iteration which will be used by the next iteration again.

And for every iteration, I have to check the files from the Results directory, if it exists already read and append data to the current iteration file, if not create an empty file (to avoid the File not found error).

And that was the reason I was not staging files but accessing them through an absolute path. And for the very same reason, I wanted to define the ENV_VAR at the workflow level, so I don’t have to pass the same arguments most of the time and it might be easier for users to run it without much knowledge of CWL.

I am not putting the same code for the number of iterations I want to run, which was the solution I found in one of the posts because I want it to be dynamic and for different use cases the number of iterations are different.

mrc · March 10, 2022, 8:59am

Thanks for the context

Using path strings instead of File or Directory inputs/outputs will eventually cause you a big problem as it violates a core element of the CWL data model. Distributed execution, using caching, provenance recording, and other scenarios will likely break.

Maybe you can modify your loop handling logic (outside the CWL workflow) to merge results and produce the inputs for the next iteration instead of doing that at the end of your CWL workflow?

Then your CWL workflow can use proper File and/or Directory inputs and outputs.

We are actively discussing how to design a loop feature for CWL, so if you can share more details about your situation then that would be very helpful. A complete copy of your workflow and the loop logic with example inputs and expected outputs would be the best; but if you can’t share the data or code then maybe you can describe it enough thay we can build a mock version together.

HrishiDhondge · March 10, 2022, 4:04pm

For the moment, I can not share many details; but here’s the brief overview of the workflow I’m trying to build:

I create several files in the first iteration of the loop using variables(string as a file name) specified by a user (considering the user might want to change the names accordingly)

And some of these files I need in the next iterations to keep track of everything. So now I have to give the filenames as strings for the first iteration and File for the next iterations.

I am not sure whether we can check if a file already exists using CWL and create an empty file if it does not exist. If this feature is already there, then it would most probably solve my problem of working around with filenames as strings.
Another workaround is to simply stage the whole result directory but that would not be a wise decision because file size is quite large.

mrc · March 10, 2022, 6:18pm

Thanks for the information @HrishiDhondge

Here is an example of taking a File or a creating a file from string

cwlVersion: v1.0
class: CommandLineTool

inputs:
 opt_file:
  type: [File, string]

requirements:
 InlineJavascriptRequirement: {}
 InitialWorkDirRequirement:
   listing: |
     ${
       if (typeof inputs.opt_file === 'string') {
         return [{"class": "File", "basename": inputs.opt_file, "contents": "" }]; }
       else { return [ inputs.opt_file] ; }
      }

baseCommand: [ ls, -l ]

arguments: [ $(inputs.opt_file) ]

outputs: []

HrishiDhondge · March 11, 2022, 7:11am

Thank you @mrc
I am almost in the final stage of the workflow, so I’ll try to finish it and hopefully, I will be able to share the code and everything to build a test case for loops.

How may I implement the default value from this example for File? I am trying to access the file from it’s location and read it inside the tool?

mrc · March 12, 2022, 3:34pm

To add a default value to opt_file, try the example below using your desired default file name instead of my_default_file_name.txt

inputs:
 opt_file:
  type: [File, string, "null"]
  default: my_default_file_name.txt

Or you can provide a default value at the workflow or workflow step level, which will flow to this CommandLineTool. Your choice!

To get the path of the file, use $(inputs.opt_file) like in my example

HrishiDhondge · March 12, 2022, 8:34pm

mrc:

InitialWorkDirRequirement:
   listing: |
     ${
       if (typeof inputs.opt_file === 'string') {
         return [{"class": "File", "basename": inputs.opt_file, "contents": "" }]; }
       else { return [ inputs.opt_file] ; }
      }

Just an additional comment would be add writable= true in if statement to create a file from string with read, write access. Without the writable option, it create files with read access only.

InitialWorkDirRequirement:
   listing: |
     ${
       if (typeof inputs.opt_file === 'string') {
         return [{"class": "File", "basename": inputs.opt_file, "contents": "" , writable: true}]; }
       else { return [ inputs.opt_file] ; }
      }