Hello,
I would like to know whether is there any way to access the Environment Variable defined in workflow from inside a tool/step of the workflow without explicitly passing it as an argument?
In the CWL tutorials, the only information to define an Env Variable and access it by usually passing it as an argument/input?
But I would like to access it without explicitly passing the variable to a tool. That should be possible as it is there in the environment, but I don’t know how to access it.
Any help would be appreciated on this. Thank you in advance!
I have a variable that I need to pass to more than half of the steps/tools, so I thought it might be a better idea to define it as ENV_VAR and access it directly from tools.
Then that should be an explicit input for each step.
In chat, you used the example of an output directory. That concept is not needed for CWL as any CWL runner must manage your output directories for you.
If you need to know the current output directory, use $(runtime.outdir)
Thank you for the prompt answer.
I have an output directory along with a few more directories, I am working with the splitting of PDB files, so I have one directory for complete PDB files (pdbDir) and another directory for split PDB files (sepDir).
The pdbDir, has to be there to split the PDB files and return output in sepDIr.
Returning output can be done using --outdir option, but passing pdbDir to several steps/tools in workflow is a must.
So I was wondering if there is a simple solution to inherit and access those ENV_VAR defined at workflow level?
Do any of your steps need multiple PDB files and for them to be in the same directory? Then you should have an output of type: Directory; in CWL we do not pass file/directory paths around as strings, as that would break during distributed execution when there is not a shared filesystem.
Or are you thinking more about presenting the final outputs? In that case, the CWL way would be to have individual type: File outputs from steps. Possibly combining them into a directory later, if truly needed.
This prints $(runtime.outdir) as string and not its value; And below code says the baseCommand` field is not valid because tried array of but
try.cwl:24:5: item is invalid because the value is not string
And in addition to this I would like to pass the file as argument/input only if it exists because if file does not exists it gives me error like
[Errno 2] No such file or directory
baseCommand only accepts static strings, once you have a dynamic element you have to start using arguments for it and all remaining elements of the command line (including other static elements). Likewise valueFrom is valid as part of an arguments element, but not in baseCommand.
My workflow consists of multiple iterations and as loops are not implemented in CWL yet. I am writing the workflow for a single iteration and creating the parameter file at the end of each iteration which will be used by the next iteration again.
And for every iteration, I have to check the files from the Results directory, if it exists already read and append data to the current iteration file, if not create an empty file (to avoid the File not found error).
And that was the reason I was not staging files but accessing them through an absolute path. And for the very same reason, I wanted to define the ENV_VAR at the workflow level, so I don’t have to pass the same arguments most of the time and it might be easier for users to run it without much knowledge of CWL.
I am not putting the same code for the number of iterations I want to run, which was the solution I found in one of the posts because I want it to be dynamic and for different use cases the number of iterations are different.
Using path strings instead of File or Directory inputs/outputs will eventually cause you a big problem as it violates a core element of the CWL data model. Distributed execution, using caching, provenance recording, and other scenarios will likely break.
Maybe you can modify your loop handling logic (outside the CWL workflow) to merge results and produce the inputs for the next iteration instead of doing that at the end of your CWL workflow?
Then your CWL workflow can use proper File and/or Directory inputs and outputs.
We are actively discussing how to design a loop feature for CWL, so if you can share more details about your situation then that would be very helpful. A complete copy of your workflow and the loop logic with example inputs and expected outputs would be the best; but if you can’t share the data or code then maybe you can describe it enough thay we can build a mock version together.
For the moment, I can not share many details; but here’s the brief overview of the workflow I’m trying to build:
I create several files in the first iteration of the loop using variables(string as a file name) specified by a user (considering the user might want to change the names accordingly)
And some of these files I need in the next iterations to keep track of everything. So now I have to give the filenames as strings for the first iteration and File for the next iterations.
I am not sure whether we can check if a file already exists using CWL and create an empty file if it does not exist. If this feature is already there, then it would most probably solve my problem of working around with filenames as strings.
Another workaround is to simply stage the whole result directory but that would not be a wise decision because file size is quite large.
Thank you @mrc
I am almost in the final stage of the workflow, so I’ll try to finish it and hopefully, I will be able to share the code and everything to build a test case for loops.
How may I implement the default value from this example for File? I am trying to access the file from it’s location and read it inside the tool?
Just an additional comment would be add writable= true in if statement to create a file from string with read, write access. Without the writable option, it create files with read access only.