We are interested in the ontology functionality of cwl. We work in the image processing domain and we would like to have cwl run a validation on the ontologies before executing a workflow.
I am under the impression that this exists in CWL. We don’t have existing ontologies in EDAM yet so we were hoping that it could perform an exact match on the format field for now.
This is a pretty simple use case. You can execute cwltool --validate to perform a check.
I expect that validate will fail because the input and output directories format fields do not match.
thresholding.cwl
cwlVersion: v1.0
id: "Thresholding plugin"
class: CommandLineTool
requirements:
DockerRequirement:
dockerPull: wipp/wipp-thresh-plugin:1.1.1
InlineJavascriptRequirement: {}
InitialWorkDirRequirement:
listing:
- entry: $(inputs.output)
writable: true
baseCommand: [""]
inputs:
input:
type: Directory
label: Input collection of ome.tiff files for the thresholding plugin.
format: "string"
inputBinding:
prefix: --input
output:
type: string
inputBinding:
prefix: --outDir
outputs:
thresholdOut:
type: Directory
label: Output collection of ome.tiff files for for the thresholding plugin.
format: "FAIL"
outputBinding:
glob: "$(inputs.output.basename)"
My understanding is that the ontology is used to validate that the file format matches what was specified in the CWL file, but it was not easy to use that to validate the directory or its contents - for a File it is probably easier to simply verify it is a fasta or a tiff file, but for the directory I’m not sure whether it would check if it’s a symlink, or the permissions of the directory, or its contents or how that would work hierarchically…
Would you be able to elaborate more on your use case for format + Directory here or in an issue on GitHub? Maybe it could be useful if others have a similar use case, or if a CWL dev has an idea on how to implement this.
A CWL Directory itself has a name and it can contain other directories in addition to files; those directories can also contain files and other directories and so on.
A File array doesn’t have its own name, and cannot contain a Directory
functionally, I have found that the biggest difference is that once you put your File's into a Directory, you lose the ability to refer to them by reference downstream. If there is a way to do this, it would be great to know. As such, its been a lot easier to use either File arrays for collections of files (where order does not matter), or array's of record types where each record can have some kind of label field plus and File field in order to identify the individual files. If you want your files in a specific directory then I think its best to implement that as the very last step in your workflow unless you know that throughout your workflow you will never need that file as an input again later.