Cwl_utils Parser Hints vs Requirements

Hello,

I’m working on implementing a CWL parser for the LANL BEE workflow engine that uses the cwl_utils Python package to parse and verify CWL v1.2 files and I’m having some trouble with CWL hints and requirements.

I noticed that the data structure returned for hints by the CWL v1.2 parser is a list of ordereddict objects while for requirements it is a list of distinct objects for each requirement class (e.g. DockerRequirement object, NetworkAccess object, etc). I was wondering if I could get some insight into why this implementation decision was made so as to guide our own parser design decisions.

Specifically, I was wondering why hints and requirements use different data structures and how this influences the way cwltool/cwl-runner treats the two in its implementation.

Thanks,
Steven Anaya

Welcome @sanaya !

Because hints can contain extensions that are not part of the specification, they require extra parsing by the implemenation-agnostic cwl_utils

Here is an example of how to do that extra parsing: https://github.com/common-workflow-language/cwl-utils/blob/8dee31d95177a47c1a85bd7883b0f09e51804c36/cwl_utils/cite_extract.py#L35

We might improve this situation as part of CWL v1.2.1: Implementing Definition of `hints` is inconsistent · Issue #896 · common-workflow-language/common-workflow-language · GitHub would enable cwl_utils, cwljava and future generated CWL parsers to automatically detect known hints

The CWL reference runner cwltool does not currently use cwl_utils as it predates that package. We hope to someday refactor cwltool to build on top of cwl_utils.

For cwltool the data structures are rather messy and untyped. Once the document is validated to not contain any requirements that it doesn’t support then cwltool just does lookups for specific extensions, and for the hints field it would just ignore any extension hint that it doesn’t know about.

As a reminder: cwl-runner is the generic command line name for any CWL implementation. If cwl-runner is install, then it might point to cwltool, toil-cwl-runner, arvados-cwl-runner, etc…

I didn’t know this about hints. Is this so that custom CWL parsers based on cwl_utils can implement these extensions specially or are they simply meant to add additional contextual or platform-dependent requirements? What role do hints typically have in a real workflow, as opposed to requirements?