Creating config JSON file from a record

Hi, I am trying to formalize a JSON input taken as input by biobb command line tools. At the moment we have received these as strings, which is OK, however it means the values are always hardcoded and can’t come from the workflow:

step1_pdb_config: '{"pdb_code" : "1aki"}'
step4_editconf_config: '{"box_type": "cubic","distance_to_molecule": 1.0}'
step6_gppion_config: '{"mdp": {"type":"minimization"}}'
step7_genion_config: '{"neutral": "True"}'
step8_gppmin_config: '{"mdp": {"type":"minimization", "nsteps":"5000", "emtol":"500"}}'
step11_gppnvt_config: '{"mdp": {"type":"nvt", "nsteps":"5000", "dt":0.002, "define":"-DPOSRES"}}'

From a CWL-point of view it is also blackbox for users what fields this JSON expects, unless they look up the biobb documentation, e.g. properties of grompp

In theory I thought this would be a great example of using records, specially as these properties have an underlying JSON schema we could then auto-generate the CWL record declarations for.

However I don’t want to commit to this JSON being embedded in the job file, so want to retain the option of it being passed as a file, similar to --config with the biobb command line.

So here is the idea (using echo instead of grompp for now)

#!/usr/bin/env cwl-runner
cwlVersion: v1.1
class: CommandLineTool

baseCommand: echo

hints:
  DockerRequirement:
    dockerPull: quay.io/biocontainers/biobb_md:0.1.5--py_0

requirements:
  InlineJavascriptRequirement: {}
  InitialWorkDirRequirement:
    listing:
      - entryname: "grompp-config.json"
        entry: $(inputs.config_rec || {}) # empty JSON by default

inputs:
  config_rec:
    doc: ""
    type:
      - "null"
      - $import: grompp-config.cwl # GromppConfig record
    inputBinding:
      #prefix: --config
      # "grompp-config.json" from InitialWorkDirRequirement
      valueFrom: "grompp-config.json"
      # This does not actually work because the inner "default" fields 
      # are not allowed, but cwltool fills in lots of nulls for the optionals:
  config:
    type: 
      - "null"
      - string
      - File
    inputBinding:
      prefix: --config

outputs:
  concatination:
    type: stdout

Thus there can be either a --config to the CWL that works the same as for the existing command line, or a config-rec key (probably from the yaml) that has nested yaml elements without any tricky '["escaping"]'. (I could not make these two --configs exclusive as in http://www.commonwl.org/user_guide/11-records/ without making yet another nesting which I thought was too cumbersome)

I don’t know of a way to make the InitialWorkDirRequirement file creation optional based on the input being specified or not; is there another way? It’s not a big problem as the empty file is just ignored otherwise, anyway I put || {} to make it valid JSON.

Is this stringifiying of the $(inputs.config_rec) record to JSON expected to work across CWL implementations? I found no documentation for this behaviour, but it is exactly what I need. Using $("" + inputs.config_rec) did not work as it gave [Object obj].

Now then the problem was how to make the record definition. I managed to get as far as this:

#cwlVersion: v1.0
type: record
name: GromppConfig
doc: "JSON configuration for invoking Grompp building block"
fields:
    input_mdp_path:
        type: string?
        doc: "Path of the input MDP file."
    mdp:
        doc: "MDP options specification. (Used if *input_mdp_path* is null)"
        s:url: "http://manual.gromacs.org/2020-current/user-guide/mdp-options.html"
        type:
            type: record
            name: MDPOptions
            fields:
                include:
                    type: string?
                    doc: "directories to include in your topology. Format: -I/home/john/mylib -I../otherlib"
                define:
                    type: string?
                    doc: |-
                      defines to pass to the preprocessor, default is no defines. 
                      You can use any defines to control options in your 
                      customized topology files. Options that act on 
                      existing top file mechanisms include:
                
                      -DFLEXIBLE will use flexible water instead of rigid 
                      water into your topology, this can be useful for normal mode analysis.
                
                      -DPOSRES will trigger the inclusion of posre.itp into 
                      your topology, used for implementing position restraints.
                integrator:
                    type: 
                        type: enum
                        name: GrompIntegrator
                        symbols: [md, md-vv, md-vv-avek, sd, bd, steep, cg, l-bfgs, nm, tpi, tpic, mimic]
                        doc: |-
                            Despite the name, this list includes algorithms that are not 
                            actually integrators over time. integrator=steep and all 
                            entries following it are in this category
                #.. 
            ## TODO: Document all of the fields of mdp file 
            ## http://manual.gromacs.org/2020-current/user-guide/mdp-options.html
            ## but as they are expected by biobb in JSON variant:
    type:
        doc: "Default options for the mdp file. Valid values: minimization, nvt, npt, free, index"
        #default: "minimization"
        type:
          - "null"
          - type: enum
            symbols: [minimization, nvt, npt, free, index]
    output_mdp_path:
        type: string?
        #default: "grompp.mdp"
        doc: "Path of the output MDP file."
    output_top_path:
        type: string?
        #default: "grompp.top"
        doc: "Path the output topology TOP file."
    maxwarn:
        type: int?
        #default: 10
        doc: "Maximum number of allowed warnings."
    gmx_path:
        type: string?
        #default: "gmx"
        doc: "Path to the GROMACS executable binary"
    remove_tmp:
        type: boolean?
        #default: true
        doc: "[WF property] Remove temporal files."
    restart:
        type: boolean?
        #default: false
        doc: "[WF property] Do not execute if output files exist."
    container_path:
        type: string?
        doc: "Path to the binary executable of your container."
    container_image:
        type: string?
        #default: "gromacs/gromacs:latest"
        doc: "Container Image identifier to execute gromacs from"
    container_volume_path:
        type: string?        
        #default: "/data"
        doc: "Path to an internal directory in the container."
    container_working_dir:
        type: string?
        doc: "Path to the internal CWD in the container."
    container_user_id:
        type: string?
        doc: "User number id to be mapped inside the container."
    container_shell_path:
        type: string?
        #default: "/bin/bash"
        doc: "Path to the binary executable of the container shell."

s:url: "https://biobb-md.readthedocs.io/en/latest/gromacs.html#module-gromacs.grompp"

$namespaces:
  s: http://schema.org/

$schemas:
- http://schema.org/version/latest/schema.rdf

However as you see I had to comment out default as it was not allowed by cwltool 3.0.20200706173533.

Strangely doc was not permitted, although it is listed on https://www.commonwl.org/v1.1/Workflow.html#InputRecordSchema so I used label instead - another cwltool bug?

The sad thing is that the optional string? and boolean? etc in here do not work, because I get them all filled in with null etc.

Running with a partial grompptest.yaml:

config_rec:
    input_mdp_path: "hello"
    mdp: 
      integrator: steep
    type: nvt
    output_mdp_path: soup
    maxwarn: 100
    restart: true

I then get an output JSON that includes all the other keys that are optional, not just the keys I gave above:

{"container_image": null, "container_path": null, "container_shell_path": null, "container_user_id": null, "container_volume_path": null, "container_working_dir": null, "gmx_path": null, "input_mdp_path": "hello", "maxwarn": 100, "mdp": {"define": null, "include": null, "integrator": "steep"}, "output_mdp_path": "soup", "output_top_path": null, "remove_tmp": null, "restart": true, "type": "nvt"}

Now this is a problem, because as you see in the default comments lots of these keys have other default values, and this will really confuse the underlying Python tool that is called.

The second problem is that I can’t allow a nested arbitrary JSON for my inner MDPOptions - these are very extensive upstream and often change, which means the CWL might become quickly out of date.

Now I seem to be required to specify all of them, but several of them are of a type that is impossible or very difficult to express in Avro Schemas.

How can I do wildcard fields within a record?

By https://www.commonwl.org/v1.1/Workflow.html#InputRecordSchema it says fields is optional, but as raised in cwltool #608 fields is required by cwltool although the CWL spec says it is optional.

Is it too ambitious? Is there a better way of using the existing JSON schemas from CWL for a record?

These files are currently in our cwl-records branch:

Tested with cwltool in this conda environment:

(test5) stain@biggie:~/src/biobb_example_workflow$ cwltool --version

/home/stain/miniconda3/envs/test5/bin/cwltool 3.0.20200706173533
(test5) stain@biggie:~/src/biobb_example_workflow$ 
(test5) stain@biggie:~/src/biobb_example_workflow$ conda list -n test5
# packages in environment at /home/stain/miniconda3/envs/test5:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       0_gnu    conda-forge
bagit                     1.7.0                      py_0    conda-forge
brotlipy                  0.7.0           py38h1e0a361_1000    conda-forge
ca-certificates           2020.6.20            hecda079_0    conda-forge
cachecontrol              0.11.7                     py_0    conda-forge
cairo                     1.16.0            h3fc0475_1005    conda-forge
certifi                   2020.6.20        py38h32f6830_0    conda-forge
cffi                      1.14.0           py38hd463f26_0    conda-forge
chardet                   3.0.4           py38h32f6830_1006    conda-forge
coloredlogs               14.0             py38h32f6830_1    conda-forge
cryptography              2.9.2            py38h766eaa4_0    conda-forge
cwltool                   3.0.20200706173533  py38h32f6830_0    conda-forge
decorator                 4.4.2                      py_0    conda-forge
expat                     2.2.9                he1b5a44_2    conda-forge
fontconfig                2.13.1            h1056068_1002    conda-forge
freetype                  2.10.2               he06d7ca_0    conda-forge
fribidi                   1.0.9                h516909a_0    conda-forge
gettext                   0.19.8.1          hc5be6a0_1002    conda-forge
glib                      2.65.0               h6f030ca_0    conda-forge
graphite2                 1.3.13            he1b5a44_1001    conda-forge
graphviz                  2.42.3               h0511662_0    conda-forge
harfbuzz                  2.4.0                hee91db6_5    conda-forge
html5lib                  1.1                pyh9f0ad1d_0    conda-forge
humanfriendly             8.2              py38h32f6830_0    conda-forge
icu                       67.1                 he1b5a44_0    conda-forge
idna                      2.10               pyh9f0ad1d_0    conda-forge
isodate                   0.6.0                      py_1    conda-forge
jpeg                      9d                   h516909a_0    conda-forge
keepalive                 0.5                        py_1    conda-forge
ld_impl_linux-64          2.34                 h53a641e_0    conda-forge
libffi                    3.2.1             he1b5a44_1007    conda-forge
libgcc-ng                 9.2.0                h24d8f2e_2    conda-forge
libgomp                   9.2.0                h24d8f2e_2    conda-forge
libiconv                  1.15              h516909a_1006    conda-forge
libpng                    1.6.37               hed695b0_1    conda-forge
libstdcxx-ng              9.2.0                hdf63c60_2    conda-forge
libtiff                   4.1.0                hc7e4089_6    conda-forge
libtool                   2.4.6             h14c3975_1002    conda-forge
libuuid                   2.32.1            h14c3975_1000    conda-forge
libwebp-base              1.1.0                h516909a_3    conda-forge
libxcb                    1.13              h14c3975_1002    conda-forge
libxml2                   2.9.10               h72b56ed_1    conda-forge
libxslt                   1.1.33               h572872d_1    conda-forge
lockfile                  0.12.2                     py_1    conda-forge
lxml                      4.5.1            py38hbb43d70_0    conda-forge
lz4-c                     1.9.2                he1b5a44_1    conda-forge
mistune                   0.8.4           py38h1e0a361_1001    conda-forge
mypy_extensions           0.4.3            py38h32f6830_1    conda-forge
ncurses                   6.1               hf484d3e_1002    conda-forge
networkx                  2.4                        py_1    conda-forge
openssl                   1.1.1g               h516909a_0    conda-forge
pango                     1.42.4               h7062337_4    conda-forge
pcre                      8.44                 he1b5a44_0    conda-forge
pip                       20.0.2                     py_2    conda-forge
pixman                    0.38.0            h516909a_1003    conda-forge
prov                      1.5.1                      py_1    conda-forge
psutil                    5.7.0            py38h1e0a361_1    conda-forge
pthread-stubs             0.4               h14c3975_1001    conda-forge
pycparser                 2.20               pyh9f0ad1d_2    conda-forge
pydotplus                 2.0.2              pyhd1c1de3_3    conda-forge
pyopenssl                 19.1.0                     py_1    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pysocks                   1.7.1            py38h32f6830_1    conda-forge
python                    3.8.2           h8356626_5_cpython    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python_abi                3.8                      1_cp38    conda-forge
rdflib                    4.2.2                 py38_1000    conda-forge
rdflib-jsonld             0.5.0            py38h32f6830_0    conda-forge
readline                  8.0                  hf8c457e_0    conda-forge
requests                  2.24.0             pyh9f0ad1d_0    conda-forge
ruamel.yaml               0.16.5           py38h516909a_1    conda-forge
ruamel.yaml.clib          0.2.0            py38h1e0a361_1    conda-forge
schema-salad              7.0.20200612160654            py_1    conda-forge
setuptools                46.1.3           py38h32f6830_0    conda-forge
shellescape               3.4.1                      py_1    bioconda
six                       1.15.0             pyh9f0ad1d_0    conda-forge
sparqlwrapper             1.8.5           py38h32f6830_1003    conda-forge
sqlite                    3.30.1               hcee41ef_0    conda-forge
tk                        8.6.10               hed695b0_0    conda-forge
typing_extensions         3.7.4.2                    py_0    conda-forge
urllib3                   1.25.9                     py_0    conda-forge
webencodings              0.5.1                      py_1    conda-forge
wheel                     0.34.2                     py_1    conda-forge
xorg-kbproto              1.0.7             h14c3975_1002    conda-forge
xorg-libice               1.0.10               h516909a_0    conda-forge
xorg-libsm                1.2.3             h84519dc_1000    conda-forge
xorg-libx11               1.6.9                h516909a_0    conda-forge
xorg-libxau               1.0.9                h14c3975_0    conda-forge
xorg-libxdmcp             1.1.3                h516909a_0    conda-forge
xorg-libxext              1.3.4                h516909a_0    conda-forge
xorg-libxpm               3.5.13               h516909a_0    conda-forge
xorg-libxrender           0.9.10            h516909a_1002    conda-forge
xorg-libxt                1.1.5             h516909a_1003    conda-forge
xorg-renderproto          0.11.1            h14c3975_1002    conda-forge
xorg-xextproto            7.3.0             h14c3975_1002    conda-forge
xorg-xproto               7.0.31            h14c3975_1007    conda-forge
xz                        5.2.5                h516909a_0    conda-forge
zlib                      1.2.11            h516909a_1006    conda-forge
zstd                      1.4.4                h6597ccf_3    conda-forge

This was discussed in CWL meeting 2020-07-28. I think what came out was that the Record’s JSON is an “internal CWL format” however saving it to file from InitialWorkDir is permitted by the spec. (REF?)

The recommendation then was to handle this CWL-specific output on the tool-side which we here are at liberty to do as BioBB are themselves wrappers.

I did not get any response as to why “default” is not working in fields under a Record, would be good to have some comments on that?

Default is missing from records both at the object level and the field level. Would you like to add this to the next CWL version?