Cannot get `cwltool` to run with `--podman` when step has `DockerRequirement`

Hi all :wave:, and thanks for providing this forum.

I am new to workflows and pretty new to containers, so please excuse any newbie blunders.

I am learning CWL via this Carpentries Incubator lesson.

As I want to run my workflow on an HPC system later, I’m replacing the use of cwltool in the tutorial with the use of toil-cwl-runner which, however, uses cwltool under the hood.

Also, because it is a requirement on our systems, we use podman instead of docker and I was delighted to see that cwltool offers a flag --podman.

However, I cannot get the first single step workflow example from the tutorial to run with either cwltool or toil and --podman.

I’m posting an example error below which throws a cwltool.errors.WorkflowException related to the missing Docker. I think the issue may be related to the one workflow step having a DockerRequirement though.

This may also just be because I don’t understand enough of podman and docker and got something wrong in the setup there.

Anyhow, the workflow calls just one step with these hints:

#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool

hints:
  DockerRequirement:
    dockerPull: quay.io/biocontainers/fastqc:0.11.9--hdfd78af_1
  SoftwareRequirement:
    packages:
      fastqc:
        specs: [ "http://identifiers.org/biotools/fastqc" ]
        version: [ "0.11.9--hdfd78af_1", "0.11.9" ]

I have pulled the container requirements mentioned in the tutorial setup as well as the one from the DockerRequirement manually in podman:

$> podman images -a
REPOSITORY                      TAG                  IMAGE ID      CREATED        SIZE
quay.io/biocontainers/cutadapt  3.7--py39hbf8eff0_1  1a64f7e399a3  8 months ago   174 MB
quay.io/biocontainers/samtools  1.14--hb421002_0     33de2337c097  13 months ago  74.8 MB
quay.io/biocontainers/fastqc    0.11.9--hdfd78af_1   cc8b303fee58  20 months ago  602 MB
quay.io/biocontainers/fastqc    0.11.5--hdfd78af_5   9cd78d3e9edf  20 months ago  601 MB
quay.io/biocontainers/star      2.7.5c--0            8993688148d9  2 years ago    17.2 MB
quay.io/biocontainers/subread   1.5.0p3--0           e49b037a2f50  6 years ago    44.5 MB

However, when running the workflow, I get a log full of errors like the one below, complaining that Docker isn’t available:

[2022-11-17T14:52:17+0100] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
[2022-11-17T14:52:17+0100] [MainThread] [I] [toil] Running Toil version 5.7.1-b5cae9634820d76cb6c13b2a6312895122017d54 on host OBFUSCATED.
[2022-11-17T14:52:17+0100] [MainThread] [I] [toil.worker] Working on job 'CWLJob' rna_seq_workflow_1.cwl.quality_control.fastqc_2_podman.cwl kind-CWLJob/instance-iycjn0s8 v4
[2022-11-17T14:52:18+0100] [MainThread] [I] [toil.worker] Loaded body Job('CWLJob' rna_seq_workflow_1.cwl.quality_control.fastqc_2_podman.cwl kind-CWLJob/instance-iycjn0s8 v4) from description 'CWLJob' rna_seq_workflow_1.cwl.quality_control.fastqc_2_podman.cwl kind-CWLJob/instance-iycjn0s8 v4
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
[2022-11-17T14:52:18+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2022-11-17T14:52:18+0100] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-ecac9c91ada040f4bc9fe528902e71b6/GSM461177_2_subsampled.fastqsanger' to path '/tmp/76b9115df48f50138cbbb189eed16e24/9f36/f2df/tmpu1c2stxi.tmp'
Traceback (most recent call last):
  File "/home/OBFUSCATED/src/novice-tutorial-exercises/env/lib/python3.10/site-packages/cwltool/job.py", line 805, in run
    self.get_from_requirements(
  File "/home/OBFUSCATED/src/novice-tutorial-exercises/env/lib/python3.10/site-packages/cwltool/docker.py", line 232, in get_from_requirements
    if self.get_image(
  File "/home/OBFUSCATED/src/novice-tutorial-exercises/env/lib/python3.10/site-packages/cwltool/docker.py", line 120, in get_image
    subprocess.check_output(  # nosec
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['docker', 'images', '--no-trunc', '--all']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/OBFUSCATED/src/novice-tutorial-exercises/env/lib/python3.10/site-packages/toil/worker.py", line 407, in workerScript
    job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer)
  File "/home/OBFUSCATED/src/novice-tutorial-exercises/env/lib/python3.10/site-packages/toil/job.py", line 2406, in _runner
    returnValues = self._run(jobGraph=None, fileStore=fileStore)
  File "/home/OBFUSCATED/src/novice-tutorial-exercises/env/lib/python3.10/site-packages/toil/job.py", line 2324, in _run
    return self.run(fileStore)
  File "/home/OBFUSCATED/src/novice-tutorial-exercises/env/lib/python3.10/site-packages/toil/cwl/cwltoil.py", line 2209, in run
    output, status = ToilSingleJobExecutor().execute(
  File "/home/OBFUSCATED/src/novice-tutorial-exercises/env/lib/python3.10/site-packages/cwltool/executors.py", line 143, in execute
    self.run_jobs(process, job_order_object, logger, runtime_context)
  File "/home/OBFUSCATED/src/novice-tutorial-exercises/env/lib/python3.10/site-packages/toil/cwl/cwltoil.py", line 930, in run_jobs
    return super().run_jobs(process, job_order_object, logger, runtime_context)
  File "/home/OBFUSCATED/src/novice-tutorial-exercises/env/lib/python3.10/site-packages/cwltool/executors.py", line 251, in run_jobs
    job.run(runtime_context)
  File "/home/OBFUSCATED/src/novice-tutorial-exercises/env/lib/python3.10/site-packages/cwltool/job.py", line 856, in run
    raise WorkflowException(
cwltool.errors.WorkflowException: Docker is not available for this tool, try --no-container to disable Docker, or install a user space Docker replacement like uDocker with --user-space-docker-cmd.: Command '['docker', 'images', '--no-trunc', '--all']' returned non-zero exit status 1.
[2022-11-17T14:52:18+0100] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host OBFUSCATED

Has anyone experienced something similar and knows a solution? Currently testing on WSL2 with Ubuntu 20.04.
Thanks!

Okay, so it seems that the cwltool uses podman to run the containers, but they still need to be “supplied” (pulled and provided) by dockerd?

I think this is a toil-cwl-runner or cwltool error; I’ll look into it

1 Like

Thanks. FWIW, --singularity works fine in the same setup, and I’m assuming could potentially have a relatively parallel implementation.

I can confirm this is a toil-cwl-runner bug as cwltool --podman works with that workflow. Could you open an issue with them?

I can confirm this is a toil-cwl-runner bug as cwltool --podman works with that workflow.

Perhaps this is a local problem, but I cannot get this to work:

(env) $> cwltool --podman --cachedir cache rna_seq_workflow_1.cwl workflow_input_1.yml
INFO /home/stephan/src/novice-tutorial-exercises/env/bin/cwltool 3.1.20220628170238
INFO Resolved 'rna_seq_workflow_1.cwl' to 'file:///home/stephan/src/novice-tutorial-exercises/rna_seq_workflow_1.cwl'
INFO [workflow ] start
INFO [workflow ] starting step quality_control
INFO [step quality_control] start
INFO [job quality_control] Output of job will be cached in /home/stephan/src/novice-tutorial-exercises/cache/4d0f5a0f4fd376cf78c09e99970cdca8
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
ERROR Workflow error, try again with --debug for more information:
Docker is not available for this tool, try --no-container to disable Docker, or install a user space Docker replacement like uDocker with --user-space-docker-cmd.: Command '['docker', 'images', '--no-trunc', '--all']' returned non-zero exit status 1.

The actual error seems to stem from cwltool, as reported in this part of the --debug stacktrace:

DEBUG Docker error
Traceback (most recent call last):
  File "/home/stephan/src/novice-tutorial-exercises/env/lib/python3.10/site-packages/cwltool/job.py", line 805, in run
    self.get_from_requirements(
  File "/home/stephan/src/novice-tutorial-exercises/env/lib/python3.10/site-packages/cwltool/docker.py", line 232, in get_from_requirements
    if self.get_image(
  File "/home/stephan/src/novice-tutorial-exercises/env/lib/python3.10/site-packages/cwltool/docker.py", line 120, in get_image
    subprocess.check_output(  # nosec
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['docker', 'images', '--no-trunc', '--all']' returned non-zero exit status 1.

If I manually change l. 121 to ["podman", "images", "--no-trunc", "--all"] it works like a spell.
The only thing def get_image would need would be another one of those if runtimeContext.podman checks I think. Would you accept a pull request for this?

Due to podman holding a different fastqc image from the Setup step
quay.io/biocontainers/fastqc 0.11.5--hdfd78af_5 9cd78d3e9edf 20 months ago 601 MB
the step tries to pull quay.io/biocontainers/fastqc:0.11.9--hdfd78af_1, and fails to do so because the docker daemon isn’t running.

The workflow I’m trying to run is this, with the single step below (cloned from the lesson repository):

cwlVersion: v1.2
class: Workflow

inputs:
  rna_reads_fruitfly: File

steps:
  quality_control:
    run: bio-cwl-tools/fastqc/fastqc_2.cwl
    in:
      reads_file: rna_reads_fruitfly
    out: [html_file]

outputs:
  quality_report:
    type: File
    outputSource: quality_control/html_file
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool

hints:
  DockerRequirement:
    dockerPull: quay.io/biocontainers/fastqc:0.11.9--hdfd78af_1
  SoftwareRequirement:
    packages:
      fastqc:
        specs: [ "http://identifiers.org/biotools/fastqc" ]
        version: [ "0.11.9--hdfd78af_1", "0.11.9" ]

inputs:

  reads_file:
    type: File
    inputBinding:
      position: 50
    doc: |
      Input bam,sam,bam_mapped,sam_mapped or fastq file

  format_enum:
    type:
      - "null"
      - type: enum
        name: "format"
        symbols: ['bam','sam','bam_mapped','sam_mapped','fastq']
    inputBinding:
      position: 6
      prefix: '--format'
    doc: |
      Bypasses the normal sequence file format detection and
      forces the program to use the specified format.  Valid
      formats are bam,sam,bam_mapped,sam_mapped and fastq

  threads:
    type: int?
    inputBinding:
      position: 7
      prefix: '--threads'
    doc: |
      Specifies the number of files which can be processed
      simultaneously.  Each thread will be allocated 250MB of
      memory so you shouldn't run more threads than your
      available memory will cope with, and not more than
      6 threads on a 32 bit machine

  contaminants:
    type: File?
    inputBinding:
      position: 8
      prefix: '--contaminants'
    doc: |
      Specifies a non-default file which contains the list of
      contaminants to screen overrepresented sequences against.
      The file must contain sets of named contaminants in the
      form name[tab]sequence.  Lines prefixed with a hash will
      be ignored.

  adapters:
    type: File?
    inputBinding:
      position: 9
      prefix: '--adapters'
    doc: |
      Specifies a non-default file which contains the list of
      adapter sequences which will be explicity searched against
      the library. The file must contain sets of named adapters
      in the form name[tab]sequence.  Lines prefixed with a hash
      will be ignored.

  limits:
    type: File?
    inputBinding:
      position: 10
      prefix: '--limits'
    doc: |
      Specifies a non-default file which contains a set of criteria
      which will be used to determine the warn/error limits for the
      various modules.  This file can also be used to selectively
      remove some modules from the output all together.  The format
      needs to mirror the default limits.txt file found in the
      Configuration folder.

  kmers:
    type: int?
    inputBinding:
      position: 11
      prefix: '--kmers'
    doc: |
      Specifies the length of Kmer to look for in the Kmer content
      module. Specified Kmer length must be between 2 and 10. Default
      length is 7 if not specified.

  casava:
    type: boolean?
    inputBinding:
      position: 13
      prefix: '--casava'
    doc: |
      Files come from raw casava output. Files in the same sample
      group (differing only by the group number) will be analysed
      as a set rather than individually. Sequences with the filter
      flag set in the header will be excluded from the analysis.
      Files must have the same names given to them by casava
      (including being gzipped and ending with .gz) otherwise they
      won't be grouped together correctly.

  nofilter:
    type: boolean?
    inputBinding:
      position: 14
      prefix: '--nofilter'
    doc: |
      If running with --casava then don't remove read flagged by
      casava as poor quality when performing the QC analysis.

  hide_group:
    type: boolean?
    inputBinding:
      position: 15
      prefix: '--nogroup'
    doc: |
      Disable grouping of bases for reads >50bp. All reports will
      show data for every base in the read.  WARNING: Using this
      option will cause fastqc to crash and burn if you use it on
      really long reads, and your plots may end up a ridiculous size.
      You have been warned!

outputs:

  zipped_file:
    type: File
    outputBinding:
      glob: '*.zip'
  html_file:
    type: File
    outputBinding:
      glob: '*.html'
  summary_file:
    type: File
    outputBinding:
      glob: "*/summary.txt"

baseCommand: [fastqc, --extract, --outdir, .]

$namespaces:
  s: http://schema.org/

$schemas:
- https://github.com/schemaorg/schemaorg/raw/main/data/releases/11.01/schemaorg-current-http.rdf

s:name: "fastqc_2"
s:license: http://www.apache.org/licenses/LICENSE-2.0

s:creator:
- class: s:Organization
  s:legalName: "Cincinnati Children's Hospital Medical Center"
  s:location:
  - class: s:PostalAddress
    s:addressCountry: "USA"
    s:addressLocality: "Cincinnati"
    s:addressRegion: "OH"
    s:postalCode: "45229"
    s:streetAddress: "3333 Burnet Ave"
    s:telephone: "+1(513)636-4200"
  s:logo: "https://www.cincinnatichildrens.org/-/media/cincinnati%20childrens/global%20shared/childrens-logo-new.png"
  s:department:
  - class: s:Organization
    s:legalName: "Allergy and Immunology"
    s:department:
    - class: s:Organization
      s:legalName: "Barski Research Lab"
      s:member:
      - class: s:Person
        s:name: Michael Kotliar
        s:email: mailto:misha.kotliar@gmail.com
        s:sameAs:
        - id: http://orcid.org/0000-0002-6486-3898

doc: |
  Tool runs FastQC from Babraham Bioinformatics