Working offline with singularity

I am trying to get a workflow running on a HPC cluster, using this docker container via singularity: quay.io/biocontainers/biobb_md:0.1.5--py_0

Unfortunately the HPC compute nodes do not have internet access - so when TOIL-CWL tries to connect to dockerhub from a task it fails (running in single machine mode on the login node works fine - so I think my scripts are okay).

I’ve tried setting the environmental variable CWL_SINGULARITY_CACHE, but this does not help, even though I think it should stop cwltool trying to download the image (see line 174 of cwltool/singularity.py: if (force_pull or not found) and pull_image - found should be true, and force_pull shouldn’t be true?), and the command that cwltool tries to run at this point does include the cache path, but I get this error:

Command ‘[‘singularity’, ‘pull’, ‘–force’, ‘–name’, ‘/work/ta004/ta004/lowe/.singularity/cache/oci-tmp/1bf5ca8ce3c83e8b55ed203e8d64545c3153988349953dc08ca6abac1df8608f/quay.io_biocontainers_biobb_md:0.1.5–py_0.sif’, ‘docker://quay.io/biocontainers/biobb_md:0.1.5–py_0’]’ returned non-zero exit status 255.

Any suggestions on what I might be able to do to get cwltool to stop trying to pull the image?

Just a quick drive by on a Friday evening, have you tried https://github.com/common-workflow-language/cwl-utils/blob/dc3998ffcbb7cd68bc9861ef5c2d307fe26cb949/cwl_utils/docker_extract.py to pre-cache the Singularity container images?

That is the exact tool I was looking for! Pulling the container image using docker_image -s to the directory defined by CWL_SINGULARITY_CACHE allows the image to be found, and the pull action is avoided. Thanks :slight_smile:

1 Like

Glad to hear it! I just added a direct reference to this script on the CWL homepage. Hopefully the next person can find it more easily. Thanks for asking and for the reminder!