Working offline with singularity

I am trying to get a workflow running on a HPC cluster, using this docker container via singularity: quay.io/biocontainers/biobb_md:0.1.5--py_0

Unfortunately the HPC compute nodes do not have internet access - so when TOIL-CWL tries to connect to dockerhub from a task it fails (running in single machine mode on the login node works fine - so I think my scripts are okay).

I’ve tried setting the environmental variable CWL_SINGULARITY_CACHE, but this does not help, even though I think it should stop cwltool trying to download the image (see line 174 of cwltool/singularity.py: if (force_pull or not found) and pull_image - found should be true, and force_pull shouldn’t be true?), and the command that cwltool tries to run at this point does include the cache path, but I get this error:

Command ‘[‘singularity’, ‘pull’, ‘–force’, ‘–name’, ‘/work/ta004/ta004/lowe/.singularity/cache/oci-tmp/1bf5ca8ce3c83e8b55ed203e8d64545c3153988349953dc08ca6abac1df8608f/quay.io_biocontainers_biobb_md:0.1.5–py_0.sif’, ‘docker://quay.io/biocontainers/biobb_md:0.1.5–py_0’]’ returned non-zero exit status 255.

Any suggestions on what I might be able to do to get cwltool to stop trying to pull the image?

Just a quick drive by on a Friday evening, have you tried https://github.com/common-workflow-language/cwl-utils/blob/dc3998ffcbb7cd68bc9861ef5c2d307fe26cb949/cwl_utils/docker_extract.py to pre-cache the Singularity container images?

That is the exact tool I was looking for! Pulling the container image using docker_image -s to the directory defined by CWL_SINGULARITY_CACHE allows the image to be found, and the pull action is avoided. Thanks :slight_smile:

1 Like

Glad to hear it! I just added a direct reference to this script on the CWL homepage. Hopefully the next person can find it more easily. Thanks for asking and for the reminder!

@mrc Isn’t there some way to just point to a Singularity image file on disk directly?

CWL uses Docker format software containers, if software containers are used at all. So we can’t point to a Singularity format container from within a CWL description. We are happy for Singularity users to use Singularity as their Docker-format-aware software container engine, for sure!

I guess I must be missing something, because when I run

cwltool --singularity workflow.cwl input.json

and I have a DockerRequirement of ubuntu:latest, the file ubuntu:latest.sif gets created in my local directory, and the execution commands used in the workflow point back to that exact .sif file on disk. The .sif container may have been created by singularity pulling from Docker Hub, but I could have just as easily (actually, more easily in many cases) created the .sif file manually myself directly from Singularity recipe and used it with the CWL instead. Right?

Perhaps, but then you run the risk of your local Singularity image not corresponding to the referenced Docker-format container in some way that matters; and this could cause a problem for you and/or your collaborators in the future.

The CWL standard is built on Docker-format containers for interoperability. If you operate outside the CWL standard then that is on you :slight_smile:

Can the CWL standard be expanded to include native Singularity support, including support for specifying a direct path to an image file on disk to use?

The reliance on Docker and thus Docker Hub causes a lot of overhead issues for us, especially after their “free tier” was changed to include access rate restrictions which essentially force us to either get a paid account else our pipelines simply break, or find some alternative to Docker Hub, or host our own Docker registry, or some combination of these.

If we could just tell the CWL "use Singularity container /path/to/containers/my_tools.sif" it would be a lot simpler.

Yes, CWL could more explicitly support Singularity. I’d like CWL to rely more on actual container standards (that didn’t exist 5 years ago) and less on the Docker ecosystem. However changing the spec is a really complicated topic.

What if there was a cwltool feature that let you provide an explicit map of Docker image names to .sif files?

CWL specifies that the docker format is used as the interchange form. No one is required to use any particular container registry. Docker Hub is not part of the CWL standard. For example, the bio-cwl-tools community repo uses quay.io hosted containers built by biocontainers.pro:

Right, we looked into quay.io, it looks promising but who knows how long it will be useable for free until they also introduce requirements for paid accounts just like Docker Hub did.

The real issue is being forced to either rely on a 3rd party service, or forced to run our own service, just to use containers that likely already exist on our server filesystem. “Have to find a Docker container registry solution” now becomes a pre-requisite for using CWL at all in production. Singularity image files gives you the option to avoid this if you want, but its not useable with CWL :sob:

What if there was a cwltool feature that let you provide an explicit map of Docker image names to .sif files? No registry involved (except the mapping file).

That sounds fine to me. In most of my cases, the listings look something like this (psuedo-code);

container_dir = '/path/to/containers'
bedtools_container = '${container_dir}/bedtools.sif'
bwa_container = '${container_dir}/bwa.sif'
gatk_container =  '${container_dir}/gatk.sif'

Very effective in avoiding registry issues, you just build the container locally on your computer and rsync the .sif file over to the server at the pre-determined location, and your pipeline just looks there each time it goes to run with one of the containers.

@steve If you’re using the CWL reference runner (cwltool) or toil-cwl-runner then I recommend adding SoftwareRequirements and using beta support for dependency resolvers to connect those to the desired Singularity containers.

Maybe a general purpose plugin could be developed to leverage Bulker here? @nsheff

Thanks but do you have an example of what the code for this would look like? Its not clear to me how to use this in order to use Singularity containers that already exist on the filesystem.

Thanks but do you have an example of what the code for this would look like? Its not clear to me how to use this in order to use Singularity containers that already exist on the filesystem.

For an arbitrary mapping, I would suggest asking at https://help.galaxyproject.org/ since we use their code.

Try setting CWL_SINGULARITY_CACHE to the location of your images if they are from a previous cwltool run or the result of GitHub - common-workflow-language/cwl-utils: Python utilities for CWL

Thanks, I just found this env var, however I am not clear what the directory is supposed to look like. For example, SINGULARITY_CACHEDIR has an automatically-created directory structure provided by Singularity. Is CWL_SINGULARITY_CACHE supposed to be the same dir as SINGULARITY_CACHEDIR? Or can it just be a bare directory of .sif files like this?

/path/to/containers
├── container1.sif
├── container2.sif

So it seems to be working better after I configured it like this;

export SINGULARITY_CACHEDIR=/path/to/singularity_cache
export SINGULARITY_TMPDIR=$SINGULARITY_CACHEDIR/tmp
export SINGULARITY_PULLDIR=$SINGULARITY_CACHEDIR/pull
export CWL_SINGULARITY_CACHE=$SINGULARITY_PULLDIR

and CWL_SINGULARITY_CACHE is indeed just a bare directory with .sif files in it

also had to make sure all directories in question existed prior to trying to run the pipelines

Not sure if all these env vars are actually needed but its working now it seems

also worth noting that some of the source code for discovering the Singularity containers appears to be located here, and helped shed some light on the behaviors of cwltool / Toil during execution

1 Like