Isolated conda environments for steps in cwltool

davidjsherman · June 8, 2021, 11:35am

What would we need to do (or change) to build an isolated Conda environment for each workflow step?

We have tools with SoftwarePackage declarations that have conflicting dependencies, that cannot be resolved in a single Conda environment. Using --beta-conda-dependencies and --no-container, cwltool creates a directory ./cwltool_deps that is shared by all workflow steps. This can be overridden in a dependency resolver file, but seems to be always global for the workflow. We need to build a separate environment for each step.

I think that the backend galaxy.tools.deps.conda package will build an isolated environment if cache_path is empty, and guess that we could pin that in the Conda entry in a custom dependency resolver file. But I don’t see how to push that value towards the resolver.

This might also provide the second mechanism suggested in workflow with parallel steps and --beta-conda-dependencies failed due to many conda install at the same time · Issue #1331 · common-workflow-language/cwltool · GitHub.

I understand of course that we could use a user-space container, but we would specifically prefer to leverage our existing Conda setup, without building a bespoke container image each time.

jjkoehorst · January 5, 2022, 3:02pm

I am very curious to this as well did you manage to find a solution for this?

jjkoehorst · January 5, 2022, 4:06pm

A current workaround that we are testing is:

baseCommand: ["bash", "script.sh"]


InitialWorkDirRequirement:
    listing:
      - entryname: script.sh
        entry: |-
          #!/bin/bash
          source /root/miniconda/bin/activate
          conda init bash
          conda activate /unlock/infrastructure/conda/medaka
          medaka_consensus $@

steve · March 1, 2022, 3:14am

you can install your conda stack inside a Docker or Singularity container

you only need to build your container once then push it up to your container registry for it to work in the tool