Cwltool: resolve conda dependencies with local mirror

Dear CWL community,

we are looking for a way to run CWL workflows with conda dependencies on an HPC.
The HPC has no internet access, but local mirrors of bioconda and conda-forge are available, aliases of which can be specified in .condarc.
One can manually install a conda package (e.g. fastqc) on the HPC with conda install -c http://<address/local/mirror>/bioconda --override-channels fastqc.
Now I would like to avoid manual installation of each tool and keep the cwl document as portable as possible.

The following minimal example works well on my own machine, with cwltool --beta-conda-dependencies workflow.cwl.

workflow.cwl:

#!/usr/bin/env cwl-runner
cwlVersion: v1.2
class: CommandLineTool

hints:
  SoftwareRequirement:
    packages:
      fastqc:
        version: [ "0.12.1" ]
        specs: [ "https://anaconda.org/bioconda/fastqc" ]

baseCommand: ["which", "fastqc"]

inputs: []

outputs: []

However, this won’t work on the HPC as it tries to download from https://anaconda.org/.

Would it be possible to specify aliases to the local conda mirrors via the --beta-dependency-resolvers-configuration flag and a dependency-resolvers-conf.yml similar to the examples described in the cwltool docs (https://cwltool.readthedocs.io/en/latest/#leveraging-softwarerequirements-beta)?

Thanks in advance for any help or suggestions in alternative directions.

Hello @Brilator! It isn’t exactly what you asked for, but did you try running cwltool --beta-conda-dependencies --beta-dependencies-directory path/to/local/conda_env pointing to a local conda environment that already has the needed programs installed?

@nsoranzo suggests

I think it should be enough to add conda_ensure_channels: ["channel1_URL", "channel2_NAME", ...] to the app_config dict in cwltool/software_requirements.py:DependenciesConfiguration.build_job_script() .

Thanks for you quick responses!
This would also help as a first step to ship the data and analysis as a whole.

Unfortunately, I did not get it to run with the conda environment created using above command on macOS (differing OS might also be an issue?)
I’ve tried

cwltool --beta-conda-dependencies --beta-dependencies-directory ./cwltool_deps/_conda workflow.cwl
cwltool --beta-conda-dependencies --beta-dependencies-directory ./cwltool_deps/_conda/envs/ workflow.cwl
cwltool --beta-conda-dependencies --beta-dependencies-directory ./cwltool_deps/_conda/envs/__fastqc@0.12.1 workflow.cwl

but cwltool keeps trying to pull:

requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2ae2eaaf0bc0>, 'Connection to github.com timed out. (connect timeout=30)'))
WARNING Conda installation requested and failed.

Hi,

I edited the app_config in cwltool/software_requirements.py to:

app_config = {
    "conda_auto_install": True,
    "conda_auto_init": True,
    "debug": builder.debug,
    "conda_ensure_channels": [
        "http://<address/local/mirror>/main",
        "http://<address/local/mirror>/bioconda", 
        "http://<address/local/mirror>/conda-forge",
    ]
}

and managed to install from source with pip install '.[deps]'.

Unfortunately, this did not do the trick.

The related error message

requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2ace5bc988e0>, 'Connection to github.com timed out. (connect timeout=30)'))
WARNING Conda installation requested and failed.
ERROR Failed to install conda

Setting the conda_auto_* to false obviously prevents the installation request.

I also tried adding the path to a local conda exe (deduced from planemo docs, that this might be an option Commands — Planemo 0.75.26 documentation).
And I added the full URLs to the local conda channels.

So my app_config now looks like this.

  app_config = {
            "conda_auto_install": False,
            "conda_auto_init": False,
            "conda_exec": "/software/conda/mambaforge/24.3.0/condabin/conda",
            "debug": builder.debug,
            "conda_ensure_channels": [
               "http://<address/local/mirror>/conda-forge/linux-64",
               "http://<address/local/mirror>/conda-forge/noarch",
               "http://<address/local/mirror>/main/linux-64",
               "http://<address/local/mirror>/main/noarch",
               "http://<address/local/mirror>/bioconda/linux-64",
               "http://<address/local/mirror>/bioconda/noarch",
            ]
        }

It still won’t install package dependencies.

@Brilator What’s the error (if any) with the latest app_config ?

Hi @nsoranzo,

basically none related to the conda dependency.

INFO cwltool 0.1.dev4672+g8f00f73
INFO Resolved 'workflow.cwl' to '<...>/workflow.cwl'
INFO [job workflow.cwl] /var/tmp/pbs.13533619.hpc-batch/lpchmx7o$ which \
    fastqc
which: no fastqc in <printing my $PATH>
WARNING [job workflow.cwl] exited with status: 1
WARNING [job workflow.cwl] completed permanentFail
{}WARNING Final process status is permanentFail

By the way, should I be worried about the dev version?
cwltool 0.1.dev4672+g8f00f73

I should mention, that I have since removed this spec.

Should be "conda_auto_install": True, (source)

conda_auto_install: Conda dependency resolution […] will attempt to install requested but missing packages.

But you still have the SoftwareRequirement, yes?

Yes, exactly. The rest is unchanged.

Not sure, if that helps.
I’ve tried cwltool conda-info.cwl to see which conda channels are being used.

conda-info.cwl:

#!/usr/bin/env cwl-runner
cwlVersion: v1.2
class: CommandLineTool

baseCommand: ["conda", "info"]

inputs: []

outputs: []

returning:

     active environment : None
       user config file : /var/tmp/pbs.13538082.hpc-batch/zkge913q/.condarc
 populated config files : /software/conda/mambaforge/24.3.0/.condarc
          conda version : 24.3.0
    conda-build version : not installed
         python version : 3.10.14.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=skylake_avx512
                          __conda=24.3.0=0
                          __glibc=2.17=0
                          __linux=3.10.0=0
                          __unix=0=0
       base environment : /software/conda/mambaforge/24.3.0  (read only)
      conda av data dir : /software/conda/mambaforge/24.3.0/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /software/conda/mambaforge/24.3.0/pkgs
                          /var/tmp/pbs.13538082.hpc-batch/zkge913q/.conda/pkgs
       envs directories : /var/tmp/pbs.13538082.hpc-batch/zkge913q/.conda/envs
                          /software/conda/mambaforge/24.3.0/envs
               platform : linux-64
             user-agent : conda/24.3.0 requests/2.31.0 CPython/3.10.14 Linux/3.10.0-1160.49.1.el7.x86_64 centos/7.9.2009 glibc/2.17 solver/libmamba conda-libmamba-solver/24.1.0 libmambapy/1.5.8
                UID:GID : 78374:511
             netrc file : None
           offline mode : False

So it neither includes the channels added to app_config nor ~/.condarc.