I am creating a cwl workflow, where I am writing the steps to run. The first step is regarding a Docker container, which is used to install some Python libraries. Then, I want the other cwl files to point to the libraries installed within the Docker container, but I am having issues on how to do that.
This is how the install_docker_test.cwl looks like:
However, the issue is that the script run inside download_files.cwl does not point to the Docker container, even though I have install_libraries steps listed before. How to fix this and to make sure that the python script uses the Docker container instead of the generic python installed in the wsl?
The install_requirements and download_files steps are isolated from one another, so installing into the container like that won’t work – the download_files step starts in a fresh container environment so any changes made by install_requirements are wiped away.
Perhaps you could explain a little more what install_requirements needs to do? I suspect what you want to do is put that in a Dockerfile to build your image.
The install_requirements step triggers install_docker_test.cwl to run, which has the requirement for the Docker image to exist. I have a Docker file, where I am installing the Python libraries that I need in the other steps (here: download_files). Here is the example of the Docker file:
FROM python:3.8
WORKDIR /base
COPY example/ /base/example/
RUN pip install /base/example/custom_lib
COPY requirements.txt /base/example/requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip \
python -m pip install -r /base/example/requirements.txt
RUN pip install torch
ENV PYTHONUNBUFFERED 1
COPY . .
The reason why I’ve separated the installation of the libraries with Docker is because the installation step in the workflow (by using a separate cwl file which would have the command as argument to install the Python libraries) would make use of too much memory: Max memory used: 213MiB
But I suspect you have the wrong mental model of the workflow. You can’t install files in step 1 and then find them there in step 2, unless you specified the files as output in step 1 and connected it as in input to step 2.
The reason is that step 1 and step 2 are running in different containers, and could be running on entirely different machines.
You want to build your docker image with all of its software dependencies at the command line (or a separate shell script) and then launch the workflow. CWL has limited support for specifying how to build Docker images.