Tool 'activate'

Hello, CWL community!

Question, in short: Is there a clean way to use cwltool or some other runner to go through execution environment setup steps, without executing the actual base command?

Some context. What I would like to achieve is starting an interactive shell in the execution environment that the tool’s command would be run in, analogous to a venv’s “activate”. Conceptually, I’m thinking of a feature where the runner skips the input/output validation and binding, still prepares the workdir and salient requirements, and runs bash or similar in interactive mode.

To offer some more background, our team has been relying on containers to provide ‘easy’ access pretty gnarly software tools. As we’re moving to rely more on CWL, we have also started to move some of the environment setup out of the dockerfile and into the CWL tool description, relying on the cwltool runner to do some of the prep (e.g. InitialWorkDirRequirement to instruct the runner to do the mount/bind). That also means that the previous functionality of ‘activating’ a tool by simply attaching the user to the underlying tool container image no longer works, as various setup logic has been moved into the cwl runner.

cwltool has overrides for requirements and targets for workflow steps, but I don’t see a way to make that work for what I’m trying to achieve. Am I missing something obvious? Is there some easy way to make this work beyond: a) maintaining a parallel cwl tool description (not ideal) or b) adding the feature to cwltool ourselves?

Thanks!

Welcome @yvdriess !

This is a great idea, and not one that I’ve heard of before!

Unless someone else knows of an external tool to do this, one would need to

  1. enhance cwltool.job._job_popen() to not redirect stdin / stdout / stderr
  2. insert bash or other shell instead of the baseCommand and arguments and other inputBindings in cwltool.process.Process._init_job().

This new feature would combine well with --single-step or --single-process for selecting which step to enter the interactive environment for.

Thanks to a search alert for CWL keywords, I just found this hack by Daniel Loos for debugging a CWL CommandLineTool or Workflow step from within the software container without having to change the CWL runner. https://gist.github.com/danlooo/56e4e04c4a27ec6649baf4e5d44d5eea

I haven’t tested it, but it looks reasonable. Adapt the inputs to match your step. Swap the DockerRequirement for your own container (that has at least python installed).

(reposting my comment here)

Essentially:

  • Generate a new document based on the given CommandLineTool: adding a script to the InitialWorkDirRequirement entries and replacing the basecommand with the invocation of said script. (For my use case: remove the Input and Ouput entries too)

  • The tool is run though the cwltool as usual.

  • The script being run by the baseCommand has two purposes:

    • communicate the execution environment’s details (viz. running containerID), and

    • simply hang, it’s purpose is to keep its execution environment running.

  • From outside the cwltool process you connect to the execution environment (viz docker run containerID) and do your thing

  • Stop the hung script.

The coordination mechanism between the hanging script and the ‘outside’ is by local file (inside the container), as stdout/stderr is redirected by the cwltool. I assume what would also work in my case is to SIGSTOP/SIGCONT the script process, unless that gets caught by cwltool’s container runner.