How to use CUDARequirement?

marveso · January 4, 2024, 1:51pm

Hello guys, I’d appreciate if you could help me out with assigning a GPU to a docker container within CWL. I’ve been searching how to do this for hours, but I couldn’t manage to make it work…

I’m trying to use the cwltool.CUDARequirement extension in my CWL specification, but once I execute the workflow, I get the following error that is tied to the CUDARequirement requirement:

Field 'class' contains undefined reference to 
'http://commonwl.org/cwltool#CUDARequirement'

This is the implementation of said requirement:

requirements:
  cwltool:CUDARequirement:
    cudaVersionMin: "11.4"
    cudaComputeCapability: "3.0"
    cudaDeviceCountMin: 1
    cudaDeviceCountMax: 8

And here’s the specified namespace:

$namespaces:
  cwltool: "http://commonwl.org/cwltool#"

I wonder whether that URL is still valid as if you do go to the http://commonwl.org/cwltool#CUDARequirement you will notice that actually it doesn’t really lead anywhere.

For reproducibility purposes, here’s a simple example of an a CWL CommandLineTool specification (example.cwl) that uses cwltool:CUDARequirement, yet it still throws the same error.

cwlVersion: v1.2
class: CommandLineTool

requirements:
  InitialWorkDirRequirement:
    listing: 
    - $(inputs.scriptFile)
  DockerRequirement:
    dockerPull: docker.io/pytorch/pytorch
  NetworkAccess:
    networkAccess: true
  cwltool:CUDARequirement:
    cudaComputeCapabilityMin: '3.0'
    cudaDeviceCountMax: 8
    cudaDeviceCountMin: 1
    cudaVersionMin: '11.4'

stdout: output.txt

inputs:
  scriptFile:
    type: File
    default:
      class: File
      path: script.py
    inputBinding:
      position: 1

outputs:
  stdout_output:
    type: stdout

baseCommand: "python"

$namespaces:
  cwltool: "http://commonwl.org/cwltool#"

The contents of script.py:

import torch

if __name__ == "__main__":
    print("CUDA AVAILABLE:", torch.cuda.is_available())

I use cwltool as a CWL runner. In other words, to execute this specification, I perform the following command: cwltool example.cwl.

If you know of any other method how to access and utilize the host GPU devices in docker container within the CWL, feel free to share them.

Thank you for your assistance, in advance.

mrc · January 4, 2024, 6:43pm

Welcome @marveso ; which CWL runner do you use? For example, the CWL reference runner requires the --enable-ext flag to use extensions.

marveso · January 8, 2024, 9:20am

Hey, thanks for the help, I didn’t know that flag was necessary, even though it wasn’t the only thing to fix though.

Luckily, I’ve finally found out what the issue was. Besides cwltool, I’ve also installed in my environment REANA, that downgraded my cwltool version to 3.1.2021xxxx version that apparently doesn’t support the said extension, hence the errors. After upgrading the cwltool to a current version, it worked.

In other words, I had to:

upgrade cwltool to the current version (as of this moment, the current version was 3.1.20231207110929)
use --enable-ext flag when executing the workflow