CWL Yaml file with location urls for directories

I am trying to use a URL in a yaml file that points to a directory using the following example:

raw_data_dir:
    class: Directory
    location: http://download.systemsbiology.nl/unlock/cwl/test_data/proteomics/raw/

This however throws a

FileNotFoundError: [Errno 2] No such file or directory: 'http://download.systemsbiology.nl/unlock/cwl/test_data/proteomics/raw'
ERROR Workflow error, try again with --debug for more information:
[Errno 2] No such file or directory: 'http://download.systemsbiology.nl/unlock/cwl/test_data/proteomics/raw'

So I assume dir’s are not (yet) supported?

Did you try with path: http://download.systemsbiology.nl/unlock/cwl/test_data/proteomics/raw/ ?

Though there isn’t an HTTP method to list the contents of a directory, alas. FTP has it, though FTP servers are becoming less common (and cwltool does not support FTP by default as the CWL specs don’t require FTP support). For cwltool, we could copy in the FTP code I wrote for cwl-tes and only enable it by an additional command line flag.

> cwltool --outdir OUT --provenance PROV proteomics/maxquant.yaml
INFO /Users/jasperk/mambaforge/bin/cwltool 3.1.20220802125926
INFO [cwltool] /Users/jasperk/mambaforge/bin/cwltool --outdir OUT --provenance PROV proteomics/maxquant.yaml
INFO Resolved 'proteomics/maxquant.yaml' to 'file:///Volumes/Git/m-unlock/cwl/tests/proteomics/maxquant.yaml'
../tools/maxquant/maxquant.cwl:39:3: object id `../tools/maxquant/maxquant.cwl#mqpar` previously defined
../tools/maxquant/maxquant.cwl:39:3: object id `../tools/maxquant/maxquant.cwl#mqpar` previously defined
INFO [provenance] Adding to RO http://download.systemsbiology.nl/unlock/cwl/test_data/proteomics/mqpar.xml
ERROR Got workflow error
Traceback (most recent call last):
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/cwltool/executors.py", line 251, in run_jobs
    job.run(runtime_context)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/cwltool/job.py", line 867, in run
    (runtime, cidfile) = self.create_runtime(env, runtimeContext)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/cwltool/docker.py", line 362, in create_runtime
    self.add_volumes(
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/cwltool/job.py", line 749, in add_volumes
    self.add_writable_directory_volume(
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/cwltool/docker.py", line 320, in add_writable_directory_volume
    shutil.copytree(volume.resolved, host_outdir_tgt)
  File "/Users/jasperk/mambaforge/lib/python3.10/shutil.py", line 557, in copytree
    with os.scandir(src) as itr:
FileNotFoundError: [Errno 2] No such file or directory: 'http://download.systemsbiology.nl/unlock/cwl/test_data/proteomics/raw'
ERROR Workflow error, try again with --debug for more information:
[Errno 2] No such file or directory: 'http://download.systemsbiology.nl/unlock/cwl/test_data/proteomics/raw'
INFO [provenance] Finalizing Research Object
INFO [provenance] Research Object saved to /Volumes/Git/m-unlock/cwl/tests/PROV

Using

cwl:tool: ../../tools/maxquant/maxquant.cwl
fasta:
    class: Directory
    path: http://download.systemsbiology.nl/unlock/cwl/test_data/proteomics/fasta/
raw_data_dir:
    class: Directory
    path: http://download.systemsbiology.nl/unlock/cwl/test_data/proteomics/raw/
mqpar:
    class: File
    location: http://download.systemsbiology.nl/unlock/cwl/test_data/proteomics/mqpar.xml

You can fully test it using:

cwl:tool: https://gitlab.com/m-unlock/cwl/-/raw/dev/tools/maxquant/maxquant.cwl
fasta:
    class: Directory
    path: http://download.systemsbiology.nl/unlock/cwl/test_data/proteomics/fasta/
raw_data_dir:
    class: Directory
    path: http://download.systemsbiology.nl/unlock/cwl/test_data/proteomics/raw/
mqpar:
    class: File
    location: http://download.systemsbiology.nl/unlock/cwl/test_data/proteomics/mqpar.xml

Yeah, I don’t know how I would implement downloading a HTTP directory, as there is no standard format/structure.

Does your webserver support WebDAV?

curl -i --request PROPFIND  http://download.systemsbiology.nl/unlock/cwl/test_data/proteomics/fasta/
HTTP/1.1 405 Method Not Allowed
Date: Tue, 23 May 2023 14:30:31 GMT
Server: Apache/2.4.29 (Ubuntu)
Allow: GET,POST,OPTIONS,HEAD
Content-Length: 317
Content-Type: text/html; charset=iso-8859-1

Seems not; anyhow.

WebDAV does have directory listing feature; so you could enable that on your webserver (but make sure to block write access!)

We could then implement WebDAV directory listings in cwltool behind a flag (as WebDAV is not yet part of the CWL standards)