We’re using cwl-tes to run CWL workflows via a GA4GH TES instance, and we find that cwltool, by default, seems to download all input and output files to local storage. While that obviously makes sense when workflows are run locally by cwltool, it seems quite redundant and costly to do so when workflows are run in a distributed way like the described scenario.
So I’m wondering: Is there perhaps a flag to disable the download behavior? And if not, what are the chances to have such a feature merged if we file a PR for it? In the latter case, I would be grateful for a quick pointer and, if applicable, some comments on possible caveats.
In cwltool the file download happens in PathMapper. You probably want cwl-tes to provide a custom subclass of PathMapper that preserves those http URLs until you need them.
If you do that, you might also need to provide a FsAccess that understands http so that cwltool can fetch file metadata like size, type, and to be read contents for use in scripts.