Hi everyone,
I’m looking for some clarity on issues I frequently encounter when working with large-scale processing workflows. Here’s the usual scenario:
I have a tool that processes a single file, and it is scattered across an array of thousands of input files. A second tool then performs a joint analysis, taking the complete array of output files from the scattered step as input.
I’m running into two main problems:
- When the tool doesn’t accept a manifest file:
How can I avoid the “argument list too long” error when passing all the file paths directly to the tool (e.g.,tool_name file1 file2 ...
with thousands of files)? - When the tool does accept a manifest file:
Even then, I sometimes hit the “argument list too long” error at the Docker level due to having thousands of-v
bind mounts in the command.
Are there recommended strategies or workarounds for these scenarios? Any advice or best practices would be much appreciated.
Thanks in advance!