CWL is absolutely designed for large batch analysis. However, you should be aware that there are multiple platforms that can run CWL and cwltool is intentionally a small-scale single node runner. So limitations of cwltool are not necessary limitations of CWL in general – if you need to scale to dozens or hundreds or thousands of compute nodes, you may want to use another CWL platform:
I don’t quite understand how these are different, but what you probably want to do is have your main workflow produce a directory as an output, and have the final output be an array of directories.
This might be helpful, specifically Lessons 5 and 6 which discuss running on multiple data sets and collecting the results into directories:
cwltool has a --cachedir option which will record intermediate results.
To clean up when you’re done, you can just delete the cache directory. Other runners manage intermediate results in their own ways.