How to have configure secondaryFiles to recognize two different patterns for the same file

We have a lot of legacy systems that output files in a naming format like this;

Sample1.bam
Sample1.bai

And we have some other systems that have the same files named like this;

Sample1.bam
Sample1.bam.bai

I need to accomodate both naming schemes. I need the CWL input for a .bam file to require either a .bai secondary file or a .bam.bai file. One of the two must be present (but not both).

As per the docs, I had been using this as the input schema for my CWL

https://www.commonwl.org/v1.2/Workflow.html#WorkflowInputParameter


type: File
      secondaryFiles:
        - .bai
        - ^.bai

However I am now realizing that this does not work. It requires that both Sample1.bai and Sample1.bam.bai must exist.

What I really want is this;

secondaryFiles:
        - ^.bai|.bai

However this does not work, instead it looks for a file named something like Sample1.bai|.bai.

Is there a way to make this work? It is not possible to “fix” all the .bai files on disk, and being forced to re-index every .bam file in every workflow just because some .bam’s are misnamed is becoming ridiculous. Thanks.

Interesting! I guess making both patterns optional isn’t what you want?

Maybe an ExpressionTool that takes two separate inputs (BAM + BAI) and outputs a singe File with a secondaryFile using your preferred form (.bai or ^.bai)

1 Like

^.bai is “replace the extension with .bai”.