Preserving /tmp directory structure

Hello again, and please bear with a newbie.

I’m using a step to git clone a Git repository into a specific directory:

cwlVersion: v1.0
class: CommandLineTool
doc: Git cloner
requirements:
  DockerRequirement:
    dockerPull: alpine/git:latest
  InlineJavascriptRequirement: {}
baseCommand: [git, clone]
inputs:
  url:
    doc: URL of repository to clone
    type: string
    inputBinding:
      position: 1
  output_dir:
    doc: Name of directory to clone into
    type: string
    default: null
    inputBinding:
      position: 2
outputs:
  cloned:
    type: Directory
    outputBinding:
      glob: |
        ${
          if (inputs.output_dir == null){
            return inputs.url.split('/').slice(-1)[0].slice(0, -4);
          } else {
            return inputs.output_dir;
          }           
        }
  stderr: stderr
stderr: git_clone-stderr.log

According to the cached output (--cachedir CACHE), this seems to work fine in the /tmp directory, e.g., running git clone <REPO URL> /data/git/repo via the job YML (output_dir: /data/git/repo) and cwltool creates .CACHE/<JOB ID>/data/git/repo.

However, when the workflow is run from, say /home/stephan/workflow, I only get /home/stephan/workflow/repo. What I want to get is home/stephan/workflow/data/git/repo instead. How can I do this?

(Sidenote, not sure if it matters, I’m using --singularity with apptainer 1.1.3.)

Btw, it seems this behaviour (v1.0 succeeds, v1.2 fails) bit me again, and the above step only runs with v1.0). Will edit original post.

EDIT: Using cwl-upgrader fixes this, see this solution

1 Like

To complete the picture, here is the MRE workflow using this step:

cwlVersion: v1.2
class: Workflow

inputs:
  git_url: string
  git_outdir: string
    

steps:
  download.extract_urls_git:
    run: tools/generic/git_clone.cwl
    in:
      url: git_url
      output_dir: git_outdir
    out:
      - cloned

outputs:
  git_dir:
    type: Directory
    outputSource: download.extract_urls_git/cloned

And pseudo inputs:

git_url: https://github.com/<user>/repository.git
git_outdir: data/raw/repository

If you simplify the outputBinding to be just glob: * then you will get the entire directory structure, and not just the named (sub)-directory you have now.

In CWL, Directories have a name and a listing of files and sub-directories; we don’t preserve the name of the parent directories.

1 Like

So in the git Tool?

outputs:
  cloned:
    type: Directory
    outputBinding:
      glob: *

This gives me

Tool definition file:///home/stephan/src/tools/generic/git_shallow_clone.cwl failed validation:
tools/generic/git_shallow_clone.cwl:31:13:   while scanning an alias
tools/generic/git_shallow_clone.cwl:31:14:     expected alphabetic or numeric character, but found
                                               '\n'

Will try and debug this.

Ah, should be:

outputs:
  cloned:
    type: Directory
    outputBinding:
      glob: "*"                 < --- In quotes

And also, the input for the directory name should (obviously :blush:) end with a slash:

git_url: https://github.com/<user>/repository.git
git_outdir: data/raw/repository/                  < --- End with slash!
1 Like