Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to stage secondary files #442

Open
Shenglai opened this issue Apr 17, 2018 · 1 comment
Open

Failed to stage secondary files #442

Shenglai opened this issue Apr 17, 2018 · 1 comment

Comments

@Shenglai
Copy link

Hi all,

I have a workflow which requires downloading input files from s3 by uuid, (e.g. example.gz and example.gz.tbi are downloaded separately by its own uuid) and in later steps, these pre-downloaded files should be staged as a "file, secondary file" structure.

My intention is to use InitialWorkDirRequirement to avoid unnecessary copying of the input files.

Here is my cwl:

#!/usr/bin/env cwl-runner

cwlVersion: v1.0

requirements:
  - class: DockerRequirement
    dockerPull: alpine
  - class: InlineJavascriptRequirement
  - class: InitialWorkDirRequirement
    listing: |
      ${
           var ret = [{"entryname": inputs.parent_file.basename, "entry": inputs.parent_file}];
           for( var i = 0; i < inputs.children.length; i++ ) {
               ret.push({"entryname": inputs.children[i].basename, "entry": inputs.children[i]});
           };
           return ret
       }

class: CommandLineTool

inputs:
  parent_file:
    type: File

  children:
    type: File[]

outputs:
  output:
    type: File
    outputBinding:
      glob: $(inputs.parent_file.basename)
    secondaryFiles: |
      ${
         var ret = [];
         var locbase = self.location.substr(0, self.location.lastIndexOf('/'))
         for( var i = 0; i < inputs.children.length; i++ ) {
           ret.push({"class": "File", "location": locbase + '/' + inputs.children[i].basename});
         }
         return ret
       }

baseCommand: "true"

The output from cwltool engine is:

[job make_secondary.cwl] completed success
{
    "output": {
        "checksum": "sha1$318c739ad52530f8913cc71c2ade57f75b5c4079", 
        "basename": "a", 
        "location": "file:///mnt/benchmark/tmp/a", 
        "secondaryFiles": [
            {
                "checksum": "sha1$49a9cd3ef8381da2b841001fc4f9bba9b9e1fbed", 
                "basename": "b", 
                "location": "file:///mnt/benchmark/tmp/b", 
                "path": "/mnt/benchmark/tmp/b", 
                "class": "File", 
                "size": 10
            }
        ], 
        "path": "/mnt/benchmark/tmp/a", 
        "class": "File", 
        "size": 10
    }
}
Final process status is success

However, from the latest rabix 1.0.5:

[2018-04-17 20:10:39.425] [INFO] Job root has started
[2018-04-17 20:10:39.594] [INFO] Pulling docker image alpine:latest
[2018-04-17 20:10:40.279] [INFO] Running command line: true
[2018-04-17 20:10:42.311] [ERROR] Failed to execute status command for root. Could not collect outputs.
org.rabix.executor.ExecutorException: Could not collect outputs.
        at org.rabix.executor.handler.impl.JobHandlerImpl.postprocess(JobHandlerImpl.java:318) ~[rabix-cli.jar:na]
        at org.rabix.executor.execution.command.StatusCommand.run(StatusCommand.java:52) ~[rabix-cli.jar:na]
        at org.rabix.executor.execution.JobHandlerCommand.run(JobHandlerCommand.java:51) [rabix-cli.jar:na]
        at org.rabix.executor.execution.JobHandlerRunnable.run(JobHandlerRunnable.java:58) [rabix-cli.jar:na]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_141]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_141]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_141]
Caused by: org.rabix.bindings.BindingException: org.rabix.bindings.cwl.service.CWLGlobException: Failed to extract outputs.
        at org.rabix.bindings.cwl.CWLProcessor.postprocess(CWLProcessor.java:150) ~[rabix-cli.jar:na]
        at org.rabix.bindings.cwl.CWLProcessor.postprocess(CWLProcessor.java:156) ~[rabix-cli.jar:na]
        at org.rabix.bindings.cwl.CWLBindings.postprocess(CWLBindings.java:89) ~[rabix-cli.jar:na]
        at org.rabix.executor.handler.impl.JobHandlerImpl.postprocess(JobHandlerImpl.java:290) ~[rabix-cli.jar:na]
        ... 6 common frames omitted
Caused by: org.rabix.bindings.cwl.service.CWLGlobException: Failed to extract outputs.
        at org.rabix.bindings.cwl.CWLProcessor.globFiles(CWLProcessor.java:388) ~[rabix-cli.jar:na]
        at org.rabix.bindings.cwl.CWLProcessor.collectOutput(CWLProcessor.java:312) ~[rabix-cli.jar:na]
        at org.rabix.bindings.cwl.CWLProcessor.collectOutputs(CWLProcessor.java:174) ~[rabix-cli.jar:na]
        at org.rabix.bindings.cwl.CWLProcessor.postprocess(CWLProcessor.java:146) ~[rabix-cli.jar:na]
        ... 9 common frames omitted
Caused by: java.lang.ClassCastException: java.util.HashMap cannot be cast to java.lang.String
        at org.rabix.bindings.cwl.CWLProcessor.getSecondaryFiles(CWLProcessor.java:459) ~[rabix-cli.jar:na]
        at org.rabix.bindings.cwl.CWLProcessor.formFileValue(CWLProcessor.java:400) ~[rabix-cli.jar:na]
        at org.rabix.bindings.cwl.CWLProcessor.globFiles(CWLProcessor.java:386) ~[rabix-cli.jar:na]
        ... 12 common frames omitted
[2018-04-17 20:10:42.311] [INFO] Failed to execute status command for root. Could not collect outputs.
Failed to execute status command for root. Could not collect outputs.

I'm just wondering if it's noticed already and if there's a workaround for my case. Thank you very much in advance.

@kinow
Copy link

kinow commented Jun 13, 2022

The example above fails for me with the latest cwltool.

kinow@ranma:/tmp/bunny-1.0.6$ mkdir /tmp/cwl
kinow@ranma:/tmp/bunny-1.0.6$ touch /tmp/cwl/a
kinow@ranma:/tmp/bunny-1.0.6$ touch /tmp/cwl/b
(venv) kinow@ranma:~/Development/python/workspace/cwl-v1.2$ cwltool /tmp/make_secondary.cwl --parent_file /tmp/cwl/a --children /tmp/cwl/b
INFO /home/kinow/Development/python/workspace/cwl-v1.2/venv/bin/cwltool 3.1.20220502060230
INFO Resolved '/tmp/make_secondary.cwl' to 'file:///tmp/make_secondary.cwl'
INFO [job make_secondary.cwl] /tmp/fk6y5rac$ docker \
    run \
    -i \
    --mount=type=bind,source=/tmp/fk6y5rac,target=/jKWVxj \
    --mount=type=bind,source=/tmp/v6sssram,target=/tmp \
    --mount=type=bind,source=/tmp/cwl/a,target=/jKWVxj/a,readonly \
    --mount=type=bind,source=/tmp/cwl/b,target=/jKWVxj/b,readonly \
    --workdir=/jKWVxj \
    --read-only=true \
    --user=1000:1000 \
    --rm \
    --cidfile=/tmp/tg7_zeg5/20220613143313-613169.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/jKWVxj \
    alpine \
    true
INFO [job make_secondary.cwl] Max memory used: 0MiB
ERROR [job make_secondary.cwl] Job error:
("Error collecting output for parameter 'output': ../../../../../../tmp/make_secondary.cwl:33:5: 'path'", {})
WARNING [job make_secondary.cwl] completed permanentFail
{}
WARNING Final process status is permanentFail

It's something with the secondaryFiles expression for the output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants