Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support path in input URI #71

Closed
wosmanitx opened this issue Mar 16, 2023 · 22 comments · Fixed by #178
Closed

Support path in input URI #71

wosmanitx opened this issue Mar 16, 2023 · 22 comments · Fixed by #178
Assignees

Comments

@wosmanitx
Copy link

such as:

quilt+s3://interline-quiltdemo#package=WDR5/EXP22000894@ed6ebf851478cf665ed435e0d718e78c9d519fd461717f0669c9538527f095f7&path=cf_out%2FO43353-432-523__O43353-432-523_relaxed_rank_1_model_2.pdb

@drernie drernie changed the title Support Path in input URI Download only path specified in input URI Mar 16, 2023
@drernie
Copy link
Member

drernie commented Mar 16, 2023

Okay, the awkward truth is this "always" works because Quilt automatically downloads the entire package ahead of time.

Let me rename this to: "Download only path specified in input URI", as the goal is to NOT download more than necessary. I believe the desired result is "only download files which match this path prefix"

@drernie
Copy link
Member

drernie commented Mar 16, 2023

Compare:

  • File: quilt+s3://allencell#package=aics/average_morphed_cell_dataset_1@&path=img_path%2F01fc13e7f1fc03ee3a71aa4182db4ac680bf4a2465952d291836ae1a77521377_Invdividual_SM2_4d_NUP153.tif
  • Folder: quilt+s3://allencell#package=aics/average_morphed_cell_dataset_1@&path=cell_seg_path%2F

Interesting. Folder URIs always have a trailing slash, but nothing has a leading slash.

We could be smart, and match exact if no trailing slash, but match prefix if it does.

@drernie
Copy link
Member

drernie commented Mar 16, 2023

I could pair this with #72 to create a package with only a single file, as a test.

@drernie
Copy link
Member

drernie commented Mar 17, 2023

Similarly, #path in an output package makes it easy to store the results of a run in its own subdirectory

@drernie
Copy link
Member

drernie commented May 22, 2023

I wonder if I was wrong, and this actually never works correctly. Need to test...

@drernie drernie changed the title Download only path specified in input URI Support path specified in input URI May 22, 2023
@drernie
Copy link
Member

drernie commented May 22, 2023

May-22 17:05:38.562 [main] DEBUG nextflow.quilt.nio.QuiltPath - Creating QuiltPath: interline-proteomics-analysis?Application=Enceladus&Author=Bianca&Comments=This+a+Package+imported+via+nextflow+quilt+plugin&Date=2023-03-16&Group=Bioinformatics&Program=SLC15A4#package=EDL%2fMSigDB_v7-5@cb598c1eb34c51559050a145efbedec696910caf074bd35cee180f33663d6946&path=raw%2fc2.cp.reactome.v7.5.symbols.gmt
May-22 17:05:38.564 [main] DEBUG nextflow.quilt.nio.QuiltFileSystem - QuiltFileSystem.getPath`[./]: []
May-22 17:05:38.564 [main] DEBUG nextflow.quilt.jep.QuiltParser - forURI[quilt+s3] for quilt+s3://./
May-22 17:05:38.565 [main] DEBUG nextflow.quilt.nio.QuiltPath - Creating QuiltPath: .
May-22 17:05:38.575 [main] DEBUG n.quilt.nio.QuiltFileSystemProvider - <A>BasicFileAttributes QuiltFileSystemProvider.readAttributes()
May-22 17:05:38.576 [main] DEBUG nextflow.quilt.nio.QuiltFileSystem - QuiltFileAttributes QuiltFileSystem.readAttributes(.)
May-22 17:05:38.576 [main] DEBUG nextflow.quilt.nio.QuiltPath - isAbsolute[null]
May-22 17:05:38.578 [main] DEBUG nextflow.Session - Session aborted -- Cause: Cannot invoke "nextflow.quilt.jep.QuiltPackage.packageDest()" because the return value of "nextflow.quilt.nio.QuiltPath.pkg()" is null
May-22 17:05:38.606 [main] ERROR nextflow.cli.Launcher - @unknown
java.lang.NullPointerException: Cannot invoke "nextflow.quilt.jep.QuiltPackage.packageDest()" because the return value of "nextflow.quilt.nio.QuiltPath.pkg()" is null
	at nextflow.quilt.nio.QuiltPath.localPath(QuiltPath.groovy:75)

@drernie drernie self-assigned this May 22, 2023
@drernie
Copy link
Member

drernie commented Jul 25, 2023

@wosmanitx Is this actually (still) a problem, or is it currently working for you?

@wosmanitx
Copy link
Author

resolved

@drernie
Copy link
Member

drernie commented Jan 19, 2024

Re-opening. Have a reproducible failure.

@drernie drernie reopened this Jan 19, 2024
@drernie
Copy link
Member

drernie commented Jan 24, 2024

Sigh. Unit test success. Integration test fails. Is CHECK_INPUT doing something new?
Tried running older version, but failed:

N E X T F L O W  ~  version 23.04.3
ERROR ~ Unable to parse config file: '/Users/ernest/GitHub/nf-quilt/nextflow.config'

  Compile failed for sources FixedSetSources[name='/groovy/script/Script775389F485D1318E2BBF21EE907E77EB/_nf_config_30789506']. Cause: BUG! exception in phase 'semantic analysis' in source unit '/groovy/script/Script775389F485D1318E2BBF21EE907E77EB/_nf_config_30789506' Unsupported class file major version 65

@drernie
Copy link
Member

drernie commented Jan 24, 2024

Jan-24 14:10:30.916 [Actor Thread 7] DEBUG nextflow.quilt.nio.QuiltFileSystem - No attributes yet for: /var/folders/tz/8q322ht10qzf9pswh01zv6880000gp/T/QuiltPackage11603948986612361183/QuiltPackage.quilt_example_examples_smart_report/README.md
Jan-24 14:10:30.918 [Actor Thread 7] DEBUG nextflow.util.CacheHelper - Unable to get file attributes file: quilt+s3://quilt-example#package=examples%2fsmart-report&path=README.md -- Cause: java.nio.file.NoSuchFileException: quilt+s3://quilt-example#package=examples%2fsmart-report&path=README.md
Jan-24 14:10:30.922 [FileTransfer-1] DEBUG nextflow.file.FilePorter - Copying foreign file s3://quilt-example/examples/smart-report/README.md to work dir: /Users/ernest/GitHub/nf-quilt/work/stage-a752a1d5-2cf0-4fe4-8e45-537b2649578b/ba/4ea9cf52fa961a34bf0f9a2941ec06/README.md
Jan-24 14:10:30.922 [FileTransfer-2] DEBUG nextflow.file.FilePorter - Copying foreign file quilt+s3://quilt-example#package=examples%2fsmart-report&path=README.md to work dir: /Users/ernest/GitHub/nf-quilt/work/stage-a752a1d5-2cf0-4fe4-8e45-537b2649578b/75/c2862e9e5eafee01370edef3769628/quilt-example#package=examples%2fsmart-report&path=README.md
Jan-24 14:10:30.924 [FileTransfer-2] DEBUG nextflow.quilt.nio.QuiltFileSystem - No attributes yet for: /var/folders/tz/8q322ht10qzf9pswh01zv6880000gp/T/QuiltPackage11603948986612361183/QuiltPackage.quilt_example_examples_smart_report/README.md
Jan-24 14:10:30.929 [Actor Thread 7] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=CHECK_INPUT (3); work-dir=null
  error [nextflow.exception.ProcessStageException]: Can't stage file quilt+s3://quilt-example#package=examples%2fsmart-report&path=README.md -- file does not exist

@drernie
Copy link
Member

drernie commented Jan 24, 2024

Ah! Maybe this is because I am not always auto-loading the package.
I do that explicitly in the unit test, after all.

UPDATE: yes, that file is now downloaded before the "cp" -- but the "cp" still fails.

@drernie
Copy link
Member

drernie commented Jan 26, 2024

Okay, this is weird. Is the filename just escaped wrongly?

work_dir % ls -a
...
.command.sh
.exitcode
quilt-example#package=examples%2fhurdat2&path=README.md
work_dir % cat .command.sh
#!/bin/bash -ue
cp quilt-example#package=examples%2fhurdat2\&path=README.md ../../tmp/
work_dir % sh .command.sh
cp: quilt-example#package=examples%2fhurdat2&path=README.md: No such file or directory
work_dir % 

Or is something more subtle happening?

@drernie
Copy link
Member

drernie commented Jan 26, 2024

Okay, the structural issue is that Nextflow implicitly (and understandably) assumes that the part after the "/" is the filename. But we have a complex URI at the end, which is NOT the simplistic 'README.md' we expect

So we need to supplement (since we can't replace):

cp quilt-example#package=examples%2fhurdat2\&path=README.md ../../tmp/

With

cp quilt-example#package=examples%2fhurdat2\&path=README.md ../../tmp/README.md

Will this work in general? Heck if I know, but it is worth shot...

@drernie
Copy link
Member

drernie commented Jan 26, 2024

Nope. The problem is that the filename assumption is deeply hardcoded in NextFlow, and it copies those files all over the place. :-(

That implies we can try running the code and ask for "quilt-example#package=examples%2fhurdat2&path=README.md" but boy is that ugly. Still should check if it works, though...

@drernie
Copy link
Member

drernie commented Jan 26, 2024

Doh. So, the real issue is simply that:
path 'quilt+s3://quilt-example#package=examples/hurdat2&path=README.md'
sets $input to quilt-example#package=examples/hurdat2&path=README.md
which you can fix via:

    if [ "$input" != "README.md" ]; then
        cp -f $input README.md
    fi

Of course it would be nice to avoid that, but I'm not sure how easy it is to munge path. Will look...

@drernie
Copy link
Member

drernie commented Jan 26, 2024

Ah. This must be a "filename" method on QuiltPath that is doing something naive (and different than we did in Python). Let me see if I can isolate that...

@drernie
Copy link
Member

drernie commented Jan 29, 2024

Released 0.7.7 -- so make path-input passes. At least for me:

Jan-29 15:05:36.133 [FileTransfer-2] DEBUG nextflow.file.FilePorter - Copying foreign file quilt+s3://quilt-example#package=examples%2fhurdat2&path=README.md to work dir: /Users/ernest/GitHub/nf-quilt/work/stage-f7373f44-164a-4c11-aaea-a6ac94dbdd44/0d/34908bc4a4b5ad963327d73c8f3625/README.md
Jan-29 15:05:36.133 [FileTransfer-2] INFO  nextflow.quilt.jep.QuiltPackage - installing examples/hurdat2 from quilt-example...

But not for the customer. Odd.

@drernie drernie changed the title Support path specified in input URI Support path in input URI Jan 29, 2024
@drernie
Copy link
Member

drernie commented Jan 29, 2024

Weird. It looks like it is installing, but it is not completing and/or returning an error.
And anyway, the customer does not even start installing, that I can tell, so this may be a totally different issue...

Jan-29 15:45:50.736 [FileTransfer-1] DEBUG nextflow.file.FilePorter - Copying foreign file quilt+s3://nf-core-gallery#package=core%2fhic&path=README_NF_QUILT.md to work dir: /Users/ernest/GitHub/nf-quilt/work/stage-f8e57909-5165-465d-a4ce-94253b04243d/a1/d272420394c9647df596cb984fcc3a/README_NF_QUILT.md
Jan-29 15:45:50.736 [FileTransfer-1] INFO  nextflow.quilt.jep.QuiltPackage - installing core/hic from nf-core-gallery...
Jan-29 15:45:50.824 [FileTransfer-2] DEBUG n.cloud.aws.nio.S3FileSystemProvider - S3 download file from=s3://nf-core-gallery/nf-core/hic/README_NF_QUILT.md to=/Users/ernest/GitHub/nf-quilt/work/stage-f8e57909-5165-465d-a4ce-94253b04243d/79/4b7b3d2efb6e7985ac38d01c1014d6/README_NF_QUILT.md
Jan-29 15:45:50.824 [FileTransfer-2] DEBUG nextflow.cloud.aws.nio.S3Client - Creating S3 transfer manager pool - chunk-size=104857600; max-treads=10;
Jan-29 15:45:51.173 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Jan-29 15:45:51.174 [Task submitter] INFO  nextflow.Session - [7e/3ddda6] Submitted process > CHECK_INPUT (1)
Jan-29 15:45:51.212 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1; name: CHECK_INPUT (1); status: COMPLETED; exit: 0; error: -; workDir: /Users/ernest/GitHub/nf-quilt/work/7e/3ddda636fab9cf700d189a697d4fc3]
Jan-29 15:45:51.610 [FileTransfer-1] ERROR nextflow.quilt.jep.QuiltPackage - failed to install core/hic
Jan-29 15:45:51.616 [Actor Thread 5] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=CHECK_INPUT (2); work-dir=null
  error [nextflow.exception.ProcessStageException]: Can't stage file quilt+s3://nf-core-gallery#package=core%2fhic&path=README_NF_QUILT.md -- file does not exist
Jan-29 15:45:51.626 [Actor Thread 5] ERROR nextflow.processor.TaskProcessor - Error executing process > 'CHECK_INPUT (2)'

@drernie drernie linked a pull request Jan 30, 2024 that will close this issue
@drernie
Copy link
Member

drernie commented Jan 30, 2024

Current Hypothesis: TransferAware is a new feature, not supported in 23.10, so nf-quilt does not auto-install the package.
Will force install in 0.7.9

@drernie
Copy link
Member

drernie commented Jan 30, 2024

NOTE: seems to install in Tower, but errors out with (hopefully irrelevant):

Jan-30 04:24:05.336 [Actor Thread 3] DEBUG i.s.wave.plugin.config.WaveConfig - Wave strategy not specified - using default: [container, dockerfile, conda, spack]

@drernie
Copy link
Member

drernie commented Jan 30, 2024

Works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants