Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overriding rpath filename when downloading #51

Open
grlloyd opened this issue Feb 22, 2024 · 3 comments
Open

overriding rpath filename when downloading #51

grlloyd opened this issue Feb 22, 2024 · 3 comments

Comments

@grlloyd
Copy link

grlloyd commented Feb 22, 2024

Web resources are currently downloaded to rpath which is constructed by combining a unique id (if requested) and the file name extracted from the url. However, some url dont include a filename e.g.

src = 'https://pubchem.ncbi.nlm.nih.gov/sdq/sdqagent.cgi?infmt=json&outfmt=csv&query={%22download%22:%22*%22,%22collection%22:%22pathway%22,%22order%22:[%22relevancescore,desc%22],%22start%22:1,%22limit%22:10000000,%22downloadfilename%22:%22PubChem_pathway_text_Reactome%22,%22where%22:{%22ands%22:[{%22*%22:%22Reactome%22},{%22source%22:%22Reactome%22}]}}'

In this case the url contains json, so I think the download fails as the filename generated for rpath isnt valid. However, any url that doesn't have a filename at the end but returns a file could end up with an unwieldy filename in the cache folder.

I tried to overcome this using bfcupdate to change rpath before downloading, but it fails because bfcupdate changes the rtype to "local".

One option would be to include an input in bfcadd that allows the user to override the default filename for rpath e.g. rpath_filename = "new_filename.xyz" and construct rpath from that instead of trying to extract it from the url.

Or you could try to extract the intended filename from the httr:GET response, if there is one.

Is there a work around for this that doesnt need an update to BiocFileCache?

@lshep
Copy link
Contributor

lshep commented Feb 26, 2024

I don't think there is a work around for this right now. We didn't think of this situation when we original designed its behavior. We would have to update BiocFileCache code.

@grlloyd
Copy link
Author

grlloyd commented Feb 28, 2024

I've found that using a service like tinyurl to shorten the url seems to be a viable workaround in my case.

@lshep
Copy link
Contributor

lshep commented Feb 28, 2024

good to know. Still looking into what we think is appropriate to adjust the code too. We will look to make changes to BiocFileCache shortly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants