Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate globus file size storage #3231

Closed
peetucket opened this issue Jul 11, 2023 · 6 comments
Closed

Investigate globus file size storage #3231

peetucket opened this issue Jul 11, 2023 · 6 comments
Labels
2023 Summer 2023 workcycle

Comments

@peetucket
Copy link
Member

peetucket commented Jul 11, 2023

Three questions to look at:

  • Do all items deposited through Globus have no file sizes in the byte_size column of the attached files table? Or is it just more recent Globus deposits?
  • What happens with "mixed deposits" (i.e. start with Globus, then add files via upload in V2)? Do some files have file sizes but not others (assume yes)?
  • Verify the reason we do not store file sizes for Globus files (assumption is performance was slow, but do some archaeology to verify this).

The answers to this question will determine if are able to actually add file sizes to Globus objects for not. They will also help us with #3230

@peetucket
Copy link
Member Author

peetucket commented Jul 12, 2023

From the first commit, app/jobs/fetch_globus_job.rb has created attached files with byte_size of 0:

4aff429#diff-c757f9bcc510ecd347cc838b2a41e331e84e68b98642b04d20350f9769bd4ffeR29-R39

Not clear from the code why though.

@peetucket peetucket removed their assignment Jul 13, 2023
@peetucket
Copy link
Member Author

Check Globus API to see if we can get file sizes along with the file names?

@lwrubel
Copy link
Contributor

lwrubel commented Jul 14, 2023

We are using the API to get the filesizes along with the filenames when using the list_files public method.

https://github.com/sul-dlss/globus_client/blob/5de26db0e52cb7b089a3aec48b2403409fdf82af/lib/globus_client/endpoint.rb#L125

@lwrubel
Copy link
Contributor

lwrubel commented Jul 14, 2023

  1. Do all items deposited through Globus have no file sizes in the byte_size column of the attached files table? Or is it just more recent Globus deposits?

@edsu looked at this and noted that the globus_client get_filenames method is used for creating AttachedFiles. It only returns filenames: https://stanfordlib.slack.com/archives/C69UZCJ8M/p1689082395205739?thread_ts=1689028505.499099&cid=C69UZCJ8M It has an underlying method that does get the filesizes from the API. So, there is another method in the client we could be using (see previous comment).

  1. What happens with "mixed deposits" (i.e. start with Globus, then add files via upload in V2)? Do some files have file sizes but not others (assume yes)?

@edsu and I looked at a "mixed deposit" and saw that only files uploaded via the browser have filesizes. https://stanfordlib.slack.com/archives/C69UZCJ8M/p1689100415282479?thread_ts=1689098275.906289&cid=C69UZCJ8M

@edsu
Copy link
Contributor

edsu commented Jul 14, 2023

Thanks for leaving those notes about what we found @lwrubel. I think we can close this and move #3230 into ready?

@peetucket
Copy link
Member Author

Answered in other tickets. We have access to file sizes for all gobus files and will make use of them now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023 Summer 2023 workcycle
Projects
None yet
Development

No branches or pull requests

3 participants