Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an cloud space for all the data #407

Closed
jwestw opened this issue Mar 17, 2023 · 3 comments · Fixed by #415
Closed

Create an cloud space for all the data #407

jwestw opened this issue Mar 17, 2023 · 3 comments · Fixed by #415
Assignees
Labels

Comments

@jwestw
Copy link
Contributor

jwestw commented Mar 17, 2023

Probably a Google bucket with read-only access

Deliverables:

  • Create a Google bucket
  • change the data source to the bucket- mount the bucket as drive
  • test that data loads
@jwestw jwestw added the Data label Mar 17, 2023
@jwestw jwestw self-assigned this Mar 31, 2023
@james-westwood james-westwood linked a pull request Apr 14, 2023 that will close this issue
19 tasks
@jwestw jwestw mentioned this issue Apr 14, 2023
19 tasks
@jwestw
Copy link
Contributor Author

jwestw commented Jul 7, 2023

Look at the code in https://github.com/ONSdigital/research-and-development/blob/develop/src/pipeline.py for inspiration on how to create conditional imports based on a config setting (which indicates choice of environment). In our case we may want to do the same re: imports but the main thing to focus on is the creation of paths (local) vs. signed-urls (cloud).
Then hopefully most of our existing file read functions will take the signed-url as a "path", just like pandas' dataframe constructor.

We want to avoid lots of if/else statements throughout the code if possible.

@jwestw
Copy link
Contributor Author

jwestw commented Jul 7, 2023

May want to write to bucket which if we do we need to have two things:

  • function to write to bucket in the GcpBucket class
  • permissions for that service account to write (may already exist)

@paigeh-fsa
Copy link
Collaborator

Going to download files and host on OneDrive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants