This is a guide on how to automate the loading of datastore data into BigQuery in python App Engine. Steps covered:
- Backup Google App Engine NDB datastore entities to Google Cloud Storage.
- Refresh BigQuery tables using latest backup files in Cloud Storage.
- Delete old backup entities and files.
Read the official documentation
to schedule automated backups of datastore models into Cloud Storage bucket.
Refer to cron.yaml
for an example.
If you haven't done so already, enable BigQuery in your project and create a dataset. The dataset will serve as the namespace for tables materialized from datastore backups.
- Make sure your appengine_config.py is set up with a vendor directory. Official documentation.
- Install the Cloud Storage client library into the vendor directory:
pip install GoogleAppEngineCloudStorageClient -t lib
- On the permissions admin for your project in console.cloud.google.com, make sure your
<PROJECT>@appspot.gserviceaccount.com
service account has permissions for BigQuery. - On the cloud storage page, click the "three dots" icon and grant your service account access to the bucket where the backups will be saved.
load_bigquery.py
contains the request handler to be run via cron.yaml
. It will materialize BigQuery tables using the most recent datastore backups. It will also delete backups older than a certain age.
Configure the three variables at the top of bigquery_lib.py
for your project.
Released under the MIT License, see LICENSE
.