Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BQ limits #567

Merged
merged 6 commits into from
Jan 25, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 55 additions & 36 deletions storage/byodb.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,52 +3,71 @@ title: Bring Your Own Database (BYODB)
permalink: /storage/byodb/
---

In certain cases you are able to use your own Snowflake account to host the data from Keboola Connection. Currently, we only support this for the Snowflake backend.
In some instances, you can use your own Snowflake/BigQuery account to host data from Keboola. Currently, this option is only supported for the Snowflake and BigQuery backends.

## Snowflake
For the integration to work, your Snowflake account must be accessible from a subset of [our IP addresses](/components/ip-addresses/).

For the integration to work, your Snowflake account needs to be accessible from a subset of [our IP addresses](/components/ip-addresses/).
***Note:** [SCIM](https://docs.snowflake.com/en/user-guide/scim.html) authentication (AAD, Okta) is not supported.*

We don't support [SCIM](https://docs.snowflake.com/en/user-guide/scim.html) authentication (AAD, Okta).
### Accessing Data in Keboola from Outside
To access Keboola-managed data, you should use only the provided roles and avoid granting permissions on Keboola-managed resources.

### Accessing Data in Keboola Connection from Outside
#### Read-only access to project storage
If your project has the [read-only input mapping](/transformations/mappings/#read-only-input-mapping) feature enabled, then a role `KEBOOLA_$PROJECTID_RO` is provided for each project.
This role grants read-only access to all the schemas and tables in the project. You can assign this role to any of your roles or users to allow access to the project's storage.

Because Keboola Connection manages access to the data, you need to only use the provided roles and not grant any grants on Keboola-managed resources.
#### Read-only access without granting roles in Snowflake
To access data without granting roles in Snowflake, use the [transformation workspaces](/transformations/workspace/) feature. The user created for each workspace is automatically assigned the role mentioned above.
This method is effective even if you *do not* use your own Snowflake account.

#### Read-only access to project storage
### Dynamic Backends Size for BYODB Snowflake
For dynamic backends, you must have one Snowflake warehouse for each backend size. The sizes of these warehouses (small, medium, and large in Keboola) are independent and
can vary (XSmall, Small, and XLarge). Setting up the warehouses with an [aggressive AUTO_SUSPEND value](https://docs.snowflake.com/en/user-guide/warehouses-considerations.html#automating-warehouse-suspension),
possibly as low as [1 second](https://docs.snowflake.com/en/sql-reference/sql/alter-warehouse.html), is recommended.

### Rules for Accessing Snowflake Objects Created by Keboola
* Do not modify grants of any resource prefixed with `KEBOOLA_` (`SAPI_` for `https://connection.keboola.com` stack), unless explicitly approved by Keboola on a case-by-case basis.
* Do not modify the root role created when registering the backend.
* Do not modify the warehouse created when registering the backend.
* Do not modify the user created when registering the backend.
* Do not modify the project databases (`KEBOOLA_$PROJECTID`).
* Particularly, do not change ownership of the database or grant future permissions on objects in the database.
* Do not modify the schemas in a project database.
* Particularly, do not change ownership of the schemas or grant future permissions on objects in those schemas.
* Do not modify the tables in schemas.
* Do not grant future permissions on tables.
* If the dynamic backends feature is enabled, do not change the backends created when enabling the feature.
* If the read-only input mapping feature is enabled, the role name is `KEBOOLA_$PROJECTID_RO`.
* You can grant this role to any of your roles or users. They will get read-only access to the data stored in a project's storage.
* Granting new tables and schemas and even linked buckets is handled automatically by Storage API.

### Schema of Objects in Snowflake Account Used as Keboola Backend

If the project has the [read-only input mapping](/transformations/mappings/#read-only-input-mapping) feature enabled, then you have a role `KEBOOLA_$PROJECTID_RO` for each project. This role has read-only access to all the schemas and tables in the project. You can grant this role to any of your own roles or users to give them access to the project's storage.
{: .image-popup}
![Snowflake account schema](/storage/byodb/schema.png)

##### Read-only access without granting roles in snowflake
## BigQuery Limitations
Google Cloud Computing services, including BigQuery, have introduced several limitations on their services, impacting Keboola, especially for customers using their own BigQuery or
Google Cloud Storage as storage backends. These limitations require careful management.

You can use the in-built functionality of [transformation workspaces](/transformations/workspace/). The user created for the workspace has the abovementioned role granted automatically. This workflow works even if you *don't* use your own Snowflake account.
Some limits are flexible (soft limits) and may be adjusted based on your contract with Google.

### Dynamic Backends Size for BYODB Snowflake
### Number of Projects
When you reach **20** projects, a ticket must be submitted to Google Support. They will then decide whether to increase the project limit or deny the request.

The dynamic backends feature requires you to have one Snowflake warehouse for each backend size. The actual sizes of warehouses are independent of the representation in Keboola (small, medium, and large). You can have XSmall, Small and XLarge. We recommend that the warehouses are set up with an [aggressive AUTO_SUSPEND value](https://docs.snowflake.com/en/user-guide/warehouses-considerations.html#automating-warehouse-suspension), even as low as [1 second](https://docs.snowflake.com/en/sql-reference/sql/alter-warehouse.html).

### Rules of Accessing Snowflake Objects Created by Keboola Connection

* do not modify grants of any resource that is prefixed by `KEBOOLA_` prefix (`SAPI_` for `https://connection.keboola.com` stack) unless explicitly approved on case-by-case basis by Keboola
* do not modify the root role created when registering the backend
* do not modify the warehouse created when registering the backend
* do not modify the user created when registering the backend
* do not modify the project databases (`KEBOOLA_$PROJECTID`)
* especially, do not change ownership of the database
* * especially, do not grant future grants on objects in the database
* do not modify the schemas in a project database
* especially, do not change ownership of the schemas
* especially, do not grant future grants on objects in those schemas
* do not modify the tables in schemas
* do not grant future grants on tables
* if you have the dynamic backends feature enabled
* do not change the backends created when enabling the feature
* if you have the read-only input mapping feature enabled
* the role name is `KEBOOLA_$PROJECTID_RO`
* you can grant this role to any of your roles or users – they will get read-only access to the data stored in a project's storage
* granting new tables and schemas and even linked buckets is handled automatically by Storage API

### Schema of Objects in Snowflake Account Used as Keboola Connection Backend
Keboola generates one GCP project per Keboola project, plus one main project for oversight, allowing up to 19 Keboola projects (+1 for management).

### Number of Service Accounts
Service accounts serve various purposes. Each project contains at least one service user.

Each project typically has at least one service user, limiting the number of workspaces/sandboxes to 99 at a time. [Workspaces](https://help.keboola.com/transformations/workspace/),
essential for tasks like transformations in Keboola, are created and then deleted post-use, impacting the service account limit only during the transformation job runtime.

By default, you can have up to **100** service accounts per project to control resource access. You may request an increase in this quota if needed. For more information, see
[Create service accounts | IAM Documentation | Google Cloud](https://cloud.google.com/iam/docs/service-accounts-create#:~:text=By%20default%2C%20each%20project%20can,a%20quota%20increase%20if%20necessary.).

----

Additional BigQuery backend limits can be found here: [Quotas and Limits | BigQuery | Google Cloud](https://cloud.google.com/bigquery/quotas#api_request_quotas)

{: .image-popup}
![Snowflake account schema](schema.png)
Loading