Skip to content

Commit

Permalink
Cloud SQL example
Browse files Browse the repository at this point in the history
  • Loading branch information
AmebaBrain committed Jan 29, 2024
1 parent 9f87e14 commit 1e31752
Show file tree
Hide file tree
Showing 5 changed files with 141 additions and 3 deletions.
132 changes: 129 additions & 3 deletions docs/clouds/gcp/cloud_sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ The most traditional one is relational database.
In GCP you can use [Cloud SQL](https://cloud.google.com/sql) service for that.

We will create a PostgreSQL and MySQL databases in the Cloud SQL.
Afterwards, we will connect and join tables in them from Datero.
Afterwards, we will connect and join tables from them in Datero.

!!! info
For the full fledged example, please refer to the [Tutorial](../../tutorial.md).
Expand Down Expand Up @@ -36,7 +36,7 @@ Assuming you created a postgres instance and it has been assigned some private I
During instance creation you must specify a password for the `postgres` user.
For simplicity, we used `postgres` as a password.

![Private IP](../../images/clouds/gcp/cloud_sql_postgres_ip.jpg){ loading=lazy; align=right }
![Postgres private IP](../../images/clouds/gcp/cloud_sql_postgres_ip.jpg){ loading=lazy; align=right }

To connect to it we can leverage VM named `instance` that we created in the [previous section](./vm_instance.md).
It is spin up in the same subnet that we picked up for private connection setup with Cloud SQL.
Expand Down Expand Up @@ -88,7 +88,7 @@ postgres=> select * from finance.departments;
(3 rows)
```

## Datero 2 Postgres connection
### Datero 2 Postgres connection
Now we can connect to the same instance from Datero.
All we need to do is to create a Postgres server entry and specify private IP address of our Cloud SQL instance.

Expand All @@ -112,3 +112,129 @@ And finally, query the `departments` table.
![Query departments table](../../images/clouds/gcp/cloud_sql_postgres_query.jpg){ loading=lazy }
<figcaption>Query departments table</figcaption>
</figure>


## MySQL
Same procedure repeats for MySQL instance.
We create Cloud SQL instance for MySQL and assign to it private IP address.

During instance creation you must specify a password for the `root` user.
For simplicity, we used `root` as a password.

![MySQL private IP](../../images/clouds/gcp/cloud_sql_mysql_ip.jpg){ loading=lazy; align=right }

To connect to it we will use the same VM named `instance` as for postgres Cloud SQL instance above.

To connect to the instance, we have to leverage `mysql` client on the VM.
To avoid direct installation of `mysql` on the VM, we can use `mysql` docker image.
We can run it in the interactive mode and connect to the instance from there.

The following code abstract does the following:

- runs `mysql` docker image in the interactive mode with automatic removal of the container after exit
- instead of starting a database server, it runs just `bash` shell
- checks `mysql` utility version
- connects to the instance by its private IP `10.12.96.6` via `mysql` client
- connection is done under the `root` user with the `root` password that we specified during our Cloud SQL instance creation

```sh
instance:~$ docker run --rm -it mysql bash

bash-4.4# mysql --version
mysql Ver 8.3.0 for Linux on x86_64 (MySQL Community Server - GPL)

bash-4.4# mysql -h 10.12.96.6 -uroot -proot
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 3255
Server version: 8.0.31-google (Google)
```

We used latest `mysql:latest` image which is `mysql` version `8.3.0`.
It has `mysql` client of the same version installed in it.
In the same time our Cloud SQL instance is running `mysql` version `8.0.31-google`.


Having connected to the instance, we create `hr` schema and `users` table in it.
```sql
mysql> create schema hr;
Query OK, 1 row affected (0.02 sec)

mysql> use hr;
Database changed

mysql> create table users(id int, name text, department_id int);
Query OK, 0 rows affected (0.04 sec)

mysql> insert into users values (1, 'John', 1), (2, 'Mary', 2), (3, 'Peter', 2), (4, 'Scott', 3);
Query OK, 4 rows affected (0.02 sec)
Records: 4 Duplicates: 0 Warnings: 0

mysql> select * from users;
+------+-------+---------------+
| id | name | department_id |
+------+-------+---------------+
| 1 | John | 1 |
| 2 | Mary | 2 |
| 3 | Peter | 2 |
| 4 | Scott | 3 |
+------+-------+---------------+
4 rows in set (0.01 sec)
```

### Datero 2 MySQL connection
Now we can connect to the same instance from Datero.
All we need to do is to create a MySQL server entry and specify private IP address of our Cloud SQL instance.

!!! info
Please see [Overview](../../overview.md#connectors) of how to create a server entry and import schema.
For the full fledged example, please refer to the [Tutorial](../../tutorial.md).

<figure markdown>
![MySQL datasource](../../images/clouds/gcp/cloud_sql_mysql_server.jpg){ loading=lazy }
<figcaption>MySQL datasource</figcaption>
</figure>

Once server entry is created, we can import a `hr` schema.
<figure markdown>
![MySQL import schema](../../images/clouds/gcp/cloud_sql_mysql_import_schema.jpg){ loading=lazy }
<figcaption>MySQL import schema</figcaption>
</figure>


## Join datasources
Now it's time to use Datero for its intended purpose.
Join tables from different datasources within single `SELECT` statement!

<figure markdown>
![Join datasources](../../images/clouds/gcp/cloud_sql_join_query.jpg){ loading=lazy }
<figcaption>Join datasources</figcaption>
</figure>


## Summary
Let's make a step aside and have a look what we got.
We have a possibility to analyze data from two different databases of different vendors
as if they were located in the same database.
And we have _full flavoured SQL_ to do that.
And it's not just a `SELECT` statement.
Depending on connector, you can also change data in a source database.

With Datero you are not locked just to the web application.
Under the hood you have fully functional Postgres database.
This means that you can connect to Datero programmatically with a variety of drivers/SDKs that Postgres support.

You have to setup your connections in Datero only once.
Afterwards, just connect to Datero and query your distributed data.
And you have no need to write any ETL for this!

Datero architecture allows to use it as an entermediate ETL node.
You don't have to connect to numerous datasource by using different drivers
and, probably, even different programming languages.
Write data receiving, syncrhonizration and processing logic.

You just connect to Datero, write your logic in SQL and get the result.

If you need to have multi-steps processing, you can store intermediate results in Datero itself.
And then query them in the next step.
Because, as mentioned earlier, it's a full-fledged Postgres database under the hood.
3 changes: 3 additions & 0 deletions docs/images/clouds/gcp/cloud_sql_join_query.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/images/clouds/gcp/cloud_sql_mysql_import_schema.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/images/clouds/gcp/cloud_sql_mysql_ip.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/images/clouds/gcp/cloud_sql_mysql_server.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 1e31752

Please sign in to comment.