Looking at Snowflake built-in paritioning scheme, "micro-partition". In relation to the cluster key construct. Here I load 5 different tables definitions, the variations been on the cluster key defined. I then use system functions to investigate the impact of the partitions and cluster keys.
Dataset: US Credit Card Transactions on Kaggle
Steps taken:
- Configure Snowflake account and database entities.
- Create staging and table database entities.
- Run PUT command to stage datasets.
- Run COPY INTO command to load data.
- Evaluate partitioning and clusters i.e. using system functions.
Table | Partitions | Avg_Depth | Avg_Overlap |
---|---|---|---|
v1 | n/a | n/a | n/a |
v2 | 18 | 1.0 | 0.0 |
v3 | 18 | 2.0 | 1.8889 |
v4 | 20 | 2.0 | 1.9 |
v5 | 19 | 19.0 | 18.0 |