Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow auto liquid clustering #935

Merged
merged 8 commits into from
Feb 10, 2025

Conversation

ShaneMazur
Copy link
Contributor

@ShaneMazur ShaneMazur commented Feb 6, 2025

Resolves #932

Description

Adds config auto_liquid_cluster to allow for tables to be CLUSTER BY AUTO on creation or update (i.e. incremental run)

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

@benc-db
Copy link
Collaborator

benc-db commented Feb 6, 2025

Can you do me one small favor: do you know if Databricks has any documentation that covers this? I'm struggling to find it...I'm fairly confident that if the E2E tests pass, this is valid Databricks SQL, but would be nice to have.

@benc-db
Copy link
Collaborator

benc-db commented Feb 6, 2025

Can you do me one small favor: do you know if Databricks has any documentation that covers this? I'm struggling to find it...I'm fairly confident that if the E2E tests pass, this is valid Databricks SQL, but would be nice to have.

Part of why I'm interested in the documentation is to see if we are using the most discoverable name for the config.

@benc-db
Copy link
Collaborator

benc-db commented Feb 6, 2025

Are you set up with hatch in your dev environment? If so, please run hatch run code-quality to fix the linting issue. If you don't, I can do it.

@ShaneMazur
Copy link
Contributor Author

Can you do me one small favor: do you know if Databricks has any documentation that covers this? I'm struggling to find it...I'm fairly confident that if the E2E tests pass, this is valid Databricks SQL, but would be nice to have.

Part of why I'm interested in the documentation is to see if we are using the most discoverable name for the config.

There are these but nothing in the Databricks official documentation but it is mentioned in this video from last year's summit

There was also a PDF provided to us by the PM but not sure if that has changed and probably not a good idea to share it here anyway. Let me know and I can share it if necessary.

@ShaneMazur
Copy link
Contributor Author

Are you set up with hatch in your dev environment? If so, please run hatch run code-quality to fix the linting issue. If you don't, I can do it.

Done ✅

@benc-db
Copy link
Collaborator

benc-db commented Feb 6, 2025

No worries, it just means that this will likely be a 'secret' feature of the adapter until the Databricks feature is documented. When you set auto cluster by, do you still need to run optimize manually? Going to pull and push to run the E2E tests.

@ShaneMazur
Copy link
Contributor Author

ShaneMazur commented Feb 6, 2025

No worries, it just means that this will likely be a 'secret' feature of the adapter until the Databricks feature is documented. When you set auto cluster by, do you still need to run optimize manually? Going to pull and push to run the E2E tests.

Yes - CLUSTER BY AUTO is just for automatic key selection. Triggering the clustering automatically is still handled by predictive optimization.

Enabling it for your Databricks workspace is hidden in the account UI under Settings > Feature enablement:

Screenshot 2025-02-06 at 4 41 35 PM

@benc-db
Copy link
Collaborator

benc-db commented Feb 6, 2025

Do you have an environment where you could write/run an E2E test? Unit test looks good, but would like to see a test similar to https://github.com/databricks/dbt-databricks/blob/main/tests/functional/adapter/liquid_clustering/test_liquid_clustering.py
to validate that the inserted code doesn't break in DBSQL.

Copy link
Collaborator

@benc-db benc-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a 'functional' aka E2E test similar to existing liquid clustering tests. Since liquid clustering is an optimization thing, we can't really validate the outcomes, but I just want a regression test to ensure we notice if they change syntax for auto cluster by before final release.

@ShaneMazur
Copy link
Contributor Author

Please add a 'functional' aka E2E test similar to existing liquid clustering tests. Since liquid clustering is an optimization thing, we can't really validate the outcomes, but I just want a regression test to ensure we notice if they change syntax for auto cluster by before final release.

Added that and included auto liquid clustering in the optimize logic since it is still valid to run optimize on liquid clustering with auto selected keys

@benc-db
Copy link
Collaborator

benc-db commented Feb 7, 2025

Added that and included auto liquid clustering in the optimize logic since it is still valid to run optimize on liquid clustering with auto selected keys

Looks good, let me just run tests. If everything is green, I'll merge it in. Thanks!

@benc-db benc-db merged commit f6a58ee into databricks:main Feb 10, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow liquid clustering CLUSTER BY AUTO option
2 participants