-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Experimental unity catalog client #20798
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #20798 +/- ##
==========================================
- Coverage 79.78% 79.62% -0.17%
==========================================
Files 1561 1568 +7
Lines 222015 222669 +654
Branches 2533 2543 +10
==========================================
+ Hits 177135 177295 +160
- Misses 44296 44790 +494
Partials 584 584 ☔ View full report in Codecov by Sentry. |
Ah we actually also have a PR open for this: delta-io/delta-rs#3078, could have shared components of the client |
|
||
let args = ScanArgsParquet { | ||
schema, | ||
allow_missing_columns: matches!(data_source_format, DataSourceFormat::Delta), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plainly reading delta parquet files is not safe operation, you will have to check the protocol versions whether you are allowed to read it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, thanks for the review!
This branch is only hit if data_source_format=PARQUET
- are there still version controls for this case?
For data_source_format=DELTA
I am using the Python-side scan_delta
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case it should be fine! :)
41a2d92
to
dbec722
Compare
y'all are amazing!!!! been waiting for this for over a year! |
@nameexhaustion it's awesome to see integration with Unity Catalog, thanks for that! Can you tell me if there are plans to fix the issue I came across? I'm using Delta Live Tables, they are managed tables and they don't expose Are there plans, and is this even possible, to scan DLT with polars? Thanks! |
@pustoladxc I had the same issue but got around it by defining the region using the storage_location param passed to scan_table. (ex. storage_options={"AWS_REGION": "us-east-2"}) |
@nrccua-timr appreciate your answer, unfortunately this does not work for me. I'm on Azure and there are no "region" nor "location" options available here. Tried all storage related options but none helped. I remember that when writing to an external table (with storage location available) providing So you say that for AWS users reading from a Delta Live Table works after providing |
@pustoladxc There were a few other issues. For example, our databricks delta live tables were created about two years ago and wasn't compatible with polars. However, when I tried the scan_table function on a newly created table it works (granted I needed to disable DeletionVectors because they aren't supported by deltalake, which is what polars using for the backend to interact with unity catalog). Basically, polars team has some more work to do for a smoother experience... but this implementation, as a beginning, is greatly appreciated! |
Introduces an experimental unity catalog client. Note that the API is unstable and subject to change.
The initial version in this PR supports:
LazyFrame
for the followingdata_source_format
s: