-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pymongoarrow replaces existing codecs #262
Comments
Thanks, for reporting. I'm curious, how did you discover this? Specifically, what type codecs are you configuring (Decimal -> Decimal128, something else)? I'm wondering if those converters should be enabled by default or if we should really inherit the current TypeRegistry. |
I created https://jira.mongodb.org/browse/INTPYTHON-497, which will require a change to |
My use case is to read Parquet and write to MongoDB. This is what happens:
Yes, I believe it makes sense to have the Decimal → Decimal128 conversion as the default. Ideally, PyMongoArrow should manage all type conversions from Parquet to BSON. Do you have any plans to open-source the Parquet-to-BSON conversion code (and vice versa) used in Atlas, its Go right? |
Hi @norrbom, I'd consider it a bug for any Parquet types to not be handled by PyMongoArrow. I opened https://jira.mongodb.org/browse/INTPYTHON-520 to track the bug. I can't speak to Atlas Data Federation, that's a whole different part of MongoDB 😄. |
In #278 we're adding support for the PyArrow Decimal128 type. |
pymongoarrow replaces existing codecs
The api.write() method replaces any existing codecs.
The collection TypeRegistry is replaced with a new instance, effectively removing any existing custom codecs.
https://github.com/mongodb-labs/mongo-arrow/blob/1.6.0/bindings/python/pymongoarrow/api.py#L419
A related issue is that pymongoarrow uses pyarrow.to_pylist() to convert a pyarrow.Table to Python objects as an intermediate step before converting to raw BSON.
Pyarrow converts Arrow decimal128 types to Python Decimal objects. However, BSON cannot handle Decimal types, necessitating the use of a custom codec.
https://github.com/mongodb-labs/mongo-arrow/blob/1.6.0/bindings/python/pymongoarrow/api.py#L385
The text was updated successfully, but these errors were encountered: