-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pickle Support for python IcechunkStore [EAR-1326] #134
Conversation
EAR-1326 add `__eq__`, `__setstate__` , `__getstate__` for PyIcechunkStore
raising NotImplementedError for now and/or skipping zarr integration tests |
@@ -50,6 +51,7 @@ impl From<&PyStoreConfig> for RepositoryConfig { | |||
version: None, | |||
inline_chunk_threshold_bytes: config.inline_chunk_threshold_bytes, | |||
unsafe_overwrite_refs: config.unsafe_overwrite_refs, | |||
change_set_bytes: None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ChangeSet
doesnt implement Eq
so we only keep the bytes here
#[pymethods] | ||
impl PyIcechunkStore { | ||
fn as_bytes(&self) -> PyResult<Cow<[u8]>> { | ||
let consolidated = self.rt.block_on(self.as_consolidated())?; | ||
let serialized = serde_json::to_vec(&consolidated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using serde_json
because rmp_serde
was not working and no idea why. serde just works. Can optimize later if necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the Python side, this looks good.
I won't attempt to review the Rust side. Will leave that to @paraseba.
One question: are there scenarios in which we would want to discard the change set when serializing?
return False | ||
self_bytes = self._store.as_bytes() | ||
other_bytes = value._store.as_bytes() | ||
return self_bytes == other_bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests only cover the equals True case. You should test the equals False path (different store, non store class).
bytes: Cow<[u8]>, | ||
read_only: bool, | ||
) -> PyResult<PyIcechunkStore> { | ||
let consolidated: ConsolidatedStore = serde_json::from_slice(&bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to improve on this, json is very inefficient at serializing bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ill add a fixme comment
@@ -238,20 +230,52 @@ fn pyicechunk_store_create<'py>( | |||
}) | |||
} | |||
|
|||
#[pyfunction] | |||
fn pyicechunk_store_from_bytes( | |||
bytes: Cow<[u8]>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type is weird, are you plan to copy it? why not &[u8]
or even Bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cow<[u8]>
is automatically converted to and from native python bytes
both in returns and args, but Vec<u8>
is not, so its a design choice we can make
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach: Instead of pickling the rust store, we serialize it to bytes to be used in pickling the python wrapper. We then can load the rust store from those bytes.
This doesnt add dask tests, just basic pickle support