Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickle Support for python IcechunkStore [EAR-1326] #134

Merged
merged 15 commits into from
Oct 4, 2024

Conversation

mpiannucci
Copy link
Contributor

@mpiannucci mpiannucci commented Oct 3, 2024

Approach: Instead of pickling the rust store, we serialize it to bytes to be used in pickling the python wrapper. We then can load the rust store from those bytes.

This doesnt add dask tests, just basic pickle support

Copy link

linear bot commented Oct 3, 2024

EAR-1326 add `__eq__`, `__setstate__` , `__getstate__` for PyIcechunkStore

raising NotImplementedError for now and/or skipping zarr integration tests

@mpiannucci mpiannucci marked this pull request as ready for review October 4, 2024 14:13
@@ -50,6 +51,7 @@ impl From<&PyStoreConfig> for RepositoryConfig {
version: None,
inline_chunk_threshold_bytes: config.inline_chunk_threshold_bytes,
unsafe_overwrite_refs: config.unsafe_overwrite_refs,
change_set_bytes: None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChangeSet doesnt implement Eq so we only keep the bytes here

#[pymethods]
impl PyIcechunkStore {
fn as_bytes(&self) -> PyResult<Cow<[u8]>> {
let consolidated = self.rt.block_on(self.as_consolidated())?;
let serialized = serde_json::to_vec(&consolidated)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using serde_json because rmp_serde was not working and no idea why. serde just works. Can optimize later if necessary

@mpiannucci mpiannucci requested a review from paraseba October 4, 2024 14:17
Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the Python side, this looks good.

I won't attempt to review the Rust side. Will leave that to @paraseba.

One question: are there scenarios in which we would want to discard the change set when serializing?

return False
self_bytes = self._store.as_bytes()
other_bytes = value._store.as_bytes()
return self_bytes == other_bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests only cover the equals True case. You should test the equals False path (different store, non store class).

bytes: Cow<[u8]>,
read_only: bool,
) -> PyResult<PyIcechunkStore> {
let consolidated: ConsolidatedStore = serde_json::from_slice(&bytes)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to improve on this, json is very inefficient at serializing bytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill add a fixme comment

@@ -238,20 +230,52 @@ fn pyicechunk_store_create<'py>(
})
}

#[pyfunction]
fn pyicechunk_store_from_bytes(
bytes: Cow<[u8]>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type is weird, are you plan to copy it? why not &[u8] or even Bytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cow<[u8]> is automatically converted to and from native python bytes both in returns and args, but Vec<u8> is not, so its a design choice we can make

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpiannucci mpiannucci enabled auto-merge (squash) October 4, 2024 15:49
@mpiannucci mpiannucci merged commit 54493a1 into main Oct 4, 2024
3 checks passed
@mpiannucci mpiannucci deleted the matt/serailize-store branch October 4, 2024 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants