Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expiration not working as expected #731

Open
rabernat opened this issue Feb 14, 2025 · 3 comments
Open

expiration not working as expected #731

rabernat opened this issue Feb 14, 2025 · 3 comments

Comments

@rabernat
Copy link
Contributor

Here's a reproducer

import icechunk as ic
import zarr
from datetime import datetime, timezone

storage = ic.local_filesystem_storage("./expiration.icechunk")
repo = ic.Repository.create(storage)
session = repo.writable_session("main")
array = zarr.create_array(store=session.store, shape=100, chunks=10, dtype='i4')
session.commit("created array")
session = repo.writable_session("main")
array = zarr.open_array(session.store)
array[:] = 1
session.commit("wrote_data")
print(repo.ancestry(branch="main"))

At this point I have 3 commits

[SnapshotInfo(id="4KHFWPG444PRDN9JJYC0", parent_id="1B48F0SVAPJH6FRQHRG0", written_at=datetime.datetime(2025,2,14,3,28,55,897712, tzinfo=datetime.timezone.utc), message="wrote_data..."),
 SnapshotInfo(id="1B48F0SVAPJH6FRQHRG0", parent_id="G4WNZE9XN03X1E26JVB0", written_at=datetime.datetime(2025,2,14,3,28,55,890476, tzinfo=datetime.timezone.utc), message="created ar..."),
 SnapshotInfo(id="G4WNZE9XN03X1E26JVB0", parent_id=None, written_at=datetime.datetime(2025,2,14,3,28,55,886074, tzinfo=datetime.timezone.utc), message="Repository...")]

I should be able to expire two of them.

repo.expire_snapshots(datetime.now(timezone.utc))
# -> set()

But this returns an empty set.

@rabernat
Copy link
Contributor Author

rabernat commented Feb 14, 2025

Ok so I see that the problem is that I gave it a date after the latest snapshot.

If I do

repo.expire_snapshots(anc[0].written_at)
# -> {'1B48F0SVAPJH6FRQHRG0'}

it expires the middle snapshot.


I guess I was expecting that it would automatically keep the snapshot pointed by main and expire everything else


Any reason why the first snapshot is not expired?

@paraseba
Copy link
Collaborator

I think that would be reasonable behavior. I'll need to think if it impacts the algorithm in any way

@rabernat
Copy link
Contributor Author

It's worth noting that expiring everything but the latest snapshot is probably very dangerous, because there could be other sessions active on the recent snapshots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants