-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore integration with Icechunk data engine #5
Comments
First step toward #5. It was interesting to see that icechunk itself is rather light-weight, with minimal dependencies.
UPDATE: After a little reading, it appears that Icechunk's Virtual Datasets are superior to Kerchunk references or VirtualiZarr datasets if a dataset will get updated, because Icechunk has "transactional updates, version controlled history, and faster access speeds." For VirtualiZarr, "you should not change or add to any of the files comprising the store once created." However, "VirtualiZarr allows you to ingest data as virtual references and write those references into an Icechunk Store." So you can get started in VirtualiZarr then hand it over to Icechunk before making updates. References: |
Icechunk presently only supports VirtualiZarr leverages Kerchunk, as an optional dependency, to create references to COG, FITS, and HDF4 file types, although COG and GRIB support are in the works. Kerchunk has a wider array of supported file types, including GRIB2, Zarr2, etc. All this might improve very soon, as the main issue was with the following, which just got merged! |
My vision for this package is that would work seamlessly in cooperation with a local and/or remote high performance data catalog and store (i.e. data engine). Presently, the Icechunk cloud-native transactional tensor storage engine is the most promising option, as it was recently open-sourced by EarthMover as the source code behind their ArrayLake services.
An ideal work flow would be to:
The text was updated successfully, but these errors were encountered: