Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group.create_array() uses the zstd codec which is not in the Zarr V3 spec #2790

Open
rouault opened this issue Feb 3, 2025 · 4 comments
Open
Labels
bug Potential issues with the zarr-python library

Comments

@rouault
Copy link

rouault commented Feb 3, 2025

Zarr version

v3.0.2

Numcodecs version

v0.15.0

Python Version

3.12

Operating System

Linux

Installation

pip install zarr

Description

Not really a zarr-python bug by itself, but more a bug of the zarr v3 ecosystem (ie the combination of zarr-python + https://github.com/zarr-developers/zarr-specs)

Following tutorial at https://zarr.readthedocs.io/en/stable/user-guide/groups.html#working-with-groups, I discovered that the following generates a Zarr V3 array

root = zarr.open_group('group.zarr', mode='w')
z = root.create_array(name='foo/bar/baz', shape=(10000, 10000), chunks=(1000, 1000), dtype='int32')

This codec is not documented at https://zarr-specs.readthedocs.io/en/latest/v3/codecs.html
This is a bit surprising.
Context: I'm updating the GDAL Zarr driver (https://gdal.org/en/stable/drivers/raster/zarr.html) , written in C++, and don't have access to numcodecs, and new codecs don't come "for free". So it would be nice that the defaults of zarr-python would match what is specified in the zarr v3 spec.

Steps to reproduce

Additional output

No response

@rouault rouault added the bug Potential issues with the zarr-python library label Feb 3, 2025
@rouault
Copy link
Author

rouault commented Feb 3, 2025

ok, I now see the zstd codec spec is a draft at zarr-developers/zarr-specs#256

@rabernat
Copy link
Contributor

rabernat commented Feb 3, 2025

Hi @rouault -- you're absolutely right about your characterization of this issue. We are working to resolve it asap by fixing the extension mechanism and specification in V3. In the future state, all codecs (besides bytes) will be considered extensions. You can assume that a formal extension for zstd will exist soon.

@rouault
Copy link
Author

rouault commented Feb 3, 2025

all codecs (besides bytes) will be considered extensions

That's perhaps nice from a conceptual point of view, but from a practical one, for a reader outside of the Python ecosystem, practical interoperability might be miserable...
I believe it would benefit to the Zarr interoperability story if there was a minimum set of common core codecs that readers are encouraged to implement. And writers are made aware that using something outside of it could make datasets unreadable by some implementations.

@d-v-b
Copy link
Contributor

d-v-b commented Feb 5, 2025

I believe it would benefit to the Zarr interoperability story if there was a minimum set of common core codecs that readers are encouraged to implement. And writers are made aware that using something outside of it could make datasets unreadable by some implementations.

Agreed! This has been proposed before, and I think the zarr steering council is working on formalizing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

3 participants