-
-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zarr-python cannot read arrays saved by tensorstore using the zstd compressor #2056
Comments
I previously discussed the root cause of this here: |
Here's a more compact reproducer. Error exists with zarr-python version 3.0.2. Reproducerimport zarr
import tensorstore as ts
zarr_path = "reproduce_zarr-python_issue_2056.zarr"
arr = ts.open({
"driver": "zarr",
"kvstore": {
"driver": "file",
"path": zarr_path
},
"key_encoding": "/",
"metadata": {
"shape": [1024, 1024],
"chunks": [128, 128],
"dtype": "|u1",
"compressor": {
"id": "zstd",
"level": 5
}
}
}, create=True, delete_existing=True).result()
arr.write(1).result()
# open with tensorstore
print(f"Opening {zarr_path} with tensorstore")
arr2 = ts.open({
"driver": "zarr",
"kvstore": {
"driver": "file",
"path": zarr_path
}
}).result()
# read first chunk with tensorstore
print(f"Reading first chunk with tensorstore")
print(arr2[:128,:128].read().result())
# open with zarr-python
print(f"Opening {zarr_path} with zarr-python")
arr3 = zarr.open(zarr_path)
# read first chunk with zarr-python
print(f"Reading the first chunk with zarr-python")
print(arr3[:128,:128])
# File "numcodecs/zstd.pyx", line 184, in numcodecs.zstd.decompress
# RuntimeError: Zstd decompression error: invalid input data Output
pixi.toml[project]
name = "reproducer"
version = "0.1.0"
description = "Add a short description here"
authors = ["Mark Kittisopikul <markkitt@gmail.com>"]
channels = ["conda-forge"]
platforms = ["linux-64"]
[tasks]
[dependencies]
zarr = ">=3.0.2,<4"
tensorstore = ">=0.1.65,<0.2" |
Non-reproductionThe problem does not occur if Tensorstore writes a Zarr v3 array because the frame content header contains a known frame size. import zarr
import tensorstore as ts
zarr_path = "nonreproduce_zarr-python_issue_2056.zarr"
arr = ts.open({
"driver": "zarr3",
"kvstore": {
"driver": "file",
"path": zarr_path
},
"metadata": {
"shape": [1024, 1024],
"chunk_grid": {
"name": "regular",
"configuration": {
"chunk_shape": [128, 128]
}
},
"data_type": "uint8",
"codecs": [{
"name": "zstd",
"configuration": {
"level": 5
}
}]
}
}, create=True, delete_existing=True).result()
arr.write(1).result()
# open with tensorstore
print(f"Opening {zarr_path} with tensorstore")
arr2 = ts.open({
"driver": "zarr3",
"kvstore": {
"driver": "file",
"path": zarr_path
}
}).result()
# read first chunk with tensorstore
print(f"Reading first chunk with tensorstore")
print(arr2[:128,:128].read().result())
# open with zarr-python
print(f"Opening {zarr_path} with zarr-python")
arr3 = zarr.open(zarr_path)
# read first chunk with zarr-python
print(f"Reading the first chunk with zarr-python")
print(arr3[:128,:128]) Output
|
One indication of the difference between the reproducer and non-reproducer is inforamtion about the compressed file from the zstd command line utility. The
Note that the command line utility can decompress either.
|
Zarr version
v2.18.2
Numcodecs version
v0.12.1
Python Version
3.12.4
Operating System
Linux
Installation
using conda
Description
I get the following error when trying to open a dataset compressed with tensorstore using the zstd compressor.
Steps to reproduce
Additional output
xref: google/tensorstore#182
The text was updated successfully, but these errors were encountered: