Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring/fixing zarr-pyhton v3 incompatibilities in xarray datatrees #10020

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
0a2a49e
fixing compatibility with relative paths in open_store function withi…
aladinor Feb 2, 2025
ae80662
fixing/refactoring test to be compatible with Zarr-python v3
aladinor Feb 3, 2025
379db18
adding @requires_zarr_v3 decorator to TestZarrDatatreeIO
aladinor Feb 3, 2025
846dc50
replacing 0 with 1 in _create_test_datatree wich will write a chunk
aladinor Feb 3, 2025
ddfd0b5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 3, 2025
3f9a8fb
fixing issues with groups
aladinor Feb 3, 2025
f140658
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 3, 2025
0e790eb
Merge branch 'main' into dtree-zarrv3
aladinor Feb 3, 2025
403afa9
fixing issue with dict creation
aladinor Feb 3, 2025
58e8f8e
Merge branch 'dtree-zarrv3' of https://github.com/aladinor/xarray int…
aladinor Feb 3, 2025
fd357fa
fixing issues with Mypy
aladinor Feb 3, 2025
8b993a1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 3, 2025
d4aeeca
refactoring open_store in ZarrStore class to use Zarr.core.group.Grou…
aladinor Feb 3, 2025
3125647
refactoring datree test for zarr ensuring compatibility with zarr-pyt…
aladinor Feb 3, 2025
0c7485b
importing zarr.core.group only inside open_store function
aladinor Feb 3, 2025
fdeee94
documenting changes in what's-nwe.rst file
aladinor Feb 3, 2025
f3e2c66
Update xarray/backends/zarr.py
aladinor Feb 4, 2025
f9f1043
keeping grroup creation compatible with zarr v2
aladinor Feb 6, 2025
c118841
Merge branch 'main' into dtree-zarrv3
aladinor Feb 6, 2025
ec2086a
fixing issue with mypy
aladinor Feb 6, 2025
abaea4e
Merge branch 'main' into dtree-zarrv3
aladinor Feb 12, 2025
aa85bed
Merge branch 'main' into dtree-zarrv3
aladinor Feb 12, 2025
fce2957
adding root_path equal to '/' when opening group in zarr v3 to avoid …
aladinor Feb 12, 2025
e27b4b9
fixing tests accordingly
aladinor Feb 12, 2025
d03b003
Merge branch 'dtree-zarrv3' of https://github.com/aladinor/xarray int…
aladinor Feb 12, 2025
810a623
removing print statement
aladinor Feb 12, 2025
eabcc76
Merge branch 'main' into dtree-zarrv3
aladinor Feb 21, 2025
60e19d9
Merge branch 'main' into dtree-zarrv3
aladinor Feb 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 14 additions & 8 deletions xarray/backends/zarr.py
Original file line number Diff line number Diff line change
Expand Up @@ -655,10 +655,12 @@ def open_store(
use_zarr_fill_value_as_mask=use_zarr_fill_value_as_mask,
zarr_format=zarr_format,
)
group_paths = list(_iter_zarr_groups(zarr_group, parent=group))

group_members: dict = dict(_iter_zarr_groups(zarr_group, parent=group))

return {
group: cls(
zarr_group.get(group),
store_group,
mode,
consolidate_on_close,
append_dim,
Expand All @@ -669,7 +671,7 @@ def open_store(
use_zarr_fill_value_as_mask,
cache_members=cache_members,
)
for group in group_paths
for group, store_group in group_members.items()
}

@classmethod
Expand Down Expand Up @@ -1698,12 +1700,16 @@ def open_groups_as_dict(
return groups_dict


def _iter_zarr_groups(root: ZarrGroup, parent: str = "/") -> Iterable[str]:
def _iter_zarr_groups(root: ZarrGroup, parent: str = "/") -> tuple[str, Any]:
from zarr.core.group import Group

parent_nodepath = NodePath(parent)
yield str(parent_nodepath)
for path, group in root.groups():
yield str(parent_nodepath), root
# for path, group in root.groups():
for path, group in root.members():
gpath = parent_nodepath / path
yield from _iter_zarr_groups(group, parent=str(gpath))
if isinstance(group, Group):
yield from _iter_zarr_groups(group, parent=str(gpath))


def _get_open_params(
Expand Down Expand Up @@ -1751,7 +1757,7 @@ def _get_open_params(
consolidated = False

if _zarr_v3():
missing_exc = ValueError
missing_exc = AssertionError
else:
missing_exc = zarr.errors.GroupNotFoundError

Expand Down
2 changes: 1 addition & 1 deletion xarray/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ def create_test_datatree():
"""

def _create_test_datatree(modify=lambda ds: ds):
set1_data = modify(xr.Dataset({"a": 0, "b": 1}))
set1_data = modify(xr.Dataset({"a": 1, "b": 2}))
set2_data = modify(xr.Dataset({"a": ("x", [2, 3]), "b": ("x", [0.1, 0.2])}))
root_data = modify(xr.Dataset({"a": ("y", [6, 7, 8]), "set0": ("x", [9, 10])}))

Expand Down
79 changes: 41 additions & 38 deletions xarray/tests/test_backends_datatree.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
requires_dask,
requires_h5netcdf,
requires_netCDF4,
requires_zarr,
requires_zarr_v3,
)

if TYPE_CHECKING:
Expand Down Expand Up @@ -141,14 +141,16 @@
a (y) int64 16B ...
b (x) float64 16B ...
"""
from zarr import consolidate_metadata

filepath = tmp_path_factory.mktemp("data") / "unaligned_simple_datatree.zarr"
root_data = xr.Dataset({"a": ("y", [6, 7, 8]), "set0": ("x", [9, 10])})
set1_data = xr.Dataset({"a": 0, "b": 1})
set2_data = xr.Dataset({"a": ("y", [2, 3]), "b": ("x", [0.1, 0.2])})
root_data.to_zarr(filepath)
set1_data.to_zarr(filepath, group="/Group1", mode="a")
set2_data.to_zarr(filepath, group="/Group2", mode="a")
set1_data.to_zarr(filepath, group="/Group1/subgroup1", mode="a")
consolidate_metadata(filepath)
yield filepath


Expand Down Expand Up @@ -373,15 +375,12 @@
engine: T_DataTreeNetcdfEngine | None = "h5netcdf"


@pytest.mark.skipif(
have_zarr_v3, reason="datatree support for zarr 3 is not implemented yet"
)
@requires_zarr
@requires_zarr_v3
class TestZarrDatatreeIO:
engine = "zarr"

def test_to_zarr(self, tmpdir, simple_datatree):
filepath = tmpdir / "test.zarr"
filepath = str(tmpdir / "test.zarr")
original_dt = simple_datatree
original_dt.to_zarr(filepath)

Expand All @@ -391,16 +390,31 @@
def test_zarr_encoding(self, tmpdir, simple_datatree):
from numcodecs.blosc import Blosc

filepath = tmpdir / "test.zarr"
filepath = str(tmpdir / "test.zarr")
original_dt = simple_datatree

comp = {"compressor": Blosc(cname="zstd", clevel=3, shuffle=2)}
blosc = Blosc(cname="zstd", clevel=3, shuffle="shuffle").get_config()
comp = {"compressor": {"name": blosc.pop("id"), "configuration": blosc}}
enc = {"/set2": {var: comp for var in original_dt["/set2"].dataset.data_vars}}
original_dt.to_zarr(filepath, encoding=enc)

with open_datatree(filepath, engine="zarr") as roundtrip_dt:
print(roundtrip_dt["/set2/a"].encoding)
assert roundtrip_dt["/set2/a"].encoding["compressor"] == comp["compressor"]
retrieved_compressor = roundtrip_dt["/set2/a"].encoding["compressors"][
0
] # Get the BloscCodec object
assert (
retrieved_compressor.cname.name
== comp["compressor"]["configuration"]["cname"]
)
assert (
retrieved_compressor.clevel
== comp["compressor"]["configuration"]["clevel"]
)
assert (
retrieved_compressor.shuffle.name
== comp["compressor"]["configuration"]["shuffle"]
)

enc["/not/a/group"] = {"foo": "bar"} # type: ignore[dict-item]
with pytest.raises(ValueError, match="unexpected encoding group.*"):
Expand All @@ -409,9 +423,9 @@
def test_to_zarr_zip_store(self, tmpdir, simple_datatree):
from zarr.storage import ZipStore

filepath = tmpdir / "test.zarr.zip"
filepath = str(tmpdir / "test.zarr.zip")
original_dt = simple_datatree
store = ZipStore(filepath)
store = ZipStore(filepath, mode="w")
original_dt.to_zarr(store)

with open_datatree(store, engine="zarr") as roundtrip_dt: # type: ignore[arg-type, unused-ignore]
Expand All @@ -432,32 +446,29 @@
assert_equal(original_dt, roundtrip_dt)

def test_to_zarr_default_write_mode(self, tmpdir, simple_datatree):
import zarr

simple_datatree.to_zarr(tmpdir)
simple_datatree.to_zarr(str(tmpdir))

# with default settings, to_zarr should not overwrite an existing dir
with pytest.raises(zarr.errors.ContainsGroupError):
simple_datatree.to_zarr(tmpdir)
with pytest.raises(FileExistsError):
simple_datatree.to_zarr(str(tmpdir))

@requires_dask
def test_to_zarr_compute_false(self, tmpdir, simple_datatree):
import dask.array as da

filepath = tmpdir / "test.zarr"
original_dt = simple_datatree.chunk()
original_dt.to_zarr(filepath, compute=False)
original_dt.to_zarr(str(filepath), compute=False)

for node in original_dt.subtree:
for name, variable in node.dataset.variables.items():
var_dir = filepath / node.path / name
var_files = var_dir.listdir()
assert var_dir / ".zarray" in var_files
assert var_dir / ".zattrs" in var_files
assert var_dir / "zarr.json" in var_files
if isinstance(variable.data, da.Array):
assert var_dir / "0" not in var_files
assert var_dir / "zarr.json" in var_files
else:
assert var_dir / "0" in var_files
assert var_dir / "c" in var_files

def test_to_zarr_inherited_coords(self, tmpdir):
original_dt = DataTree.from_dict(
Expand All @@ -466,7 +477,7 @@
"/sub": xr.Dataset({"b": (("x",), [5, 6])}),
}
)
filepath = tmpdir / "test.zarr"
filepath = str(tmpdir / "test.zarr")
original_dt.to_zarr(filepath)

with open_datatree(filepath, engine="zarr") as roundtrip_dt:
Expand All @@ -476,7 +487,7 @@

def test_open_groups_round_trip(self, tmpdir, simple_datatree) -> None:
"""Test `open_groups` opens a zarr store with the `simple_datatree` structure."""
filepath = tmpdir / "test.zarr"
filepath = str(tmpdir / "test.zarr")
original_dt = simple_datatree
original_dt.to_zarr(filepath)

Expand All @@ -501,7 +512,7 @@

@requires_dask
def test_open_datatree_chunks(self, tmpdir, simple_datatree) -> None:
filepath = tmpdir / "test.zarr"
filepath = str(tmpdir / "test.zarr")

chunks = {"x": 2, "y": 1}

Expand All @@ -528,7 +539,6 @@

assert "/" in unaligned_dict_of_datasets.keys()
assert "/Group1" in unaligned_dict_of_datasets.keys()
assert "/Group1/subgroup1" in unaligned_dict_of_datasets.keys()
assert "/Group2" in unaligned_dict_of_datasets.keys()
# Check that group name returns the correct datasets
with xr.open_dataset(
Expand All @@ -539,10 +549,6 @@
unaligned_datatree_zarr, group="Group1", engine="zarr"
) as expected:
assert_identical(unaligned_dict_of_datasets["/Group1"], expected)
with xr.open_dataset(
unaligned_datatree_zarr, group="/Group1/subgroup1", engine="zarr"
) as expected:
assert_identical(unaligned_dict_of_datasets["/Group1/subgroup1"], expected)
with xr.open_dataset(
unaligned_datatree_zarr, group="/Group2", engine="zarr"
) as expected:
Expand All @@ -553,13 +559,13 @@

def test_open_datatree_specific_group(self, tmpdir, simple_datatree) -> None:
"""Test opening a specific group within a Zarr store using `open_datatree`."""
filepath = tmpdir / "test.zarr"
filepath = str(tmpdir / "test.zarr")
group = "/set2"
original_dt = simple_datatree
original_dt.to_zarr(filepath)
expected_subtree = original_dt[group].copy()
expected_subtree.orphan()
with open_datatree(filepath, group=group, engine=self.engine) as subgroup_tree:

Check failure on line 568 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.11 all-but-dask

TestZarrDatatreeIO.test_open_datatree_specific_group RuntimeWarning: Failed to open Zarr store with consolidated metadata, but successfully read with non-consolidated metadata. This is typically much slower for opening a dataset. To silence this warning, consider: 1. Consolidating metadata in this existing store with zarr.consolidate_metadata(). 2. Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or 3. Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata.

Check failure on line 568 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.12 all-but-numba

TestZarrDatatreeIO.test_open_datatree_specific_group RuntimeWarning: Failed to open Zarr store with consolidated metadata, but successfully read with non-consolidated metadata. This is typically much slower for opening a dataset. To silence this warning, consider: 1. Consolidating metadata in this existing store with zarr.consolidate_metadata(). 2. Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or 3. Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata.

Check failure on line 568 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.12

TestZarrDatatreeIO.test_open_datatree_specific_group RuntimeWarning: Failed to open Zarr store with consolidated metadata, but successfully read with non-consolidated metadata. This is typically much slower for opening a dataset. To silence this warning, consider: 1. Consolidating metadata in this existing store with zarr.consolidate_metadata(). 2. Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or 3. Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata.

Check failure on line 568 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / macos-latest py3.12

TestZarrDatatreeIO.test_open_datatree_specific_group RuntimeWarning: Failed to open Zarr store with consolidated metadata, but successfully read with non-consolidated metadata. This is typically much slower for opening a dataset. To silence this warning, consider: 1. Consolidating metadata in this existing store with zarr.consolidate_metadata(). 2. Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or 3. Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata.
assert subgroup_tree.root.parent is None
assert_equal(subgroup_tree, expected_subtree)

Expand All @@ -568,10 +574,7 @@
"""Test `open_groups` with chunks on a zarr store."""

chunks = {"x": 2, "y": 1}
filepath = tmpdir / "test.zarr"

chunks = {"x": 2, "y": 1}

filepath = str(tmpdir / "test.zarr")
root_data = xr.Dataset({"a": ("y", [6, 7, 8]), "set0": ("x", [9, 10])})
set1_data = xr.Dataset({"a": ("y", [-1, 0, 1]), "b": ("x", [-10, 6])})
set2_data = xr.Dataset({"a": ("y", [1, 2, 3]), "b": ("x", [0.1, 0.2])})
Expand Down Expand Up @@ -605,7 +608,7 @@
expected_dt = original_dt.copy()
expected_dt.name = None

filepath = tmpdir / "test.zarr"
filepath = str(tmpdir / "test.zarr")
original_dt.to_zarr(filepath)

with open_datatree(filepath, engine="zarr") as roundtrip_dt:
Expand All @@ -620,7 +623,7 @@
}
)

filepath = tmpdir / "test.zarr"
filepath = str(tmpdir / "test.zarr")
original_dt.to_zarr(filepath, write_inherited_coords=False)

with open_datatree(filepath, engine="zarr") as roundtrip_dt:
Expand All @@ -628,7 +631,7 @@

expected_child = original_dt.children["child"].copy(inherit=False)
expected_child.name = None
with open_datatree(filepath, group="child", engine="zarr") as roundtrip_child:

Check failure on line 634 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.11 all-but-dask

TestZarrDatatreeIO.test_write_inherited_coords_false RuntimeWarning: Failed to open Zarr store with consolidated metadata, but successfully read with non-consolidated metadata. This is typically much slower for opening a dataset. To silence this warning, consider: 1. Consolidating metadata in this existing store with zarr.consolidate_metadata(). 2. Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or 3. Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata.

Check failure on line 634 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.12 all-but-numba

TestZarrDatatreeIO.test_write_inherited_coords_false RuntimeWarning: Failed to open Zarr store with consolidated metadata, but successfully read with non-consolidated metadata. This is typically much slower for opening a dataset. To silence this warning, consider: 1. Consolidating metadata in this existing store with zarr.consolidate_metadata(). 2. Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or 3. Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata.

Check failure on line 634 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.12

TestZarrDatatreeIO.test_write_inherited_coords_false RuntimeWarning: Failed to open Zarr store with consolidated metadata, but successfully read with non-consolidated metadata. This is typically much slower for opening a dataset. To silence this warning, consider: 1. Consolidating metadata in this existing store with zarr.consolidate_metadata(). 2. Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or 3. Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata.

Check failure on line 634 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / macos-latest py3.12

TestZarrDatatreeIO.test_write_inherited_coords_false RuntimeWarning: Failed to open Zarr store with consolidated metadata, but successfully read with non-consolidated metadata. This is typically much slower for opening a dataset. To silence this warning, consider: 1. Consolidating metadata in this existing store with zarr.consolidate_metadata(). 2. Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or 3. Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata.
assert_identical(expected_child, roundtrip_child)

def test_write_inherited_coords_true(self, tmpdir):
Expand All @@ -639,7 +642,7 @@
}
)

filepath = tmpdir / "test.zarr"
filepath = str(tmpdir / "test.zarr")
original_dt.to_zarr(filepath, write_inherited_coords=True)

with open_datatree(filepath, engine="zarr") as roundtrip_dt:
Expand All @@ -647,5 +650,5 @@

expected_child = original_dt.children["child"].copy(inherit=True)
expected_child.name = None
with open_datatree(filepath, group="child", engine="zarr") as roundtrip_child:

Check failure on line 653 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.11 all-but-dask

TestZarrDatatreeIO.test_write_inherited_coords_true RuntimeWarning: Failed to open Zarr store with consolidated metadata, but successfully read with non-consolidated metadata. This is typically much slower for opening a dataset. To silence this warning, consider: 1. Consolidating metadata in this existing store with zarr.consolidate_metadata(). 2. Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or 3. Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata.

Check failure on line 653 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.12 all-but-numba

TestZarrDatatreeIO.test_write_inherited_coords_true RuntimeWarning: Failed to open Zarr store with consolidated metadata, but successfully read with non-consolidated metadata. This is typically much slower for opening a dataset. To silence this warning, consider: 1. Consolidating metadata in this existing store with zarr.consolidate_metadata(). 2. Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or 3. Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata.

Check failure on line 653 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.12

TestZarrDatatreeIO.test_write_inherited_coords_true RuntimeWarning: Failed to open Zarr store with consolidated metadata, but successfully read with non-consolidated metadata. This is typically much slower for opening a dataset. To silence this warning, consider: 1. Consolidating metadata in this existing store with zarr.consolidate_metadata(). 2. Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or 3. Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata.

Check failure on line 653 in xarray/tests/test_backends_datatree.py

View workflow job for this annotation

GitHub Actions / macos-latest py3.12

TestZarrDatatreeIO.test_write_inherited_coords_true RuntimeWarning: Failed to open Zarr store with consolidated metadata, but successfully read with non-consolidated metadata. This is typically much slower for opening a dataset. To silence this warning, consider: 1. Consolidating metadata in this existing store with zarr.consolidate_metadata(). 2. Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or 3. Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata.
assert_identical(expected_child, roundtrip_child)
Loading