-
Notifications
You must be signed in to change notification settings - Fork 28
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Treat zarr metadata as a blob (mostly) (#749)
* Treat zarr metadata as a blob (mostly) We were parsing too much of Zarr metadata. Icechunk currently is only interested in the array size and chunk sizes. It may become interested in the dimension names at some point. But still, we were parsing the whole metadata, storing internally as parsed object and then formatting it back to json. We did this when the project started, imagining we may need more from the metadata. For example, we thought we could need to incorporate the codec pipeline in Icechunk. With this patch, we now only extract the parts of the zarr metadata we care about. And we preserve the original blob of metadata as is, in a new user_data byte array. We return this blob in metadata requests. If, in the future, we need more from the metadata, we can parse it and add it to the storage. Simpler and less code. It works with zarr extensions, it's more resilient to zarr spec changes. There is a price to this: we are no longer separating the user attributes from the rest of the metadata. The only impact of this, is we no longe can treat conflicts in user attributes separate from the rest of the zarr metadata. If we consider this important in the short term, we can add it back by parsing more of the metadata blobs. Also in this change: - No more AttributeFile. We'll implement it when we need it - Better snapshot serialization [on-disk breaking change] * Enable complex arrays in tests * fix xarray test --------- Co-authored-by: Deepak Cherian <deepak@earthmover.io>
- Loading branch information
Showing
64 changed files
with
1,728 additions
and
3,226 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file added
BIN
+237 Bytes
icechunk-python/tests/data/test-repo/manifests/0GQQ44D2837GGMHY81CG
Binary file not shown.
Binary file added
BIN
+168 Bytes
icechunk-python/tests/data/test-repo/manifests/73Q2CY1JSN768PFJS2M0
Binary file not shown.
Binary file added
BIN
+277 Bytes
icechunk-python/tests/data/test-repo/manifests/8WT6R2E6WVC9GJ7BS6GG
Binary file not shown.
Binary file added
BIN
+174 Bytes
icechunk-python/tests/data/test-repo/manifests/C38XX4Z2517M93GQ5MA0
Binary file not shown.
2 changes: 1 addition & 1 deletion
2
icechunk-python/tests/data/test-repo/refs/branch.main/ZZZZZZZW.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"snapshot":"A2RD2Y65PR6D3B6BR1K0"} | ||
{"snapshot":"HNG82GMS51ECXFXFCYJG"} |
2 changes: 1 addition & 1 deletion
2
icechunk-python/tests/data/test-repo/refs/branch.main/ZZZZZZZX.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"snapshot":"K1BMYVG1HNVTNV1FSBH0"} | ||
{"snapshot":"GNFK0SSWD5B8CVA53XEG"} |
2 changes: 1 addition & 1 deletion
2
icechunk-python/tests/data/test-repo/refs/branch.main/ZZZZZZZY.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"snapshot":"RPA0WQCNM2N9HBBRHJQ0"} | ||
{"snapshot":"3EKE17N8YF5ZK5NRMZJ0"} |
2 changes: 1 addition & 1 deletion
2
icechunk-python/tests/data/test-repo/refs/branch.main/ZZZZZZZZ.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"snapshot":"6Q9GDTXKF17BGQVSQZFG"} | ||
{"snapshot":"R7F1RJHPZ428N4AK19K0"} |
2 changes: 1 addition & 1 deletion
2
icechunk-python/tests/data/test-repo/refs/branch.my-branch/ZZZZZZZX.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"snapshot":"949AXZ49X764TMDC6D4G"} | ||
{"snapshot":"TNE0TX645A2G7VTXFA1G"} |
2 changes: 1 addition & 1 deletion
2
icechunk-python/tests/data/test-repo/refs/branch.my-branch/ZZZZZZZY.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"snapshot":"SNF98D1SK7NWD5KQJM20"} | ||
{"snapshot":"394QWZDXAY74HP6Q8P3G"} |
2 changes: 1 addition & 1 deletion
2
icechunk-python/tests/data/test-repo/refs/branch.my-branch/ZZZZZZZZ.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"snapshot":"A2RD2Y65PR6D3B6BR1K0"} | ||
{"snapshot":"HNG82GMS51ECXFXFCYJG"} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"snapshot":"SNF98D1SK7NWD5KQJM20"} | ||
{"snapshot":"394QWZDXAY74HP6Q8P3G"} |
2 changes: 1 addition & 1 deletion
2
icechunk-python/tests/data/test-repo/refs/tag.it also works!/ref.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"snapshot":"949AXZ49X764TMDC6D4G"} | ||
{"snapshot":"TNE0TX645A2G7VTXFA1G"} |
2 changes: 1 addition & 1 deletion
2
icechunk-python/tests/data/test-repo/refs/tag.it works!/ref.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"snapshot":"SNF98D1SK7NWD5KQJM20"} | ||
{"snapshot":"394QWZDXAY74HP6Q8P3G"} |
Binary file added
BIN
+867 Bytes
icechunk-python/tests/data/test-repo/snapshots/394QWZDXAY74HP6Q8P3G
Binary file not shown.
Binary file added
BIN
+801 Bytes
icechunk-python/tests/data/test-repo/snapshots/3EKE17N8YF5ZK5NRMZJ0
Binary file not shown.
Binary file added
BIN
+867 Bytes
icechunk-python/tests/data/test-repo/snapshots/GNFK0SSWD5B8CVA53XEG
Binary file not shown.
Binary file added
BIN
+871 Bytes
icechunk-python/tests/data/test-repo/snapshots/HNG82GMS51ECXFXFCYJG
Binary file not shown.
Binary file added
BIN
+178 Bytes
icechunk-python/tests/data/test-repo/snapshots/R7F1RJHPZ428N4AK19K0
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+147 Bytes
icechunk-python/tests/data/test-repo/transactions/394QWZDXAY74HP6Q8P3G
Binary file not shown.
Binary file added
BIN
+157 Bytes
icechunk-python/tests/data/test-repo/transactions/3EKE17N8YF5ZK5NRMZJ0
Binary file not shown.
Binary file added
BIN
+235 Bytes
icechunk-python/tests/data/test-repo/transactions/GNFK0SSWD5B8CVA53XEG
Binary file not shown.
Binary file added
BIN
+146 Bytes
icechunk-python/tests/data/test-repo/transactions/HNG82GMS51ECXFXFCYJG
Binary file not shown.
Binary file added
BIN
+173 Bytes
icechunk-python/tests/data/test-repo/transactions/TNE0TX645A2G7VTXFA1G
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.