From 9216892c1574238f792dda1b0dff9a177a77192f Mon Sep 17 00:00:00 2001 From: Davis Vann Bennett Date: Thu, 22 Aug 2024 11:05:04 +0200 Subject: [PATCH 01/10] placeholder:clarify language --- docs/v3/codecs.rst | 6 +++++- docs/v3/core/v3.0.rst | 32 ++++++++++++++++---------------- 2 files changed, 21 insertions(+), 17 deletions(-) diff --git a/docs/v3/codecs.rst b/docs/v3/codecs.rst index 0bb25363..bc7eadce 100644 --- a/docs/v3/codecs.rst +++ b/docs/v3/codecs.rst @@ -2,7 +2,11 @@ Codecs ====== -Under construction. +The following documents specify the codecs that Zarr version 3 implementations MUST support. This collection of +codecs is chosen to form a shared basis for interoperability between Zarr implementations in different languages, +and to ensure that different Zarr implementations consistently implement key Zarr features. + +Read more about codecs in the :ref:`_zarr-core-specification-v3.0#codecs` section of the Zarr version 3 specification. .. toctree:: :glob: diff --git a/docs/v3/core/v3.0.rst b/docs/v3/core/v3.0.rst index 4f8d2569..5effa5c7 100644 --- a/docs/v3/core/v3.0.rst +++ b/docs/v3/core/v3.0.rst @@ -1214,9 +1214,22 @@ the following procedure: Specifying codecs ----------------- -To allow for flexibility to define and implement new codecs, this -specification does not define any codecs, nor restrict the set of -codecs that may be used. Each codec must be defined via a separate +Core codecs +----------- + +This specification depends on a set of core codecs which all Zarr implementations must implement. +The specifications of these core codecs are host alongside this specification in the +`zarr-specs GitHub repository`_, and which are +published on the `zarr-specs documentation Web site +`_. The list of core codecs can be changed via the same mechanism used for +changing this specification document. Because supporting all of the core codecs is required for Zarr implementations, +changes to the list of core codecs must be made in close collaboration with extant Zarr v3 implementations. In practice, +this means that active Zarr v3 implementations must first implement changes + +Non-core codecs +--------------- + +Each codec must be defined via a separate specification. In order to refer to codecs in array metadata documents, each codec must have a unique identifier, which is a URI that dereferences to a human-readable specification of the codec. A @@ -1231,19 +1244,6 @@ resulting compression ratio of the data. Configuration parameters must be declared in the codec specification, including a definition of how configuration parameters are represented as JSON. -The Zarr core development team maintains a repository of codec -specifications, which are hosted alongside this specification in the -`zarr-specs GitHub repository`_, and which are -published on the `zarr-specs documentation Web site -`_. For ease of discovery, it is -recommended that codec specifications are contributed to the -zarr-specs GitHub repository. However, codec specifications may be -maintained by any group or organisation and published in any location -on the Web. For further details of the process for contributing a -codec specification to the zarr-specs GitHub repository, see -`ZEP 0 `_ which describes -the process for Zarr specification changes. - Further details of how codecs are configured for an array are given in the `Array metadata`_ section. Stores From 92f3b12aad3eca736fb79507ac52c8385f34d037 Mon Sep 17 00:00:00 2001 From: Davis Vann Bennett Date: Tue, 10 Sep 2024 23:20:32 +0200 Subject: [PATCH 02/10] move codecs, introduce the concept of communtiy codecs --- docs/v3/codecs/{ => core}/blosc/v1.0.rst | 0 docs/v3/codecs/{ => core}/bytes/v1.0.rst | 0 docs/v3/codecs/{ => core}/crc32c/v1.0.rst | 0 docs/v3/codecs/{ => core}/gzip/v1.0.rst | 0 .../{ => core}/sharding-indexed/sharding.png | Bin .../{ => core}/sharding-indexed/v1.0.rst | 0 docs/v3/codecs/{ => core}/transpose/v1.0.rst | 0 docs/v3/core/v3.0.rst | 65 +++++++++--------- 8 files changed, 31 insertions(+), 34 deletions(-) rename docs/v3/codecs/{ => core}/blosc/v1.0.rst (100%) rename docs/v3/codecs/{ => core}/bytes/v1.0.rst (100%) rename docs/v3/codecs/{ => core}/crc32c/v1.0.rst (100%) rename docs/v3/codecs/{ => core}/gzip/v1.0.rst (100%) rename docs/v3/codecs/{ => core}/sharding-indexed/sharding.png (100%) rename docs/v3/codecs/{ => core}/sharding-indexed/v1.0.rst (100%) rename docs/v3/codecs/{ => core}/transpose/v1.0.rst (100%) diff --git a/docs/v3/codecs/blosc/v1.0.rst b/docs/v3/codecs/core/blosc/v1.0.rst similarity index 100% rename from docs/v3/codecs/blosc/v1.0.rst rename to docs/v3/codecs/core/blosc/v1.0.rst diff --git a/docs/v3/codecs/bytes/v1.0.rst b/docs/v3/codecs/core/bytes/v1.0.rst similarity index 100% rename from docs/v3/codecs/bytes/v1.0.rst rename to docs/v3/codecs/core/bytes/v1.0.rst diff --git a/docs/v3/codecs/crc32c/v1.0.rst b/docs/v3/codecs/core/crc32c/v1.0.rst similarity index 100% rename from docs/v3/codecs/crc32c/v1.0.rst rename to docs/v3/codecs/core/crc32c/v1.0.rst diff --git a/docs/v3/codecs/gzip/v1.0.rst b/docs/v3/codecs/core/gzip/v1.0.rst similarity index 100% rename from docs/v3/codecs/gzip/v1.0.rst rename to docs/v3/codecs/core/gzip/v1.0.rst diff --git a/docs/v3/codecs/sharding-indexed/sharding.png b/docs/v3/codecs/core/sharding-indexed/sharding.png similarity index 100% rename from docs/v3/codecs/sharding-indexed/sharding.png rename to docs/v3/codecs/core/sharding-indexed/sharding.png diff --git a/docs/v3/codecs/sharding-indexed/v1.0.rst b/docs/v3/codecs/core/sharding-indexed/v1.0.rst similarity index 100% rename from docs/v3/codecs/sharding-indexed/v1.0.rst rename to docs/v3/codecs/core/sharding-indexed/v1.0.rst diff --git a/docs/v3/codecs/transpose/v1.0.rst b/docs/v3/codecs/core/transpose/v1.0.rst similarity index 100% rename from docs/v3/codecs/transpose/v1.0.rst rename to docs/v3/codecs/core/transpose/v1.0.rst diff --git a/docs/v3/core/v3.0.rst b/docs/v3/core/v3.0.rst index 5effa5c7..15df4453 100644 --- a/docs/v3/core/v3.0.rst +++ b/docs/v3/core/v3.0.rst @@ -295,7 +295,7 @@ The following figure illustrates the first part of the terminology: *Codec* - The list of *codecs* specified for an array_ determine the encoded byte + The list of *codecs* specified for an array_ determines the encoded byte representation of each chunk in the store_. .. _metadata document: @@ -634,13 +634,12 @@ mandatory names: ``codecs`` ^^^^^^^^^^ - Specifies a list of codecs to be used for encoding and decoding chunks. The - value must be an array of objects, each object containing a member with - ``name`` whose value is a string referring to a v3 codec specification. The - codec object may also contain a ``configuration`` object which consists of - the parameter names and values as defined by the corresponding codec - specification. Since an ``array -> bytes`` codec must be specified, the - list cannot be empty. + Specifies a list of codecs to be used for encoding and decoding chunks. The + value MUST be an array of objects, where each object contains a member with the name + ``name`` whose value is a string. The ``name`` member identifies the specification for the codec. A + codec object MAY contain a member named ``configuration``, which is an object defined by the respective codec + specification. Because ``codecs`` MUST contain an ``array -> bytes`` codec, the + list cannot be empty (See :ref:`codecs `). The following members are optional: @@ -1217,34 +1216,32 @@ Specifying codecs Core codecs ----------- -This specification depends on a set of core codecs which all Zarr implementations must implement. -The specifications of these core codecs are host alongside this specification in the -`zarr-specs GitHub repository`_, and which are -published on the `zarr-specs documentation Web site -`_. The list of core codecs can be changed via the same mechanism used for -changing this specification document. Because supporting all of the core codecs is required for Zarr implementations, -changes to the list of core codecs must be made in close collaboration with extant Zarr v3 implementations. In practice, -this means that active Zarr v3 implementations must first implement changes +This spec defines a set of codecs ("core codecs") which all Zarr implementations MUST implement. +This requirement is intended to ensure a minimal level of interoperability between Zarr implementations. +Core codecs are each defined by specification documents which are hosted in the +`zarr-specs GitHub repository`_, and are published on the `zarr-specs documentation web site +`_. The list of core codecs is part of the Zarr v3 specification. +Changes to the list of core codecs MUST be made via the same protocol used for +changing the Zarr v3 specification. Changes to the list of core codecs SHOULD be made carefully and +in close collaboration with extant Zarr v3 implementations. A new core codec SHOULD be added to the +list when a sufficient number of Zarr implementations support or intend to support that codec. +An existing core codec SHOULD be removed from the list when a sufficient number of implementation +developers and Zarr users deem the codec worth removing, e.g. because of a technical flaw in the +algorithm underlying the codec. + +Community codecs +---------------- -Non-core codecs ---------------- +Zarr implementations MAY support a codec that is not in the list of core codecs +(hereafter termed a "community codec"), provided the community codec does not use an identifier +that is already used by a core codec, as the identifiers of core codecs are reserved. -Each codec must be defined via a separate -specification. In order to refer to codecs in array metadata -documents, each codec must have a unique identifier, which is a URI -that dereferences to a human-readable specification of the codec. A -codec specification must declare the codec identifier, and describe -(or cite documents that describe) the encoding and decoding algorithms -and the format of the encoded data. - -A codec may have configuration parameters which modify the behaviour -of the codec in some way. For example, a compression codec may have a -compression level parameter, which is an integer that affects the -resulting compression ratio of the data. Configuration parameters must -be declared in the codec specification, including a definition of how -configuration parameters are represented as JSON. - -Further details of how codecs are configured for an array are given in the `Array metadata`_ section. +This specification places no other constraints on community codecs. It is possible that separate +developers may define distinct codecs that use the same identifier. +To minimize the impact of such name collisions, codec developers are strongly encouraged +to publish their codec specifications as additions to the "community codecs" section of Zarr v3 specification. +Publication in the "community codecs" section does not confer primacy or an official designation to a codec. +The list of community codecs exists expressly as a tool to enable coordinated codec development. Stores ====== From 163a8676e568b9d94dfbec7cc48935a58a7aa8bd Mon Sep 17 00:00:00 2001 From: Davis Vann Bennett Date: Tue, 10 Sep 2024 23:40:05 +0200 Subject: [PATCH 03/10] fix anchors and update codecs page --- docs/v3/codecs.rst | 9 ++++++--- docs/v3/core/v3.0.rst | 4 ++++ 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/docs/v3/codecs.rst b/docs/v3/codecs.rst index bc7eadce..8083d9e5 100644 --- a/docs/v3/codecs.rst +++ b/docs/v3/codecs.rst @@ -2,16 +2,19 @@ Codecs ====== +Core codecs +----------- + The following documents specify the codecs that Zarr version 3 implementations MUST support. This collection of codecs is chosen to form a shared basis for interoperability between Zarr implementations in different languages, and to ensure that different Zarr implementations consistently implement key Zarr features. -Read more about codecs in the :ref:`_zarr-core-specification-v3.0#codecs` section of the Zarr version 3 specification. +Read more about core codecs in the :ref:`core codecs ` section of the Zarr version 3 specification. .. toctree:: :glob: :maxdepth: 1 :titlesonly: - :caption: Contents: + :caption: Core codecs: - codecs/*/* + codecs/core/*/* \ No newline at end of file diff --git a/docs/v3/core/v3.0.rst b/docs/v3/core/v3.0.rst index 15df4453..c6c9dc77 100644 --- a/docs/v3/core/v3.0.rst +++ b/docs/v3/core/v3.0.rst @@ -1210,9 +1210,13 @@ the following procedure: 4. The chunk array ``A`` is equal to ``EC[0]``. +.. _codec-specification: + Specifying codecs ----------------- +.. _core-codecs: + Core codecs ----------- From f273390d8b82cbc0b23773184047f2d7ca7c796d Mon Sep 17 00:00:00 2001 From: Davis Vann Bennett Date: Thu, 12 Sep 2024 13:40:26 +0200 Subject: [PATCH 04/10] core codecs: MUST support -> SHOULD support --- docs/v3/codecs.rst | 6 +----- docs/v3/core/v3.0.rst | 4 ++-- 2 files changed, 3 insertions(+), 7 deletions(-) diff --git a/docs/v3/codecs.rst b/docs/v3/codecs.rst index 8083d9e5..80a373ae 100644 --- a/docs/v3/codecs.rst +++ b/docs/v3/codecs.rst @@ -5,11 +5,7 @@ Codecs Core codecs ----------- -The following documents specify the codecs that Zarr version 3 implementations MUST support. This collection of -codecs is chosen to form a shared basis for interoperability between Zarr implementations in different languages, -and to ensure that different Zarr implementations consistently implement key Zarr features. - -Read more about core codecs in the :ref:`core codecs ` section of the Zarr version 3 specification. +The following documents specify the core codecs. Read more about core codecs in the :ref:`core codecs ` section of the Zarr version 3 specification. .. toctree:: :glob: diff --git a/docs/v3/core/v3.0.rst b/docs/v3/core/v3.0.rst index c6c9dc77..a5292d0e 100644 --- a/docs/v3/core/v3.0.rst +++ b/docs/v3/core/v3.0.rst @@ -1220,8 +1220,8 @@ Specifying codecs Core codecs ----------- -This spec defines a set of codecs ("core codecs") which all Zarr implementations MUST implement. -This requirement is intended to ensure a minimal level of interoperability between Zarr implementations. +This spec defines a set of codecs ("core codecs") which all Zarr implementations SHOULD implement in +order to ensure a minimal level of interoperability between Zarr implementations. Core codecs are each defined by specification documents which are hosted in the `zarr-specs GitHub repository`_, and are published on the `zarr-specs documentation web site `_. The list of core codecs is part of the Zarr v3 specification. From 4c101581578a7255a03bf58c441be5c14895d5fa Mon Sep 17 00:00:00 2001 From: Davis Vann Bennett Date: Thu, 12 Sep 2024 13:41:31 +0200 Subject: [PATCH 05/10] style --- docs/v3/core/v3.0.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/v3/core/v3.0.rst b/docs/v3/core/v3.0.rst index a5292d0e..69aab725 100644 --- a/docs/v3/core/v3.0.rst +++ b/docs/v3/core/v3.0.rst @@ -1222,7 +1222,7 @@ Core codecs This spec defines a set of codecs ("core codecs") which all Zarr implementations SHOULD implement in order to ensure a minimal level of interoperability between Zarr implementations. -Core codecs are each defined by specification documents which are hosted in the +Each core codec is each defined by specification documents which are hosted in the `zarr-specs GitHub repository`_, and are published on the `zarr-specs documentation web site `_. The list of core codecs is part of the Zarr v3 specification. Changes to the list of core codecs MUST be made via the same protocol used for From 04f99c7852b75b51476096f7fcedebe9a73dbe6b Mon Sep 17 00:00:00 2001 From: Davis Bennett Date: Fri, 4 Oct 2024 08:27:19 -0400 Subject: [PATCH 06/10] Update docs/v3/core/v3.0.rst Co-authored-by: Ryan Abernathey --- docs/v3/core/v3.0.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/v3/core/v3.0.rst b/docs/v3/core/v3.0.rst index 69aab725..f9557f7c 100644 --- a/docs/v3/core/v3.0.rst +++ b/docs/v3/core/v3.0.rst @@ -1240,8 +1240,8 @@ Zarr implementations MAY support a codec that is not in the list of core codecs (hereafter termed a "community codec"), provided the community codec does not use an identifier that is already used by a core codec, as the identifiers of core codecs are reserved. -This specification places no other constraints on community codecs. It is possible that separate -developers may define distinct codecs that use the same identifier. +This specification places no other constraints on community codecs. It is possible, through discouraged, +that separate developers may define distinct codecs that use the same identifier. To minimize the impact of such name collisions, codec developers are strongly encouraged to publish their codec specifications as additions to the "community codecs" section of Zarr v3 specification. Publication in the "community codecs" section does not confer primacy or an official designation to a codec. From c70bcb1950e3ea31e1e1fd42dd0355ad173674da Mon Sep 17 00:00:00 2001 From: Davis Vann Bennett Date: Mon, 7 Oct 2024 15:38:51 -0400 Subject: [PATCH 07/10] rename community codecs to extension codecs --- docs/v3/core/v3.0.rst | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/v3/core/v3.0.rst b/docs/v3/core/v3.0.rst index 69aab725..89e4aa41 100644 --- a/docs/v3/core/v3.0.rst +++ b/docs/v3/core/v3.0.rst @@ -1233,19 +1233,19 @@ An existing core codec SHOULD be removed from the list when a sufficient number developers and Zarr users deem the codec worth removing, e.g. because of a technical flaw in the algorithm underlying the codec. -Community codecs +Extension codecs ---------------- Zarr implementations MAY support a codec that is not in the list of core codecs -(hereafter termed a "community codec"), provided the community codec does not use an identifier -that is already used by a core codec, as the identifiers of core codecs are reserved. +(hereafter termed an "extension codec"), provided the extension codec does not use an identifier +that is already used by a core codec; the identifiers of core codecs are reserved. -This specification places no other constraints on community codecs. It is possible that separate +This specification places no other constraints on extension codecs. It is possible that separate developers may define distinct codecs that use the same identifier. To minimize the impact of such name collisions, codec developers are strongly encouraged -to publish their codec specifications as additions to the "community codecs" section of Zarr v3 specification. -Publication in the "community codecs" section does not confer primacy or an official designation to a codec. -The list of community codecs exists expressly as a tool to enable coordinated codec development. +to publish their codec specifications as additions to the "extension codecs" section of Zarr v3 specification. +Publication in the "extension codecs" section does not confer primacy or an official designation to a codec. +The list of extension codecs exists expressly as a tool to enable coordinated codec development. Stores ====== From 3519fe5ba34c235ddf174a907c8f7448336cd212 Mon Sep 17 00:00:00 2001 From: Davis Vann Bennett Date: Tue, 8 Oct 2024 07:56:35 +0200 Subject: [PATCH 08/10] community -> extension --- docs/v3/core/v3.0.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/v3/core/v3.0.rst b/docs/v3/core/v3.0.rst index 1720c0e9..5a54cbbb 100644 --- a/docs/v3/core/v3.0.rst +++ b/docs/v3/core/v3.0.rst @@ -1240,7 +1240,7 @@ Zarr implementations MAY support a codec that is not in the list of core codecs (hereafter termed an "extension codec"), provided the extension codec does not use an identifier that is already used by a core codec; the identifiers of core codecs are reserved. -This specification places no other constraints on community codecs. It is possible, through discouraged, +This specification places no other constraints on extension codecs. It is possible, through discouraged, that separate developers may define distinct codecs that use the same identifier. To minimize the impact of such name collisions, codec developers are strongly encouraged to publish their codec specifications as additions to the "extension codecs" section of Zarr v3 specification. From 4a5e52283b7e2d87d780230ec160271d8e5cff5c Mon Sep 17 00:00:00 2001 From: Josh Moore Date: Wed, 9 Oct 2024 22:06:57 +0200 Subject: [PATCH 09/10] Add missing 'have' --- docs/v3/core/v3.0.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/v3/core/v3.0.rst b/docs/v3/core/v3.0.rst index 5a54cbbb..fa74e035 100644 --- a/docs/v3/core/v3.0.rst +++ b/docs/v3/core/v3.0.rst @@ -775,7 +775,7 @@ above, but using a (currently made up) extension data type:: - ``chunks`` has been replaced with ``chunk_grid``, - ``dimension_separator`` has been replaced with ``chunk_key_encoding``, - ``order`` has been replaced by the :ref:`transpose ` codec, - - the separate ``filters`` and ``compressor`` fields been combined into the single ``codecs`` field. + - the separate ``filters`` and ``compressor`` fields have been combined into the single ``codecs`` field. .. _group-metadata: From a0f469f09068f60be7ecb61223836d6769dfc288 Mon Sep 17 00:00:00 2001 From: Davis Bennett Date: Thu, 10 Oct 2024 14:16:59 +0200 Subject: [PATCH 10/10] Update docs/v3/core/v3.0.rst Co-authored-by: David Stansby --- docs/v3/core/v3.0.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/v3/core/v3.0.rst b/docs/v3/core/v3.0.rst index fa74e035..0ab3cf2c 100644 --- a/docs/v3/core/v3.0.rst +++ b/docs/v3/core/v3.0.rst @@ -1226,7 +1226,7 @@ Each core codec is each defined by specification documents which are hosted in t `zarr-specs GitHub repository`_, and are published on the `zarr-specs documentation web site `_. The list of core codecs is part of the Zarr v3 specification. Changes to the list of core codecs MUST be made via the same protocol used for -changing the Zarr v3 specification. Changes to the list of core codecs SHOULD be made carefully and +changing the Zarr v3 specification. Changes to the list of core codecs SHOULD be made in close collaboration with extant Zarr v3 implementations. A new core codec SHOULD be added to the list when a sufficient number of Zarr implementations support or intend to support that codec. An existing core codec SHOULD be removed from the list when a sufficient number of implementation