).
+
+We propose the creation of a standalone Python package, "named-array". This package is envisioned to be a version of the `xarray.Variable` data structure, cleanly separated from the heavier dependencies of Xarray. named-array will provide a lightweight, user-friendly array-like data structure with named dimensions, facilitating convenient indexing and broadcasting. The package will use existing scientific Python community standards such as established array protocols and the new [Python array API standard](https://data-apis.org/array-api/latest), allowing users to wrap multiple duck-array objects, including, but not limited to, NumPy, Dask, Sparse, Pint, CuPy, and Pytorch.
+
+The development of named-array is projected to meet a key community need and expected to broaden Xarray's user base. By making the core `xarray.Variable` more accessible, we anticipate an increase in contributors and a reduction in the developer burden on current Xarray maintainers.
+
+### Goals
+
+1. **Simple and minimal**: named-array will expose Xarray's [Variable class](https://docs.xarray.dev/en/stable/internals/variable-objects.html) as a standalone object (`NamedArray`) with named axes (dimensions) and arbitrary metadata (attributes) but without coordinate labels. This will make it a lightweight, efficient array data structure that allows convenient broadcasting and indexing.
+
+2. **Interoperability**: named-array will follow established scientific Python community standards and in doing so, will allow it to wrap multiple duck-array objects, including but not limited to, NumPy, Dask, Sparse, Pint, CuPy, and Pytorch.
+
+3. **Community Engagement**: By making the core `xarray.Variable` more accessible, we open the door to increased adoption of this fundamental data structure. As such, we hope to see an increase in contributors and reduction in the developer burden on current Xarray maintainers.
+
+### Non-Goals
+
+1. **Extensive Data Analysis**: named-array will not provide extensive data analysis features like statistical functions, data cleaning, or visualization. Its primary focus is on providing a data structure that allows users to use dimension names for descriptive array manipulations.
+
+2. **Support for I/O**: named-array will not bundle file reading functions. Instead users will be expected to handle I/O and then wrap those arrays with the new named-array data structure.
+
+## Backward Compatibility
+
+The creation of named-array is intended to separate the `xarray.Variable` from Xarray into a standalone package. This allows it to be used independently, without the need for Xarray's dependencies, like Pandas. This separation has implications for backward compatibility.
+
+Since the new named-array is envisioned to contain the core features of Xarray's variable, existing code using Variable from Xarray should be able to switch to named-array with minimal changes. However, there are several potential issues related to backward compatibility:
+
+* **API Changes**: as the Variable is decoupled from Xarray and moved into named-array, some changes to the API may be necessary. These changes might include differences in function signature, etc. These changes could break existing code that relies on the current API and associated utility functions (e.g. `as_variable()`). The `xarray.Variable` object will subclass `NamedArray`, and provide the existing interface for compatibility.
+
+## Detailed Description
+
+named-array aims to provide a lightweight, efficient array structure with named dimensions, or axes, that enables convenient broadcasting and indexing. The primary component of named-array is a standalone version of the xarray.Variable data structure, which was previously a part of the Xarray library.
+The xarray.Variable data structure in named-array will maintain the core features of its counterpart in Xarray, including:
+
+* **Named Axes (Dimensions)**: Each axis of the array can be given a name, providing a descriptive and intuitive way to reference the dimensions of the array.
+
+* **Arbitrary Metadata (Attributes)**: named-array will support the attachment of arbitrary metadata to arrays as a dict, providing a mechanism to store additional information about the data that the array represents.
+
+* **Convenient Broadcasting and Indexing**: With named dimensions, broadcasting and indexing operations become more intuitive and less error-prone.
+
+The named-array package is designed to be interoperable with other scientific Python libraries. It will follow established scientific Python community standards and use standard array protocols, as well as the new data-apis standard. This allows named-array to wrap multiple duck-array objects, including, but not limited to, NumPy, Dask, Sparse, Pint, CuPy, and Pytorch.
+
+## Implementation
+
+* **Decoupling**: making `variable.py` agnostic to Xarray internals by decoupling it from the rest of the library. This will make the code more modular and easier to maintain. However, this will also make the code more complex, as we will need to define a clear interface for how the functionality in `variable.py` interacts with the rest of the library, particularly the ExplicitlyIndexed subclasses used to enable lazy indexing of data on disk.
+* **Move Xarray's internal lazy indexing classes to follow standard Array Protocols**: moving the lazy indexing classes like `ExplicitlyIndexed` to use standard array protocols will be a key step in decoupling. It will also potentially improve interoperability with other libraries that use these protocols, and prepare these classes [for eventual movement out](https://github.com/pydata/xarray/issues/5081) of the Xarray code base. However, this will also require significant changes to the code, and we will need to ensure that all existing functionality is preserved.
+ * Use [https://data-apis.org/array-api-compat/](https://data-apis.org/array-api-compat/) to handle compatibility issues?
+* **Leave lazy indexing classes in Xarray for now**
+* **Preserve support for Dask collection protocols**: named-array will preserve existing support for the dask collections protocol namely the __dask_***__ methods
+* **Preserve support for ChunkManagerEntrypoint?** Opening variables backed by dask vs cubed arrays currently is [handled within Variable.chunk](https://github.com/pydata/xarray/blob/92c8b33eb464b09d6f8277265b16cae039ab57ee/xarray/core/variable.py#L1272C15-L1272C15). If we are preserving dask support it would be nice to preserve general chunked array type support, but this currently requires an entrypoint.
+
+### Plan
+
+1. Create a new baseclass for `xarray.Variable` to its own module e.g. `xarray.core.base_variable`
+2. Remove all imports of internal Xarray classes and utils from `base_variable.py`. `base_variable.Variable` should not depend on anything in xarray.core
+ * Will require moving the lazy indexing classes (subclasses of ExplicitlyIndexed) to be standards compliant containers.`
+ * an array-api compliant container that provides **array_namespace**`
+ * Support `.oindex` and `.vindex` for explicit indexing
+ * Potentially implement this by introducing a new compliant wrapper object?
+ * Delete the `NON_NUMPY_SUPPORTED_ARRAY_TYPES` variable which special-cases ExplicitlyIndexed and `pd.Index.`
+ * `ExplicitlyIndexed` class and subclasses should provide `.oindex` and `.vindex` for indexing by `Variable.__getitem__.`: `oindex` and `vindex` were proposed in [NEP21](https://numpy.org/neps/nep-0021-advanced-indexing.html), but have not been implemented yet
+ * Delete the ExplicitIndexer objects (`BasicIndexer`, `VectorizedIndexer`, `OuterIndexer`)
+ * Remove explicit support for `pd.Index`. When provided with a `pd.Index` object, Variable will coerce to an array using `np.array(pd.Index)`. For Xarray's purposes, Xarray can use `as_variable` to explicitly wrap these in PandasIndexingAdapter and pass them to `Variable.__init__`.
+3. Define a minimal variable interface that the rest of Xarray can use:
+ 1. `dims`: tuple of dimension names
+ 2. `data`: numpy/dask/duck arrays`
+ 3. `attrs``: dictionary of attributes
+
+4. Implement basic functions & methods for manipulating these objects. These methods will be a cleaned-up subset (for now) of functionality on xarray.Variable, with adaptations inspired by the [Python array API](https://data-apis.org/array-api/2022.12/API_specification/index.html).
+5. Existing Variable structures
+ 1. Keep Variable object which subclasses the new structure that adds the `.encoding` attribute and potentially other methods needed for easy refactoring.
+ 2. IndexVariable will remain in xarray.core.variable and subclass the new named-array data structure pending future deletion.
+6. Docstrings and user-facing APIs will need to be updated to reflect the changed methods on Variable objects.
+
+Further implementation details are in Appendix: [Implementation Details](#appendix-implementation-details).
+
+## Project Timeline and Milestones
+
+We have identified the following milestones for the completion of this project:
+
+1. **Write and publish a design document**: this document will explain the purpose of named-array, the intended audience, and the features it will provide. It will also describe the architecture of named-array and how it will be implemented. This will ensure early community awareness and engagement in the project to promote subsequent uptake.
+2. **Refactor `variable.py` to `base_variable.py`** and remove internal Xarray imports.
+3. **Break out the package and create continuous integration infrastructure**: this will entail breaking out the named-array project into a Python package and creating a continuous integration (CI) system. This will help to modularize the code and make it easier to manage. Building a CI system will help ensure that codebase changes do not break existing functionality.
+4. Incrementally add new functions & methods to the new package, ported from xarray. This will start to make named-array useful on its own.
+5. Refactor the existing Xarray codebase to rely on the newly created package (named-array): This will help to demonstrate the usefulness of the new package, and also provide an example for others who may want to use it.
+6. Expand tests, add documentation, and write a blog post: expanding the test suite will help to ensure that the code is reliable and that changes do not introduce bugs. Adding documentation will make it easier for others to understand and use the project.
+7. Finally, we will write a series of blog posts on [xarray.dev](https://xarray.dev/) to promote the project and attract more contributors.
+ * Toward the end of the process, write a few blog posts that demonstrate the use of the newly available data structure
+ * pick the same example applications used by other implementations/applications (e.g. Pytorch, sklearn, and Levanter) to show how it can work.
+
+## Related Work
+
+1. [GitHub - deepmind/graphcast](https://github.com/deepmind/graphcast)
+2. [Getting Started — LArray 0.34 documentation](https://larray.readthedocs.io/en/stable/tutorial/getting_started.html)
+3. [Levanter — Legible, Scalable, Reproducible Foundation Models with JAX](https://crfm.stanford.edu/2023/06/16/levanter-1_0-release.html)
+4. [google/xarray-tensorstore](https://github.com/google/xarray-tensorstore)
+5. [State of Torch Named Tensors · Issue #60832 · pytorch/pytorch · GitHub](https://github.com/pytorch/pytorch/issues/60832)
+ * Incomplete support: Many primitive operations result in errors, making it difficult to use NamedTensors in Practice. Users often have to resort to removing the names from tensors to avoid these errors.
+ * Lack of active development: the development of the NamedTensor feature in PyTorch is not currently active due a lack of bandwidth for resolving ambiguities in the design.
+ * Usability issues: the current form of NamedTensor is not user-friendly and sometimes raises errors, making it difficult for users to incorporate NamedTensors into their workflows.
+6. [Scikit-learn Enhancement Proposals (SLEPs) 8, 12, 14](https://github.com/scikit-learn/enhancement_proposals/pull/18)
+ * Some of the key points and limitations discussed in these proposals are:
+ * Inconsistency in feature name handling: Scikit-learn currently lacks a consistent and comprehensive way to handle and propagate feature names through its pipelines and estimators ([SLEP 8](https://github.com/scikit-learn/enhancement_proposals/pull/18),[SLEP 12](https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep012/proposal.html)).
+ * Memory intensive for large feature sets: storing and propagating feature names can be memory intensive, particularly in cases where the entire "dictionary" becomes the features, such as in NLP use cases ([SLEP 8](https://github.com/scikit-learn/enhancement_proposals/pull/18),[GitHub issue #35](https://github.com/scikit-learn/enhancement_proposals/issues/35))
+ * Sparse matrices: sparse data structures present a challenge for feature name propagation. For instance, the sparse data structure functionality in Pandas 1.0 only supports converting directly to the coordinate format (COO), which can be an issue with transformers such as the OneHotEncoder.transform that has been optimized to construct a CSR matrix ([SLEP 14](https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep014/proposal.html))
+ * New Data structures: the introduction of new data structures, such as "InputArray" or "DataArray" could lead to more burden for third-party estimator maintainers and increase the learning curve for users. Xarray's "DataArray" is mentioned as a potential alternative, but the proposal mentions that the conversion from a Pandas dataframe to a Dataset is not lossless ([SLEP 12](https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep012/proposal.html),[SLEP 14](https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep014/proposal.html),[GitHub issue #35](https://github.com/scikit-learn/enhancement_proposals/issues/35)).
+ * Dependency on other libraries: solutions that involve using Xarray and/or Pandas to handle feature names come with the challenge of managing dependencies. While a soft dependency approach is suggested, this means users would be able to have/enable the feature only if they have the dependency installed. Xarra-lite's integration with other scientific Python libraries could potentially help with this issue ([GitHub issue #35](https://github.com/scikit-learn/enhancement_proposals/issues/35)).
+
+## References and Previous Discussion
+
+* [[Proposal] Expose Variable without Pandas dependency · Issue #3981 · pydata/xarray · GitHub](https://github.com/pydata/xarray/issues/3981)
+* [https://github.com/pydata/xarray/issues/3981#issuecomment-985051449](https://github.com/pydata/xarray/issues/3981#issuecomment-985051449)
+* [Lazy indexing arrays as a stand-alone package · Issue #5081 · pydata/xarray · GitHub](https://github.com/pydata/xarray/issues/5081)
+
+### Appendix: Engagement with the Community
+
+We plan to publicize this document on :
+
+* [x] `Xarray dev call`
+* [ ] `Scientific Python discourse`
+* [ ] `Xarray Github`
+* [ ] `Twitter`
+* [ ] `Respond to NamedTensor and Scikit-Learn issues?`
+* [ ] `Pangeo Discourse`
+* [ ] `Numpy, SciPy email lists?`
+* [ ] `Xarray blog`
+
+Additionally, We plan on writing a series of blog posts to effectively showcase the implementation and potential of the newly available functionality. To illustrate this, we will use the same example applications as other established libraries (such as Pytorch, sklearn), providing practical demonstrations of how these new data structures can be leveraged.
+
+### Appendix: API Surface
+
+Questions:
+
+1. Document Xarray indexing rules
+2. Document use of .oindex and .vindex protocols
+3. Do we use `.mean` and `.nanmean` or `.mean(skipna=...)`?
+ * Default behavior in named-array should mirror NumPy / the array API standard, not pandas.
+ * nanmean is not (yet) in the [array API](https://github.com/pydata/xarray/pull/7424#issuecomment-1373979208). There are a handful of other key functions (e.g., median) that are are also missing. I think that should be OK, as long as what we support is a strict superset of the array API.
+4. What methods need to be exposed on Variable?
+ * `Variable.concat` classmethod: create two functions, one as the equivalent of `np.stack` and other for `np.concat`
+ * `.rolling_window` and `.coarsen_reshape` ?
+ * `named-array.apply_ufunc`: used in astype, clip, quantile, isnull, notnull`
+
+#### methods to be preserved from xarray.Variable
+
+```python
+# Sorting
+ Variable.argsort
+ Variable.searchsorted
+
+# NaN handling
+ Variable.fillna
+ Variable.isnull
+ Variable.notnull
+
+# Lazy data handling
+ Variable.chunk # Could instead have accessor interface and recommend users use `Variable.dask.chunk` and `Variable.cubed.chunk`?
+ Variable.to_numpy()
+ Variable.as_numpy()
+
+# Xarray-specific
+ Variable.get_axis_num
+ Variable.isel
+ Variable.to_dict
+
+# Reductions
+ Variable.reduce
+ Variable.all
+ Variable.any
+ Variable.argmax
+ Variable.argmin
+ Variable.count
+ Variable.max
+ Variable.mean
+ Variable.median
+ Variable.min
+ Variable.prod
+ Variable.quantile
+ Variable.std
+ Variable.sum
+ Variable.var
+
+# Accumulate
+ Variable.cumprod
+ Variable.cumsum
+
+# numpy-like Methods
+ Variable.astype
+ Variable.copy
+ Variable.clip
+ Variable.round
+ Variable.item
+ Variable.where
+
+# Reordering/Reshaping
+ Variable.squeeze
+ Variable.pad
+ Variable.roll
+ Variable.shift
+
+```
+
+#### methods to be renamed from xarray.Variable
+
+```python
+# Xarray-specific
+ Variable.concat # create two functions, one as the equivalent of `np.stack` and other for `np.concat`
+
+ # Given how niche these are, these would be better as functions than methods.
+ # We could also keep these in Xarray, at least for now. If we don't think people will use functionality outside of Xarray it probably is not worth the trouble of porting it (including documentation, etc).
+ Variable.coarsen # This should probably be called something like coarsen_reduce.
+ Variable.coarsen_reshape
+ Variable.rolling_window
+
+ Variable.set_dims # split this into broadcas_to and expand_dims
+
+
+# Reordering/Reshaping
+ Variable.stack # To avoid confusion with np.stack, let's call this stack_dims.
+ Variable.transpose # Could consider calling this permute_dims, like the [array API standard](https://data-apis.org/array-api/2022.12/API_specification/manipulation_functions.html#objects-in-api)
+ Variable.unstack # Likewise, maybe call this unstack_dims?
+```
+
+#### methods to be removed from xarray.Variable
+
+```python
+# Testing
+ Variable.broadcast_equals
+ Variable.equals
+ Variable.identical
+ Variable.no_conflicts
+
+# Lazy data handling
+ Variable.compute # We can probably omit this method for now, too, given that dask.compute() uses a protocol. The other concern is that different array libraries have different notions of "compute" and this one is rather Dask specific, including conversion from Dask to NumPy arrays. For example, in JAX every operation executes eagerly, but in a non-blocking fashion, and you need to call jax.block_until_ready() to ensure computation is finished.
+ Variable.load # Could remove? compute vs load is a common source of confusion.
+
+# Xarray-specific
+ Variable.to_index
+ Variable.to_index_variable
+ Variable.to_variable
+ Variable.to_base_variable
+ Variable.to_coord
+
+ Variable.rank # Uses bottleneck. Delete? Could use https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rankdata.html instead
+
+
+# numpy-like Methods
+ Variable.conjugate # .conj is enough
+ Variable.__array_wrap__ # This is a very old NumPy protocol for duck arrays. We don't need it now that we have `__array_ufunc__` and `__array_function__`
+
+# Encoding
+ Variable.reset_encoding
+
+```
+
+#### Attributes to be preserved from xarray.Variable
+
+```python
+# Properties
+ Variable.attrs
+ Variable.chunks
+ Variable.data
+ Variable.dims
+ Variable.dtype
+
+ Variable.nbytes
+ Variable.ndim
+ Variable.shape
+ Variable.size
+ Variable.sizes
+
+ Variable.T
+ Variable.real
+ Variable.imag
+ Variable.conj
+```
+
+#### Attributes to be renamed from xarray.Variable
+
+```python
+```
+
+#### Attributes to be removed from xarray.Variable
+
+```python
+
+ Variable.values # Probably also remove -- this is a legacy from before Xarray supported dask arrays. ".data" is enough.
+
+# Encoding
+ Variable.encoding
+
+```
+
+### Appendix: Implementation Details
+
+* Merge in VariableArithmetic's parent classes: AbstractArray, NdimSizeLenMixin with the new data structure..
+
+```python
+class VariableArithmetic(
+ ImplementsArrayReduce,
+ IncludeReduceMethods,
+ IncludeCumMethods,
+ IncludeNumpySameMethods,
+ SupportsArithmetic,
+ VariableOpsMixin,
+):
+ __slots__ = ()
+ # prioritize our operations over those of numpy.ndarray (priority=0)
+ __array_priority__ = 50
+
+```
+
+* Move over `_typed_ops.VariableOpsMixin`
+* Build a list of utility functions used elsewhere : Which of these should become public API?
+ * `broadcast_variables`: `dataset.py`, `dataarray.py`,`missing.py`
+ * This could be just called "broadcast" in named-array.
+ * `Variable._getitem_with_mask` : `alignment.py`
+ * keep this method/function as private and inside Xarray.
+* The Variable constructor will need to be rewritten to no longer accept tuples, encodings, etc. These details should be handled at the Xarray data structure level.
+* What happens to `duck_array_ops?`
+* What about Variable.chunk and "chunk managers"?
+ * Could this functionality be left in Xarray proper for now? Alternative array types like JAX also have some notion of "chunks" for parallel arrays, but the details differ in a number of ways from the Dask/Cubed.
+ * Perhaps variable.chunk/load methods should become functions defined in xarray that convert Variable objects. This is easy so long as xarray can reach in and replace .data
+
+* Utility functions like `as_variable` should be moved out of `base_variable.py` so they can convert BaseVariable objects to/from DataArray or Dataset containing explicitly indexed arrays.
diff --git a/doc/api-hidden.rst b/doc/api-hidden.rst
index 5d825be2e08..527bdcdede2 100644
--- a/doc/api-hidden.rst
+++ b/doc/api-hidden.rst
@@ -9,17 +9,40 @@
.. autosummary::
:toctree: generated/
+ Coordinates.from_pandas_multiindex
+ Coordinates.get
+ Coordinates.items
+ Coordinates.keys
+ Coordinates.values
+ Coordinates.dims
+ Coordinates.dtypes
+ Coordinates.variables
+ Coordinates.xindexes
+ Coordinates.indexes
+ Coordinates.to_dataset
+ Coordinates.to_index
+ Coordinates.update
+ Coordinates.merge
+ Coordinates.copy
+ Coordinates.equals
+ Coordinates.identical
+
core.coordinates.DatasetCoordinates.get
core.coordinates.DatasetCoordinates.items
core.coordinates.DatasetCoordinates.keys
- core.coordinates.DatasetCoordinates.merge
- core.coordinates.DatasetCoordinates.to_dataset
- core.coordinates.DatasetCoordinates.to_index
- core.coordinates.DatasetCoordinates.update
core.coordinates.DatasetCoordinates.values
core.coordinates.DatasetCoordinates.dims
- core.coordinates.DatasetCoordinates.indexes
+ core.coordinates.DatasetCoordinates.dtypes
core.coordinates.DatasetCoordinates.variables
+ core.coordinates.DatasetCoordinates.xindexes
+ core.coordinates.DatasetCoordinates.indexes
+ core.coordinates.DatasetCoordinates.to_dataset
+ core.coordinates.DatasetCoordinates.to_index
+ core.coordinates.DatasetCoordinates.update
+ core.coordinates.DatasetCoordinates.merge
+ core.coordinates.DataArrayCoordinates.copy
+ core.coordinates.DatasetCoordinates.equals
+ core.coordinates.DatasetCoordinates.identical
core.rolling.DatasetCoarsen.boundary
core.rolling.DatasetCoarsen.coord_func
@@ -47,14 +70,19 @@
core.coordinates.DataArrayCoordinates.get
core.coordinates.DataArrayCoordinates.items
core.coordinates.DataArrayCoordinates.keys
- core.coordinates.DataArrayCoordinates.merge
- core.coordinates.DataArrayCoordinates.to_dataset
- core.coordinates.DataArrayCoordinates.to_index
- core.coordinates.DataArrayCoordinates.update
core.coordinates.DataArrayCoordinates.values
core.coordinates.DataArrayCoordinates.dims
- core.coordinates.DataArrayCoordinates.indexes
+ core.coordinates.DataArrayCoordinates.dtypes
core.coordinates.DataArrayCoordinates.variables
+ core.coordinates.DataArrayCoordinates.xindexes
+ core.coordinates.DataArrayCoordinates.indexes
+ core.coordinates.DataArrayCoordinates.to_dataset
+ core.coordinates.DataArrayCoordinates.to_index
+ core.coordinates.DataArrayCoordinates.update
+ core.coordinates.DataArrayCoordinates.merge
+ core.coordinates.DataArrayCoordinates.copy
+ core.coordinates.DataArrayCoordinates.equals
+ core.coordinates.DataArrayCoordinates.identical
core.rolling.DataArrayCoarsen.boundary
core.rolling.DataArrayCoarsen.coord_func
@@ -451,6 +479,21 @@
CFTimeIndex.values
CFTimeIndex.year
+ Index.from_variables
+ Index.concat
+ Index.stack
+ Index.unstack
+ Index.create_variables
+ Index.to_pandas_index
+ Index.isel
+ Index.sel
+ Index.join
+ Index.reindex_like
+ Index.equals
+ Index.roll
+ Index.rename
+ Index.copy
+
backends.NetCDF4DataStore.close
backends.NetCDF4DataStore.encode
backends.NetCDF4DataStore.encode_attribute
diff --git a/doc/api.rst b/doc/api.rst
index 34d6558ed55..0cf07f91df8 100644
--- a/doc/api.rst
+++ b/doc/api.rst
@@ -1085,12 +1085,14 @@ Advanced API
.. autosummary::
:toctree: generated/
+ Coordinates
Dataset.variables
DataArray.variable
Variable
IndexVariable
as_variable
- indexes.Index
+ Index
+ IndexSelResult
Context
register_dataset_accessor
register_dataarray_accessor
diff --git a/doc/conf.py b/doc/conf.py
index eb861004e2f..6c6efb47f6b 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -58,7 +58,7 @@
]
)
-nbsphinx_allow_errors = True
+nbsphinx_allow_errors = False
# -- General configuration ------------------------------------------------
@@ -238,7 +238,7 @@
extra_footer="""Xarray is a fiscally sponsored project of NumFOCUS,
a nonprofit dedicated to supporting the open-source scientific computing community.
Theme by the Executable Book Project
""",
- twitter_url="https://twitter.com/xarray_devs",
+ twitter_url="https://twitter.com/xarray_dev",
icon_links=[], # workaround for pydata/pydata-sphinx-theme#1220
)
@@ -323,6 +323,7 @@
"dask": ("https://docs.dask.org/en/latest", None),
"cftime": ("https://unidata.github.io/cftime", None),
"sparse": ("https://sparse.pydata.org/en/latest/", None),
+ "cubed": ("https://tom-e-white.com/cubed/", None),
}
diff --git a/doc/contributing.rst b/doc/contributing.rst
index 3cc43314d9a..3cdd7dd9933 100644
--- a/doc/contributing.rst
+++ b/doc/contributing.rst
@@ -518,7 +518,7 @@ See the `Installation `_
Including figures and files
----------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
Image files can be directly included in pages with the ``image::`` directive.
diff --git a/doc/developers-meeting.rst b/doc/developers-meeting.rst
index 1c49a900f66..153f3520f26 100644
--- a/doc/developers-meeting.rst
+++ b/doc/developers-meeting.rst
@@ -3,18 +3,18 @@ Developers meeting
Xarray developers meet bi-weekly every other Wednesday.
-The meeting occurs on `Zoom `__.
+The meeting occurs on `Zoom `__.
-Find the `notes for the meeting here `__.
+Find the `notes for the meeting here `__.
There is a :issue:`GitHub issue for changes to the meeting<4001>`.
You can subscribe to this calendar to be notified of changes:
-* `Google Calendar `__
-* `iCal `__
+* `Google Calendar `__
+* `iCal `__
.. raw:: html
-
+
diff --git a/doc/examples/multidimensional-coords.ipynb b/doc/examples/multidimensional-coords.ipynb
index f7471f05e5d..ce8a091a5da 100644
--- a/doc/examples/multidimensional-coords.ipynb
+++ b/doc/examples/multidimensional-coords.ipynb
@@ -56,7 +56,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "In this example, the _logical coordinates_ are `x` and `y`, while the _physical coordinates_ are `xc` and `yc`, which represent the latitudes and longitude of the data."
+ "In this example, the _logical coordinates_ are `x` and `y`, while the _physical coordinates_ are `xc` and `yc`, which represent the longitudes and latitudes of the data."
]
},
{
diff --git a/doc/examples/visualization_gallery.ipynb b/doc/examples/visualization_gallery.ipynb
index e6fa564db0d..e7e9196a6f6 100644
--- a/doc/examples/visualization_gallery.ipynb
+++ b/doc/examples/visualization_gallery.ipynb
@@ -193,90 +193,6 @@
"# Show\n",
"plt.tight_layout()"
]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "jp-MarkdownHeadingCollapsed": true,
- "tags": []
- },
- "source": [
- "## `imshow()` and rasterio map projections\n",
- "\n",
- "\n",
- "Using rasterio's projection information for more accurate plots.\n",
- "\n",
- "This example extends `recipes.rasterio` and plots the image in the\n",
- "original map projection instead of relying on pcolormesh and a map\n",
- "transformation."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "da = xr.tutorial.open_rasterio(\"RGB.byte\")\n",
- "\n",
- "# The data is in UTM projection. We have to set it manually until\n",
- "# https://github.com/SciTools/cartopy/issues/813 is implemented\n",
- "crs = ccrs.UTM(\"18\")\n",
- "\n",
- "# Plot on a map\n",
- "ax = plt.subplot(projection=crs)\n",
- "da.plot.imshow(ax=ax, rgb=\"band\", transform=crs)\n",
- "ax.coastlines(\"10m\", color=\"r\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Parsing rasterio geocoordinates\n",
- "\n",
- "Converting a projection's cartesian coordinates into 2D longitudes and\n",
- "latitudes.\n",
- "\n",
- "These new coordinates might be handy for plotting and indexing, but it should\n",
- "be kept in mind that a grid which is regular in projection coordinates will\n",
- "likely be irregular in lon/lat. It is often recommended to work in the data's\n",
- "original map projection (see `recipes.rasterio_rgb`)."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from pyproj import Transformer\n",
- "import numpy as np\n",
- "\n",
- "da = xr.tutorial.open_rasterio(\"RGB.byte\")\n",
- "\n",
- "x, y = np.meshgrid(da[\"x\"], da[\"y\"])\n",
- "transformer = Transformer.from_crs(da.crs, \"EPSG:4326\", always_xy=True)\n",
- "lon, lat = transformer.transform(x, y)\n",
- "da.coords[\"lon\"] = ((\"y\", \"x\"), lon)\n",
- "da.coords[\"lat\"] = ((\"y\", \"x\"), lat)\n",
- "\n",
- "# Compute a greyscale out of the rgb image\n",
- "greyscale = da.mean(dim=\"band\")\n",
- "\n",
- "# Plot on a map\n",
- "ax = plt.subplot(projection=ccrs.PlateCarree())\n",
- "greyscale.plot(\n",
- " ax=ax,\n",
- " x=\"lon\",\n",
- " y=\"lat\",\n",
- " transform=ccrs.PlateCarree(),\n",
- " cmap=\"Greys_r\",\n",
- " shading=\"auto\",\n",
- " add_colorbar=False,\n",
- ")\n",
- "ax.coastlines(\"10m\", color=\"r\")"
- ]
}
],
"metadata": {
@@ -296,6 +212,13 @@
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
+ },
+ "widgets": {
+ "application/vnd.jupyter.widget-state+json": {
+ "state": {},
+ "version_major": 2,
+ "version_minor": 0
+ }
}
},
"nbformat": 4,
diff --git a/doc/getting-started-guide/installing.rst b/doc/getting-started-guide/installing.rst
index 9fee849a341..ff8650bc0ff 100644
--- a/doc/getting-started-guide/installing.rst
+++ b/doc/getting-started-guide/installing.rst
@@ -7,7 +7,7 @@ Required dependencies
---------------------
- Python (3.9 or later)
-- `numpy `__ (1.21 or later)
+- `numpy `__ (1.22 or later)
- `packaging `__ (21.3 or later)
- `pandas `__ (1.4 or later)
diff --git a/doc/howdoi.rst b/doc/howdoi.rst
index b6374cc5100..8cc4e9939f2 100644
--- a/doc/howdoi.rst
+++ b/doc/howdoi.rst
@@ -42,7 +42,7 @@ How do I ...
* - extract the underlying array (e.g. NumPy or Dask arrays)
- :py:attr:`DataArray.data`
* - convert to and extract the underlying NumPy array
- - :py:attr:`DataArray.values`
+ - :py:attr:`DataArray.to_numpy`
* - convert to a pandas DataFrame
- :py:attr:`Dataset.to_dataframe`
* - sort values
diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst
new file mode 100644
index 00000000000..7192c3f0bc5
--- /dev/null
+++ b/doc/internals/chunked-arrays.rst
@@ -0,0 +1,102 @@
+.. currentmodule:: xarray
+
+.. _internals.chunkedarrays:
+
+Alternative chunked array types
+===============================
+
+.. warning::
+
+ This is a *highly* experimental feature. Please report any bugs or other difficulties on `xarray's issue tracker `_.
+ In particular see discussion on `xarray issue #6807 `_
+
+Xarray can wrap chunked dask arrays (see :ref:`dask`), but can also wrap any other chunked array type that exposes the correct interface.
+This allows us to support using other frameworks for distributed and out-of-core processing, with user code still written as xarray commands.
+In particular xarray also supports wrapping :py:class:`cubed.Array` objects
+(see `Cubed's documentation `_ and the `cubed-xarray package `_).
+
+The basic idea is that by wrapping an array that has an explicit notion of ``.chunks``, xarray can expose control over
+the choice of chunking scheme to users via methods like :py:meth:`DataArray.chunk` whilst the wrapped array actually
+implements the handling of processing all of the chunks.
+
+Chunked array methods and "core operations"
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A chunked array needs to meet all the :ref:`requirements for normal duck arrays `, but must also
+implement additional features.
+
+Chunked arrays have additional attributes and methods, such as ``.chunks`` and ``.rechunk``.
+Furthermore, Xarray dispatches chunk-aware computations across one or more chunked arrays using special functions known
+as "core operations". Examples include ``map_blocks``, ``blockwise``, and ``apply_gufunc``.
+
+The core operations are generalizations of functions first implemented in :py:mod:`dask.array`.
+The implementation of these functions is specific to the type of arrays passed to them. For example, when applying the
+``map_blocks`` core operation, :py:class:`dask.array.Array` objects must be processed by :py:func:`dask.array.map_blocks`,
+whereas :py:class:`cubed.Array` objects must be processed by :py:func:`cubed.map_blocks`.
+
+In order to use the correct implementation of a core operation for the array type encountered, xarray dispatches to the
+corresponding subclass of :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint`,
+also known as a "Chunk Manager". Therefore **a full list of the operations that need to be defined is set by the
+API of the** :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` **abstract base class**. Note that chunked array
+methods are also currently dispatched using this class.
+
+Chunked array creation is also handled by this class. As chunked array objects have a one-to-one correspondence with
+in-memory numpy arrays, it should be possible to create a chunked array from a numpy array by passing the desired
+chunking pattern to an implementation of :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint.from_array``.
+
+.. note::
+
+ The :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` abstract base class is mostly just acting as a
+ namespace for containing the chunked-aware function primitives. Ideally in the future we would have an API standard
+ for chunked array types which codified this structure, making the entrypoint system unnecessary.
+
+.. currentmodule:: xarray.core.parallelcompat
+
+.. autoclass:: xarray.core.parallelcompat.ChunkManagerEntrypoint
+ :members:
+
+Registering a new ChunkManagerEntrypoint subclass
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Rather than hard-coding various chunk managers to deal with specific chunked array implementations, xarray uses an
+entrypoint system to allow developers of new chunked array implementations to register their corresponding subclass of
+:py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint`.
+
+
+To register a new entrypoint you need to add an entry to the ``setup.cfg`` like this::
+
+ [options.entry_points]
+ xarray.chunkmanagers =
+ dask = xarray.core.daskmanager:DaskManager
+
+See also `cubed-xarray `_ for another example.
+
+To check that the entrypoint has worked correctly, you may find it useful to display the available chunkmanagers using
+the internal function :py:func:`~xarray.core.parallelcompat.list_chunkmanagers`.
+
+.. autofunction:: list_chunkmanagers
+
+
+User interface
+~~~~~~~~~~~~~~
+
+Once the chunkmanager subclass has been registered, xarray objects wrapping the desired array type can be created in 3 ways:
+
+#. By manually passing the array type to the :py:class:`~xarray.DataArray` constructor, see the examples for :ref:`numpy-like arrays `,
+
+#. Calling :py:meth:`~xarray.DataArray.chunk`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``,
+
+#. Calling :py:func:`~xarray.open_dataset`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``.
+
+The latter two methods ultimately call the chunkmanager's implementation of ``.from_array``, to which they pass the ``from_array_kwargs`` dict.
+The ``chunked_array_type`` kwarg selects which registered chunkmanager subclass to dispatch to. It defaults to ``'dask'``
+if Dask is installed, otherwise it defaults to whichever chunkmanager is registered if only one is registered.
+If multiple chunkmanagers are registered it will raise an error by default.
+
+Parallel processing without chunks
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To use a parallel array type that does not expose a concept of chunks explicitly, none of the information on this page
+is theoretically required. Such an array type (e.g. `Ramba `_ or
+`Arkouda `_) could be wrapped using xarray's existing support for
+:ref:`numpy-like "duck" arrays `.
diff --git a/doc/internals/duck-arrays-integration.rst b/doc/internals/duck-arrays-integration.rst
index d403328aa2f..1f1f57974df 100644
--- a/doc/internals/duck-arrays-integration.rst
+++ b/doc/internals/duck-arrays-integration.rst
@@ -1,23 +1,57 @@
-.. _internals.duck_arrays:
+.. _internals.duckarrays:
Integrating with duck arrays
=============================
.. warning::
- This is a experimental feature.
+ This is an experimental feature. Please report any bugs or other difficulties on `xarray's issue tracker `_.
-Xarray can wrap custom :term:`duck array` objects as long as they define numpy's
-``shape``, ``dtype`` and ``ndim`` properties and the ``__array__``,
-``__array_ufunc__`` and ``__array_function__`` methods.
+Xarray can wrap custom numpy-like arrays (":term:`duck array`\s") - see the :ref:`user guide documentation `.
+This page is intended for developers who are interested in wrapping a new custom array type with xarray.
+
+.. _internals.duckarrays.requirements:
+
+Duck array requirements
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Xarray does not explicitly check that required methods are defined by the underlying duck array object before
+attempting to wrap the given array. However, a wrapped array type should at a minimum define these attributes:
+
+* ``shape`` property,
+* ``dtype`` property,
+* ``ndim`` property,
+* ``__array__`` method,
+* ``__array_ufunc__`` method,
+* ``__array_function__`` method.
+
+These need to be defined consistently with :py:class:`numpy.ndarray`, for example the array ``shape``
+property needs to obey `numpy's broadcasting rules `_
+(see also the `Python Array API standard's explanation `_
+of these same rules).
+
+Python Array API standard support
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As an integration library xarray benefits greatly from the standardization of duck-array libraries' APIs, and so is a
+big supporter of the `Python Array API Standard `_. .
+
+We aim to support any array libraries that follow the Array API standard out-of-the-box. However, xarray does occasionally
+call some numpy functions which are not (yet) part of the standard (e.g. :py:meth:`xarray.DataArray.pad` calls :py:func:`numpy.pad`).
+See `xarray issue #7848 `_ for a list of such functions. We can still support dispatching on these functions through
+the array protocols above, it just means that if you exclusively implement the methods in the Python Array API standard
+then some features in xarray will not work.
+
+Custom inline reprs
+~~~~~~~~~~~~~~~~~~~
In certain situations (e.g. when printing the collapsed preview of
variables of a ``Dataset``), xarray will display the repr of a :term:`duck array`
in a single line, truncating it to a certain number of characters. If that
would drop too much information, the :term:`duck array` may define a
``_repr_inline_`` method that takes ``max_width`` (number of characters) as an
-argument:
+argument
.. code:: python
diff --git a/doc/internals/extending-xarray.rst b/doc/internals/extending-xarray.rst
index 56aeb8fa462..a180b85044f 100644
--- a/doc/internals/extending-xarray.rst
+++ b/doc/internals/extending-xarray.rst
@@ -1,4 +1,6 @@
+.. _internals.accessors:
+
Extending xarray using accessors
================================
diff --git a/doc/internals/how-to-create-custom-index.rst b/doc/internals/how-to-create-custom-index.rst
new file mode 100644
index 00000000000..93805229db1
--- /dev/null
+++ b/doc/internals/how-to-create-custom-index.rst
@@ -0,0 +1,233 @@
+.. currentmodule:: xarray
+
+How to create a custom index
+============================
+
+.. warning::
+
+ This feature is highly experimental. Support for custom indexes has been
+ introduced in v2022.06.0 and is still incomplete. API is subject to change
+ without deprecation notice. However we encourage you to experiment and report issues that arise.
+
+Xarray's built-in support for label-based indexing (e.g. `ds.sel(latitude=40, method="nearest")`) and alignment operations
+relies on :py:class:`pandas.Index` objects. Pandas Indexes are powerful and suitable for many
+applications but also have some limitations:
+
+- it only works with 1-dimensional coordinates where explicit labels
+ are fully loaded in memory
+- it is hard to reuse it with irregular data for which there exist more
+ efficient, tree-based structures to perform data selection
+- it doesn't support extra metadata that may be required for indexing and
+ alignment (e.g., a coordinate reference system)
+
+Fortunately, Xarray now allows extending this functionality with custom indexes,
+which can be implemented in 3rd-party libraries.
+
+The Index base class
+--------------------
+
+Every Xarray index must inherit from the :py:class:`Index` base class. It is for
+example the case of Xarray built-in ``PandasIndex`` and ``PandasMultiIndex``
+subclasses, which wrap :py:class:`pandas.Index` and
+:py:class:`pandas.MultiIndex` respectively.
+
+The ``Index`` API closely follows the :py:class:`Dataset` and
+:py:class:`DataArray` API, e.g., for an index to support :py:meth:`DataArray.sel` it needs to
+implement :py:meth:`Index.sel`, to support :py:meth:`DataArray.stack` and :py:meth:`DataArray.unstack` it
+needs to implement :py:meth:`Index.stack` and :py:meth:`Index.unstack`, etc.
+
+Some guidelines and examples are given below. More details can be found in the
+documented :py:class:`Index` API.
+
+Minimal requirements
+--------------------
+
+Every index must at least implement the :py:meth:`Index.from_variables` class
+method, which is used by Xarray to build a new index instance from one or more
+existing coordinates in a Dataset or DataArray.
+
+Since any collection of coordinates can be passed to that method (i.e., the
+number, order and dimensions of the coordinates are all arbitrary), it is the
+responsibility of the index to check the consistency and validity of those input
+coordinates.
+
+For example, :py:class:`~xarray.core.indexes.PandasIndex` accepts only one coordinate and
+:py:class:`~xarray.core.indexes.PandasMultiIndex` accepts one or more 1-dimensional coordinates that must all
+share the same dimension. Other, custom indexes need not have the same
+constraints, e.g.,
+
+- a georeferenced raster index which only accepts two 1-d coordinates with
+ distinct dimensions
+- a staggered grid index which takes coordinates with different dimension name
+ suffixes (e.g., "_c" and "_l" for center and left)
+
+Optional requirements
+---------------------
+
+Pretty much everything else is optional. Depending on the method, in the absence
+of a (re)implementation, an index will either raise a `NotImplementedError`
+or won't do anything specific (just drop, pass or copy itself
+from/to the resulting Dataset or DataArray).
+
+For example, you can just skip re-implementing :py:meth:`Index.rename` if there
+is no internal attribute or object to rename according to the new desired
+coordinate or dimension names. In the case of ``PandasIndex``, we rename the
+underlying ``pandas.Index`` object and/or update the ``PandasIndex.dim``
+attribute since the associated dimension name has been changed.
+
+Wrap index data as coordinate data
+----------------------------------
+
+In some cases it is possible to reuse the index's underlying object or structure
+as coordinate data and hence avoid data duplication.
+
+For ``PandasIndex`` and ``PandasMultiIndex``, we
+leverage the fact that ``pandas.Index`` objects expose some array-like API. In
+Xarray we use some wrappers around those underlying objects as a thin
+compatibility layer to preserve dtypes, handle explicit and n-dimensional
+indexing, etc.
+
+Other structures like tree-based indexes (e.g., kd-tree) may differ too much
+from arrays to reuse it as coordinate data.
+
+If the index data can be reused as coordinate data, the ``Index`` subclass
+should implement :py:meth:`Index.create_variables`. This method accepts a
+dictionary of variable names as keys and :py:class:`Variable` objects as values (used for propagating
+variable metadata) and should return a dictionary of new :py:class:`Variable` or
+:py:class:`IndexVariable` objects.
+
+Data selection
+--------------
+
+For an index to support label-based selection, it needs to at least implement
+:py:meth:`Index.sel`. This method accepts a dictionary of labels where the keys
+are coordinate names (already filtered for the current index) and the values can
+be pretty much anything (e.g., a slice, a tuple, a list, a numpy array, a
+:py:class:`Variable` or a :py:class:`DataArray`). It is the responsibility of
+the index to properly handle those input labels.
+
+:py:meth:`Index.sel` must return an instance of :py:class:`IndexSelResult`. The
+latter is a small data class that holds positional indexers (indices) and that
+may also hold new variables, new indexes, names of variables or indexes to drop,
+names of dimensions to rename, etc. For example, this is useful in the case of
+``PandasMultiIndex`` as it allows Xarray to convert it into a single ``PandasIndex``
+when only one level remains after the selection.
+
+The :py:class:`IndexSelResult` class is also used to merge results from label-based
+selection performed by different indexes. Note that it is now possible to have
+two distinct indexes for two 1-d coordinates sharing the same dimension, but it
+is not currently possible to use those two indexes in the same call to
+:py:meth:`Dataset.sel`.
+
+Optionally, the index may also implement :py:meth:`Index.isel`. In the case of
+``PandasIndex`` we use it to create a new index object by just indexing the
+underlying ``pandas.Index`` object. In other cases this may not be possible,
+e.g., a kd-tree object may not be easily indexed. If ``Index.isel()`` is not
+implemented, the index in just dropped in the DataArray or Dataset resulting
+from the selection.
+
+Alignment
+---------
+
+For an index to support alignment, it needs to implement:
+
+- :py:meth:`Index.equals`, which compares the index with another index and
+ returns either ``True`` or ``False``
+- :py:meth:`Index.join`, which combines the index with another index and returns
+ a new Index object
+- :py:meth:`Index.reindex_like`, which queries the index with another index and
+ returns positional indexers that are used to re-index Dataset or DataArray
+ variables along one or more dimensions
+
+Xarray ensures that those three methods are called with an index of the same
+type as argument.
+
+Meta-indexes
+------------
+
+Nothing prevents writing a custom Xarray index that itself encapsulates other
+Xarray index(es). We call such index a "meta-index".
+
+Here is a small example of a meta-index for geospatial, raster datasets (i.e.,
+regularly spaced 2-dimensional data) that internally relies on two
+``PandasIndex`` instances for the x and y dimensions respectively:
+
+.. code-block:: python
+
+ from xarray import Index
+ from xarray.core.indexes import PandasIndex
+ from xarray.core.indexing import merge_sel_results
+
+
+ class RasterIndex(Index):
+ def __init__(self, xy_indexes):
+ assert len(xy_indexes) == 2
+
+ # must have two distinct dimensions
+ dim = [idx.dim for idx in xy_indexes.values()]
+ assert dim[0] != dim[1]
+
+ self._xy_indexes = xy_indexes
+
+ @classmethod
+ def from_variables(cls, variables):
+ assert len(variables) == 2
+
+ xy_indexes = {
+ k: PandasIndex.from_variables({k: v}) for k, v in variables.items()
+ }
+
+ return cls(xy_indexes)
+
+ def create_variables(self, variables):
+ idx_variables = {}
+
+ for index in self._xy_indexes.values():
+ idx_variables.update(index.create_variables(variables))
+
+ return idx_variables
+
+ def sel(self, labels):
+ results = []
+
+ for k, index in self._xy_indexes.items():
+ if k in labels:
+ results.append(index.sel({k: labels[k]}))
+
+ return merge_sel_results(results)
+
+
+This basic index only supports label-based selection. Providing a full-featured
+index by implementing the other ``Index`` methods should be pretty
+straightforward for this example, though.
+
+This example is also not very useful unless we add some extra functionality on
+top of the two encapsulated ``PandasIndex`` objects, such as a coordinate
+reference system.
+
+How to use a custom index
+-------------------------
+
+You can use :py:meth:`Dataset.set_xindex` or :py:meth:`DataArray.set_xindex` to assign a
+custom index to a Dataset or DataArray, e.g., using the ``RasterIndex`` above:
+
+.. code-block:: python
+
+ import numpy as np
+ import xarray as xr
+
+ da = xr.DataArray(
+ np.random.uniform(size=(100, 50)),
+ coords={"x": ("x", np.arange(50)), "y": ("y", np.arange(100))},
+ dims=("y", "x"),
+ )
+
+ # Xarray create default indexes for the 'x' and 'y' coordinates
+ # we first need to explicitly drop it
+ da = da.drop_indexes(["x", "y"])
+
+ # Build a RasterIndex from the 'x' and 'y' coordinates
+ da_raster = da.set_xindex(["x", "y"], RasterIndex)
+
+ # RasterIndex now takes care of label-based selection
+ selected = da_raster.sel(x=10, y=slice(20, 50))
diff --git a/doc/internals/index.rst b/doc/internals/index.rst
index e4ca9779dd7..7e13f0cfe95 100644
--- a/doc/internals/index.rst
+++ b/doc/internals/index.rst
@@ -8,6 +8,12 @@ stack, NumPy and pandas. It is written in pure Python (no C or Cython
extensions), which makes it easy to develop and extend. Instead, we push
compiled code to :ref:`optional dependencies`.
+The pages in this section are intended for:
+
+* Contributors to xarray who wish to better understand some of the internals,
+* Developers who wish to extend xarray with domain-specific logic, perhaps to support a new scientific community of users,
+* Developers who wish to interface xarray with their existing tooling, e.g. by creating a plugin for reading a new file format, or wrapping a custom array type.
+
.. toctree::
:maxdepth: 2
@@ -15,6 +21,8 @@ compiled code to :ref:`optional dependencies`.
variable-objects
duck-arrays-integration
+ chunked-arrays
extending-xarray
zarr-encoding-spec
how-to-add-new-backend
+ how-to-create-custom-index
diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst
index e0fd4bd0d25..64e7b3625ac 100644
--- a/doc/user-guide/data-structures.rst
+++ b/doc/user-guide/data-structures.rst
@@ -19,7 +19,8 @@ DataArray
:py:class:`xarray.DataArray` is xarray's implementation of a labeled,
multi-dimensional array. It has several key properties:
-- ``values``: a :py:class:`numpy.ndarray` holding the array's values
+- ``values``: a :py:class:`numpy.ndarray` or
+ :ref:`numpy-like array ` holding the array's values
- ``dims``: dimension names for each axis (e.g., ``('x', 'y', 'z')``)
- ``coords``: a dict-like container of arrays (*coordinates*) that label each
point (e.g., 1-dimensional arrays of numbers, datetime objects or
@@ -46,7 +47,8 @@ Creating a DataArray
The :py:class:`~xarray.DataArray` constructor takes:
- ``data``: a multi-dimensional array of values (e.g., a numpy ndarray,
- :py:class:`~pandas.Series`, :py:class:`~pandas.DataFrame` or ``pandas.Panel``)
+ a :ref:`numpy-like array `, :py:class:`~pandas.Series`,
+ :py:class:`~pandas.DataFrame` or ``pandas.Panel``)
- ``coords``: a list or dictionary of coordinates. If a list, it should be a
list of tuples where the first element is the dimension name and the second
element is the corresponding coordinate array_like object.
diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst
index 78c7d1e572a..f0650ac61b5 100644
--- a/doc/user-guide/duckarrays.rst
+++ b/doc/user-guide/duckarrays.rst
@@ -1,30 +1,183 @@
.. currentmodule:: xarray
+.. _userguide.duckarrays:
+
Working with numpy-like arrays
==============================
+NumPy-like arrays (often known as :term:`duck array`\s) are drop-in replacements for the :py:class:`numpy.ndarray`
+class but with different features, such as propagating physical units or a different layout in memory.
+Xarray can often wrap these array types, allowing you to use labelled dimensions and indexes whilst benefiting from the
+additional features of these array libraries.
+
+Some numpy-like array types that xarray already has some support for:
+
+* `Cupy `_ - GPU support (see `cupy-xarray `_),
+* `Sparse `_ - for performant arrays with many zero elements,
+* `Pint `_ - for tracking the physical units of your data (see `pint-xarray `_),
+* `Dask `_ - parallel computing on larger-than-memory arrays (see :ref:`using dask with xarray `),
+* `Cubed `_ - another parallel computing framework that emphasises reliability (see `cubed-xarray `_).
+
.. warning::
- This feature should be considered experimental. Please report any bug you may find on
- xarray’s github repository.
+ This feature should be considered somewhat experimental. Please report any bugs you find on
+ `xarray’s issue tracker `_.
+
+.. note::
+
+ For information on wrapping dask arrays see :ref:`dask`. Whilst xarray wraps dask arrays in a similar way to that
+ described on this page, chunked array types like :py:class:`dask.array.Array` implement additional methods that require
+ slightly different user code (e.g. calling ``.chunk`` or ``.compute``). See the docs on :ref:`wrapping chunked arrays `.
+
+Why "duck"?
+-----------
+
+Why is it also called a "duck" array? This comes from a common statement of object-oriented programming -
+"If it walks like a duck, and quacks like a duck, treat it like a duck". In other words, a library like xarray that
+is capable of using multiple different types of arrays does not have to explicitly check that each one it encounters is
+permitted (e.g. ``if dask``, ``if numpy``, ``if sparse`` etc.). Instead xarray can take the more permissive approach of simply
+treating the wrapped array as valid, attempting to call the relevant methods (e.g. ``.mean()``) and only raising an
+error if a problem occurs (e.g. the method is not found on the wrapped class). This is much more flexible, and allows
+objects and classes from different libraries to work together more easily.
+
+What is a numpy-like array?
+---------------------------
+
+A "numpy-like array" (also known as a "duck array") is a class that contains array-like data, and implements key
+numpy-like functionality such as indexing, broadcasting, and computation methods.
+
+For example, the `sparse `_ library provides a sparse array type which is useful for representing nD array objects like sparse matrices
+in a memory-efficient manner. We can create a sparse array object (of the :py:class:`sparse.COO` type) from a numpy array like this:
+
+.. ipython:: python
+
+ from sparse import COO
+
+ x = np.eye(4, dtype=np.uint8) # create diagonal identity matrix
+ s = COO.from_numpy(x)
+ s
-NumPy-like arrays (:term:`duck array`) extend the :py:class:`numpy.ndarray` with
-additional features, like propagating physical units or a different layout in memory.
+This sparse object does not attempt to explicitly store every element in the array, only the non-zero elements.
+This approach is much more efficient for large arrays with only a few non-zero elements (such as tri-diagonal matrices).
+Sparse array objects can be converted back to a "dense" numpy array by calling :py:meth:`sparse.COO.todense`.
-:py:class:`DataArray` and :py:class:`Dataset` objects can wrap these duck arrays, as
-long as they satisfy certain conditions (see :ref:`internals.duck_arrays`).
+Just like :py:class:`numpy.ndarray` objects, :py:class:`sparse.COO` arrays support indexing
+
+.. ipython:: python
+
+ s[1, 1] # diagonal elements should be ones
+ s[2, 3] # off-diagonal elements should be zero
+
+broadcasting,
+
+.. ipython:: python
+
+ x2 = np.zeros(
+ (4, 1), dtype=np.uint8
+ ) # create second sparse array of different shape
+ s2 = COO.from_numpy(x2)
+ (s * s2) # multiplication requires broadcasting
+
+and various computation methods
+
+.. ipython:: python
+
+ s.sum(axis=1)
+
+This numpy-like array also supports calling so-called `numpy ufuncs `_
+("universal functions") on it directly:
+
+.. ipython:: python
+
+ np.sum(s, axis=1)
+
+
+Notice that in each case the API for calling the operation on the sparse array is identical to that of calling it on the
+equivalent numpy array - this is the sense in which the sparse array is "numpy-like".
.. note::
- For ``dask`` support see :ref:`dask`.
+ For discussion on exactly which methods a class needs to implement to be considered "numpy-like", see :ref:`internals.duckarrays`.
+
+Wrapping numpy-like arrays in xarray
+------------------------------------
+
+:py:class:`DataArray`, :py:class:`Dataset`, and :py:class:`Variable` objects can wrap these numpy-like arrays.
+Constructing xarray objects which wrap numpy-like arrays
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Missing features
-----------------
-Most of the API does support :term:`duck array` objects, but there are a few areas where
-the code will still cast to ``numpy`` arrays:
+The primary way to create an xarray object which wraps a numpy-like array is to pass that numpy-like array instance directly
+to the constructor of the xarray class. The :ref:`page on xarray data structures ` shows how :py:class:`DataArray` and :py:class:`Dataset`
+both accept data in various forms through their ``data`` argument, but in fact this data can also be any wrappable numpy-like array.
-- dimension coordinates, and thus all indexing operations:
+For example, we can wrap the sparse array we created earlier inside a new DataArray object:
+
+.. ipython:: python
+
+ s_da = xr.DataArray(s, dims=["i", "j"])
+ s_da
+
+We can see what's inside - the printable representation of our xarray object (the repr) automatically uses the printable
+representation of the underlying wrapped array.
+
+Of course our sparse array object is still there underneath - it's stored under the ``.data`` attribute of the dataarray:
+
+.. ipython:: python
+
+ s_da.data
+
+Array methods
+~~~~~~~~~~~~~
+
+We saw above that numpy-like arrays provide numpy methods. Xarray automatically uses these when you call the corresponding xarray method:
+
+.. ipython:: python
+
+ s_da.sum(dim="j")
+
+Converting wrapped types
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you want to change the type inside your xarray object you can use :py:meth:`DataArray.as_numpy`:
+
+.. ipython:: python
+
+ s_da.as_numpy()
+
+This returns a new :py:class:`DataArray` object, but now wrapping a normal numpy array.
+
+If instead you want to convert to numpy and return that numpy array you can use either :py:meth:`DataArray.to_numpy` or
+:py:meth:`DataArray.values`, where the former is strongly preferred. The difference is in the way they coerce to numpy - :py:meth:`~DataArray.values`
+always uses :py:func:`numpy.asarray` which will fail for some array types (e.g. ``cupy``), whereas :py:meth:`~DataArray.to_numpy`
+uses the correct method depending on the array type.
+
+.. ipython:: python
+
+ s_da.to_numpy()
+
+.. ipython:: python
+ :okexcept:
+
+ s_da.values
+
+This illustrates the difference between :py:meth:`~DataArray.data` and :py:meth:`~DataArray.values`,
+which is sometimes a point of confusion for new xarray users.
+Explicitly: :py:meth:`DataArray.data` returns the underlying numpy-like array, regardless of type, whereas
+:py:meth:`DataArray.values` converts the underlying array to a numpy array before returning it.
+(This is another reason to use :py:meth:`~DataArray.to_numpy` over :py:meth:`~DataArray.values` - the intention is clearer.)
+
+Conversion to numpy as a fallback
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If a wrapped array does not implement the corresponding array method then xarray will often attempt to convert the
+underlying array to a numpy array so that the operation can be performed. You may want to watch out for this behavior,
+and report any instances in which it causes problems.
+
+Most of xarray's API does support using :term:`duck array` objects, but there are a few areas where
+the code will still convert to ``numpy`` arrays:
+
+- Dimension coordinates, and thus all indexing operations:
* :py:meth:`Dataset.sel` and :py:meth:`DataArray.sel`
* :py:meth:`Dataset.loc` and :py:meth:`DataArray.loc`
@@ -33,7 +186,7 @@ the code will still cast to ``numpy`` arrays:
:py:meth:`DataArray.reindex` and :py:meth:`DataArray.reindex_like`: duck arrays in
data variables and non-dimension coordinates won't be casted
-- functions and methods that depend on external libraries or features of ``numpy`` not
+- Functions and methods that depend on external libraries or features of ``numpy`` not
covered by ``__array_function__`` / ``__array_ufunc__``:
* :py:meth:`Dataset.ffill` and :py:meth:`DataArray.ffill` (uses ``bottleneck``)
@@ -49,17 +202,25 @@ the code will still cast to ``numpy`` arrays:
:py:class:`numpy.vectorize`)
* :py:func:`apply_ufunc` with ``vectorize=True`` (uses :py:class:`numpy.vectorize`)
-- incompatibilities between different :term:`duck array` libraries:
+- Incompatibilities between different :term:`duck array` libraries:
* :py:meth:`Dataset.chunk` and :py:meth:`DataArray.chunk`: this fails if the data was
not already chunked and the :term:`duck array` (e.g. a ``pint`` quantity) should
- wrap the new ``dask`` array; changing the chunk sizes works.
-
+ wrap the new ``dask`` array; changing the chunk sizes works however.
Extensions using duck arrays
----------------------------
-Here's a list of libraries extending ``xarray`` to make working with wrapped duck arrays
-easier:
+
+Whilst the features above allow many numpy-like array libraries to be used pretty seamlessly with xarray, it often also
+makes sense to use an interfacing package to make certain tasks easier.
+
+For example the `pint-xarray package `_ offers a custom ``.pint`` accessor (see :ref:`internals.accessors`) which provides
+convenient access to information stored within the wrapped array (e.g. ``.units`` and ``.magnitude``), and makes makes
+creating wrapped pint arrays (and especially xarray-wrapping-pint-wrapping-dask arrays) simpler for the user.
+
+We maintain a list of libraries extending ``xarray`` to make working with particular wrapped duck arrays
+easier. If you know of more that aren't on this list please raise an issue to add them!
- `pint-xarray `_
- `cupy-xarray `_
+- `cubed-xarray `_
diff --git a/doc/user-guide/io.rst b/doc/user-guide/io.rst
index dc495b9f285..c0e88634705 100644
--- a/doc/user-guide/io.rst
+++ b/doc/user-guide/io.rst
@@ -559,6 +559,67 @@ and currently raises a warning unless ``invalid_netcdf=True`` is set:
Note that this produces a file that is likely to be not readable by other netCDF
libraries!
+.. _io.hdf5:
+
+HDF5
+----
+`HDF5`_ is both a file format and a data model for storing information. HDF5 stores
+data hierarchically, using groups to create a nested structure. HDF5 is a more
+general verion of the netCDF4 data model, so the nested structure is one of many
+similarities between the two data formats.
+
+Reading HDF5 files in xarray requires the ``h5netcdf`` engine, which can be installed
+with ``conda install h5netcdf``. Once installed we can use xarray to open HDF5 files:
+
+.. code:: python
+
+ xr.open_dataset("/path/to/my/file.h5")
+
+The similarities between HDF5 and netCDF4 mean that HDF5 data can be written with the
+same :py:meth:`Dataset.to_netcdf` method as used for netCDF4 data:
+
+.. ipython:: python
+
+ ds = xr.Dataset(
+ {"foo": (("x", "y"), np.random.rand(4, 5))},
+ coords={
+ "x": [10, 20, 30, 40],
+ "y": pd.date_range("2000-01-01", periods=5),
+ "z": ("x", list("abcd")),
+ },
+ )
+
+ ds.to_netcdf("saved_on_disk.h5")
+
+Groups
+~~~~~~
+
+If you have multiple or highly nested groups, xarray by default may not read the group
+that you want. A particular group of an HDF5 file can be specified using the ``group``
+argument:
+
+.. code:: python
+
+ xr.open_dataset("/path/to/my/file.h5", group="/my/group")
+
+While xarray cannot interrogate an HDF5 file to determine which groups are available,
+the HDF5 Python reader `h5py`_ can be used instead.
+
+Natively the xarray data structures can only handle one level of nesting, organized as
+DataArrays inside of Datasets. If your HDF5 file has additional levels of hierarchy you
+can only access one group and a time and will need to specify group names.
+
+.. note::
+
+ For native handling of multiple HDF5 groups with xarray, including I/O, you might be
+ interested in the experimental
+ `xarray-datatree `_ package.
+
+
+.. _HDF5: https://hdfgroup.github.io/hdf5/index.html
+.. _h5py: https://www.h5py.org/
+
+
.. _io.zarr:
Zarr
diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst
index 24e6ab69927..d99312643aa 100644
--- a/doc/user-guide/terminology.rst
+++ b/doc/user-guide/terminology.rst
@@ -54,23 +54,22 @@ complete examples, please consult the relevant documentation.*
Coordinate
An array that labels a dimension or set of dimensions of another
``DataArray``. In the usual one-dimensional case, the coordinate array's
- values can loosely be thought of as tick labels along a dimension. There
- are two types of coordinate arrays: *dimension coordinates* and
- *non-dimension coordinates* (see below). A coordinate named ``x`` can be
- retrieved from ``arr.coords[x]``. A ``DataArray`` can have more
- coordinates than dimensions because a single dimension can be labeled by
- multiple coordinate arrays. However, only one coordinate array can be a
- assigned as a particular dimension's dimension coordinate array. As a
+ values can loosely be thought of as tick labels along a dimension. We
+ distinguish :term:`Dimension coordinate` vs. :term:`Non-dimension
+ coordinate` and :term:`Indexed coordinate` vs. :term:`Non-indexed
+ coordinate`. A coordinate named ``x`` can be retrieved from
+ ``arr.coords[x]``. A ``DataArray`` can have more coordinates than
+ dimensions because a single dimension can be labeled by multiple
+ coordinate arrays. However, only one coordinate array can be a assigned
+ as a particular dimension's dimension coordinate array. As a
consequence, ``len(arr.dims) <= len(arr.coords)`` in general.
Dimension coordinate
A one-dimensional coordinate array assigned to ``arr`` with both a name
- and dimension name in ``arr.dims``. Dimension coordinates are used for
- label-based indexing and alignment, like the index found on a
- :py:class:`pandas.DataFrame` or :py:class:`pandas.Series`. In fact,
- dimension coordinates use :py:class:`pandas.Index` objects under the
- hood for efficient computation. Dimension coordinates are marked by
- ``*`` when printing a ``DataArray`` or ``Dataset``.
+ and dimension name in ``arr.dims``. Usually (but not always), a
+ dimension coordinate is also an :term:`Indexed coordinate` so that it can
+ be used for label-based indexing and alignment, like the index found on
+ a :py:class:`pandas.DataFrame` or :py:class:`pandas.Series`.
Non-dimension coordinate
A coordinate array assigned to ``arr`` with a name in ``arr.coords`` but
@@ -79,20 +78,40 @@ complete examples, please consult the relevant documentation.*
example, multidimensional coordinates are often used in geoscience
datasets when :doc:`the data's physical coordinates (such as latitude
and longitude) differ from their logical coordinates
- <../examples/multidimensional-coords>`. However, non-dimension coordinates
- are not indexed, and any operation on non-dimension coordinates that
- leverages indexing will fail. Printing ``arr.coords`` will print all of
- ``arr``'s coordinate names, with the corresponding dimension(s) in
- parentheses. For example, ``coord_name (dim_name) 1 2 3 ...``.
+ <../examples/multidimensional-coords>`. Printing ``arr.coords`` will
+ print all of ``arr``'s coordinate names, with the corresponding
+ dimension(s) in parentheses. For example, ``coord_name (dim_name) 1 2 3
+ ...``.
+
+ Indexed coordinate
+ A coordinate which has an associated :term:`Index`. Generally this means
+ that the coordinate labels can be used for indexing (selection) and/or
+ alignment. An indexed coordinate may have one or more arbitrary
+ dimensions although in most cases it is also a :term:`Dimension
+ coordinate`. It may or may not be grouped with other indexed coordinates
+ depending on whether they share the same index. Indexed coordinates are
+ marked by ``*`` when printing a ``DataArray`` or ``Dataset``.
+
+ Non-indexed coordinate
+ A coordinate which has no associated :term:`Index`. It may still
+ represent fixed labels along one or more dimensions but it cannot be
+ used for label-based indexing and alignment.
Index
- An *index* is a data structure optimized for efficient selecting and
- slicing of an associated array. Xarray creates indexes for dimension
- coordinates so that operations along dimensions are fast, while
- non-dimension coordinates are not indexed. Under the hood, indexes are
- implemented as :py:class:`pandas.Index` objects. The index associated
- with dimension name ``x`` can be retrieved by ``arr.indexes[x]``. By
- construction, ``len(arr.dims) == len(arr.indexes)``
+ An *index* is a data structure optimized for efficient data selection
+ and alignment within a discrete or continuous space that is defined by
+ coordinate labels (unless it is a functional index). By default, Xarray
+ creates a :py:class:`~xarray.indexes.PandasIndex` object (i.e., a
+ :py:class:`pandas.Index` wrapper) for each :term:`Dimension coordinate`.
+ For more advanced use cases (e.g., staggered or irregular grids,
+ geospatial indexes), Xarray also accepts any instance of a specialized
+ :py:class:`~xarray.indexes.Index` subclass that is associated to one or
+ more arbitrary coordinates. The index associated with the coordinate
+ ``x`` can be retrieved by ``arr.xindexes[x]`` (or ``arr.indexes["x"]``
+ if the index is convertible to a :py:class:`pandas.Index` object). If
+ two coordinates ``x`` and ``y`` share the same index,
+ ``arr.xindexes[x]`` and ``arr.xindexes[y]`` both return the same
+ :py:class:`~xarray.indexes.Index` object.
name
The names of dimensions, coordinates, DataArray objects and data
@@ -112,3 +131,128 @@ complete examples, please consult the relevant documentation.*
``__array_ufunc__`` and ``__array_function__`` protocols are also required.
__ https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
+
+ .. ipython:: python
+ :suppress:
+
+ import numpy as np
+ import xarray as xr
+
+ Aligning
+ Aligning refers to the process of ensuring that two or more DataArrays or Datasets
+ have the same dimensions and coordinates, so that they can be combined or compared properly.
+
+ .. ipython:: python
+
+ x = xr.DataArray(
+ [[25, 35], [10, 24]],
+ dims=("lat", "lon"),
+ coords={"lat": [35.0, 40.0], "lon": [100.0, 120.0]},
+ )
+ y = xr.DataArray(
+ [[20, 5], [7, 13]],
+ dims=("lat", "lon"),
+ coords={"lat": [35.0, 42.0], "lon": [100.0, 120.0]},
+ )
+ x
+ y
+
+ Broadcasting
+ A technique that allows operations to be performed on arrays with different shapes and dimensions.
+ When performing operations on arrays with different shapes and dimensions, xarray will automatically attempt to broadcast the
+ arrays to a common shape before the operation is applied.
+
+ .. ipython:: python
+
+ # 'a' has shape (3,) and 'b' has shape (4,)
+ a = xr.DataArray(np.array([1, 2, 3]), dims=["x"])
+ b = xr.DataArray(np.array([4, 5, 6, 7]), dims=["y"])
+
+ # 2D array with shape (3, 4)
+ a + b
+
+ Merging
+ Merging is used to combine two or more Datasets or DataArrays that have different variables or coordinates along
+ the same dimensions. When merging, xarray aligns the variables and coordinates of the different datasets along
+ the specified dimensions and creates a new ``Dataset`` containing all the variables and coordinates.
+
+ .. ipython:: python
+
+ # create two 1D arrays with names
+ arr1 = xr.DataArray(
+ [1, 2, 3], dims=["x"], coords={"x": [10, 20, 30]}, name="arr1"
+ )
+ arr2 = xr.DataArray(
+ [4, 5, 6], dims=["x"], coords={"x": [20, 30, 40]}, name="arr2"
+ )
+
+ # merge the two arrays into a new dataset
+ merged_ds = xr.Dataset({"arr1": arr1, "arr2": arr2})
+ merged_ds
+
+ Concatenating
+ Concatenating is used to combine two or more Datasets or DataArrays along a dimension. When concatenating,
+ xarray arranges the datasets or dataarrays along a new dimension, and the resulting ``Dataset`` or ``Dataarray``
+ will have the same variables and coordinates along the other dimensions.
+
+ .. ipython:: python
+
+ a = xr.DataArray([[1, 2], [3, 4]], dims=("x", "y"))
+ b = xr.DataArray([[5, 6], [7, 8]], dims=("x", "y"))
+ c = xr.concat([a, b], dim="c")
+ c
+
+ Combining
+ Combining is the process of arranging two or more DataArrays or Datasets into a single ``DataArray`` or
+ ``Dataset`` using some combination of merging and concatenation operations.
+
+ .. ipython:: python
+
+ ds1 = xr.Dataset(
+ {"data": xr.DataArray([[1, 2], [3, 4]], dims=("x", "y"))},
+ coords={"x": [1, 2], "y": [3, 4]},
+ )
+ ds2 = xr.Dataset(
+ {"data": xr.DataArray([[5, 6], [7, 8]], dims=("x", "y"))},
+ coords={"x": [2, 3], "y": [4, 5]},
+ )
+
+ # combine the datasets
+ combined_ds = xr.combine_by_coords([ds1, ds2])
+ combined_ds
+
+ lazy
+ Lazily-evaluated operations do not load data into memory until necessary.Instead of doing calculations
+ right away, xarray lets you plan what calculations you want to do, like finding the
+ average temperature in a dataset.This planning is called "lazy evaluation." Later, when
+ you're ready to see the final result, you tell xarray, "Okay, go ahead and do those calculations now!"
+ That's when xarray starts working through the steps you planned and gives you the answer you wanted.This
+ lazy approach helps save time and memory because xarray only does the work when you actually need the
+ results.
+
+ labeled
+ Labeled data has metadata describing the context of the data, not just the raw data values.
+ This contextual information can be labels for array axes (i.e. dimension names) tick labels along axes (stored as Coordinate variables) or unique names for each array. These labels
+ provide context and meaning to the data, making it easier to understand and work with. If you have
+ temperature data for different cities over time. Using xarray, you can label the dimensions: one for
+ cities and another for time.
+
+ serialization
+ Serialization is the process of converting your data into a format that makes it easy to save and share.
+ When you serialize data in xarray, you're taking all those temperature measurements, along with their
+ labels and other information, and turning them into a format that can be stored in a file or sent over
+ the internet. xarray objects can be serialized into formats which store the labels alongside the data.
+ Some supported serialization formats are files that can then be stored or transferred (e.g. netCDF),
+ whilst others are protocols that allow for data access over a network (e.g. Zarr).
+
+ indexing
+ :ref:`Indexing` is how you select subsets of your data which you are interested in.
+
+ - Label-based Indexing: Selecting data by passing a specific label and comparing it to the labels
+ stored in the associated coordinates. You can use labels to specify what you want like "Give me the
+ temperature for New York on July 15th."
+
+ - Positional Indexing: You can use numbers to refer to positions in the data like "Give me the third temperature value" This is useful when you know the order of your data but don't need to remember the exact labels.
+
+ - Slicing: You can take a "slice" of your data, like you might want all temperatures from July 1st
+ to July 10th. xarray supports slicing for both positional and label-based indexing.
diff --git a/doc/whats-new.rst b/doc/whats-new.rst
index b94a8a2faa5..8e119361ba1 100644
--- a/doc/whats-new.rst
+++ b/doc/whats-new.rst
@@ -14,11 +14,228 @@ What's New
np.random.seed(123456)
+.. _whats-new.2023.08.0:
+
+v2023.08.0 (Aug 18, 2023)
+-------------------------
+
+This release brings changes to minimum dependencies, allows reading of datasets where a dimension name is
+associated with a multidimensional variable (e.g. finite volume ocean model output), and introduces
+a new :py:class:`xarray.Coordinates` object.
+
+Thanks to the 16 contributors to this release: Anderson Banihirwe, Articoking, Benoit Bovy, Deepak Cherian, Harshitha, Ian Carroll,
+Joe Hamman, Justus Magin, Peter Hill, Rachel Wegener, Riley Kuttruff, Thomas Nicholas, Tom Nicholas, ilgast, quantsnus, vallirep
+
+Announcements
+~~~~~~~~~~~~~
+
+The :py:class:`xarray.Variable` class is being refactored out to a new project title 'namedarray'.
+See the `design doc `_ for more
+details. Reach out to us on this [discussion topic](https://github.com/pydata/xarray/discussions/8080) if you have any thoughts.
+
+New Features
+~~~~~~~~~~~~
+
+- :py:class:`Coordinates` can now be constructed independently of any Dataset or
+ DataArray (it is also returned by the :py:attr:`Dataset.coords` and
+ :py:attr:`DataArray.coords` properties). ``Coordinates`` objects are useful for
+ passing both coordinate variables and indexes to new Dataset / DataArray objects,
+ e.g., via their constructor or via :py:meth:`Dataset.assign_coords`. We may also
+ wrap coordinate variables in a ``Coordinates`` object in order to skip
+ the automatic creation of (pandas) indexes for dimension coordinates.
+ The :py:class:`Coordinates.from_pandas_multiindex` constructor may be used to
+ create coordinates directly from a :py:class:`pandas.MultiIndex` object (it is
+ preferred over passing it directly as coordinate data, which may be deprecated soon).
+ Like Dataset and DataArray objects, ``Coordinates`` objects may now be used in
+ :py:func:`align` and :py:func:`merge`.
+ (:issue:`6392`, :pull:`7368`).
+ By `Benoît Bovy `_.
+- Visually group together coordinates with the same indexes in the index section of the text repr (:pull:`7225`).
+ By `Justus Magin `_.
+- Allow creating Xarray objects where a multidimensional variable shares its name
+ with a dimension. Examples include output from finite volume models like FVCOM.
+ (:issue:`2233`, :pull:`7989`)
+ By `Deepak Cherian `_ and `Benoit Bovy `_.
+- When outputting :py:class:`Dataset` objects as Zarr via :py:meth:`Dataset.to_zarr`,
+ user can now specify that chunks that will contain no valid data will not be written.
+ Originally, this could be done by specifying ``"write_empty_chunks": True`` in the
+ ``encoding`` parameter; however, this setting would not carry over when appending new
+ data to an existing dataset. (:issue:`8009`) Requires ``zarr>=2.11``.
+
+
+Breaking changes
+~~~~~~~~~~~~~~~~
+
+- The minimum versions of some dependencies were changed (:pull:`8022`):
+
+ ===================== ========= ========
+ Package Old New
+ ===================== ========= ========
+ boto3 1.20 1.24
+ cftime 1.5 1.6
+ dask-core 2022.1 2022.7
+ distributed 2022.1 2022.7
+ hfnetcdf 0.13 1.0
+ iris 3.1 3.2
+ lxml 4.7 4.9
+ netcdf4 1.5.7 1.6.0
+ numpy 1.21 1.22
+ pint 0.18 0.19
+ pydap 3.2 3.3
+ rasterio 1.2 1.3
+ scipy 1.7 1.8
+ toolz 0.11 0.12
+ typing_extensions 4.0 4.3
+ zarr 2.10 2.12
+ numbagg 0.1 0.2.1
+ ===================== ========= ========
+
+
+Documentation
+~~~~~~~~~~~~~
+
+- Added examples to docstrings of :py:meth:`Dataset.assign_attrs`, :py:meth:`Dataset.broadcast_equals`,
+ :py:meth:`Dataset.equals`, :py:meth:`Dataset.identical`, :py:meth:`Dataset.expand_dims`,:py:meth:`Dataset.drop_vars`
+ (:issue:`6793`, :pull:`7937`) By `Harshitha `_.
+- Add docstrings for the :py:class:`Index` base class and add some documentation on how to
+ create custom, Xarray-compatible indexes (:pull:`6975`)
+ By `Benoît Bovy `_.
+- Added a page clarifying the role of Xarray core team members.
+ (:pull:`7999`) By `Tom Nicholas `_.
+- Fixed broken links in "See also" section of :py:meth:`Dataset.count` (:issue:`8055`, :pull:`8057`)
+ By `Articoking `_.
+- Extended the glossary by adding terms Aligning, Broadcasting, Merging, Concatenating, Combining, lazy,
+ labeled, serialization, indexing (:issue:`3355`, :pull:`7732`)
+ By `Harshitha `_.
+
+Internal Changes
+~~~~~~~~~~~~~~~~
+
+- :py:func:`as_variable` now consistently includes the variable name in any exceptions
+ raised. (:pull:`7995`). By `Peter Hill `_
+- :py:func:`encode_dataset_coordinates` now sorts coordinates automatically assigned to
+ `coordinates` attributes during serialization (:issue:`8026`, :pull:`8034`).
+ `By Ian Carroll `_.
+
+.. _whats-new.2023.07.0:
+
+v2023.07.0 (July 17, 2023)
+--------------------------
+
+This release brings improvements to the documentation on wrapping numpy-like arrays, improved docstrings, and bug fixes.
+
+Deprecations
+~~~~~~~~~~~~
+
+- `hue_style` is being deprecated for scatter plots. (:issue:`7907`, :pull:`7925`).
+ By `Jimmy Westling `_.
+
+Bug fixes
+~~~~~~~~~
+
+- Ensure no forward slashes in variable and dimension names for HDF5-based engines.
+ (:issue:`7943`, :pull:`7953`) By `Kai Mühlbauer `_.
+
+Documentation
+~~~~~~~~~~~~~
+
+- Added examples to docstrings of :py:meth:`Dataset.assign_attrs`, :py:meth:`Dataset.broadcast_equals`,
+ :py:meth:`Dataset.equals`, :py:meth:`Dataset.identical`, :py:meth:`Dataset.expand_dims`,:py:meth:`Dataset.drop_vars`
+ (:issue:`6793`, :pull:`7937`) By `Harshitha `_.
+- Added page on wrapping chunked numpy-like arrays as alternatives to dask arrays.
+ (:pull:`7951`) By `Tom Nicholas `_.
+- Expanded the page on wrapping numpy-like "duck" arrays.
+ (:pull:`7911`) By `Tom Nicholas `_.
+- Added examples to docstrings of :py:meth:`Dataset.isel`, :py:meth:`Dataset.reduce`, :py:meth:`Dataset.argmin`,
+ :py:meth:`Dataset.argmax` (:issue:`6793`, :pull:`7881`)
+ By `Harshitha `_ .
+
+Internal Changes
+~~~~~~~~~~~~~~~~
+
+- Allow chunked non-dask arrays (i.e. Cubed arrays) in groupby operations. (:pull:`7941`)
+ By `Tom Nicholas `_.
+
+
+.. _whats-new.2023.06.0:
+
+v2023.06.0 (June 21, 2023)
+--------------------------
+
+This release adds features to ``curvefit``, improves the performance of concatenation, and fixes various bugs.
+
+Thank to our 13 contributors to this release:
+Anderson Banihirwe, Deepak Cherian, dependabot[bot], Illviljan, Juniper Tyree, Justus Magin, Martin Fleischmann,
+Mattia Almansi, mgunyho, Rutger van Haasteren, Thomas Nicholas, Tom Nicholas, Tom White.
+
+
+New Features
+~~~~~~~~~~~~
+
+- Added support for multidimensional initial guess and bounds in :py:meth:`DataArray.curvefit` (:issue:`7768`, :pull:`7821`).
+ By `András Gunyhó `_.
+- Add an ``errors`` option to :py:meth:`Dataset.curve_fit` that allows
+ returning NaN for the parameters and covariances of failed fits, rather than
+ failing the whole series of fits (:issue:`6317`, :pull:`7891`).
+ By `Dominik Stańczak `_ and `András Gunyhó `_.
+
+Breaking changes
+~~~~~~~~~~~~~~~~
+
+
+Deprecations
+~~~~~~~~~~~~
+- Deprecate the `cdms2 `_ conversion methods (:pull:`7876`)
+ By `Justus Magin `_.
+
+Performance
+~~~~~~~~~~~
+- Improve concatenation performance (:issue:`7833`, :pull:`7824`).
+ By `Jimmy Westling `_.
+
+Bug fixes
+~~~~~~~~~
+- Fix bug where weighted ``polyfit`` were changing the original object (:issue:`5644`, :pull:`7900`).
+ By `Mattia Almansi `_.
+- Don't call ``CachingFileManager.__del__`` on interpreter shutdown (:issue:`7814`, :pull:`7880`).
+ By `Justus Magin `_.
+- Preserve vlen dtype for empty string arrays (:issue:`7328`, :pull:`7862`).
+ By `Tom White `_ and `Kai Mühlbauer `_.
+- Ensure dtype of reindex result matches dtype of the original DataArray (:issue:`7299`, :pull:`7917`)
+ By `Anderson Banihirwe `_.
+- Fix bug where a zero-length zarr ``chunk_store`` was ignored as if it was ``None`` (:pull:`7923`)
+ By `Juniper Tyree `_.
+
+Documentation
+~~~~~~~~~~~~~
+
+Internal Changes
+~~~~~~~~~~~~~~~~
+
+- Minor improvements to support of the python `array api standard `_,
+ internally using the function ``xp.astype()`` instead of the method ``arr.astype()``, as the latter is not in the standard.
+ (:pull:`7847`) By `Tom Nicholas `_.
+- Xarray now uploads nightly wheels to https://pypi.anaconda.org/scientific-python-nightly-wheels/simple/ (:issue:`7863`, :pull:`7865`).
+ By `Martin Fleischmann `_.
+- Stop uploading development wheels to TestPyPI (:pull:`7889`)
+ By `Justus Magin `_.
+- Added an exception catch for ``AttributeError`` along with ``ImportError`` when duck typing the dynamic imports in pycompat.py. This catches some name collisions between packages. (:issue:`7870`, :pull:`7874`)
.. _whats-new.2023.05.0:
-v2023.05.0 (unreleased)
------------------------
+v2023.05.0 (May 18, 2023)
+-------------------------
+
+This release adds some new methods and operators, updates our deprecation policy for python versions, fixes some bugs with groupby,
+and introduces experimental support for alternative chunked parallel array computation backends via a new plugin system!
+
+**Note:** If you are using a locally-installed development version of xarray then pulling the changes from this release may require you to re-install.
+This avoids an error where xarray cannot detect dask via the new entrypoints system introduced in :pull:`7019`. See :issue:`7856` for details.
+
+Thanks to our 14 contributors:
+Alan Brammer, crusaderky, David Stansby, dcherian, Deeksha, Deepak Cherian, Illviljan, James McCreight,
+Joe Hamman, Justus Magin, Kyle Sunden, Max Hollmann, mgunyho, and Tom Nicholas
+
New Features
~~~~~~~~~~~~
@@ -27,28 +244,37 @@ New Features
- Add support for lshift and rshift binary operators (``<<``, ``>>``) on
:py:class:`xr.DataArray` of type :py:class:`int` (:issue:`7727` , :pull:`7741`).
By `Alan Brammer `_.
-
+- Keyword argument `data='array'` to both :py:meth:`xarray.Dataset.to_dict` and
+ :py:meth:`xarray.DataArray.to_dict` will now return data as the underlying array type.
+ Python lists are returned for `data='list'` or `data=True`. Supplying `data=False` only returns the schema without data.
+ ``encoding=True`` returns the encoding dictionary for the underlying variable also. (:issue:`1599`, :pull:`7739`) .
+ By `James McCreight `_.
Breaking changes
~~~~~~~~~~~~~~~~
- adjust the deprecation policy for python to once again align with NEP-29 (:issue:`7765`, :pull:`7793`)
By `Justus Magin `_.
-Deprecations
-~~~~~~~~~~~~
-
+Performance
+~~~~~~~~~~~
+- Optimize ``.dt `` accessor performance with ``CFTimeIndex``. (:pull:`7796`)
+ By `Deepak Cherian `_.
Bug fixes
~~~~~~~~~
+- Fix `as_compatible_data` for masked float arrays, now always creates a copy when mask is present (:issue:`2377`, :pull:`7788`).
+ By `Max Hollmann `_.
- Fix groupby binary ops when grouped array is subset relative to other. (:issue:`7797`).
By `Deepak Cherian `_.
-
-Documentation
-~~~~~~~~~~~~~
-
+- Fix groupby sum, prod for all-NaN groups with ``flox``. (:issue:`7808`).
+ By `Deepak Cherian `_.
Internal Changes
~~~~~~~~~~~~~~~~
+- Experimental support for wrapping chunked array libraries other than dask.
+ A new ABC is defined - :py:class:`xr.core.parallelcompat.ChunkManagerEntrypoint` - which can be subclassed and then
+ registered by alternative chunked array implementations. (:issue:`6807`, :pull:`7019`)
+ By `Tom Nicholas `_.
.. _whats-new.2023.04.2:
@@ -109,10 +335,6 @@ New Features
- Added ability to save ``DataArray`` objects directly to Zarr using :py:meth:`~xarray.DataArray.to_zarr`.
(:issue:`7692`, :pull:`7693`) .
By `Joe Hamman `_.
-- Keyword argument `data='array'` to both :py:meth:`xarray.Dataset.to_dict` and
- :py:meth:`xarray.DataArray.to_dict` will now return data as the underlying array type. Python lists are returned for `data='list'` or `data=True`. Supplying `data=False` only returns the schema without data. ``encoding=True`` returns the encoding dictionary for the underlying variable also.
- (:issue:`1599`, :pull:`7739`) .
- By `James McCreight `_.
Breaking changes
~~~~~~~~~~~~~~~~
@@ -645,6 +867,7 @@ Bug fixes
Documentation
~~~~~~~~~~~~~
+
- Update merge docstrings. (:issue:`6935`, :pull:`7033`)
By `Zach Moon `_.
- Raise a more informative error when trying to open a non-existent zarr store. (:issue:`6484`, :pull:`7060`)
diff --git a/pyproject.toml b/pyproject.toml
index 88b34d002d5..4d63fd564ba 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -39,6 +39,7 @@ module = [
"cf_units.*",
"cfgrib.*",
"cftime.*",
+ "cubed.*",
"cupy.*",
"fsspec.*",
"h5netcdf.*",
diff --git a/setup.cfg b/setup.cfg
index 81b7f1c4a0e..85ac8e259e5 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -132,6 +132,10 @@ xarray =
static/css/*
static/html/*
+[options.entry_points]
+xarray.chunkmanagers =
+ dask = xarray.core.daskmanager:DaskManager
+
[tool:pytest]
python_files = test_*.py
testpaths = xarray/tests properties
diff --git a/xarray/__init__.py b/xarray/__init__.py
index 75a58053663..830bc254a71 100644
--- a/xarray/__init__.py
+++ b/xarray/__init__.py
@@ -26,16 +26,19 @@
where,
)
from xarray.core.concat import concat
+from xarray.core.coordinates import Coordinates
from xarray.core.dataarray import DataArray
from xarray.core.dataset import Dataset
from xarray.core.extensions import (
register_dataarray_accessor,
register_dataset_accessor,
)
+from xarray.core.indexes import Index
+from xarray.core.indexing import IndexSelResult
from xarray.core.merge import Context, MergeError, merge
from xarray.core.options import get_options, set_options
from xarray.core.parallel import map_blocks
-from xarray.core.variable import Coordinate, IndexVariable, Variable, as_variable
+from xarray.core.variable import IndexVariable, Variable, as_variable
from xarray.util.print_versions import show_versions
try:
@@ -98,8 +101,11 @@
"CFTimeIndex",
"Context",
"Coordinate",
+ "Coordinates",
"DataArray",
"Dataset",
+ "Index",
+ "IndexSelResult",
"IndexVariable",
"Variable",
# Exceptions
diff --git a/xarray/backends/api.py b/xarray/backends/api.py
index e5adedbb576..e35d85a1e2f 100644
--- a/xarray/backends/api.py
+++ b/xarray/backends/api.py
@@ -3,16 +3,29 @@
import os
from collections.abc import Hashable, Iterable, Mapping, MutableMapping, Sequence
from functools import partial
-from glob import glob
from io import BytesIO
from numbers import Number
-from typing import TYPE_CHECKING, Any, Callable, Final, Literal, Union, cast, overload
+from typing import (
+ TYPE_CHECKING,
+ Any,
+ Callable,
+ Final,
+ Literal,
+ Union,
+ cast,
+ overload,
+)
import numpy as np
from xarray import backends, conventions
from xarray.backends import plugins
-from xarray.backends.common import AbstractDataStore, ArrayWriter, _normalize_path
+from xarray.backends.common import (
+ AbstractDataStore,
+ ArrayWriter,
+ _find_absolute_paths,
+ _normalize_path,
+)
from xarray.backends.locks import _get_scheduler
from xarray.core import indexing
from xarray.core.combine import (
@@ -20,9 +33,11 @@
_nested_combine,
combine_by_coords,
)
+from xarray.core.daskmanager import DaskManager
from xarray.core.dataarray import DataArray
from xarray.core.dataset import Dataset, _get_chunk, _maybe_chunk
from xarray.core.indexes import Index
+from xarray.core.parallelcompat import guess_chunkmanager
from xarray.core.utils import is_remote_uri
if TYPE_CHECKING:
@@ -38,6 +53,7 @@
CompatOptions,
JoinOptions,
NestedSequence,
+ T_Chunks,
)
T_NetcdfEngine = Literal["netcdf4", "scipy", "h5netcdf"]
@@ -48,7 +64,6 @@
str, # no nice typing support for custom backends
None,
]
- T_Chunks = Union[int, dict[Any, Any], Literal["auto"], None]
T_NetcdfTypes = Literal[
"NETCDF4", "NETCDF4_CLASSIC", "NETCDF3_64BIT", "NETCDF3_CLASSIC"
]
@@ -297,17 +312,27 @@ def _chunk_ds(
chunks,
overwrite_encoded_chunks,
inline_array,
+ chunked_array_type,
+ from_array_kwargs,
**extra_tokens,
):
- from dask.base import tokenize
+ chunkmanager = guess_chunkmanager(chunked_array_type)
- mtime = _get_mtime(filename_or_obj)
- token = tokenize(filename_or_obj, mtime, engine, chunks, **extra_tokens)
- name_prefix = f"open_dataset-{token}"
+ # TODO refactor to move this dask-specific logic inside the DaskManager class
+ if isinstance(chunkmanager, DaskManager):
+ from dask.base import tokenize
+
+ mtime = _get_mtime(filename_or_obj)
+ token = tokenize(filename_or_obj, mtime, engine, chunks, **extra_tokens)
+ name_prefix = "open_dataset-"
+ else:
+ # not used
+ token = (None,)
+ name_prefix = None
variables = {}
for name, var in backend_ds.variables.items():
- var_chunks = _get_chunk(var, chunks)
+ var_chunks = _get_chunk(var, chunks, chunkmanager)
variables[name] = _maybe_chunk(
name,
var,
@@ -316,6 +341,8 @@ def _chunk_ds(
name_prefix=name_prefix,
token=token,
inline_array=inline_array,
+ chunked_array_type=chunkmanager,
+ from_array_kwargs=from_array_kwargs.copy(),
)
return backend_ds._replace(variables)
@@ -328,6 +355,8 @@ def _dataset_from_backend_dataset(
cache,
overwrite_encoded_chunks,
inline_array,
+ chunked_array_type,
+ from_array_kwargs,
**extra_tokens,
):
if not isinstance(chunks, (int, dict)) and chunks not in {None, "auto"}:
@@ -346,6 +375,8 @@ def _dataset_from_backend_dataset(
chunks,
overwrite_encoded_chunks,
inline_array,
+ chunked_array_type,
+ from_array_kwargs,
**extra_tokens,
)
@@ -373,6 +404,8 @@ def open_dataset(
decode_coords: Literal["coordinates", "all"] | bool | None = None,
drop_variables: str | Iterable[str] | None = None,
inline_array: bool = False,
+ chunked_array_type: str | None = None,
+ from_array_kwargs: dict[str, Any] | None = None,
backend_kwargs: dict[str, Any] | None = None,
**kwargs,
) -> Dataset:
@@ -465,6 +498,15 @@ def open_dataset(
itself, and each chunk refers to that task by its key. With
``inline_array=True``, Dask will instead inline the array directly
in the values of the task graph. See :py:func:`dask.array.from_array`.
+ chunked_array_type: str, optional
+ Which chunked array type to coerce this datasets' arrays to.
+ Defaults to 'dask' if installed, else whatever is registered via the `ChunkManagerEnetryPoint` system.
+ Experimental API that should not be relied upon.
+ from_array_kwargs: dict
+ Additional keyword arguments passed on to the `ChunkManagerEntrypoint.from_array` method used to create
+ chunked arrays, via whichever chunk manager is specified through the `chunked_array_type` kwarg.
+ For example if :py:func:`dask.array.Array` objects are used for chunking, additional kwargs will be passed
+ to :py:func:`dask.array.from_array`. Experimental API that should not be relied upon.
backend_kwargs: dict
Additional keyword arguments passed on to the engine open function,
equivalent to `**kwargs`.
@@ -508,6 +550,9 @@ def open_dataset(
if engine is None:
engine = plugins.guess_engine(filename_or_obj)
+ if from_array_kwargs is None:
+ from_array_kwargs = {}
+
backend = plugins.get_backend(engine)
decoders = _resolve_decoders_kwargs(
@@ -536,6 +581,8 @@ def open_dataset(
cache,
overwrite_encoded_chunks,
inline_array,
+ chunked_array_type,
+ from_array_kwargs,
drop_variables=drop_variables,
**decoders,
**kwargs,
@@ -546,8 +593,8 @@ def open_dataset(
def open_dataarray(
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
*,
- engine: T_Engine = None,
- chunks: T_Chunks = None,
+ engine: T_Engine | None = None,
+ chunks: T_Chunks | None = None,
cache: bool | None = None,
decode_cf: bool | None = None,
mask_and_scale: bool | None = None,
@@ -558,6 +605,8 @@ def open_dataarray(
decode_coords: Literal["coordinates", "all"] | bool | None = None,
drop_variables: str | Iterable[str] | None = None,
inline_array: bool = False,
+ chunked_array_type: str | None = None,
+ from_array_kwargs: dict[str, Any] | None = None,
backend_kwargs: dict[str, Any] | None = None,
**kwargs,
) -> DataArray:
@@ -652,6 +701,15 @@ def open_dataarray(
itself, and each chunk refers to that task by its key. With
``inline_array=True``, Dask will instead inline the array directly
in the values of the task graph. See :py:func:`dask.array.from_array`.
+ chunked_array_type: str, optional
+ Which chunked array type to coerce the underlying data array to.
+ Defaults to 'dask' if installed, else whatever is registered via the `ChunkManagerEnetryPoint` system.
+ Experimental API that should not be relied upon.
+ from_array_kwargs: dict
+ Additional keyword arguments passed on to the `ChunkManagerEntrypoint.from_array` method used to create
+ chunked arrays, via whichever chunk manager is specified through the `chunked_array_type` kwarg.
+ For example if :py:func:`dask.array.Array` objects are used for chunking, additional kwargs will be passed
+ to :py:func:`dask.array.from_array`. Experimental API that should not be relied upon.
backend_kwargs: dict
Additional keyword arguments passed on to the engine open function,
equivalent to `**kwargs`.
@@ -695,6 +753,8 @@ def open_dataarray(
cache=cache,
drop_variables=drop_variables,
inline_array=inline_array,
+ chunked_array_type=chunked_array_type,
+ from_array_kwargs=from_array_kwargs,
backend_kwargs=backend_kwargs,
use_cftime=use_cftime,
decode_timedelta=decode_timedelta,
@@ -726,7 +786,7 @@ def open_dataarray(
def open_mfdataset(
paths: str | NestedSequence[str | os.PathLike],
- chunks: T_Chunks = None,
+ chunks: T_Chunks | None = None,
concat_dim: str
| DataArray
| Index
@@ -736,7 +796,7 @@ def open_mfdataset(
| None = None,
compat: CompatOptions = "no_conflicts",
preprocess: Callable[[Dataset], Dataset] | None = None,
- engine: T_Engine = None,
+ engine: T_Engine | None = None,
data_vars: Literal["all", "minimal", "different"] | list[str] = "all",
coords="different",
combine: Literal["by_coords", "nested"] = "by_coords",
@@ -911,37 +971,7 @@ def open_mfdataset(
.. [1] https://docs.xarray.dev/en/stable/dask.html
.. [2] https://docs.xarray.dev/en/stable/dask.html#chunking-and-performance
"""
- if isinstance(paths, str):
- if is_remote_uri(paths) and engine == "zarr":
- try:
- from fsspec.core import get_fs_token_paths
- except ImportError as e:
- raise ImportError(
- "The use of remote URLs for opening zarr requires the package fsspec"
- ) from e
-
- fs, _, _ = get_fs_token_paths(
- paths,
- mode="rb",
- storage_options=kwargs.get("backend_kwargs", {}).get(
- "storage_options", {}
- ),
- expand=False,
- )
- tmp_paths = fs.glob(fs._strip_protocol(paths)) # finds directories
- paths = [fs.get_mapper(path) for path in tmp_paths]
- elif is_remote_uri(paths):
- raise ValueError(
- "cannot do wild-card matching for paths that are remote URLs "
- f"unless engine='zarr' is specified. Got paths: {paths}. "
- "Instead, supply paths as an explicit list of strings."
- )
- else:
- paths = sorted(glob(_normalize_path(paths)))
- elif isinstance(paths, os.PathLike):
- paths = [os.fspath(paths)]
- else:
- paths = [os.fspath(p) if isinstance(p, os.PathLike) else p for p in paths]
+ paths = _find_absolute_paths(paths, engine=engine, **kwargs)
if not paths:
raise OSError("no files to open")
@@ -1490,6 +1520,8 @@ def to_zarr(
safe_chunks: bool = True,
storage_options: dict[str, str] | None = None,
zarr_version: int | None = None,
+ write_empty_chunks: bool | None = None,
+ chunkmanager_store_kwargs: dict[str, Any] | None = None,
) -> backends.ZarrStore:
...
@@ -1512,6 +1544,8 @@ def to_zarr(
safe_chunks: bool = True,
storage_options: dict[str, str] | None = None,
zarr_version: int | None = None,
+ write_empty_chunks: bool | None = None,
+ chunkmanager_store_kwargs: dict[str, Any] | None = None,
) -> Delayed:
...
@@ -1531,6 +1565,8 @@ def to_zarr(
safe_chunks: bool = True,
storage_options: dict[str, str] | None = None,
zarr_version: int | None = None,
+ write_empty_chunks: bool | None = None,
+ chunkmanager_store_kwargs: dict[str, Any] | None = None,
) -> backends.ZarrStore | Delayed:
"""This function creates an appropriate datastore for writing a dataset to
a zarr ztore
@@ -1623,6 +1659,7 @@ def to_zarr(
safe_chunks=safe_chunks,
stacklevel=4, # for Dataset.to_zarr()
zarr_version=zarr_version,
+ write_empty=write_empty_chunks,
)
if mode in ["a", "r+"]:
@@ -1652,7 +1689,9 @@ def to_zarr(
writer = ArrayWriter()
# TODO: figure out how to properly handle unlimited_dims
dump_to_store(dataset, zstore, writer, encoding=encoding)
- writes = writer.sync(compute=compute)
+ writes = writer.sync(
+ compute=compute, chunkmanager_store_kwargs=chunkmanager_store_kwargs
+ )
if compute:
_finalize_store(writes, zstore)
diff --git a/xarray/backends/common.py b/xarray/backends/common.py
index bca8b7f668a..1ac988c6b4f 100644
--- a/xarray/backends/common.py
+++ b/xarray/backends/common.py
@@ -5,19 +5,22 @@
import time
import traceback
from collections.abc import Iterable
+from glob import glob
from typing import TYPE_CHECKING, Any, ClassVar
import numpy as np
from xarray.conventions import cf_encoder
from xarray.core import indexing
-from xarray.core.pycompat import is_duck_dask_array
+from xarray.core.parallelcompat import get_chunked_array_type
+from xarray.core.pycompat import is_chunked_array
from xarray.core.utils import FrozenDict, NdimSizeLenMixin, is_remote_uri
if TYPE_CHECKING:
from io import BufferedIOBase
from xarray.core.dataset import Dataset
+ from xarray.core.types import NestedSequence
# Create a logger object, but don't add any handlers. Leave that to user code.
logger = logging.getLogger(__name__)
@@ -27,6 +30,24 @@
def _normalize_path(path):
+ """
+ Normalize pathlikes to string.
+
+ Parameters
+ ----------
+ path :
+ Path to file.
+
+ Examples
+ --------
+ >>> from pathlib import Path
+
+ >>> directory = Path(xr.backends.common.__file__).parent
+ >>> paths_path = Path(directory).joinpath("comm*n.py")
+ >>> paths_str = xr.backends.common._normalize_path(paths_path)
+ >>> print([type(p) for p in (paths_str,)])
+ []
+ """
if isinstance(path, os.PathLike):
path = os.fspath(path)
@@ -36,6 +57,64 @@ def _normalize_path(path):
return path
+def _find_absolute_paths(
+ paths: str | os.PathLike | NestedSequence[str | os.PathLike], **kwargs
+) -> list[str]:
+ """
+ Find absolute paths from the pattern.
+
+ Parameters
+ ----------
+ paths :
+ Path(s) to file(s). Can include wildcards like * .
+ **kwargs :
+ Extra kwargs. Mainly for fsspec.
+
+ Examples
+ --------
+ >>> from pathlib import Path
+
+ >>> directory = Path(xr.backends.common.__file__).parent
+ >>> paths = str(Path(directory).joinpath("comm*n.py")) # Find common with wildcard
+ >>> paths = xr.backends.common._find_absolute_paths(paths)
+ >>> [Path(p).name for p in paths]
+ ['common.py']
+ """
+ if isinstance(paths, str):
+ if is_remote_uri(paths) and kwargs.get("engine", None) == "zarr":
+ try:
+ from fsspec.core import get_fs_token_paths
+ except ImportError as e:
+ raise ImportError(
+ "The use of remote URLs for opening zarr requires the package fsspec"
+ ) from e
+
+ fs, _, _ = get_fs_token_paths(
+ paths,
+ mode="rb",
+ storage_options=kwargs.get("backend_kwargs", {}).get(
+ "storage_options", {}
+ ),
+ expand=False,
+ )
+ tmp_paths = fs.glob(fs._strip_protocol(paths)) # finds directories
+ paths = [fs.get_mapper(path) for path in tmp_paths]
+ elif is_remote_uri(paths):
+ raise ValueError(
+ "cannot do wild-card matching for paths that are remote URLs "
+ f"unless engine='zarr' is specified. Got paths: {paths}. "
+ "Instead, supply paths as an explicit list of strings."
+ )
+ else:
+ paths = sorted(glob(_normalize_path(paths)))
+ elif isinstance(paths, os.PathLike):
+ paths = [os.fspath(paths)]
+ else:
+ paths = [os.fspath(p) if isinstance(p, os.PathLike) else p for p in paths]
+
+ return paths
+
+
def _encode_variable_name(name):
if name is None:
name = NONE_VAR_NAME
@@ -153,7 +232,7 @@ def __init__(self, lock=None):
self.lock = lock
def add(self, source, target, region=None):
- if is_duck_dask_array(source):
+ if is_chunked_array(source):
self.sources.append(source)
self.targets.append(target)
self.regions.append(region)
@@ -163,21 +242,25 @@ def add(self, source, target, region=None):
else:
target[...] = source
- def sync(self, compute=True):
+ def sync(self, compute=True, chunkmanager_store_kwargs=None):
if self.sources:
- import dask.array as da
+ chunkmanager = get_chunked_array_type(*self.sources)
# TODO: consider wrapping targets with dask.delayed, if this makes
# for any discernible difference in perforance, e.g.,
# targets = [dask.delayed(t) for t in self.targets]
- delayed_store = da.store(
+ if chunkmanager_store_kwargs is None:
+ chunkmanager_store_kwargs = {}
+
+ delayed_store = chunkmanager.store(
self.sources,
self.targets,
lock=self.lock,
compute=compute,
flush=True,
regions=self.regions,
+ **chunkmanager_store_kwargs,
)
self.sources = []
self.targets = []
diff --git a/xarray/backends/file_manager.py b/xarray/backends/file_manager.py
index 91fd15fcaa4..df901f9a1d9 100644
--- a/xarray/backends/file_manager.py
+++ b/xarray/backends/file_manager.py
@@ -1,5 +1,6 @@
from __future__ import annotations
+import atexit
import contextlib
import io
import threading
@@ -289,6 +290,13 @@ def __repr__(self) -> str:
)
+@atexit.register
+def _remove_del_method():
+ # We don't need to close unclosed files at program exit, and may not be able
+ # to, because Python is cleaning up imports / globals.
+ del CachingFileManager.__del__
+
+
class _RefCounter:
"""Class for keeping track of reference counts."""
diff --git a/xarray/backends/h5netcdf_.py b/xarray/backends/h5netcdf_.py
index 7389f6a2862..59f6c362491 100644
--- a/xarray/backends/h5netcdf_.py
+++ b/xarray/backends/h5netcdf_.py
@@ -6,8 +6,6 @@
from collections.abc import Iterable
from typing import TYPE_CHECKING, Any
-from packaging.version import Version
-
from xarray.backends.common import (
BACKEND_ENTRYPOINTS,
BackendEntrypoint,
@@ -20,6 +18,7 @@
from xarray.backends.netCDF4_ import (
BaseNetCDF4Array,
_encode_nc4_variable,
+ _ensure_no_forward_slash_in_name,
_extract_nc4_variable_encoding,
_get_datatype,
_nc4_require_group,
@@ -232,30 +231,17 @@ def get_attrs(self):
return FrozenDict(_read_attributes(self.ds))
def get_dimensions(self):
- import h5netcdf
-
- if Version(h5netcdf.__version__) >= Version("0.14.0.dev0"):
- return FrozenDict((k, len(v)) for k, v in self.ds.dimensions.items())
- else:
- return self.ds.dimensions
+ return FrozenDict((k, len(v)) for k, v in self.ds.dimensions.items())
def get_encoding(self):
- import h5netcdf
-
- if Version(h5netcdf.__version__) >= Version("0.14.0.dev0"):
- return {
- "unlimited_dims": {
- k for k, v in self.ds.dimensions.items() if v.isunlimited()
- }
- }
- else:
- return {
- "unlimited_dims": {
- k for k, v in self.ds.dimensions.items() if v is None
- }
+ return {
+ "unlimited_dims": {
+ k for k, v in self.ds.dimensions.items() if v.isunlimited()
}
+ }
def set_dimension(self, name, length, is_unlimited=False):
+ _ensure_no_forward_slash_in_name(name)
if is_unlimited:
self.ds.dimensions[name] = None
self.ds.resize_dimension(name, length)
@@ -273,6 +259,7 @@ def prepare_variable(
):
import h5py
+ _ensure_no_forward_slash_in_name(name)
attrs = variable.attrs.copy()
dtype = _get_datatype(variable, raise_on_invalid_encoding=check_encoding)
diff --git a/xarray/backends/netCDF4_.py b/xarray/backends/netCDF4_.py
index d3866e90de6..b5c3413e7f8 100644
--- a/xarray/backends/netCDF4_.py
+++ b/xarray/backends/netCDF4_.py
@@ -65,10 +65,12 @@ def __init__(self, variable_name, datastore):
dtype = array.dtype
if dtype is str:
- # use object dtype because that's the only way in numpy to
- # represent variable length strings; it also prevents automatic
- # string concatenation via conventions.decode_cf_variable
- dtype = np.dtype("O")
+ # use object dtype (with additional vlen string metadata) because that's
+ # the only way in numpy to represent variable length strings and to
+ # check vlen string dtype in further steps
+ # it also prevents automatic string concatenation via
+ # conventions.decode_cf_variable
+ dtype = coding.strings.create_vlen_dtype(str)
self.dtype = dtype
def __setitem__(self, key, value):
@@ -192,6 +194,15 @@ def _nc4_require_group(ds, group, mode, create_group=_netcdf4_create_group):
return ds
+def _ensure_no_forward_slash_in_name(name):
+ if "/" in name:
+ raise ValueError(
+ f"Forward slashes '/' are not allowed in variable and dimension names (got {name!r}). "
+ "Forward slashes are used as hierarchy-separators for "
+ "HDF5-based files ('netcdf4'/'h5netcdf')."
+ )
+
+
def _ensure_fill_value_valid(data, attributes):
# work around for netCDF4/scipy issue where _FillValue has the wrong type:
# https://github.com/Unidata/netcdf4-python/issues/271
@@ -445,6 +456,7 @@ def get_encoding(self):
}
def set_dimension(self, name, length, is_unlimited=False):
+ _ensure_no_forward_slash_in_name(name)
dim_length = length if not is_unlimited else None
self.ds.createDimension(name, size=dim_length)
@@ -468,6 +480,8 @@ def encode_variable(self, variable):
def prepare_variable(
self, name, variable, check_encoding=False, unlimited_dims=None
):
+ _ensure_no_forward_slash_in_name(name)
+
datatype = _get_datatype(
variable, self.format, raise_on_invalid_encoding=check_encoding
)
diff --git a/xarray/backends/plugins.py b/xarray/backends/plugins.py
index 232c2300192..a62ca6c9862 100644
--- a/xarray/backends/plugins.py
+++ b/xarray/backends/plugins.py
@@ -146,7 +146,7 @@ def refresh_engines() -> None:
def guess_engine(
store_spec: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
-):
+) -> str | type[BackendEntrypoint]:
engines = list_engines()
for engine, backend in engines.items():
diff --git a/xarray/backends/pydap_.py b/xarray/backends/pydap_.py
index 116c48f5692..9b5bcc82e6f 100644
--- a/xarray/backends/pydap_.py
+++ b/xarray/backends/pydap_.py
@@ -4,7 +4,6 @@
from typing import TYPE_CHECKING, Any
import numpy as np
-from packaging.version import Version
from xarray.backends.common import (
BACKEND_ENTRYPOINTS,
@@ -123,11 +122,10 @@ def open(
"output_grid": output_grid or True,
"timeout": timeout,
}
- if Version(pydap.lib.__version__) >= Version("3.3.0"):
- if verify is not None:
- kwargs.update({"verify": verify})
- if user_charset is not None:
- kwargs.update({"user_charset": user_charset})
+ if verify is not None:
+ kwargs.update({"verify": verify})
+ if user_charset is not None:
+ kwargs.update({"user_charset": user_charset})
ds = pydap.client.open_url(**kwargs)
return cls(ds)
diff --git a/xarray/backends/zarr.py b/xarray/backends/zarr.py
index 7d21c771e06..f88523422bb 100644
--- a/xarray/backends/zarr.py
+++ b/xarray/backends/zarr.py
@@ -19,6 +19,7 @@
)
from xarray.backends.store import StoreBackendEntrypoint
from xarray.core import indexing
+from xarray.core.parallelcompat import guess_chunkmanager
from xarray.core.pycompat import integer_types
from xarray.core.utils import (
FrozenDict,
@@ -69,7 +70,14 @@ def __init__(self, variable_name, datastore):
array = self.get_array()
self.shape = array.shape
- dtype = array.dtype
+ # preserve vlen string object dtype (GH 7328)
+ if array.filters is not None and any(
+ [filt.codec_id == "vlen-utf8" for filt in array.filters]
+ ):
+ dtype = coding.strings.create_vlen_dtype(str)
+ else:
+ dtype = array.dtype
+
self.dtype = dtype
def get_array(self):
@@ -360,6 +368,7 @@ class ZarrStore(AbstractWritableDataStore):
"_synchronizer",
"_write_region",
"_safe_chunks",
+ "_write_empty",
)
@classmethod
@@ -378,6 +387,7 @@ def open_group(
safe_chunks=True,
stacklevel=2,
zarr_version=None,
+ write_empty: bool | None = None,
):
import zarr
@@ -409,7 +419,7 @@ def open_group(
if consolidated is None:
consolidated = False
- if chunk_store:
+ if chunk_store is not None:
open_kwargs["chunk_store"] = chunk_store
if consolidated is None:
consolidated = False
@@ -449,6 +459,7 @@ def open_group(
append_dim,
write_region,
safe_chunks,
+ write_empty,
)
def __init__(
@@ -459,6 +470,7 @@ def __init__(
append_dim=None,
write_region=None,
safe_chunks=True,
+ write_empty: bool | None = None,
):
self.zarr_group = zarr_group
self._read_only = self.zarr_group.read_only
@@ -469,6 +481,7 @@ def __init__(
self._append_dim = append_dim
self._write_region = write_region
self._safe_chunks = safe_chunks
+ self._write_empty = write_empty
@property
def ds(self):
@@ -640,6 +653,8 @@ def set_variables(self, variables, check_encoding_set, writer, unlimited_dims=No
dimensions.
"""
+ import zarr
+
for vn, v in variables.items():
name = _encode_variable_name(vn)
check = vn in check_encoding_set
@@ -657,7 +672,14 @@ def set_variables(self, variables, check_encoding_set, writer, unlimited_dims=No
# TODO: if mode="a", consider overriding the existing variable
# metadata. This would need some case work properly with region
# and append_dim.
- zarr_array = self.zarr_group[name]
+ if self._write_empty is not None:
+ zarr_array = zarr.open(
+ store=self.zarr_group.store,
+ path=f"{self.zarr_group.name}/{name}",
+ write_empty_chunks=self._write_empty,
+ )
+ else:
+ zarr_array = self.zarr_group[name]
else:
# new variable
encoding = extract_zarr_variable_encoding(
@@ -671,8 +693,25 @@ def set_variables(self, variables, check_encoding_set, writer, unlimited_dims=No
if coding.strings.check_vlen_dtype(dtype) == str:
dtype = str
+
+ if self._write_empty is not None:
+ if (
+ "write_empty_chunks" in encoding
+ and encoding["write_empty_chunks"] != self._write_empty
+ ):
+ raise ValueError(
+ 'Differing "write_empty_chunks" values in encoding and parameters'
+ f'Got {encoding["write_empty_chunks"] = } and {self._write_empty = }'
+ )
+ else:
+ encoding["write_empty_chunks"] = self._write_empty
+
zarr_array = self.zarr_group.create(
- name, shape=shape, dtype=dtype, fill_value=fill_value, **encoding
+ name,
+ shape=shape,
+ dtype=dtype,
+ fill_value=fill_value,
+ **encoding,
)
zarr_array = _put_attrs(zarr_array, encoded_attrs)
@@ -716,6 +755,8 @@ def open_zarr(
decode_timedelta=None,
use_cftime=None,
zarr_version=None,
+ chunked_array_type: str | None = None,
+ from_array_kwargs: dict[str, Any] | None = None,
**kwargs,
):
"""Load and decode a dataset from a Zarr store.
@@ -800,6 +841,15 @@ def open_zarr(
The desired zarr spec version to target (currently 2 or 3). The default
of None will attempt to determine the zarr version from ``store`` when
possible, otherwise defaulting to 2.
+ chunked_array_type: str, optional
+ Which chunked array type to coerce this datasets' arrays to.
+ Defaults to 'dask' if installed, else whatever is registered via the `ChunkManagerEntryPoint` system.
+ Experimental API that should not be relied upon.
+ from_array_kwargs: dict, optional
+ Additional keyword arguments passed on to the `ChunkManagerEntrypoint.from_array` method used to create
+ chunked arrays, via whichever chunk manager is specified through the `chunked_array_type` kwarg.
+ Defaults to {'manager': 'dask'}, meaning additional kwargs will be passed eventually to
+ :py:func:`dask.array.from_array`. Experimental API that should not be relied upon.
Returns
-------
@@ -817,12 +867,17 @@ def open_zarr(
"""
from xarray.backends.api import open_dataset
+ if from_array_kwargs is None:
+ from_array_kwargs = {}
+
if chunks == "auto":
try:
- import dask.array # noqa
+ guess_chunkmanager(
+ chunked_array_type
+ ) # attempt to import that parallel backend
chunks = {}
- except ImportError:
+ except ValueError:
chunks = None
if kwargs:
@@ -851,6 +906,8 @@ def open_zarr(
engine="zarr",
chunks=chunks,
drop_variables=drop_variables,
+ chunked_array_type=chunked_array_type,
+ from_array_kwargs=from_array_kwargs,
backend_kwargs=backend_kwargs,
decode_timedelta=decode_timedelta,
use_cftime=use_cftime,
diff --git a/xarray/coding/cftimeindex.py b/xarray/coding/cftimeindex.py
index c6a7b9f8763..8f3472dce19 100644
--- a/xarray/coding/cftimeindex.py
+++ b/xarray/coding/cftimeindex.py
@@ -470,13 +470,9 @@ def get_loc(self, key):
else:
return super().get_loc(key)
- def _maybe_cast_slice_bound(self, label, side, kind=None):
+ def _maybe_cast_slice_bound(self, label, side):
"""Adapted from
pandas.tseries.index.DatetimeIndex._maybe_cast_slice_bound
-
- Note that we have never used the kind argument in CFTimeIndex and it is
- deprecated as of pandas version 1.3.0. It exists only for compatibility
- reasons. We can remove it when our minimum version of pandas is 1.3.0.
"""
if not isinstance(label, str):
return label
diff --git a/xarray/coding/strings.py b/xarray/coding/strings.py
index 61b3ab7c46c..d0bfb1a7a63 100644
--- a/xarray/coding/strings.py
+++ b/xarray/coding/strings.py
@@ -14,7 +14,7 @@
unpack_for_encoding,
)
from xarray.core import indexing
-from xarray.core.pycompat import is_duck_dask_array
+from xarray.core.parallelcompat import get_chunked_array_type, is_chunked_array
from xarray.core.variable import Variable
@@ -29,7 +29,8 @@ def check_vlen_dtype(dtype):
if dtype.kind != "O" or dtype.metadata is None:
return None
else:
- return dtype.metadata.get("element_type")
+ # check xarray (element_type) as well as h5py (vlen)
+ return dtype.metadata.get("element_type", dtype.metadata.get("vlen"))
def is_unicode_dtype(dtype):
@@ -134,10 +135,10 @@ def bytes_to_char(arr):
if arr.dtype.kind != "S":
raise ValueError("argument must have a fixed-width bytes dtype")
- if is_duck_dask_array(arr):
- import dask.array as da
+ if is_chunked_array(arr):
+ chunkmanager = get_chunked_array_type(arr)
- return da.map_blocks(
+ return chunkmanager.map_blocks(
_numpy_bytes_to_char,
arr,
dtype="S1",
@@ -169,8 +170,8 @@ def char_to_bytes(arr):
# can't make an S0 dtype
return np.zeros(arr.shape[:-1], dtype=np.string_)
- if is_duck_dask_array(arr):
- import dask.array as da
+ if is_chunked_array(arr):
+ chunkmanager = get_chunked_array_type(arr)
if len(arr.chunks[-1]) > 1:
raise ValueError(
@@ -179,7 +180,7 @@ def char_to_bytes(arr):
)
dtype = np.dtype("S" + str(arr.shape[-1]))
- return da.map_blocks(
+ return chunkmanager.map_blocks(
_numpy_char_to_bytes,
arr,
dtype=dtype,
diff --git a/xarray/coding/variables.py b/xarray/coding/variables.py
index 5c6e51c2215..8ba7dcbb0e2 100644
--- a/xarray/coding/variables.py
+++ b/xarray/coding/variables.py
@@ -10,7 +10,8 @@
import pandas as pd
from xarray.core import dtypes, duck_array_ops, indexing
-from xarray.core.pycompat import is_duck_dask_array
+from xarray.core.parallelcompat import get_chunked_array_type
+from xarray.core.pycompat import is_chunked_array
from xarray.core.variable import Variable
if TYPE_CHECKING:
@@ -57,7 +58,7 @@ class _ElementwiseFunctionArray(indexing.ExplicitlyIndexedNDArrayMixin):
"""
def __init__(self, array, func: Callable, dtype: np.typing.DTypeLike):
- assert not is_duck_dask_array(array)
+ assert not is_chunked_array(array)
self.array = indexing.as_indexable(array)
self.func = func
self._dtype = dtype
@@ -158,10 +159,10 @@ def lazy_elemwise_func(array, func: Callable, dtype: np.typing.DTypeLike):
-------
Either a dask.array.Array or _ElementwiseFunctionArray.
"""
- if is_duck_dask_array(array):
- import dask.array as da
+ if is_chunked_array(array):
+ chunkmanager = get_chunked_array_type(array)
- return da.map_blocks(func, array, dtype=dtype)
+ return chunkmanager.map_blocks(func, array, dtype=dtype)
else:
return _ElementwiseFunctionArray(array, func, dtype)
@@ -330,7 +331,7 @@ def encode(self, variable: Variable, name: T_Name = None) -> Variable:
if "scale_factor" in encoding or "add_offset" in encoding:
dtype = _choose_float_dtype(data.dtype, "add_offset" in encoding)
- data = data.astype(dtype=dtype, copy=True)
+ data = duck_array_ops.astype(data, dtype=dtype, copy=True)
if "add_offset" in encoding:
data -= pop_to(encoding, attrs, "add_offset", name=name)
if "scale_factor" in encoding:
@@ -377,7 +378,7 @@ def encode(self, variable: Variable, name: T_Name = None) -> Variable:
if "_FillValue" in attrs:
new_fill = signed_dtype.type(attrs["_FillValue"])
attrs["_FillValue"] = new_fill
- data = duck_array_ops.around(data).astype(signed_dtype)
+ data = duck_array_ops.astype(duck_array_ops.around(data), signed_dtype)
return Variable(dims, data, attrs, encoding, fastpath=True)
else:
diff --git a/xarray/conventions.py b/xarray/conventions.py
index ea0787aa1a1..5dd2fbbde74 100644
--- a/xarray/conventions.py
+++ b/xarray/conventions.py
@@ -108,6 +108,10 @@ def ensure_dtype_not_object(var: Variable, name: T_Name = None) -> Variable:
if var.dtype.kind == "O":
dims, data, attrs, encoding = _var_as_tuple(var)
+ # leave vlen dtypes unchanged
+ if strings.check_vlen_dtype(data.dtype) is not None:
+ return var
+
if is_duck_dask_array(data):
warnings.warn(
"variable {} has data in the form of a dask array with "
@@ -690,7 +694,7 @@ def _encode_coordinates(variables, attributes, non_dim_coord_names):
if not coords_str and variable_coordinates[name]:
coordinates_text = " ".join(
str(coord_name)
- for coord_name in variable_coordinates[name]
+ for coord_name in sorted(variable_coordinates[name])
if coord_name not in not_technically_coordinates
)
if coordinates_text:
@@ -715,7 +719,7 @@ def _encode_coordinates(variables, attributes, non_dim_coord_names):
SerializationWarning,
)
else:
- attributes["coordinates"] = " ".join(map(str, global_coordinates))
+ attributes["coordinates"] = " ".join(sorted(map(str, global_coordinates)))
return variables, attributes
diff --git a/xarray/core/_aggregations.py b/xarray/core/_aggregations.py
index 3051502beba..d5070f97c6a 100644
--- a/xarray/core/_aggregations.py
+++ b/xarray/core/_aggregations.py
@@ -9,7 +9,7 @@
from xarray.core import duck_array_ops
from xarray.core.options import OPTIONS
from xarray.core.types import Dims
-from xarray.core.utils import contains_only_dask_or_numpy, module_available
+from xarray.core.utils import contains_only_chunked_or_numpy, module_available
if TYPE_CHECKING:
from xarray.core.dataarray import DataArray
@@ -65,8 +65,8 @@ def count(
See Also
--------
- numpy.count
- dask.array.count
+ pandas.DataFrame.count
+ dask.dataframe.DataFrame.count
DataArray.count
:ref:`agg`
User guide on reduction or aggregation operations.
@@ -74,7 +74,7 @@ def count(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -89,7 +89,7 @@ def count(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.count()
@@ -296,7 +296,7 @@ def max(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -311,7 +311,7 @@ def max(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.max()
@@ -383,7 +383,7 @@ def min(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -398,13 +398,13 @@ def min(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.min()
Dimensions: ()
Data variables:
- da float64 1.0
+ da float64 0.0
Use ``skipna`` to control whether NaNs are ignored.
@@ -474,7 +474,7 @@ def mean(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -489,13 +489,13 @@ def mean(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.mean()
Dimensions: ()
Data variables:
- da float64 1.8
+ da float64 1.6
Use ``skipna`` to control whether NaNs are ignored.
@@ -572,7 +572,7 @@ def prod(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -587,13 +587,13 @@ def prod(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.prod()
Dimensions: ()
Data variables:
- da float64 12.0
+ da float64 0.0
Use ``skipna`` to control whether NaNs are ignored.
@@ -609,7 +609,7 @@ def prod(
Dimensions: ()
Data variables:
- da float64 12.0
+ da float64 0.0
"""
return self.reduce(
duck_array_ops.prod,
@@ -679,7 +679,7 @@ def sum(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -694,13 +694,13 @@ def sum(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.sum()
Dimensions: ()
Data variables:
- da float64 9.0
+ da float64 8.0
Use ``skipna`` to control whether NaNs are ignored.
@@ -716,7 +716,7 @@ def sum(
Dimensions: ()
Data variables:
- da float64 9.0
+ da float64 8.0
"""
return self.reduce(
duck_array_ops.sum,
@@ -783,7 +783,7 @@ def std(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -798,13 +798,13 @@ def std(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.std()
Dimensions: ()
Data variables:
- da float64 0.7483
+ da float64 1.02
Use ``skipna`` to control whether NaNs are ignored.
@@ -820,7 +820,7 @@ def std(
Dimensions: ()
Data variables:
- da float64 0.8367
+ da float64 1.14
"""
return self.reduce(
duck_array_ops.std,
@@ -887,7 +887,7 @@ def var(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -902,13 +902,13 @@ def var(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.var()
Dimensions: ()
Data variables:
- da float64 0.56
+ da float64 1.04
Use ``skipna`` to control whether NaNs are ignored.
@@ -924,7 +924,7 @@ def var(
Dimensions: ()
Data variables:
- da float64 0.7
+ da float64 1.3
"""
return self.reduce(
duck_array_ops.var,
@@ -987,7 +987,7 @@ def median(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -1002,7 +1002,7 @@ def median(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.median()
@@ -1078,7 +1078,7 @@ def cumsum(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -1093,14 +1093,14 @@ def cumsum(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.cumsum()
Dimensions: (time: 6)
Dimensions without coordinates: time
Data variables:
- da (time) float64 1.0 3.0 6.0 7.0 9.0 9.0
+ da (time) float64 1.0 3.0 6.0 6.0 8.0 8.0
Use ``skipna`` to control whether NaNs are ignored.
@@ -1109,7 +1109,7 @@ def cumsum(
Dimensions: (time: 6)
Dimensions without coordinates: time
Data variables:
- da (time) float64 1.0 3.0 6.0 7.0 9.0 nan
+ da (time) float64 1.0 3.0 6.0 6.0 8.0 nan
"""
return self.reduce(
duck_array_ops.cumsum,
@@ -1171,7 +1171,7 @@ def cumprod(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -1186,14 +1186,14 @@ def cumprod(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.cumprod()
Dimensions: (time: 6)
Dimensions without coordinates: time
Data variables:
- da (time) float64 1.0 2.0 6.0 6.0 12.0 12.0
+ da (time) float64 1.0 2.0 6.0 0.0 0.0 0.0
Use ``skipna`` to control whether NaNs are ignored.
@@ -1202,7 +1202,7 @@ def cumprod(
Dimensions: (time: 6)
Dimensions without coordinates: time
Data variables:
- da (time) float64 1.0 2.0 6.0 6.0 12.0 nan
+ da (time) float64 1.0 2.0 6.0 0.0 0.0 nan
"""
return self.reduce(
duck_array_ops.cumprod,
@@ -1261,8 +1261,8 @@ def count(
See Also
--------
- numpy.count
- dask.array.count
+ pandas.DataFrame.count
+ dask.dataframe.DataFrame.count
Dataset.count
:ref:`agg`
User guide on reduction or aggregation operations.
@@ -1270,7 +1270,7 @@ def count(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -1279,7 +1279,7 @@ def count(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -1483,7 +1483,7 @@ def max(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -1562,14 +1562,14 @@ def min(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.min()
- array(1.)
+ array(0.)
Use ``skipna`` to control whether NaNs are ignored.
@@ -1636,7 +1636,7 @@ def mean(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -1645,14 +1645,14 @@ def mean(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.mean()
- array(1.8)
+ array(1.6)
Use ``skipna`` to control whether NaNs are ignored.
@@ -1726,7 +1726,7 @@ def prod(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -1735,14 +1735,14 @@ def prod(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.prod()
- array(12.)
+ array(0.)
Use ``skipna`` to control whether NaNs are ignored.
@@ -1754,7 +1754,7 @@ def prod(
>>> da.prod(skipna=True, min_count=2)
- array(12.)
+ array(0.)
"""
return self.reduce(
duck_array_ops.prod,
@@ -1823,7 +1823,7 @@ def sum(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -1832,14 +1832,14 @@ def sum(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.sum()
- array(9.)
+ array(8.)
Use ``skipna`` to control whether NaNs are ignored.
@@ -1851,7 +1851,7 @@ def sum(
>>> da.sum(skipna=True, min_count=2)
- array(9.)
+ array(8.)
"""
return self.reduce(
duck_array_ops.sum,
@@ -1917,7 +1917,7 @@ def std(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -1926,14 +1926,14 @@ def std(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.std()
- array(0.74833148)
+ array(1.0198039)
Use ``skipna`` to control whether NaNs are ignored.
@@ -1945,7 +1945,7 @@ def std(
>>> da.std(skipna=True, ddof=1)
- array(0.83666003)
+ array(1.14017543)
"""
return self.reduce(
duck_array_ops.std,
@@ -2011,7 +2011,7 @@ def var(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -2020,14 +2020,14 @@ def var(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.var()
- array(0.56)
+ array(1.04)
Use ``skipna`` to control whether NaNs are ignored.
@@ -2039,7 +2039,7 @@ def var(
>>> da.var(skipna=True, ddof=1)
- array(0.7)
+ array(1.3)
"""
return self.reduce(
duck_array_ops.var,
@@ -2101,7 +2101,7 @@ def median(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -2110,7 +2110,7 @@ def median(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -2193,14 +2193,14 @@ def cumsum(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.cumsum()
- array([1., 3., 6., 7., 9., 9.])
+ array([1., 3., 6., 6., 8., 8.])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.cumsum(skipna=False)
- array([ 1., 3., 6., 7., 9., nan])
+ array([ 1., 3., 6., 6., 8., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -2282,14 +2282,14 @@ def cumprod(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.cumprod()
- array([ 1., 2., 6., 6., 12., 12.])
+ array([1., 2., 6., 0., 0., 0.])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.cumprod(skipna=False)
- array([ 1., 2., 6., 6., 12., nan])
+ array([ 1., 2., 6., 0., 0., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -2400,7 +2400,7 @@ def count(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.groupby("labels").count()
@@ -2413,7 +2413,7 @@ def count(
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="count",
@@ -2511,7 +2511,7 @@ def all(
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="all",
@@ -2609,7 +2609,7 @@ def any(
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="any",
@@ -2685,7 +2685,7 @@ def max(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -2700,7 +2700,7 @@ def max(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.groupby("labels").max()
@@ -2723,7 +2723,7 @@ def max(
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="max",
@@ -2801,7 +2801,7 @@ def min(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -2816,7 +2816,7 @@ def min(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.groupby("labels").min()
@@ -2824,7 +2824,7 @@ def min(
Coordinates:
* labels (labels) object 'a' 'b' 'c'
Data variables:
- da (labels) float64 1.0 2.0 1.0
+ da (labels) float64 1.0 2.0 0.0
Use ``skipna`` to control whether NaNs are ignored.
@@ -2834,12 +2834,12 @@ def min(
Coordinates:
* labels (labels) object 'a' 'b' 'c'
Data variables:
- da (labels) float64 nan 2.0 1.0
+ da (labels) float64 nan 2.0 0.0
"""
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="min",
@@ -2919,7 +2919,7 @@ def mean(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -2934,7 +2934,7 @@ def mean(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.groupby("labels").mean()
@@ -2942,7 +2942,7 @@ def mean(
Coordinates:
* labels (labels) object 'a' 'b' 'c'
Data variables:
- da (labels) float64 1.0 2.0 2.0
+ da (labels) float64 1.0 2.0 1.5
Use ``skipna`` to control whether NaNs are ignored.
@@ -2952,12 +2952,12 @@ def mean(
Coordinates:
* labels (labels) object 'a' 'b' 'c'
Data variables:
- da (labels) float64 nan 2.0 2.0
+ da (labels) float64 nan 2.0 1.5
"""
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="mean",
@@ -3044,7 +3044,7 @@ def prod(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -3059,7 +3059,7 @@ def prod(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.groupby("labels").prod()
@@ -3067,7 +3067,7 @@ def prod(
Coordinates:
* labels (labels) object 'a' 'b' 'c'
Data variables:
- da (labels) float64 1.0 4.0 3.0
+ da (labels) float64 1.0 4.0 0.0
Use ``skipna`` to control whether NaNs are ignored.
@@ -3077,7 +3077,7 @@ def prod(
Coordinates:
* labels (labels) object 'a' 'b' 'c'
Data variables:
- da (labels) float64 nan 4.0 3.0
+ da (labels) float64 nan 4.0 0.0
Specify ``min_count`` for finer control over when NaNs are ignored.
@@ -3087,12 +3087,12 @@ def prod(
Coordinates:
* labels (labels) object 'a' 'b' 'c'
Data variables:
- da (labels) float64 nan 4.0 3.0
+ da (labels) float64 nan 4.0 0.0
"""
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="prod",
@@ -3181,7 +3181,7 @@ def sum(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -3196,7 +3196,7 @@ def sum(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.groupby("labels").sum()
@@ -3204,7 +3204,7 @@ def sum(
Coordinates:
* labels (labels) object 'a' 'b' 'c'
Data variables:
- da (labels) float64 1.0 4.0 4.0
+ da (labels) float64 1.0 4.0 3.0
Use ``skipna`` to control whether NaNs are ignored.
@@ -3214,7 +3214,7 @@ def sum(
Coordinates:
* labels (labels) object 'a' 'b' 'c'
Data variables:
- da (labels) float64 nan 4.0 4.0
+ da (labels) float64 nan 4.0 3.0
Specify ``min_count`` for finer control over when NaNs are ignored.
@@ -3224,12 +3224,12 @@ def sum(
Coordinates:
* labels (labels) object 'a' 'b' 'c'
Data variables:
- da (labels) float64 nan 4.0 4.0
+ da (labels) float64 nan 4.0 3.0
"""
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="sum",
@@ -3315,7 +3315,7 @@ def std(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -3330,7 +3330,7 @@ def std(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.groupby("labels").std()
@@ -3338,7 +3338,7 @@ def std(
Coordinates:
* labels (labels) object 'a' 'b' 'c'
Data variables:
- da (labels) float64 0.0 0.0 1.0
+ da (labels) float64 0.0 0.0 1.5
Use ``skipna`` to control whether NaNs are ignored.
@@ -3348,7 +3348,7 @@ def std(
Coordinates:
* labels (labels) object 'a' 'b' 'c'
Data variables:
- da (labels) float64 nan 0.0 1.0
+ da (labels) float64 nan 0.0 1.5
Specify ``ddof=1`` for an unbiased estimate.
@@ -3358,12 +3358,12 @@ def std(
Coordinates:
* labels (labels) object 'a' 'b' 'c'
Data variables:
- da (labels) float64 nan 0.0 1.414
+ da (labels) float64 nan 0.0 2.121
"""
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="std",
@@ -3449,7 +3449,7 @@ def var(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -3464,7 +3464,7 @@ def var(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.groupby("labels").var()
@@ -3472,7 +3472,7 @@ def var(
Coordinates:
* labels (labels) object 'a' 'b' 'c'
Data variables:
- da (labels) float64 0.0 0.0 1.0
+ da (labels) float64 0.0 0.0 2.25
Use ``skipna`` to control whether NaNs are ignored.
@@ -3482,7 +3482,7 @@ def var(
Coordinates:
* labels (labels) object 'a' 'b' 'c'
Data variables:
- da (labels) float64 nan 0.0 1.0
+ da (labels) float64 nan 0.0 2.25
Specify ``ddof=1`` for an unbiased estimate.
@@ -3492,12 +3492,12 @@ def var(
Coordinates:
* labels (labels) object 'a' 'b' 'c'
Data variables:
- da (labels) float64 nan 0.0 2.0
+ da (labels) float64 nan 0.0 4.5
"""
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="var",
@@ -3579,7 +3579,7 @@ def median(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -3594,7 +3594,7 @@ def median(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.groupby("labels").median()
@@ -3602,7 +3602,7 @@ def median(
Coordinates:
* labels (labels) object 'a' 'b' 'c'
Data variables:
- da (labels) float64 1.0 2.0 2.0
+ da (labels) float64 1.0 2.0 1.5
Use ``skipna`` to control whether NaNs are ignored.
@@ -3612,7 +3612,7 @@ def median(
Coordinates:
* labels (labels) object 'a' 'b' 'c'
Data variables:
- da (labels) float64 nan 2.0 2.0
+ da (labels) float64 nan 2.0 1.5
"""
return self.reduce(
duck_array_ops.median,
@@ -3682,7 +3682,7 @@ def cumsum(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -3697,14 +3697,14 @@ def cumsum(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.groupby("labels").cumsum()
Dimensions: (time: 6)
Dimensions without coordinates: time
Data variables:
- da (time) float64 1.0 2.0 3.0 4.0 4.0 1.0
+ da (time) float64 1.0 2.0 3.0 3.0 4.0 1.0
Use ``skipna`` to control whether NaNs are ignored.
@@ -3713,7 +3713,7 @@ def cumsum(
Dimensions: (time: 6)
Dimensions without coordinates: time
Data variables:
- da (time) float64 1.0 2.0 3.0 4.0 4.0 nan
+ da (time) float64 1.0 2.0 3.0 3.0 4.0 nan
"""
return self.reduce(
duck_array_ops.cumsum,
@@ -3783,7 +3783,7 @@ def cumprod(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -3798,14 +3798,14 @@ def cumprod(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.groupby("labels").cumprod()
Dimensions: (time: 6)
Dimensions without coordinates: time
Data variables:
- da (time) float64 1.0 2.0 3.0 3.0 4.0 1.0
+ da (time) float64 1.0 2.0 3.0 0.0 4.0 1.0
Use ``skipna`` to control whether NaNs are ignored.
@@ -3814,7 +3814,7 @@ def cumprod(
Dimensions: (time: 6)
Dimensions without coordinates: time
Data variables:
- da (time) float64 1.0 2.0 3.0 3.0 4.0 nan
+ da (time) float64 1.0 2.0 3.0 0.0 4.0 nan
"""
return self.reduce(
duck_array_ops.cumprod,
@@ -3881,8 +3881,8 @@ def count(
See Also
--------
- numpy.count
- dask.array.count
+ pandas.DataFrame.count
+ dask.dataframe.DataFrame.count
Dataset.count
:ref:`resampling`
User guide on resampling operations.
@@ -3899,7 +3899,7 @@ def count(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -3914,7 +3914,7 @@ def count(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.resample(time="3M").count()
@@ -3927,7 +3927,7 @@ def count(
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="count",
@@ -4025,7 +4025,7 @@ def all(
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="all",
@@ -4123,7 +4123,7 @@ def any(
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="any",
@@ -4199,7 +4199,7 @@ def max(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -4214,7 +4214,7 @@ def max(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.resample(time="3M").max()
@@ -4237,7 +4237,7 @@ def max(
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="max",
@@ -4315,7 +4315,7 @@ def min(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -4330,7 +4330,7 @@ def min(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.resample(time="3M").min()
@@ -4338,7 +4338,7 @@ def min(
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
Data variables:
- da (time) float64 1.0 1.0 2.0
+ da (time) float64 1.0 0.0 2.0
Use ``skipna`` to control whether NaNs are ignored.
@@ -4348,12 +4348,12 @@ def min(
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
Data variables:
- da (time) float64 1.0 1.0 nan
+ da (time) float64 1.0 0.0 nan
"""
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="min",
@@ -4433,7 +4433,7 @@ def mean(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -4448,7 +4448,7 @@ def mean(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.resample(time="3M").mean()
@@ -4456,7 +4456,7 @@ def mean(
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
Data variables:
- da (time) float64 1.0 2.0 2.0
+ da (time) float64 1.0 1.667 2.0
Use ``skipna`` to control whether NaNs are ignored.
@@ -4466,12 +4466,12 @@ def mean(
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
Data variables:
- da (time) float64 1.0 2.0 nan
+ da (time) float64 1.0 1.667 nan
"""
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="mean",
@@ -4558,7 +4558,7 @@ def prod(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -4573,7 +4573,7 @@ def prod(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.resample(time="3M").prod()
@@ -4581,7 +4581,7 @@ def prod(
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
Data variables:
- da (time) float64 1.0 6.0 2.0
+ da (time) float64 1.0 0.0 2.0
Use ``skipna`` to control whether NaNs are ignored.
@@ -4591,7 +4591,7 @@ def prod(
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
Data variables:
- da (time) float64 1.0 6.0 nan
+ da (time) float64 1.0 0.0 nan
Specify ``min_count`` for finer control over when NaNs are ignored.
@@ -4601,12 +4601,12 @@ def prod(
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
Data variables:
- da (time) float64 nan 6.0 nan
+ da (time) float64 nan 0.0 nan
"""
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="prod",
@@ -4695,7 +4695,7 @@ def sum(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -4710,7 +4710,7 @@ def sum(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.resample(time="3M").sum()
@@ -4718,7 +4718,7 @@ def sum(
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
Data variables:
- da (time) float64 1.0 6.0 2.0
+ da (time) float64 1.0 5.0 2.0
Use ``skipna`` to control whether NaNs are ignored.
@@ -4728,7 +4728,7 @@ def sum(
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
Data variables:
- da (time) float64 1.0 6.0 nan
+ da (time) float64 1.0 5.0 nan
Specify ``min_count`` for finer control over when NaNs are ignored.
@@ -4738,12 +4738,12 @@ def sum(
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
Data variables:
- da (time) float64 nan 6.0 nan
+ da (time) float64 nan 5.0 nan
"""
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="sum",
@@ -4829,7 +4829,7 @@ def std(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -4844,7 +4844,7 @@ def std(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.resample(time="3M").std()
@@ -4852,7 +4852,7 @@ def std(
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
Data variables:
- da (time) float64 0.0 0.8165 0.0
+ da (time) float64 0.0 1.247 0.0
Use ``skipna`` to control whether NaNs are ignored.
@@ -4862,7 +4862,7 @@ def std(
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
Data variables:
- da (time) float64 0.0 0.8165 nan
+ da (time) float64 0.0 1.247 nan
Specify ``ddof=1`` for an unbiased estimate.
@@ -4872,12 +4872,12 @@ def std(
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
Data variables:
- da (time) float64 nan 1.0 nan
+ da (time) float64 nan 1.528 nan
"""
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="std",
@@ -4963,7 +4963,7 @@ def var(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -4978,7 +4978,7 @@ def var(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.resample(time="3M").var()
@@ -4986,7 +4986,7 @@ def var(
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
Data variables:
- da (time) float64 0.0 0.6667 0.0
+ da (time) float64 0.0 1.556 0.0
Use ``skipna`` to control whether NaNs are ignored.
@@ -4996,7 +4996,7 @@ def var(
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
Data variables:
- da (time) float64 0.0 0.6667 nan
+ da (time) float64 0.0 1.556 nan
Specify ``ddof=1`` for an unbiased estimate.
@@ -5006,12 +5006,12 @@ def var(
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
Data variables:
- da (time) float64 nan 1.0 nan
+ da (time) float64 nan 2.333 nan
"""
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="var",
@@ -5093,7 +5093,7 @@ def median(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -5108,7 +5108,7 @@ def median(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.resample(time="3M").median()
@@ -5196,7 +5196,7 @@ def cumsum(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -5211,14 +5211,14 @@ def cumsum(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.resample(time="3M").cumsum()
Dimensions: (time: 6)
Dimensions without coordinates: time
Data variables:
- da (time) float64 1.0 2.0 5.0 6.0 2.0 2.0
+ da (time) float64 1.0 2.0 5.0 5.0 2.0 2.0
Use ``skipna`` to control whether NaNs are ignored.
@@ -5227,7 +5227,7 @@ def cumsum(
Dimensions: (time: 6)
Dimensions without coordinates: time
Data variables:
- da (time) float64 1.0 2.0 5.0 6.0 2.0 nan
+ da (time) float64 1.0 2.0 5.0 5.0 2.0 nan
"""
return self.reduce(
duck_array_ops.cumsum,
@@ -5297,7 +5297,7 @@ def cumprod(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -5312,14 +5312,14 @@ def cumprod(
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> ds.resample(time="3M").cumprod()
Dimensions: (time: 6)
Dimensions without coordinates: time
Data variables:
- da (time) float64 1.0 2.0 6.0 6.0 2.0 2.0
+ da (time) float64 1.0 2.0 6.0 0.0 2.0 2.0
Use ``skipna`` to control whether NaNs are ignored.
@@ -5328,7 +5328,7 @@ def cumprod(
Dimensions: (time: 6)
Dimensions without coordinates: time
Data variables:
- da (time) float64 1.0 2.0 6.0 6.0 2.0 nan
+ da (time) float64 1.0 2.0 6.0 0.0 2.0 nan
"""
return self.reduce(
duck_array_ops.cumprod,
@@ -5395,8 +5395,8 @@ def count(
See Also
--------
- numpy.count
- dask.array.count
+ pandas.DataFrame.count
+ dask.dataframe.DataFrame.count
DataArray.count
:ref:`groupby`
User guide on groupby operations.
@@ -5413,7 +5413,7 @@ def count(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -5422,7 +5422,7 @@ def count(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -5701,7 +5701,7 @@ def max(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -5808,14 +5808,14 @@ def min(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.groupby("labels").min()
- array([1., 2., 1.])
+ array([1., 2., 0.])
Coordinates:
* labels (labels) object 'a' 'b' 'c'
@@ -5823,14 +5823,14 @@ def min(
>>> da.groupby("labels").min(skipna=False)
- array([nan, 2., 1.])
+ array([nan, 2., 0.])
Coordinates:
* labels (labels) object 'a' 'b' 'c'
"""
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="min",
@@ -5908,7 +5908,7 @@ def mean(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -5917,14 +5917,14 @@ def mean(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.groupby("labels").mean()
- array([1., 2., 2.])
+ array([1. , 2. , 1.5])
Coordinates:
* labels (labels) object 'a' 'b' 'c'
@@ -5932,14 +5932,14 @@ def mean(
>>> da.groupby("labels").mean(skipna=False)
- array([nan, 2., 2.])
+ array([nan, 2. , 1.5])
Coordinates:
* labels (labels) object 'a' 'b' 'c'
"""
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="mean",
@@ -6024,7 +6024,7 @@ def prod(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -6033,14 +6033,14 @@ def prod(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.groupby("labels").prod()
- array([1., 4., 3.])
+ array([1., 4., 0.])
Coordinates:
* labels (labels) object 'a' 'b' 'c'
@@ -6048,7 +6048,7 @@ def prod(
>>> da.groupby("labels").prod(skipna=False)
- array([nan, 4., 3.])
+ array([nan, 4., 0.])
Coordinates:
* labels (labels) object 'a' 'b' 'c'
@@ -6056,14 +6056,14 @@ def prod(
>>> da.groupby("labels").prod(skipna=True, min_count=2)
- array([nan, 4., 3.])
+ array([nan, 4., 0.])
Coordinates:
* labels (labels) object 'a' 'b' 'c'
"""
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="prod",
@@ -6150,7 +6150,7 @@ def sum(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -6159,14 +6159,14 @@ def sum(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.groupby("labels").sum()
- array([1., 4., 4.])
+ array([1., 4., 3.])
Coordinates:
* labels (labels) object 'a' 'b' 'c'
@@ -6174,7 +6174,7 @@ def sum(
>>> da.groupby("labels").sum(skipna=False)
- array([nan, 4., 4.])
+ array([nan, 4., 3.])
Coordinates:
* labels (labels) object 'a' 'b' 'c'
@@ -6182,14 +6182,14 @@ def sum(
>>> da.groupby("labels").sum(skipna=True, min_count=2)
- array([nan, 4., 4.])
+ array([nan, 4., 3.])
Coordinates:
* labels (labels) object 'a' 'b' 'c'
"""
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="sum",
@@ -6273,7 +6273,7 @@ def std(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -6282,14 +6282,14 @@ def std(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.groupby("labels").std()
- array([0., 0., 1.])
+ array([0. , 0. , 1.5])
Coordinates:
* labels (labels) object 'a' 'b' 'c'
@@ -6297,7 +6297,7 @@ def std(
>>> da.groupby("labels").std(skipna=False)
- array([nan, 0., 1.])
+ array([nan, 0. , 1.5])
Coordinates:
* labels (labels) object 'a' 'b' 'c'
@@ -6305,14 +6305,14 @@ def std(
>>> da.groupby("labels").std(skipna=True, ddof=1)
- array([ nan, 0. , 1.41421356])
+ array([ nan, 0. , 2.12132034])
Coordinates:
* labels (labels) object 'a' 'b' 'c'
"""
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="std",
@@ -6396,7 +6396,7 @@ def var(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -6405,14 +6405,14 @@ def var(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.groupby("labels").var()
- array([0., 0., 1.])
+ array([0. , 0. , 2.25])
Coordinates:
* labels (labels) object 'a' 'b' 'c'
@@ -6420,7 +6420,7 @@ def var(
>>> da.groupby("labels").var(skipna=False)
- array([nan, 0., 1.])
+ array([ nan, 0. , 2.25])
Coordinates:
* labels (labels) object 'a' 'b' 'c'
@@ -6428,14 +6428,14 @@ def var(
>>> da.groupby("labels").var(skipna=True, ddof=1)
- array([nan, 0., 2.])
+ array([nan, 0. , 4.5])
Coordinates:
* labels (labels) object 'a' 'b' 'c'
"""
if (
flox_available
and OPTIONS["use_flox"]
- and contains_only_dask_or_numpy(self._obj)
+ and contains_only_chunked_or_numpy(self._obj)
):
return self._flox_reduce(
func="var",
@@ -6515,7 +6515,7 @@ def median(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -6524,14 +6524,14 @@ def median(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.groupby("labels").median()
- array([1., 2., 2.])
+ array([1. , 2. , 1.5])
Coordinates:
* labels (labels) object 'a' 'b' 'c'
@@ -6539,7 +6539,7 @@ def median(
>>> da.groupby("labels").median(skipna=False)
- array([nan, 2., 2.])
+ array([nan, 2. , 1.5])
Coordinates:
* labels (labels) object 'a' 'b' 'c'
"""
@@ -6610,7 +6610,7 @@ def cumsum(
Examples
--------
>>> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -6619,14 +6619,14 @@ def cumsum(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.groupby("labels").cumsum()
- array([1., 2., 3., 4., 4., 1.])
+ array([1., 2., 3., 3., 4., 1.])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.groupby("labels").cumsum(skipna=False)
- array([ 1., 2., 3., 4., 4., nan])
+ array([ 1., 2., 3., 3., 4., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -6716,14 +6716,14 @@ def cumprod(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.groupby("labels").cumprod()
- array([1., 2., 3., 3., 4., 1.])
+ array([1., 2., 3., 0., 4., 1.])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da.groupby("labels").cumprod(skipna=False)
- array([ 1., 2., 3., 3., 4., nan])
+ array([ 1., 2., 3., 0., 4., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -6828,7 +6828,7 @@ def count(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -7107,7 +7107,7 @@ def max(
... )
>>> da
- array([ 1., 2., 3., 1., 2., nan])
+ array([ 1., 2., 3., 0., 2., nan])
Coordinates:
* time (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-06-30
labels (time) >> da = xr.DataArray(
- ... np.array([1, 2, 3, 1, 2, np.nan]),
+ ... np.array([1, 2, 3, 0, 2, np.nan]),
... dims="time",
... coords=dict(
... time=("time", pd.date_range("2001-01-01", freq="M", periods=6)),
@@ -7214,14 +7214,14 @@ def min(
... )
>>> da