-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add GRIB field, handle and metadata cache (#428)
* Add grib handle and metadata cache --------- Co-authored-by: Iain Russell <40060766+iainrussell@users.noreply.github.com> Co-authored-by: Iain Russell <Iain.Russell@ecmwf.int>
- Loading branch information
1 parent
81d5d6b
commit 8060550
Showing
20 changed files
with
1,203 additions
and
156 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
.. _grib-memory: | ||
|
||
GRIB field memory management | ||
////////////////////////////// | ||
|
||
:ref:`grib` is a message-based binary format, where each message is regarded as a field. For reading GRIB, earthkit-data relies on :xref:`eccodes`, which, when loading a message into memory, represents it as a ``GRIB handle``. In the low level API, the GRIB handle is the object that holds the data and metadata of a GRIB field, therefore it can use up a significant amount of memory. | ||
|
||
Determining when a GRIB handle needs to be created and when it can be released is important for memory management. Earthkit-data provides several settings to control this behaviour depending on how we actually read the data. | ||
|
||
Reading GRIB data as a stream iterator | ||
======================================== | ||
|
||
We can read :ref:`grib` data as a :ref:`stream <streams>` iterator e.g. with the following code: | ||
|
||
.. code-block:: python | ||
import earthkit.data | ||
url = "https://get.ecmwf.int/repository/test-data/earthkit-data/examples/test6.grib" | ||
ds = earthkit.data.from_source("url", url, stream=True) | ||
for f in fields: | ||
print(f) | ||
Here, field ``f`` is not attached to a fieldlist and only exists in the scope of the iteration (in the for loop). During its existence the field keeps the GRIB handle in memory and if used in the way shown above, only one field can exist at a time. Once the stream is consumed there is no way to access the data again (unless we read it with :func:`from_source` again). | ||
|
||
Reading all GRIB data from a stream into memory | ||
=============================================== | ||
|
||
We can load :ref:`grib` data fully into memory when we read it as a :ref:`stream <streams>` with the ``read_all=True`` option in :func:`from_source`. | ||
|
||
.. code-block:: python | ||
import earthkit.data | ||
url = "https://get.ecmwf.int/repository/test-data/earthkit-data/examples/test6.grib" | ||
ds = earthkit.data.from_source("url", url, stream=True, read_all=True) | ||
With this, the entire ``ds`` fieldlist, including all the fields and the related GRIB handles, are stored in memory. | ||
|
||
Reading data from disk and managing its memory | ||
============================================== | ||
|
||
When reading :ref:`grib` data from disk as a :ref:`file source <data-sources-file>`, it is represented as a fieldlist and loaded lazily. After the (fast) initial scan for field offsets and lengths, no actual fields are created and no data is read into memory. When we start using the fieldlist, e.g. by iterating over the fields, accessing data or metadata etc., the fields will be created **on demand** and the related GRIB handles will be loaded from disk **when needed**. Whether this data or part of it stays in memory depends on the following :ref:`settings <settings>`: | ||
|
||
- :ref:`grib-field-policy <grib-field-policy>` | ||
- :ref:`grib-handle-policy <grib-handle-policy>` | ||
- :ref:`grib-handle-cache-size <grib-handle-cache-size>` | ||
- :ref:`use-grib-metadata-cache <use-grib-metadata-cache>` | ||
|
||
.. _grib-field-policy: | ||
|
||
grib-field-policy | ||
++++++++++++++++++++++++++++ | ||
|
||
Controls whether fields are kept in memory. The default is ``"persistent"``. The possible values are: | ||
|
||
- ``"persistent"``: fields are kept in memory until the fieldlist is deleted | ||
- ``"temporary"``: fields are deleted when they go out of scope and recreated on demand | ||
|
||
The actual memory used by a field depends on whether it owns the GRIB handle of the related GRIB message. This is controlled by the :ref:`grib-handle-policy <grib-handle-policy>` settings. | ||
|
||
A field can also cache its metadata access for performance, thus increasing memory usage. This is controlled by the :ref:`use-grib-metadata-cache <use-grib-metadata-cache>` settings. | ||
|
||
.. _grib-handle-policy: | ||
|
||
grib-handle-policy | ||
++++++++++++++++++++++++++++ | ||
|
||
Controls whether GRIB handles are kept in memory. The default is ``"cache"``. The possible values are: | ||
|
||
- ``"cache"``: a separate in-memory LRU cache is created for the GRIB handles in the fieldlist. The maximum number of GRIB handles kept in this cache is controlled by :ref:`grib-handle-cache-size <grib-handle-cache-size>`. In this mode, field objects are lightweight and only store the GRIB handle cache index, and can only access the GRIB handles via the cache. | ||
- ``"persistent"``: once a GRIB handle is created, a field keeps it in memory until the field is deleted | ||
- ``"temporary"``: for each call to data and metadata access on a field, a new GRIB handle is created and released once the access has finished. | ||
|
||
.. _grib-handle-cache-size: | ||
|
||
grib-handle-cache-size | ||
++++++++++++++++++++++++++++ | ||
|
||
When :ref:`grib-handle-policy <grib-handle-policy>` is ``"cache"``, the setting ``grib-handle-cache-size`` (default is ``1``) specifies the maximum number of GRIB handles kept in an in-memory cache per fieldlist. This is an LRU cache, so when it is full, the least recently used GRIB handle is removed and a new GRIB message is loaded from disk and added to the cache. | ||
|
||
.. _use-grib-metadata-cache: | ||
|
||
use-grib-metadata-cache | ||
+++++++++++++++++++++++++++++++++++ | ||
|
||
When ``use-grib-metadata-cache`` is ``True`` (this is the default) all the fields will cache their metadata access. This is an in-memory cache attached to the field and implemented for the low-level metadata accessor for individual keys. This cache can be useful when the same metadata keys are accessed multiple times for the same field. | ||
|
||
|
||
Overriding the settings | ||
++++++++++++++++++++++++++++ | ||
|
||
In addition to changing the :ref:`settings` themselves, it is possible to override the 4 parameters above when loading a given fieldlist by passing them as keyword arguments to :func:`from_source`. The parameter names are the same but the dashes are replaced by underscores. When a parameter is not specified in :func:`from_source` or is set to None, its value is taken from the actual :ref:`settings`. E.g.: | ||
|
||
.. code-block:: python | ||
import earthkit.data | ||
ds = earthkit.data.from_source( | ||
"file", | ||
"test6.grib", | ||
grib_field_policy="persistent", | ||
grib_handle_policy="temporary", | ||
grib_handle_cache_size=0, | ||
use_grib_metadata_cache=True, | ||
) | ||
Reading data from disk as a stream | ||
++++++++++++++++++++++++++++++++++ | ||
|
||
Whilst the usual way of reading GRIB data from disk loads fields lazily (i.e. only when they are actually used), it is also possible to read all | ||
fields up-front and keep them in memory by reading it as a :ref:`stream source <data-sources-stream>` with the ``read_all=True`` option. | ||
|
||
.. code-block:: python | ||
import earthkit.data | ||
f = open("test6.grib", "rb") | ||
ds = earthkit.data.from_source("stream", f, read_all=True) | ||
.. warning:: | ||
|
||
Use this option carefully since your data might not fit into memory. | ||
|
||
|
||
|
||
.. note:: | ||
The default settings are chosen to keep the memory usage low and the performance high. However, depending on the use case, the settings can be adjusted to optimize the memory | ||
usage and performance. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -40,6 +40,7 @@ dependencies = [ | |
"filelock", | ||
"jinja2", | ||
"jsonschema", | ||
"lru-dict", | ||
"markdown", | ||
"multiurl", | ||
"netcdf4", | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.