Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/refactor streaming #371

Merged
merged 14 commits into from
May 15, 2024
Merged

Conversation

sandorkertesz
Copy link
Collaborator

@sandorkertesz sandorkertesz commented Apr 22, 2024

Fixes #364

This PR:

  • addsbatched() and group_by() for stream and fieldlist like objects
  • adds theread_all option for from_source() when stream=True
  • removes the batch_size and group_by options from from_source (This is a breaking change!)

Examples

ds = from_source("url", "http://..../my_data.grib", stream=True)

for f in ds:
     # f is a field

# at this point ds consumed the stream
ds1 = from_source("file", "my_local_data.grib")
ds2 = from_source("url", "http://..../my_data.grib", stream=True)

for f in ds1.batched(2):
     # f is now a Fieldlist with 2 Fields

for f in ds2.batched(2):
     # f is now a Fieldlist with 2 Fields
ds1 = from_source("file", "my_local_data.grib")
ds2 = from_source("url", "http://..../my_data.grib", stream=True)

for f in ds1.group_by("level"):
     # f is a Fieldlist

for f in ds2.group_by("level"):
     # f is a Fieldlist
ds = from_source("url", "http://..../my_data.grib", stream=True, read_all=True)

# ds is a Fieldlist in memory, so all these work
len(ds)
r = ds.sel(param="t")

for f in ds:
     # f isa Field

for f in ds.batched(2):
     # f is a Fieldlist with 2 Fields

Documentation

Notebooks

https://earthkit-data--371.org.readthedocs.build/en/371/examples/data_from_stream.html
https://earthkit-data--371.org.readthedocs.build/en/371/examples/url_stream.html
https://earthkit-data--371.org.readthedocs.build/en/371/examples/fdb.html

@sandorkertesz sandorkertesz marked this pull request as draft April 22, 2024 13:27
@sandorkertesz sandorkertesz changed the title WIP: Feature/refactor streaming Feature/refactor streaming Apr 24, 2024
@sandorkertesz sandorkertesz marked this pull request as ready for review April 24, 2024 15:50
Copy link
Member

@iainrussell iainrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from that suggestion (what do you think @sandorkertesz?), I think this looks really really nice - a good and more intuitive way of doing things.

earthkit/data/core/__init__.py Outdated Show resolved Hide resolved
@sandorkertesz sandorkertesz changed the title Feature/refactor streaming WIP: Feature/refactor streaming Apr 26, 2024
@sandorkertesz sandorkertesz changed the title WIP: Feature/refactor streaming Feature/refactor streaming Apr 27, 2024
@codecov-commenter
Copy link

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

@sandorkertesz sandorkertesz requested a review from tlmquintino May 3, 2024 12:25
Copy link
Member

@tlmquintino tlmquintino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for clearing up the interface.

@sandorkertesz sandorkertesz merged commit 08a404b into develop May 15, 2024
80 checks passed
@sandorkertesz sandorkertesz deleted the feature/refactor-streaming branch May 15, 2024 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor the stream API
4 participants