Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor core.py #1329

Closed
jstriebel opened this issue Jan 23, 2023 · 4 comments
Closed

Refactor core.py #1329

jstriebel opened this issue Jan 23, 2023 · 4 comments

Comments

@jstriebel
Copy link
Member

Atm there is quite some 🍝 code in core.py, especially around the different read and write cases (partial reads, decode, encode, …). It would be great to refactor this a bit, some ideas:

  • add a protocol for codecs to help with partial reads/writes
  • add high-level explanation of indexers and selection methods (probably can be refactored)
@rabernat
Copy link
Contributor

rabernat commented Feb 3, 2023

I agree that a refactoring is needed and would be eager to help.

@jstriebel
Copy link
Member Author

Trying to list different use-cases we might want to have in mind when refactoring the API:

Some questions come to my mind. They don't all need to be answered for the refactoring, but I assume they still might help:

  • What's the level of backwards-compatibility required? E.g. do third-party store implementations need to continue to work?
  • What's the expected (future) usage of zarr with other array libraries, and regarding the array API? This defines what features are expected, e.g. what kind of indexing should work out of the box, is Support for transpose and moveaxis #1256 in the zarr scope or not?
  • How important are in-memory stores except for tests?

(The refactoring should be no means solve any of those things above, rather make solving them easy if we think they might come up soon).

Also, we might want to wait for #1323 to be done before starting a larger refactoring.

What are the next steps? I'd propose to

  • propose APIs and sequence diagrams
  • discuss
  • consider backwards-compatibility
  • implement

cc @d-v-b @martindurant @joshmoore @jakirkham

@martindurant
Copy link
Member

Sorry I am slow to reply here. Some of your suggestions were the target of my "context" suggestion, which was proposed as a way of getting some of this without a refactor. Of course, that's not to say that a refactor might not be the best option in the long term! Also, I know that there already is a route to partial reads (but I don't know how to make it happen) and considerable work on sharding.

As a reminder, the contexts idea was to be able to pass a dict to the storage and codec reader methods, specifying additional information about what was requested (key name, part of the chunk requested, location in output, class of output array...). This could be a way to implement partial reads, sharding, custom array classes - again, in the current API. The calling function can check the signature of a codec/store to see if it accepts the dict, and any implementation can pick and choose what information to make use of, so you maintain backward compatibility.

It is interesting to see async in the list, though! Surfacing that through the current API, as opposed to my POC subclassing, would definitely require code changes at multiple levels.

@jstriebel
Copy link
Member Author

A large refactoring effort is on it's way, for the relevant discussions please see #1480 & #1569. Closing this issue as a duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants