Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add HTTP store support #61

Merged
merged 48 commits into from
Dec 17, 2024
Merged

feat: add HTTP store support #61

merged 48 commits into from
Dec 17, 2024

Conversation

LDeakin
Copy link
Collaborator

@LDeakin LDeakin commented Nov 27, 2024

Local benchmark

http-server -c-1 -s
import zarr
import timeit

import zarrs  # noqa: F401
zarr.config.set({"codec_pipeline.path": "zarrs.ZarrsCodecPipeline"}) #with/without

def read():
    URL = "http://127.0.0.1:8080/<DATASET>"
    arr = zarr.open(URL)
    print(arr[:].shape)

print(timeit.timeit(read, number=1))

Time in second with the same datasets as https://github.com/LDeakin/zarr_benchmarks, best of 3, hot disk cache

Name Default codec pipeline ZarrsCodecPipeline
Uncompressed 6.85 7.19
Compressed 3.92 2.79
Compressed + sharded 27.72 4.96

@LDeakin LDeakin mentioned this pull request Nov 29, 2024
5 tasks
@LDeakin LDeakin marked this pull request as ready for review December 6, 2024 23:57
@LDeakin
Copy link
Collaborator Author

LDeakin commented Dec 7, 2024

Windows aarch64 build failure: https://github.com/ilan-gold/zarrs-python/actions/runs/12209833011/job/34065169380
I've disabled it for now.

Related: rust-cross/cargo-xwin#76. It seems like more of an issue that needs to be fixed in ring, which is a deep dependency of opendal/reqwest etc.

@LDeakin LDeakin requested a review from ilan-gold December 7, 2024 04:48
Copy link
Owner

@ilan-gold ilan-gold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, mostly just little comments. I'd like to take another review though for sure once you're done with these, but looks very nice and clean. I just want to make sure I've got my head fully wrapped around it

@LDeakin LDeakin requested a review from flying-sheep December 10, 2024 21:20
@LDeakin
Copy link
Collaborator Author

LDeakin commented Dec 10, 2024

Please check that out and maybe you’ll get some for inspiration for service reuse (which probably lives in zarrs_opendal already) and type hierarchy.

zarrs_opendal just generalises over opendal operators, it doesn't help with creating them at all. I just did some refactoring (removing CodecPipelineStore) which I think makes all of this a bit cleaner.

Looks pretty good apart from your TODO. Medium-term I’m not a huge fan of using sync APIs for async use cases, so we should look into making everything async in order to get rid of the hacks.

Agreed! Here is my comment back for posterity:

This uses the AsyncToSyncStorageAdapter to use an async store in a sync context. Long term this may not be the way to go, but I don't think we should expend any effort at trying to do anything fancier for now.

@flying-sheep
Copy link
Collaborator

I had to take today off, I’ll do a real review tomorrow

Co-authored-by: Philipp A. <flying-sheep@web.de>
@flying-sheep
Copy link
Collaborator

flying-sheep commented Dec 12, 2024

OK, regarding the type hierarchy: I didn’t realize that the enum isn’t just a simple marker but contains actual data, since last time I used pyo3, it didn‘t support bridging complex enums to Python.

Jij-Inc/pyo3-stub-gen#119

So (if we actually exported StoreConfigType), the actual Python type hierarchy would be something like

class StoreConfig: ...
class FilesystemStoreConfig(StoreConfig): ...
class HttpStoreConfig(StoreConfig): ...

class StoreConfigType:
    class Filesystem(StoreConfigType, tuple[FilesystemStoreConfig]): ...
    class Http(StoreConfigType, tuple[HttpStoreConfig]): ...

which means that my request above wasn’t a good idea and should be reverted.

After this PR, we could maybe remove impl<'py> FromPyObject<'py> for StoreConfigType and follow Ilan’s idea of getting rid of the Raw tuple types, instead directly converting into our types. Then the type hierarchy would be

class FilesystemStoreConfig:
    root: str
class HttpStoreConfig:
    root: str

class StoreConfig:
    class Filesystem(StoreConfigType, tuple[FilesystemStoreConfig]): ...
    class Http(StoreConfigType, tuple[HttpStoreConfig]): ...

@LDeakin
Copy link
Collaborator Author

LDeakin commented Dec 13, 2024

Closing and reopening to clear this Windows aarch64 pending check. Rerunning CI doesn't do it.

@LDeakin LDeakin closed this Dec 13, 2024
@LDeakin LDeakin reopened this Dec 13, 2024
@LDeakin LDeakin enabled auto-merge (squash) December 13, 2024 20:31
@LDeakin
Copy link
Collaborator Author

LDeakin commented Dec 13, 2024

@flying-sheep, any idea why this PR expects windows aarch64, but #72 doesn't? Oh, I guess the ruleset only applies for PRs on main

Co-authored-by: Lachlan Deakin <ljdgit@gmail.com>
@flying-sheep
Copy link
Collaborator

Oh, I guess the ruleset only applies for PRs on main

Yup, seems like it. We can’t modify them. Maybe @ilan-gold can give us the ability to do so for now?

flying-sheep and others added 2 commits December 14, 2024 17:20
RemoteStore will be renamed to FsspecStore in the next beta / RC
@flying-sheep flying-sheep merged commit cca07fc into main Dec 17, 2024
17 checks passed
@flying-sheep flying-sheep deleted the ld/http_store branch December 17, 2024 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants