-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement LocalCache (1) #54
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #54 +/- ##
==========================================
- Coverage 83.42% 82.34% -1.09%
==========================================
Files 28 28
Lines 1907 2101 +194
==========================================
+ Hits 1591 1730 +139
- Misses 316 371 +55 ☔ View full report in Codecov by Sentry. |
Changes made: Implemented a Remove dependency on remfile and instead use an internal LindiRemfile. This is necessary so we can use the same sqlite local cache method as the rest of the package (remfile uses a different system). Recall that when EXTERNAL_ARRAY_LINK is used, an h5py client is used to load dataset chunks (this happens when the number of chunks is too large to include in the .lindi.json file). In this case, the lindi.LindiRemfile() is used, and the optional local_cache object is passed in to lindi.LindiRemfile(). There are some other minor differences between Remfile and LindiRemfile as well. Replace Add a convenience function |
@@ -79,8 +96,8 @@ import lindi | |||
# URL of the remote .nwb.lindi.json file | |||
url = 'https://kerchunk.neurosift.org/dandi/dandisets/000939/assets/11f512ba-5bcf-4230-a8cb-dc8d36db38cb/zarr.json' | |||
|
|||
# Load the h5py-like client for the reference file system | |||
client = lindi.LindiH5pyFile.from_reference_file_system(url) | |||
# Load the h5py-like client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Load the h5py-like client | |
# Load the h5py-like client with the URL of the remote LINDI file (or the path of a local LINDI file) |
How large can this cache grow? If it is unbounded, then I think it would be useful to set a configurable maximum size on the cache so that it does not accidentally swamp the filesystem, especially since it is stored in a hidden directory, and users could be working with very large datasets. |
@rly That's a good idea. How do you think we should manage the configuration? This shouldn't be a global setting since different scripts can utilize different cache directories. |
It could be a parameter on Caching in fsspec has a lot of different options, such as expiry time, compression, cache chunks or whole files: https://filesystem-spec.readthedocs.io/en/latest/_modules/fsspec/implementations/cached.html Since the |
The lindi.LocalCache does something very specific... it caches specific chunks of data of remote files... and that's it. I imagine fsspec's system has a much grander scope, but I haven't looked at it. |
fsspec's
|
Not sure if this is the best way to do it, but this is a candidate.
@rly