Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAQ question: "Can my dataset be virtualized?" #430

Open
TomNicholas opened this issue Feb 6, 2025 · 4 comments
Open

FAQ question: "Can my dataset be virtualized?" #430

TomNicholas opened this issue Feb 6, 2025 · 4 comments
Labels
documentation Improvements or additions to documentation

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Feb 6, 2025

The restrictions of regular-length chunks and consistent codecs are so important that we should add it to the FAQ. Lots of people keep asking me effectively: "can my dataset be Virtualized" and the FAQ should answer that question explicitly.

(Inspired by the conversation with ITS_LIVE cc @betolink @sharkinsspatial @abarciauskas-bgse )

@TomNicholas TomNicholas added the documentation Improvements or additions to documentation label Feb 6, 2025
@abarciauskas-bgse
Copy link
Collaborator

Great idea!

@maxrjones
Copy link
Member

I like the idea of a FAQ on this as well! This question was also an inspiration for developmentseed/ndquirk#1 recognizing that it can be hard to tell if there are consistent chunks (e.g., MUR SST) so it'd be awesome if people could drop a list of files into an API to enumerate any barriers to virtualization.

@TomNicholas
Copy link
Member Author

TomNicholas commented Feb 7, 2025

drop a list of files into an API to enumerate any barriers to virtualization.

Isn't that kind of what VirtualiZarr is though? Especially once open_virtual_mfdataset is merged. To check that files in object storage can be virtualized you need to do basically the same set of steps that VirtualiZarr does...

@maxrjones
Copy link
Member

drop a list of files into an API to enumerate any barriers to virtualization.

Isn't that kind of what VirtualiZarr is though? Especially once open_virtual_mfdataset is merged. To check that files in object storage can be virtualized you need to do basically the same set of steps that VirtualiZarr does...

Yeah there's overlap but I imagine virtualizarr will mostly be used for L3/4 data whereas I'd like for ndquirk to also identify quirkiness in L1/2 data. I haven't started on a design doc yet but an explicit sub-goal is to leverage existing libraries which would ideally include open_virtual_mfdataset. Though off-the-cuff it might be slower to actually construct virtual datasets vs. leaving off that last step due to loadable_variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants