Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add codec pipeline strategy #2824

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dcherian
Copy link
Contributor

I could use input on whether we can generate filters and compressors independently of each other or whether they must be consistent with each other. Currently, I am assuming the latter and the codecs strategy creates both filters and compressors

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.rst
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@dcherian dcherian changed the title Add compressor, codec pipeline strategy Add codec pipeline strategy Feb 13, 2025
@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Feb 13, 2025
@dcherian dcherian force-pushed the compressor-filter-strategies branch from 7cc1bc2 to 68ad945 Compare February 13, 2025 15:48
Comment on lines +105 to +127
zarr_codecs = st.one_of(
st.builds(zcodecs.ZstdCodec),
st.builds(
zcodecs.BloscCodec,
shuffle=st.builds(
zcodecs.BloscShuffle.from_int, num=st.integers(min_value=0, max_value=2)
),
),
st.builds(zcodecs.GzipCodec),
st.builds(zcodecs.Crc32cCodec),
)
num_codecs_v2 = st.one_of(
st.builds(numcodecs.Zlib),
st.builds(numcodecs.LZMA),
st.builds(numcodecs.Zstd),
st.builds(numcodecs.Zlib),
)
num_codecs_v3 = st.one_of(
st.builds(ncodecs.Blosc),
st.builds(ncodecs.LZMA),
# st.builds(ncodecs.PCodec),
# st.builds(ncodecs.ZFPY),
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This search space is quite large. How can we be strategic here?

I think we should test the codec pipeline separately with many different array types. For the arrays strategy, we could restrict ourselves to a much more limited choice (e.g. choose one among [a codec from numcodecs, a codec from zarr codecs, or None])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs release notes Automatically applied to PRs which haven't added release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant