-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Begin Manifest Sharding #767
base: main
Are you sure you want to change the base?
Conversation
// 1. Should we record the "real" extent or the theoretical-max extent (as set in the config). | ||
// 2. | ||
// if not, we aggregate_extents | ||
// this is a choice, we could just always aggregate extents | ||
let chunks = aggregate_extents(&mut from, &mut to, chunks, |ci| &ci.coord); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one q here: Now that the extents are defined by the shard sizes, then this function is not needed? Unless we want to record "true" extents for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think actual extents is useful...
8630650
to
fd1c572
Compare
pub struct ManifestShards(Vec<ManifestExtents>); | ||
|
||
impl ManifestShards { | ||
pub fn default(ndim: usize) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this, but it is certainly tied to ndim
.
|
||
impl ManifestShards { | ||
pub fn default(ndim: usize) -> Self { | ||
Self(vec![ManifestExtents(repeat_n(0..u32::MAX, ndim).collect())]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I can do
Self(vec![ManifestExtents(repeat_n(0..u32::MAX, ndim).collect())]) | |
Self(vec![ManifestExtents(repeat_n(0.., ndim).collect())]) |
@@ -37,9 +33,77 @@ impl ManifestExtents { | |||
Self(v) | |||
} | |||
|
|||
pub fn contains(&self, coord: &[u32]) -> bool { | |||
self.iter().zip(coord.iter()).all(|(range, that)| range.contains(that)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to start checking on writes that indexes have the proper size for the metadata
// but only shard (0,0) and shard (1,1) are populated with data. | ||
// A coord located in (1, 0) should return Err | ||
// Since shard_range need not form a regular grid, we must iterate through and find the first result. | ||
// ManifestExtents in shard_range MUST NOT overlap with each other. How do we ensure this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should have a better datastructure to maintain this invariant.
let mut array_manifests = Vec::with_capacity(1); | ||
let mut array_manifests = Vec::with_capacity(shards.len()); | ||
// why peek? | ||
// this seems to handle multiple nodes, but we only send one in? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because we need to gather chunks for each node (this handles the case of chunks coming from multiple arrays to the same manifest). So peek allows me to know if I should continue or if I can finalize the current node.
// 1. Should we record the "real" extent or the theoretical-max extent (as set in the config). | ||
// 2. | ||
// if not, we aggregate_extents | ||
// this is a choice, we could just always aggregate extents | ||
let chunks = aggregate_extents(&mut from, &mut to, chunks, |ci| &ci.coord); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think actual extents is useful...
No description provided.