Skip to content

implement Iter::group_by #1998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

juliano
Copy link
Contributor

@juliano juliano commented Apr 24, 2025

Fixes #1201

Copy link

peter-jerry-ye-code-review bot commented Apr 24, 2025

Potentially inefficient array allocation in chunk_by

Category
Performance
Code Snippet
buffer = Array::new(capacity=16)
Recommendation
Consider using a dynamic capacity based on input size or previous chunk sizes
Reasoning
Using a fixed capacity of 16 for the buffer array might be inefficient for large chunks. If chunks are typically larger, this will cause multiple reallocations. If chunks are typically smaller, this wastes memory.

Nested closure complexity in chunk_by implementation

Category
Maintainability
Code Snippet
fn(yield_) {
let mut has_current = false
let mut current_key : K? = None
let mut buffer : Array[T] = []
fn emit_current_chunk() -> IterResult { ... }
Recommendation
Consider splitting the implementation into smaller functions or using a struct to hold state
Reasoning
The nested closure structure with mutable state makes the code harder to follow. Using a struct to encapsulate the chunking state would make the code more maintainable.

Potential memory leak in chunk_by error handling

Category
Correctness
Code Snippet
if emit_current_chunk() == IterEnd {
return IterEnd
}
Recommendation
Add cleanup code to ensure buffer is cleared when iteration ends early
Reasoning
If iteration ends early due to IterEnd, the remaining buffer contents might not be properly cleaned up. Consider adding a cleanup mechanism or documenting this behavior.

@coveralls
Copy link
Collaborator

coveralls commented Apr 24, 2025

Pull Request Test Coverage Report for Build 6456

Details

  • 12 of 14 (85.71%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 92.67%

Changes Missing Coverage Covered Lines Changed/Added Lines %
builtin/iter.mbt 12 14 85.71%
Totals Coverage Status
Change from base Build 6455: 0.0%
Covered Lines: 6132
Relevant Lines: 6617

💛 - Coveralls

assert_eq!(groups, [
(25, ["Alice", "Bob"]),
(30, ["Charlie"]),
(35, ["Dave"]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there's two 30? It's not expected from users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same principle as the test above, it's generating a new group every time the value of the key function changes.

Copy link
Collaborator

@peter-jerry-ye peter-jerry-ye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have doubt about the type signature. I'm not quite sure whether we should have Iter inside Iter. It looks kind of weird to me. Maybe Iter[(Key, Array[Value])] is enough. I'm also not certain with the semantics. Are we expecting group_by to be an iterator that merges the adjacent items or are we expecting it to group everything visited, in which case we should have Map[Key, Array[Value]] as result.

@hackwaly
Copy link
Contributor

I think Map[Key, Array[Value]] make sense

@juliano
Copy link
Contributor Author

juliano commented Apr 25, 2025

This one was my reference for the implementation, but I probably should've questioned beforehand :)

@peter-jerry-ye
Copy link
Collaborator

@juliano
Copy link
Contributor Author

juliano commented Apr 25, 2025

It was actually my initial thought, but after reading a bit about iterators, that approach made sense.

No problem though, will come up with the new version soon

@peter-jerry-ye
Copy link
Collaborator

peter-jerry-ye commented Apr 25, 2025

FTR

The original implementation may be renamed with a more proper name such as:

Though some uses group_by:

Also notice that

  • chunk are usually considered as split into containers that has the fixed size
  • partition are usually considered as splitting into 2

Copy link
Collaborator

@peter-jerry-ye peter-jerry-ye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better performance, we may need #1965

@juliano
Copy link
Contributor Author

juliano commented Apr 25, 2025

FTR

The original implementation may be renamed with a more proper name such as:

Though some uses group_by:

Also notice that

  • chunk are usually considered as split into containers that has the fixed size
  • partition are usually considered as splitting into 2

It seems to me that either partition_by and chunk_by are appropriate. Any preferences @peter-jerry-ye @hackwaly ?

@hackwaly
Copy link
Contributor

hackwaly commented Apr 25, 2025

FTR
The original implementation may be renamed with a more proper name such as:

Though some uses group_by:

Also notice that

  • chunk are usually considered as split into containers that has the fixed size
  • partition are usually considered as splitting into 2

It seems to me that either partition_by and chunk_by are appropriate. Any preferences @peter-jerry-ye @hackwaly ?

There's chunk_by in core. The previous implementation should be called chunk_by. LGTM for the new implementation.

@peter-jerry-ye
Copy link
Collaborator

Can we keep the group_by for this PR and have chunk_by in another PR? I still have some doubt about the signature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

provide Iter::group_by
4 participants