Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group-by without ordering on aggregate #11706

Open
Tracked by #11924
Jackie-Jiang opened this issue Sep 28, 2023 · 1 comment
Open
Tracked by #11924

Group-by without ordering on aggregate #11706

Jackie-Jiang opened this issue Sep 28, 2023 · 1 comment

Comments

@Jackie-Jiang
Copy link
Contributor

When a group-by query does not have order-by on the aggregate column, we don't need to keep more groups than the LIMIT because the order-by value won't change. We can maintain a heap (PriorityQueue) of LIMIT values. On the group-key generation side, we should also keep only the relevant keys.

One common query is:
SELECT COUNT(*) FROM myTable GROUP BY timeCol ORDER BY timeCol DESC LIMIT 10

Problems to solve:

  1. Group-by query with order-by on the key column:
    Currently we keep Math.max(5000, LIMIT * 5) groups, which is not necessary since only the top LIMIT groups are relevant

  2. Group-by query without order-by:
    Currently we keep random LIMIT groups per server, and there is no guarantee the same group is picked across different servers, which can lead to wrong result when there are more than LIMIT groups

Solution:

  1. To ensure the ordering is deterministic (we need this guarantee to ensure the groups returned from all servers are the same), we should append all non-ordering group keys implicitly. This is one exception: when we want to keep all groups on the server, we don't need this since all groups will be returned anyway.
  2. Optimize the execution when all the ordering keys are group key
@anandheritage
Copy link
Contributor

@Jackie-Jiang If no one have picked this up - I can pick it up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants