You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a group-by query does not have order-by on the aggregate column, we don't need to keep more groups than the LIMIT because the order-by value won't change. We can maintain a heap (PriorityQueue) of LIMIT values. On the group-key generation side, we should also keep only the relevant keys.
One common query is: SELECT COUNT(*) FROM myTable GROUP BY timeCol ORDER BY timeCol DESC LIMIT 10
Problems to solve:
Group-by query with order-by on the key column:
Currently we keep Math.max(5000, LIMIT * 5) groups, which is not necessary since only the top LIMIT groups are relevant
Group-by query without order-by:
Currently we keep random LIMIT groups per server, and there is no guarantee the same group is picked across different servers, which can lead to wrong result when there are more than LIMIT groups
Solution:
To ensure the ordering is deterministic (we need this guarantee to ensure the groups returned from all servers are the same), we should append all non-ordering group keys implicitly. This is one exception: when we want to keep all groups on the server, we don't need this since all groups will be returned anyway.
Optimize the execution when all the ordering keys are group key
The text was updated successfully, but these errors were encountered:
When a group-by query does not have order-by on the aggregate column, we don't need to keep more groups than the LIMIT because the order-by value won't change. We can maintain a heap (PriorityQueue) of LIMIT values. On the group-key generation side, we should also keep only the relevant keys.
One common query is:
SELECT COUNT(*) FROM myTable GROUP BY timeCol ORDER BY timeCol DESC LIMIT 10
Problems to solve:
Group-by query with order-by on the key column:
Currently we keep
Math.max(5000, LIMIT * 5)
groups, which is not necessary since only the topLIMIT
groups are relevantGroup-by query without order-by:
Currently we keep random
LIMIT
groups per server, and there is no guarantee the same group is picked across different servers, which can lead to wrong result when there are more thanLIMIT
groupsSolution:
The text was updated successfully, but these errors were encountered: