Skip to content

Refactor(PrefixCache): New load API, per-layer Tries, async ops & stats #269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yuyanpeng-google
Copy link
Collaborator

Add async to prevent device_get blocking on the critical paths waiting prefill result. Use per-layer tries to prevent load cache from DRAM when common length tie. Add statistic for debug and benchmark.

Copy link
Collaborator

@vipannalla vipannalla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, can you share benchmark results before and with this PR and what metrics did this improve?

@github-actions github-actions bot added the pull ready This label is needed if we want the copybara service to auto sync it to g3. label May 7, 2025
@yuyanpeng-google yuyanpeng-google force-pushed the yuyan-prefix-cache branch 2 times, most recently from fc0d025 to 8e22444 Compare May 8, 2025 09:49
Add async to prevent device_get blocking on the critical paths waiting prefill result.
Use per-layer tries to prevent load cache from DRAM when common length tie.
Add statistic for debug and benchmark.
@yuyanpeng-google
Copy link
Collaborator Author

Looks good, can you share benchmark results before and with this PR and what metrics did this improve?

There is no formally result before this PR. There is just some causal small experiments and found that the device_get would block at the critical path. After this PR there is a first version benchmark result in b/397854862

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pull ready This label is needed if we want the copybara service to auto sync it to g3.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants