Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] KV Caching for inference speed #110

Open
soran-ghaderi opened this issue Jan 15, 2024 · 1 comment
Open

[Enhancement] KV Caching for inference speed #110

soran-ghaderi opened this issue Jan 15, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed tensorflow Related to Tensorflow tests Related to tests

Comments

@soran-ghaderi
Copy link
Member

Is your feature request related to a problem? Please describe.

caching the key and value matrices during the self-attention mechanism to reduce computational complexity and improve inference speed

Describe the solution you'd like

This caching mechanism reduces the need to recompute the full key and value matrices in every iteration of the decoding process, leading to faster inference.

@soran-ghaderi soran-ghaderi added enhancement New feature or request help wanted Extra attention is needed tests Related to tests tensorflow Related to Tensorflow labels Jan 15, 2024
@soran-ghaderi soran-ghaderi self-assigned this Jan 15, 2024
@soran-ghaderi
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed tensorflow Related to Tensorflow tests Related to tests
Projects
None yet
Development

No branches or pull requests

1 participant