[Enhancement] KV Caching for inference speed #110

soran-ghaderi · 2024-01-15T00:54:45Z

Is your feature request related to a problem? Please describe.

caching the key and value matrices during the self-attention mechanism to reduce computational complexity and improve inference speed

Describe the solution you'd like

This caching mechanism reduces the need to recompute the full key and value matrices in every iteration of the decoding process, leading to faster inference.

soran-ghaderi · 2024-01-15T00:55:27Z

Useful links:
https://www.dipkumar.dev/becoming-the-unbeatable/posts/gpt-kvcache/

soran-ghaderi added enhancement New feature or request help wanted Extra attention is needed tests Related to tests tensorflow Related to Tensorflow labels Jan 15, 2024

soran-ghaderi self-assigned this Jan 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] KV Caching for inference speed #110

[Enhancement] KV Caching for inference speed #110

soran-ghaderi commented Jan 15, 2024

soran-ghaderi commented Jan 15, 2024

[Enhancement] KV Caching for inference speed #110

[Enhancement] KV Caching for inference speed #110

Comments

soran-ghaderi commented Jan 15, 2024

soran-ghaderi commented Jan 15, 2024