From bcee9f77949c48bc45da1c478b3f2cd419cfbf0a Mon Sep 17 00:00:00 2001 From: Takeshi Yoneda Date: Tue, 11 Feb 2025 07:31:19 -0800 Subject: [PATCH] docs: tweaks rate limit doc on model (#318) **Commit Message** The model name is extracted by AI Gateway filter, not the one explicitly added by downstream clients. **Related Issues/PRs (if applicable)** Follow up on #260 --------- Signed-off-by: Takeshi Yoneda Signed-off-by: Loong --- site/docs/capabilities/usage-based-ratelimiting.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/site/docs/capabilities/usage-based-ratelimiting.md b/site/docs/capabilities/usage-based-ratelimiting.md index 6b6fa9aef..bc0e0cb64 100644 --- a/site/docs/capabilities/usage-based-ratelimiting.md +++ b/site/docs/capabilities/usage-based-ratelimiting.md @@ -14,7 +14,7 @@ This guide focuses on AI Gateway's specific capabilities for token-based rate li AI Gateway leverages Envoy Gateway's Global Rate Limit API to provide token-based rate limiting for LLM requests. Key features include: - Token usage tracking based on model and user identifiers - Configuration for tracking input, output, and total token metadata from LLM responses -- Model-specific rate limiting using AI Gateway headers (`x-ai-eg-model`) +- Model-specific rate limiting using AI Gateway headers (`x-ai-eg-model`) which is inserted by the AI Gateway filter with the model name extracted from the request body. - Support for custom token cost calculations using CEL expressions ## Token Usage Behavior @@ -151,18 +151,17 @@ When configuring rate limits: For proper cost control and rate limiting, requests must include: - `x-user-id`: Identifies the user making the request -- `x-ai-eg-model`: Identifies the model being used Example request: ```shell curl --fail \ -H "Content-Type: application/json" \ -H "x-user-id: user123" \ - -H "x-ai-eg-model: gpt-4" \ # Both user ID and model are required -d '{ "messages": [ { "role": "user", + "model": "gpt-4", "content": "Hello!" } ]