-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: added a usage-based ratelimiting section in the docs #260
Conversation
Signed-off-by: melsal13 <mmvsal13@gmail.com>
docs: moved required tools section to top of prerequisites section
Signed-off-by: melsal13 <mmvsal13@gmail.com>
@mathetake Could you take a look to verify that the documentation is correct? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!!! Assigning @missBerg for the rest, overall looks correct to me
request: | ||
from: Number | ||
number: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like an explanation of what zero means here if i were new to this project
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this request section required ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good q @arkodg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah cost of a request is 0 ( by default it's 1 i.e. every request costs 1 count towards total limit ) , and cost of a response is Y.
But the check of whether the total count has reached the limit or not happens during a request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant, the default of 1 must not be changed for legacy API use when the top level cost
field is not given but when this field is given then we can technically default to zero right? There’s no backward compatibility concern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe i should’ve done it before v1.3 release…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now we can have this, and at least people won't have things broken when we get it better ;)
## Understanding the Configuration | ||
|
||
### Rate Limit Rules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels like a documentation for Envoy Gateway as per the comment above. Ack this might be helpful but also i would like to avoid the duplicate effort with Envoy Gateway project (not here). defer to @missBerg for the decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think keeping some of it in for now is good, let's add a link to EG docs for people to dive into more details @melsal13
- Total tokens | ||
|
||
This is particularly useful for: | ||
- Controlling costs per user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cost needs be controlled at the model level, commonly a combination of user and target model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah model header demonstration would be helpful yea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah let's add that example for combo of user and target model @melsal13
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @yuzisun @mathetake @missBerg
I added an example of a user and target model. Please let me know what you all think :)
Signed-off-by: melsal13 <mmvsal13@gmail.com>
Signed-off-by: melsal13 <mmvsal13@gmail.com>
merging upstream changes into update-docs branch with usage based rate limiting changes
Signed-off-by: melsal13 <mmvsal13@gmail.com>
Signed-off-by: melsal13 <mmvsal13@gmail.com>
Signed-off-by: melsal13 <mmvsal13@gmail.com>
merging origin into usage based docs
Signed-off-by: melsal13 <mmvsal13@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one tiny edit and we can merge 🙏 thanks for the work @melsal13
- Custom limits using CEL expressions | ||
|
||
:::note | ||
The token counts are extracted from the model's response. Make sure your model backend provides token usage information in a format compatible with the OpenAI schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the AI GW transforms for example AWS bedrock responses to OpenAI schema Envoy AI GW can also capture usage from model providers that have transformation of requests and responses into the open a i schema. Update this to be clear that thanks to the request/response transformer into a unified API based on openAI schema we can capture usage in a unified way 😊 @melsal13
Signed-off-by: melsal13 <mmvsal13@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks good. We can always add more info later :)
**Commit Message** The model name is extracted by AI Gateway filter, not the one explicitly added by downstream clients. **Related Issues/PRs (if applicable)** Follow up on #260 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
**Commit Message** The model name is extracted by AI Gateway filter, not the one explicitly added by downstream clients. **Related Issues/PRs (if applicable)** Follow up on envoyproxy#260 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
**Commit Message** The model name is extracted by AI Gateway filter, not the one explicitly added by downstream clients. **Related Issues/PRs (if applicable)** Follow up on envoyproxy#260 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
…y#260) **Commit Message** Added a usage-based ratelimiting section in the docs. Making it easier for people to understand how to configure rate-limiting. **Related Issues/PRs (if applicable)** Fixes envoyproxy#242 --------- Signed-off-by: melsal13 <mmvsal13@gmail.com> Signed-off-by: Loong <long0dai@foxmail.com>
**Commit Message** The model name is extracted by AI Gateway filter, not the one explicitly added by downstream clients. **Related Issues/PRs (if applicable)** Follow up on envoyproxy#260 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> Signed-off-by: Loong <long0dai@foxmail.com>
Commit Message
Added a usage-based ratelimiting section in the docs. Making it easier for people to understand how to configure rate-limiting.
Related Issues/PRs (if applicable)
Fixes #242