Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: added a usage-based ratelimiting section in the docs #260

Merged
merged 15 commits into from
Feb 8, 2025

Conversation

melsal13
Copy link
Contributor

@melsal13 melsal13 commented Jan 31, 2025

Commit Message
Added a usage-based ratelimiting section in the docs. Making it easier for people to understand how to configure rate-limiting.

Related Issues/PRs (if applicable)
Fixes #242

docs: moved required tools section to top of prerequisites section
Signed-off-by: melsal13 <mmvsal13@gmail.com>
@melsal13 melsal13 requested a review from a team as a code owner January 31, 2025 21:12
@melsal13
Copy link
Contributor Author

@mathetake Could you take a look to verify that the documentation is correct? Thanks!

Copy link
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!!! Assigning @missBerg for the rest, overall looks correct to me

Comment on lines 70 to 72
request:
from: Number
number: 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like an explanation of what zero means here if i were new to this project

Copy link
Contributor

@yuzisun yuzisun Feb 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this request section required ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good q @arkodg

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah cost of a request is 0 ( by default it's 1 i.e. every request costs 1 count towards total limit ) , and cost of a response is Y.

But the check of whether the total count has reached the limit or not happens during a request

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, the default of 1 must not be changed for legacy API use when the top level cost field is not given but when this field is given then we can technically default to zero right? There’s no backward compatibility concern

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe i should’ve done it before v1.3 release…

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now we can have this, and at least people won't have things broken when we get it better ;)

Comment on lines 116 to 118
## Understanding the Configuration

### Rate Limit Rules
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like a documentation for Envoy Gateway as per the comment above. Ack this might be helpful but also i would like to avoid the duplicate effort with Envoy Gateway project (not here). defer to @missBerg for the decision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping some of it in for now is good, let's add a link to EG docs for people to dive into more details @melsal13

- Total tokens

This is particularly useful for:
- Controlling costs per user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost needs be controlled at the model level, commonly a combination of user and target model.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah model header demonstration would be helpful yea

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah let's add that example for combo of user and target model @melsal13

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @yuzisun @mathetake @missBerg

I added an example of a user and target model. Please let me know what you all think :)

Signed-off-by: melsal13 <mmvsal13@gmail.com>
Signed-off-by: melsal13 <mmvsal13@gmail.com>
merging upstream changes into update-docs branch with usage based rate limiting changes
Signed-off-by: melsal13 <mmvsal13@gmail.com>
Signed-off-by: melsal13 <mmvsal13@gmail.com>
Signed-off-by: melsal13 <mmvsal13@gmail.com>
Signed-off-by: melsal13 <mmvsal13@gmail.com>
Copy link
Contributor

@missBerg missBerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one tiny edit and we can merge 🙏 thanks for the work @melsal13

- Custom limits using CEL expressions

:::note
The token counts are extracted from the model's response. Make sure your model backend provides token usage information in a format compatible with the OpenAI schema.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the AI GW transforms for example AWS bedrock responses to OpenAI schema Envoy AI GW can also capture usage from model providers that have transformation of requests and responses into the open a i schema. Update this to be clear that thanks to the request/response transformer into a unified API based on openAI schema we can capture usage in a unified way 😊 @melsal13

@melsal13 melsal13 requested a review from missBerg February 7, 2025 06:22
Copy link
Contributor

@missBerg missBerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good. We can always add more info later :)

@missBerg missBerg enabled auto-merge (squash) February 8, 2025 02:35
@missBerg missBerg merged commit 2ddeb70 into envoyproxy:main Feb 8, 2025
8 checks passed
mathetake added a commit that referenced this pull request Feb 11, 2025
**Commit Message**

The model name is extracted by AI Gateway filter, not the one explicitly
added by downstream clients.


**Related Issues/PRs (if applicable)**

Follow up on #260

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
daixiang0 pushed a commit to daixiang0/ai-gateway that referenced this pull request Feb 12, 2025
**Commit Message**

The model name is extracted by AI Gateway filter, not the one explicitly
added by downstream clients.


**Related Issues/PRs (if applicable)**

Follow up on envoyproxy#260

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
daixiang0 pushed a commit to daixiang0/ai-gateway that referenced this pull request Feb 12, 2025
**Commit Message**

The model name is extracted by AI Gateway filter, not the one explicitly
added by downstream clients.


**Related Issues/PRs (if applicable)**

Follow up on envoyproxy#260

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
daixiang0 pushed a commit to daixiang0/ai-gateway that referenced this pull request Feb 19, 2025
…y#260)

**Commit Message**
Added a usage-based ratelimiting section in the docs. Making it easier
for people to understand how to configure rate-limiting.

**Related Issues/PRs (if applicable)**
Fixes envoyproxy#242

---------

Signed-off-by: melsal13 <mmvsal13@gmail.com>
Signed-off-by: Loong <long0dai@foxmail.com>
daixiang0 pushed a commit to daixiang0/ai-gateway that referenced this pull request Feb 19, 2025
**Commit Message**

The model name is extracted by AI Gateway filter, not the one explicitly
added by downstream clients.


**Related Issues/PRs (if applicable)**

Follow up on envoyproxy#260

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Loong <long0dai@foxmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

docs: Configure Usage-based Ratelimiting
5 participants