docs: added a usage-based ratelimiting section in the docs #260

melsal13 · 2025-01-31T21:12:58Z

Commit Message
Added a usage-based ratelimiting section in the docs. Making it easier for people to understand how to configure rate-limiting.

Related Issues/PRs (if applicable)
Fixes #242

Signed-off-by: melsal13 <mmvsal13@gmail.com>

docs: moved required tools section to top of prerequisites section

Signed-off-by: melsal13 <mmvsal13@gmail.com>

melsal13 · 2025-01-31T21:13:42Z

@mathetake Could you take a look to verify that the documentation is correct? Thanks!

mathetake

Thank you!!! Assigning @missBerg for the rest, overall looks correct to me

site/docs/capabilities/usage-based-ratelimiting.md

mathetake · 2025-01-31T21:17:26Z

site/docs/capabilities/usage-based-ratelimiting.md

+            request:
+              from: Number
+              number: 0


I would like an explanation of what zero means here if i were new to this project

Is this request section required ?

Good q @arkodg

Yeah cost of a request is 0 ( by default it's 1 i.e. every request costs 1 count towards total limit ) , and cost of a response is Y.

But the check of whether the total count has reached the limit or not happens during a request

I meant, the default of 1 must not be changed for legacy API use when the top level cost field is not given but when this field is given then we can technically default to zero right? There’s no backward compatibility concern

Maybe i should’ve done it before v1.3 release…

for now we can have this, and at least people won't have things broken when we get it better ;)

mathetake · 2025-01-31T21:19:11Z

site/docs/capabilities/usage-based-ratelimiting.md

+## Understanding the Configuration
+
+### Rate Limit Rules


this feels like a documentation for Envoy Gateway as per the comment above. Ack this might be helpful but also i would like to avoid the duplicate effort with Envoy Gateway project (not here). defer to @missBerg for the decision.

I think keeping some of it in for now is good, let's add a link to EG docs for people to dive into more details @melsal13

yuzisun · 2025-02-01T03:12:20Z

site/docs/capabilities/usage-based-ratelimiting.md

+- Total tokens
+
+This is particularly useful for:
+- Controlling costs per user


The cost needs be controlled at the model level, commonly a combination of user and target model.

Yeah model header demonstration would be helpful yea

yeah let's add that example for combo of user and target model @melsal13

Hello @yuzisun @mathetake @missBerg

I added an example of a user and target model. Please let me know what you all think :)

site/docs/capabilities/usage-based-ratelimiting.md

Signed-off-by: melsal13 <mmvsal13@gmail.com>

merging upstream changes into update-docs branch with usage based rate limiting changes

Signed-off-by: melsal13 <mmvsal13@gmail.com>

merging origin into usage based docs

into docs-usagebased

Signed-off-by: melsal13 <mmvsal13@gmail.com>

missBerg

Just one tiny edit and we can merge 🙏 thanks for the work @melsal13

missBerg · 2025-02-07T04:46:24Z

site/docs/capabilities/usage-based-ratelimiting.md

+   - Custom limits using CEL expressions
+
+:::note
+The token counts are extracted from the model's response. Make sure your model backend provides token usage information in a format compatible with the OpenAI schema.


Since the AI GW transforms for example AWS bedrock responses to OpenAI schema Envoy AI GW can also capture usage from model providers that have transformation of requests and responses into the open a i schema. Update this to be clear that thanks to the request/response transformer into a unified API based on openAI schema we can capture usage in a unified way 😊 @melsal13

Signed-off-by: melsal13 <mmvsal13@gmail.com>

missBerg

Thanks! Looks good. We can always add more info later :)

**Commit Message** The model name is extracted by AI Gateway filter, not the one explicitly added by downstream clients. **Related Issues/PRs (if applicable)** Follow up on #260 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

**Commit Message** The model name is extracted by AI Gateway filter, not the one explicitly added by downstream clients. **Related Issues/PRs (if applicable)** Follow up on envoyproxy#260 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

…y#260) **Commit Message** Added a usage-based ratelimiting section in the docs. Making it easier for people to understand how to configure rate-limiting. **Related Issues/PRs (if applicable)** Fixes envoyproxy#242 --------- Signed-off-by: melsal13 <mmvsal13@gmail.com> Signed-off-by: Loong <long0dai@foxmail.com>

**Commit Message** The model name is extracted by AI Gateway filter, not the one explicitly added by downstream clients. **Related Issues/PRs (if applicable)** Follow up on envoyproxy#260 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> Signed-off-by: Loong <long0dai@foxmail.com>

melsal13 added 3 commits January 29, 2025 12:43

updated order of prerequisite docs to ensure neccessary tools installed

e0656a9

Signed-off-by: melsal13 <mmvsal13@gmail.com>

Merge remote-tracking branch 'origin/main' into update-docs

0d00b85

docs: moved required tools section to top of prerequisites section

added a usage-based ratelimiting section in the docs

de039a2

Signed-off-by: melsal13 <mmvsal13@gmail.com>

melsal13 requested a review from a team as a code owner January 31, 2025 21:12

mathetake reviewed Jan 31, 2025

View reviewed changes

mathetake assigned missBerg Jan 31, 2025

yuzisun reviewed Feb 1, 2025

View reviewed changes

site/docs/capabilities/usage-based-ratelimiting.md Show resolved Hide resolved

melsal13 added 9 commits February 5, 2025 08:43

added a usage-based ratelimiting section in the docs

2f3baa8

Signed-off-by: melsal13 <mmvsal13@gmail.com>

made usage based rate limiting docs more ai-gateway specific

49fc8a1

Signed-off-by: melsal13 <mmvsal13@gmail.com>

Merge remote-tracking branch 'origin/main' into update-docs

6d4a9f3

merging upstream changes into update-docs branch with usage based rate limiting changes

demonstrates combo of model and user rate limiting

4aa9144

Signed-off-by: melsal13 <mmvsal13@gmail.com>

first draft of usage based rate limiting

930b381

Signed-off-by: melsal13 <mmvsal13@gmail.com>

2nd usage based rate limiting docs with model headers

d5b4dcc

Signed-off-by: melsal13 <mmvsal13@gmail.com>

Merge remote-tracking branch 'origin/main' into docs-usagebased

9190bc3

merging origin into usage based docs

Merge branch 'docs-usagebased' of https://github.com/melsal13/ai-gateway

04075cc

into docs-usagebased

fixed white space error

578bed4

Signed-off-by: melsal13 <mmvsal13@gmail.com>

melsal13 requested review from yuzisun, arkodg, mathetake and missBerg February 6, 2025 01:30

missBerg requested changes Feb 7, 2025

View reviewed changes

melsal13 added 2 commits February 6, 2025 22:17

changed note to clarify note on unifies openai schema

a602571

Signed-off-by: melsal13 <mmvsal13@gmail.com>

Merge branch 'main' into docs-usagebased

62bb705

melsal13 requested a review from missBerg February 7, 2025 06:22

missBerg approved these changes Feb 8, 2025

View reviewed changes

missBerg enabled auto-merge (squash) February 8, 2025 02:35

Merge branch 'main' into docs-usagebased

eab1f70

missBerg merged commit 2ddeb70 into envoyproxy:main Feb 8, 2025
8 checks passed

mathetake mentioned this pull request Feb 10, 2025

docs: tweaks rate limit doc on model #318

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: added a usage-based ratelimiting section in the docs #260

docs: added a usage-based ratelimiting section in the docs #260

melsal13 commented Jan 31, 2025 •

edited by mathetake

Loading

melsal13 commented Jan 31, 2025

mathetake left a comment

mathetake Jan 31, 2025

yuzisun Feb 1, 2025 •

edited

Loading

mathetake Feb 1, 2025

arkodg Feb 1, 2025

mathetake Feb 1, 2025

mathetake Feb 1, 2025

missBerg Feb 1, 2025

mathetake Jan 31, 2025

missBerg Jan 31, 2025

yuzisun Feb 1, 2025

mathetake Feb 1, 2025

missBerg Feb 1, 2025

melsal13 Feb 6, 2025

missBerg left a comment

missBerg Feb 7, 2025

missBerg left a comment

docs: added a usage-based ratelimiting section in the docs #260

docs: added a usage-based ratelimiting section in the docs #260

Conversation

melsal13 commented Jan 31, 2025 • edited by mathetake Loading

melsal13 commented Jan 31, 2025

mathetake left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuzisun Feb 1, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

missBerg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

missBerg left a comment

Choose a reason for hiding this comment

melsal13 commented Jan 31, 2025 •

edited by mathetake

Loading

yuzisun Feb 1, 2025 •

edited

Loading