-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds initial controller implementation #71
Conversation
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
60b9003
to
86ab274
Compare
the implementation is mechanical so nothing special here. I hope someone approves by the time when I wake up tomorrow then I can unblock the implementation part of #43 as well as the rate limit stuff |
@@ -22,8 +31,8 @@ import ( | |||
// | |||
// inputSchema: | |||
// schema: OpenAI | |||
// backendRoutingHeaderKey: x-backend-name | |||
// modelNameHeaderKey: x-model-name | |||
// selectedBackendHeaderKey: x-envoy-ai-gateway-selected-backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if we want to call it selected-backend
, the backend may not be necessarily selected by gateway. User can still specify the backend in the request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, the backend will be "calculated" and "selected" following the matching rule of LLMRoute, so the word "selected" makes sense regardless of who actually "select".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the internal use only, and not user facing, so it should be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not true, you might not be able to select the backend just based on the model.. For example there is case where the same anthropic models are supported by both google and aws. In this case user needs to set the backend header to determine where to route to.
see https://aws.amazon.com/bedrock/claude/ and https://cloud.google.com/solutions/anthropic?hl=en.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait a second, this is nothing to do with model...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say you havve
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: LLMRoute
metadata:
name: some-route
namespace: default
spec:
inputSchema:
schema: OpenAI
rules:
- matches:
- headers:
- type: Exact
name: some-random-header-that-can-be-sent-directly-by-clients
value: foo
backendRefs:
- name: some-backend-name
......
and clients send curl -H "some-random-header-that-can-be-sent-directly-by-clients: foo"
then internally extproc sets the header $selectedBackendHeaderKey: some-backend-name
. That's what this does and this is the completely implementation detail. This package is not CRD but a configuration of extproc itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you are saying we are not providing the standard ai gateway backend routing header to user, they MUST define the matching rules on LLMRoute for each backend user creates. Is that correct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes exactly currently, though we can provide some "canonical-backend-choosing-header-key" defined in the api/v1alpha package and prioritize the header value when present, which effectively ignores LLMRoute.Rules if the header exists. This is another topic we should discuss in another issue/pr if you want to support that. This configuration key is complete implementation detail regardless of whether or not we do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will raise an issue tomorrow!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is what exactly I was thinking to create those rules for user, if this routing header exists then we ignore the rules on llm route. I think this can provide better user experience.
extprocconfig/extprocconfig.go
Outdated
// where the input of the external processor is in the OpenAI schema, the model name is populated in the header x-model-name, | ||
// The model name header `x-model-name` is used in the header matching to make the routing decision. **After** the routing decision is made, | ||
// the selected backend name is populated in the header `x-backend-name`. For example, when the model name is `llama3.3333`, | ||
// where the input of the external processor is in the OpenAI schema, the model name is populated in the header x-envoy-ai-gateway-llm-model, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since user can configure the model routing header name, we can say configured modelNameHeaderKey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, no, users do not choose the model routing header name at this point.
ai-gateway/api/v1alpha1/api.go
Line 206 in cb6b2e0
LLMModelHeaderKey = "x-envoy-ai-gateway-llm-model" |
It should be good to make this part of the LLMRoute resource, but the comment here matches what it is now
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
return fmt.Errorf("failed to get extension policy: %w", err) | ||
} | ||
pm := egv1a1.BufferedExtProcBodyProcessingMode | ||
port := gwapiv1.PortNumber(1063) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why specifically 1063? should this port be configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we can make this configurable later
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
This comes up in a thread in #71 with @yuzisun, and we might want to remove the LLM prefix. The rationale is that the current functionality is nothing to do with "LLM" but more about the general routing and authn/z with "AI providers" where the input is OpenAI format. On the other hand, there "will be" the LLM specific CRD such as the configurations for prompt guard, semantics caching etc. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
This commit adds the initial implementation of the resource
management controllers. As per the official recommendation of
controller-runtime, the controller interface is implemented per
CRD. Therefore, this adds two typical "controller" one for LLMRoute
and another for LLMBackend.
In addition, this implements "configuration sink" which is the sync
mechanism across multiple resource events via Go channel. It has
the global view over all AI Gateway resources and hence have the
full context required to create a final extproc configuration as well
as HTTPRoute. The following is the (very) simple diagram for the
relationship among k8s events, controllers and configSink.
k8s events --async--> {llmRouteCtrl, llmBackendCtrl, ...} --sinkEvent (sync)--> configSink
The followup PR will add the end to end tests.