-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add all methods to vertex API #192
Conversation
We should not make it different from what people would get on GKE? I think we should rather make it configurable what "type" is used via env, e.g. INFERENCE_TYPE="embed", "rank",.... We will have the same question for sagemaker too. |
Then how do you use the multiple routes? You just decide a start time what's the route you want to serve? I think that's an inferior solution in every aspect. |
|
@OlivierDehaene approved and looks good but deferring to @philschmid on the best interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Not my fault all the clouds can freaking figure out that oh surprise you may want to have more than a single route on your service... |
You deploy 1 model, so you define 1 task, similar to other models, like The single route is something all Cloud ML Services currently have with Vertex and SageMaker. |
Except that's not the case. Some users use both
Is it not already the case? Does GKE use the /vertex route? If not they don't have the
Yeah and that's a poor design decision that everybody now need to live with. |
No GKE can use all existing stuff. There is no |
What? #183 (review)
Which one is it then? |
So if I understood you correctly:
So WHATEVER we do we will have different APIs between the two. |
Yes
Yes
I would not do this since it would be different from what we did in TGI. Even if they have a different payload its the "same" just put into |
c2c984c
to
339dc4a
Compare
@drbh what do you think?
@philschmid FYI this will modify the API to: