AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
kubernetes ai k8s whisper autoscaler openai-api llm vllm faster-whisper ollama vllm-operator ollama-operator inference-operator
-
Updated
Feb 3, 2025 - Go