diff --git a/website/docs/gen-ai/inference/img/nim-architecture.png b/website/docs/gen-ai/inference/img/nim-architecture.png deleted file mode 100644 index 4b65a235b..000000000 Binary files a/website/docs/gen-ai/inference/img/nim-architecture.png and /dev/null differ diff --git a/website/docs/gen-ai/inference/img/nim-on-eks-arch.png b/website/docs/gen-ai/inference/img/nim-on-eks-arch.png new file mode 100644 index 000000000..5a39573bd Binary files /dev/null and b/website/docs/gen-ai/inference/img/nim-on-eks-arch.png differ diff --git a/website/docs/gen-ai/inference/nvidia-nim-llama3.md b/website/docs/gen-ai/inference/nvidia-nim-llama3.md index 34b8b9eca..79d66335b 100644 --- a/website/docs/gen-ai/inference/nvidia-nim-llama3.md +++ b/website/docs/gen-ai/inference/nvidia-nim-llama3.md @@ -28,10 +28,6 @@ NIM abstracts away model inference internals such as execution engine and runtim NIMs are packaged as container images on a per model/model family basis. Each NIM container is with a model, such as `meta/llama3-8b-instruct`. These containers include a runtime that runs on any NVIDIA GPU with sufficient GPU memory, but some model/GPU combinations are optimized. NIM automatically downloads the model from NVIDIA NGC Catalog, leveraging a local filesystem cache if available. -![NIM Architecture](img/nim-architecture.png) - -Source: https://docs.nvidia.com/nim/large-language-models/latest/introduction.html#architecture - ## Overview of this deployment pattern on Amazon EKS This pattern combines the capabilities of NVIDIA NIM, Amazon Elastic Kubernetes Service (EKS), and various AWS services to deliver a high-performance and cost-optimized model serving infrastructure. @@ -48,6 +44,8 @@ This pattern combines the capabilities of NVIDIA NIM, Amazon Elastic Kubernetes By combining these components, our proposed solution delivers a powerful and cost-effective model serving infrastructure tailored for large language models. With NVIDIA NIM's seamless integration, Amazon EKS's scalability with Karpenter, customers can achieve high performance while minimizing infrastructure costs. +![NIM on EKS Architecture](img/nim-on-eks-arch.png) + ## Deploying the Solution ### Prerequisites