diff --git a/README.md b/README.md index 728771ae..5e6214c5 100644 --- a/README.md +++ b/README.md @@ -30,22 +30,6 @@ This plugin is the recommended approach for supporting the Ascend backend within By using vLLM Ascend plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, Multi-modal LLMs can run seamlessly on the Ascend NPU. -## Prerequisites -### Support Devices -- Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2) -- Atlas 800I A2 Inference series (Atlas 800I A2) - -### Dependencies -| Requirement | Supported version | Recommended version | Note | -|-------------|-------------------| ----------- |------------------------------------------| -| vLLM | main | main | Required for vllm-ascend | -| Python | >= 3.9 | [3.10](https://www.python.org/downloads/) | Required for vllm | -| CANN | >= 8.0.RC2 | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | Required for vllm-ascend and torch-npu | -| torch-npu | >= 2.4.0 | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1) | Required for vllm-ascend | -| torch | >= 2.4.0 | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1) | Required for torch-npu and vllm | - -Find more about how to setup your environment in [here](docs/environment.md). - ## Getting Started > [!NOTE] @@ -73,72 +57,7 @@ Run the following command to start the vLLM server with the [Qwen/Qwen2.5-0.5B-I vllm serve Qwen/Qwen2.5-0.5B-Instruct curl http://localhost:8000/v1/models ``` - -Please refer to [vLLM Quickstart](https://docs.vllm.ai/en/latest/getting_started/quickstart.html) for more details. - -## Building - -#### Build Python package from source - -```bash -git clone https://github.com/vllm-project/vllm-ascend.git -cd vllm-ascend -pip install -e . -``` - -#### Build container image from source -```bash -git clone https://github.com/vllm-project/vllm-ascend.git -cd vllm-ascend -docker build -t vllm-ascend-dev-image -f ./Dockerfile . -``` - -See [Building and Testing](./CONTRIBUTING.md) for more details, which is a step-by-step guide to help you set up development environment, build and test. - -## Feature Support Matrix -| Feature | Supported | Note | -|---------|-----------|------| -| Chunked Prefill | ✗ | Plan in 2025 Q1 | -| Automatic Prefix Caching | ✅ | Imporve performance in 2025 Q1 | -| LoRA | ✗ | Plan in 2025 Q1 | -| Prompt adapter | ✅ || -| Speculative decoding | ✅ | Impore accuracy in 2025 Q1| -| Pooling | ✗ | Plan in 2025 Q1 | -| Enc-dec | ✗ | Plan in 2025 Q1 | -| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 | -| LogProbs | ✅ || -| Prompt logProbs | ✅ || -| Async output | ✅ || -| Multi step scheduler | ✅ || -| Best of | ✅ || -| Beam search | ✅ || -| Guided Decoding | ✗ | Plan in 2025 Q1 | - -## Model Support Matrix - -The list here is a subset of the supported models. See [supported_models](docs/supported_models.md) for more details: -| Model | Supported | Note | -|---------|-----------|------| -| Qwen 2.5 | ✅ || -| Mistral | | Need test | -| DeepSeek v2.5 | |Need test | -| LLama3.1/3.2 | ✅ || -| Gemma-2 | |Need test| -| baichuan | |Need test| -| minicpm | |Need test| -| internlm | ✅ || -| ChatGLM | ✅ || -| InternVL 2.5 | ✅ || -| Qwen2-VL | ✅ || -| GLM-4v | |Need test| -| Molomo | ✅ || -| LLaVA 1.5 | ✅ || -| Mllama | |Need test| -| LLaVA-Next | |Need test| -| LLaVA-Next-Video | |Need test| -| Phi-3-Vison/Phi-3.5-Vison | |Need test| -| Ultravox | |Need test| -| Qwen2-Audio | ✅ || +**Please refer to [Official Docs](./docs/index.md) for more details.** ## Contributing We welcome and value any contributions and collaborations: diff --git a/README.zh.md b/README.zh.md index c6fc9fb8..b649a504 100644 --- a/README.zh.md +++ b/README.zh.md @@ -30,22 +30,6 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个让vLLM在Ascend NPU无缝运行的 使用 vLLM 昇腾插件,可以让类Transformer、混合专家(MOE)、嵌入、多模态等流行的大语言模型在 Ascend NPU 上无缝运行。 -## 前提 -### 支持的设备 -- Atlas A2 训练系列 (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2) -- Atlas 800I A2 推理系列 (Atlas 800I A2) - -### 依赖 -| 需求 | 支持的版本 | 推荐版本 | 注意 | -|-------------|-------------------| ----------- |------------------------------------------| -| vLLM | main | main | vllm-ascend 依赖 | -| Python | >= 3.9 | [3.10](https://www.python.org/downloads/) | vllm 依赖 | -| CANN | >= 8.0.RC2 | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | vllm-ascend and torch-npu 依赖 | -| torch-npu | >= 2.4.0 | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1) | vllm-ascend 依赖 | -| torch | >= 2.4.0 | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1) | torch-npu and vllm 依赖 | - -在[此处](docs/environment.zh.md)了解更多如何配置您环境的信息。 - ## 开始使用 > [!NOTE] @@ -74,72 +58,7 @@ vllm serve Qwen/Qwen2.5-0.5B-Instruct curl http://localhost:8000/v1/models ``` -请参阅 [vLLM 快速入门](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)以获取更多详细信息。 - -## 构建 - -#### 从源码构建Python包 - -```bash -git clone https://github.com/vllm-project/vllm-ascend.git -cd vllm-ascend -pip install -e . -``` - -#### 构建容器镜像 -```bash -git clone https://github.com/vllm-project/vllm-ascend.git -cd vllm-ascend -docker build -t vllm-ascend-dev-image -f ./Dockerfile . -``` - -查看[构建和测试](./CONTRIBUTING.zh.md)以获取更多详细信息,其中包含逐步指南,帮助您设置开发环境、构建和测试。 - -## 特性支持矩阵 -| Feature | Supported | Note | -|---------|-----------|------| -| Chunked Prefill | ✗ | Plan in 2025 Q1 | -| Automatic Prefix Caching | ✅ | Imporve performance in 2025 Q1 | -| LoRA | ✗ | Plan in 2025 Q1 | -| Prompt adapter | ✅ || -| Speculative decoding | ✅ | Impore accuracy in 2025 Q1| -| Pooling | ✗ | Plan in 2025 Q1 | -| Enc-dec | ✗ | Plan in 2025 Q1 | -| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 | -| LogProbs | ✅ || -| Prompt logProbs | ✅ || -| Async output | ✅ || -| Multi step scheduler | ✅ || -| Best of | ✅ || -| Beam search | ✅ || -| Guided Decoding | ✗ | Plan in 2025 Q1 | - -## 模型支持矩阵 - -此处展示了部分受支持的模型。有关更多详细信息,请参阅 [supported_models](docs/supported_models.md): -| Model | Supported | Note | -|---------|-----------|------| -| Qwen 2.5 | ✅ || -| Mistral | | Need test | -| DeepSeek v2.5 | |Need test | -| LLama3.1/3.2 | ✅ || -| Gemma-2 | |Need test| -| baichuan | |Need test| -| minicpm | |Need test| -| internlm | ✅ || -| ChatGLM | ✅ || -| InternVL 2.5 | ✅ || -| Qwen2-VL | ✅ || -| GLM-4v | |Need test| -| Molomo | ✅ || -| LLaVA 1.5 | ✅ || -| Mllama | |Need test| -| LLaVA-Next | |Need test| -| LLaVA-Next-Video | |Need test| -| Phi-3-Vison/Phi-3.5-Vison | |Need test| -| Ultravox | |Need test| -| Qwen2-Audio | ✅ || - +**请参阅 [官方文档](./docs/index.md)以获取更多详细信息** ## 贡献 我们欢迎并重视任何形式的贡献与合作: diff --git a/docs/environment.zh.md b/docs/cn/environment.zh.md similarity index 100% rename from docs/environment.zh.md rename to docs/cn/environment.zh.md diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 00000000..b36640b6 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,11 @@ +# Ascend plugin for vLLM +This plugin allows you to use Ascend as a backend for vLLM. + +## Contents + +- [Quick Start](./quick_start.md) +- [Installation](./installation.md) +- Usage + - [Running vLLM with Ascend](./usage/running_vllm_with_ascend.md) + - [Feature Support](./usage/feature_support.md) + - [Supported Models](./usage/supported_models.md) diff --git a/docs/environment.md b/docs/installation.md similarity index 83% rename from docs/environment.md rename to docs/installation.md index 5dd70b29..b09c6d35 100644 --- a/docs/environment.md +++ b/docs/installation.md @@ -1,3 +1,23 @@ +# Installation + + +## Building + +#### Build Python package from source + +```bash +git clone https://github.com/vllm-project/vllm-ascend.git +cd vllm-ascend +pip install -e . +``` + +#### Build container image from source +```bash +git clone https://github.com/vllm-project/vllm-ascend.git +cd vllm-ascend +docker build -t vllm-ascend-dev-image -f ./Dockerfile . +``` + ### Prepare Ascend NPU environment ### Dependencies diff --git a/docs/quick_start.md b/docs/quick_start.md new file mode 100644 index 00000000..548eb5ac --- /dev/null +++ b/docs/quick_start.md @@ -0,0 +1,17 @@ +# Quick Start + +## Prerequisites +### Support Devices +- Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2) +- Atlas 800I A2 Inference series (Atlas 800I A2) + +### Dependencies +| Requirement | Supported version | Recommended version | Note | +|-------------|-------------------| ----------- |------------------------------------------| +| vLLM | main | main | Required for vllm-ascend | +| Python | >= 3.9 | [3.10](https://www.python.org/downloads/) | Required for vllm | +| CANN | >= 8.0.RC2 | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | Required for vllm-ascend and torch-npu | +| torch-npu | >= 2.4.0 | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1) | Required for vllm-ascend | +| torch | >= 2.4.0 | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1) | Required for torch-npu and vllm | + +Find more about how to setup your environment in [here](docs/environment.md). \ No newline at end of file diff --git a/docs/supported_models.md b/docs/supported_models.md deleted file mode 100644 index a0990367..00000000 --- a/docs/supported_models.md +++ /dev/null @@ -1 +0,0 @@ -TBD diff --git a/docs/usage/feature_support.md b/docs/usage/feature_support.md new file mode 100644 index 00000000..b2ea2bec --- /dev/null +++ b/docs/usage/feature_support.md @@ -0,0 +1,19 @@ +# Feature Support + +| Feature | Supported | Note | +|---------|-----------|------| +| Chunked Prefill | ✗ | Plan in 2025 Q1 | +| Automatic Prefix Caching | ✅ | Imporve performance in 2025 Q1 | +| LoRA | ✗ | Plan in 2025 Q1 | +| Prompt adapter | ✅ || +| Speculative decoding | ✅ | Impore accuracy in 2025 Q1| +| Pooling | ✗ | Plan in 2025 Q1 | +| Enc-dec | ✗ | Plan in 2025 Q1 | +| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 | +| LogProbs | ✅ || +| Prompt logProbs | ✅ || +| Async output | ✅ || +| Multi step scheduler | ✅ || +| Best of | ✅ || +| Beam search | ✅ || +| Guided Decoding | ✗ | Plan in 2025 Q1 | diff --git a/docs/usage/running_vllm_with_ascend.md b/docs/usage/running_vllm_with_ascend.md new file mode 100644 index 00000000..d86dc25a --- /dev/null +++ b/docs/usage/running_vllm_with_ascend.md @@ -0,0 +1 @@ +# Running VLLM with Ascend \ No newline at end of file diff --git a/docs/usage/supported_models.md b/docs/usage/supported_models.md new file mode 100644 index 00000000..edf3df6c --- /dev/null +++ b/docs/usage/supported_models.md @@ -0,0 +1,24 @@ +# Supported Models + +| Model | Supported | Note | +|---------|-----------|------| +| Qwen 2.5 | ✅ || +| Mistral | | Need test | +| DeepSeek v2.5 | |Need test | +| LLama3.1/3.2 | ✅ || +| Gemma-2 | |Need test| +| baichuan | |Need test| +| minicpm | |Need test| +| internlm | ✅ || +| ChatGLM | ✅ || +| InternVL 2.5 | ✅ || +| Qwen2-VL | ✅ || +| GLM-4v | |Need test| +| Molomo | ✅ || +| LLaVA 1.5 | ✅ || +| Mllama | |Need test| +| LLaVA-Next | |Need test| +| LLaVA-Next-Video | |Need test| +| Phi-3-Vison/Phi-3.5-Vison | |Need test| +| Ultravox | |Need test| +| Qwen2-Audio | ✅ ||