diff --git a/README.md b/README.md index 728771ae..52c1e31b 100644 --- a/README.md +++ b/README.md @@ -31,20 +31,11 @@ This plugin is the recommended approach for supporting the Ascend backend within By using vLLM Ascend plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, Multi-modal LLMs can run seamlessly on the Ascend NPU. ## Prerequisites -### Support Devices -- Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2) -- Atlas 800I A2 Inference series (Atlas 800I A2) -### Dependencies -| Requirement | Supported version | Recommended version | Note | -|-------------|-------------------| ----------- |------------------------------------------| -| vLLM | main | main | Required for vllm-ascend | -| Python | >= 3.9 | [3.10](https://www.python.org/downloads/) | Required for vllm | -| CANN | >= 8.0.RC2 | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | Required for vllm-ascend and torch-npu | -| torch-npu | >= 2.4.0 | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1) | Required for vllm-ascend | -| torch | >= 2.4.0 | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1) | Required for torch-npu and vllm | +- Hardware: Atlas 800I A2 Inference series, Atlas A2 Training series +- Software: vLLM (the same version as vllm-ascned), Python >= 3.9, CANN >= 8.0.RC2, PyTorch >= 2.4.0, torch-npu >= 2.4.0 -Find more about how to setup your environment in [here](docs/environment.md). +Find more about how to setup your environment step by step in [here](docs/installation.md). ## Getting Started @@ -73,78 +64,14 @@ Run the following command to start the vLLM server with the [Qwen/Qwen2.5-0.5B-I vllm serve Qwen/Qwen2.5-0.5B-Instruct curl http://localhost:8000/v1/models ``` - -Please refer to [vLLM Quickstart](https://docs.vllm.ai/en/latest/getting_started/quickstart.html) for more details. - -## Building - -#### Build Python package from source - -```bash -git clone https://github.com/vllm-project/vllm-ascend.git -cd vllm-ascend -pip install -e . -``` - -#### Build container image from source -```bash -git clone https://github.com/vllm-project/vllm-ascend.git -cd vllm-ascend -docker build -t vllm-ascend-dev-image -f ./Dockerfile . -``` - -See [Building and Testing](./CONTRIBUTING.md) for more details, which is a step-by-step guide to help you set up development environment, build and test. - -## Feature Support Matrix -| Feature | Supported | Note | -|---------|-----------|------| -| Chunked Prefill | ✗ | Plan in 2025 Q1 | -| Automatic Prefix Caching | ✅ | Imporve performance in 2025 Q1 | -| LoRA | ✗ | Plan in 2025 Q1 | -| Prompt adapter | ✅ || -| Speculative decoding | ✅ | Impore accuracy in 2025 Q1| -| Pooling | ✗ | Plan in 2025 Q1 | -| Enc-dec | ✗ | Plan in 2025 Q1 | -| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 | -| LogProbs | ✅ || -| Prompt logProbs | ✅ || -| Async output | ✅ || -| Multi step scheduler | ✅ || -| Best of | ✅ || -| Beam search | ✅ || -| Guided Decoding | ✗ | Plan in 2025 Q1 | - -## Model Support Matrix - -The list here is a subset of the supported models. See [supported_models](docs/supported_models.md) for more details: -| Model | Supported | Note | -|---------|-----------|------| -| Qwen 2.5 | ✅ || -| Mistral | | Need test | -| DeepSeek v2.5 | |Need test | -| LLama3.1/3.2 | ✅ || -| Gemma-2 | |Need test| -| baichuan | |Need test| -| minicpm | |Need test| -| internlm | ✅ || -| ChatGLM | ✅ || -| InternVL 2.5 | ✅ || -| Qwen2-VL | ✅ || -| GLM-4v | |Need test| -| Molomo | ✅ || -| LLaVA 1.5 | ✅ || -| Mllama | |Need test| -| LLaVA-Next | |Need test| -| LLaVA-Next-Video | |Need test| -| Phi-3-Vison/Phi-3.5-Vison | |Need test| -| Ultravox | |Need test| -| Qwen2-Audio | ✅ || +**Please refer to [Official Docs](./docs/index.md) for more details.** ## Contributing +See [CONTRIBUTING](./CONTRIBUTING.md) for more details, which is a step-by-step guide to help you set up development environment, build and test. + We welcome and value any contributions and collaborations: - Please feel free comments [here](https://github.com/vllm-project/vllm-ascend/issues/19) about your usage of vLLM Ascend Plugin. - Please let us know if you encounter a bug by [filing an issue](https://github.com/vllm-project/vllm-ascend/issues). -- Please see the guidance on how to contribute in [CONTRIBUTING.md](./CONTRIBUTING.md). ## License diff --git a/README.zh.md b/README.zh.md index c6fc9fb8..f5addc46 100644 --- a/README.zh.md +++ b/README.zh.md @@ -30,21 +30,12 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个让vLLM在Ascend NPU无缝运行的 使用 vLLM 昇腾插件,可以让类Transformer、混合专家(MOE)、嵌入、多模态等流行的大语言模型在 Ascend NPU 上无缝运行。 -## 前提 -### 支持的设备 -- Atlas A2 训练系列 (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2) -- Atlas 800I A2 推理系列 (Atlas 800I A2) - -### 依赖 -| 需求 | 支持的版本 | 推荐版本 | 注意 | -|-------------|-------------------| ----------- |------------------------------------------| -| vLLM | main | main | vllm-ascend 依赖 | -| Python | >= 3.9 | [3.10](https://www.python.org/downloads/) | vllm 依赖 | -| CANN | >= 8.0.RC2 | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | vllm-ascend and torch-npu 依赖 | -| torch-npu | >= 2.4.0 | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1) | vllm-ascend 依赖 | -| torch | >= 2.4.0 | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1) | torch-npu and vllm 依赖 | - -在[此处](docs/environment.zh.md)了解更多如何配置您环境的信息。 +## 准备 + +- 硬件:Atlas 800I A2 Inference系列、Atlas A2 Training系列 +- 软件:vLLM(与vllm-ascn​​ed版本相同),Python >= 3.9,CANN >= 8.0.RC2,PyTorch >= 2.4.0,torch-npu >= 2.4.0 + +在[此处](docs/installation.md) 中查找有关如何逐步设置环境的更多信息。 ## 开始使用 @@ -74,78 +65,14 @@ vllm serve Qwen/Qwen2.5-0.5B-Instruct curl http://localhost:8000/v1/models ``` -请参阅 [vLLM 快速入门](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)以获取更多详细信息。 - -## 构建 - -#### 从源码构建Python包 - -```bash -git clone https://github.com/vllm-project/vllm-ascend.git -cd vllm-ascend -pip install -e . -``` - -#### 构建容器镜像 -```bash -git clone https://github.com/vllm-project/vllm-ascend.git -cd vllm-ascend -docker build -t vllm-ascend-dev-image -f ./Dockerfile . -``` - -查看[构建和测试](./CONTRIBUTING.zh.md)以获取更多详细信息,其中包含逐步指南,帮助您设置开发环境、构建和测试。 - -## 特性支持矩阵 -| Feature | Supported | Note | -|---------|-----------|------| -| Chunked Prefill | ✗ | Plan in 2025 Q1 | -| Automatic Prefix Caching | ✅ | Imporve performance in 2025 Q1 | -| LoRA | ✗ | Plan in 2025 Q1 | -| Prompt adapter | ✅ || -| Speculative decoding | ✅ | Impore accuracy in 2025 Q1| -| Pooling | ✗ | Plan in 2025 Q1 | -| Enc-dec | ✗ | Plan in 2025 Q1 | -| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 | -| LogProbs | ✅ || -| Prompt logProbs | ✅ || -| Async output | ✅ || -| Multi step scheduler | ✅ || -| Best of | ✅ || -| Beam search | ✅ || -| Guided Decoding | ✗ | Plan in 2025 Q1 | - -## 模型支持矩阵 - -此处展示了部分受支持的模型。有关更多详细信息,请参阅 [supported_models](docs/supported_models.md): -| Model | Supported | Note | -|---------|-----------|------| -| Qwen 2.5 | ✅ || -| Mistral | | Need test | -| DeepSeek v2.5 | |Need test | -| LLama3.1/3.2 | ✅ || -| Gemma-2 | |Need test| -| baichuan | |Need test| -| minicpm | |Need test| -| internlm | ✅ || -| ChatGLM | ✅ || -| InternVL 2.5 | ✅ || -| Qwen2-VL | ✅ || -| GLM-4v | |Need test| -| Molomo | ✅ || -| LLaVA 1.5 | ✅ || -| Mllama | |Need test| -| LLaVA-Next | |Need test| -| LLaVA-Next-Video | |Need test| -| Phi-3-Vison/Phi-3.5-Vison | |Need test| -| Ultravox | |Need test| -| Qwen2-Audio | ✅ || - +**请参阅 [官方文档](./docs/index.md)以获取更多详细信息** ## 贡献 +有关更多详细信息,请参阅 [CONTRIBUTING](./CONTRIBUTING.md),可以更详细的帮助您部署开发环境、构建和测试。 + 我们欢迎并重视任何形式的贡献与合作: - 您可以在[这里](https://github.com/vllm-project/vllm-ascend/issues/19)反馈您的使用体验。 - 请通过[提交问题](https://github.com/vllm-project/vllm-ascend/issues)来告知我们您遇到的任何错误。 -- 请参阅 [CONTRIBUTING.zh.md](./CONTRIBUTING.zh.md) 中的贡献指南。 ## 许可证 diff --git a/docs/environment.zh.md b/docs/environment.zh.md deleted file mode 100644 index ceddf608..00000000 --- a/docs/environment.zh.md +++ /dev/null @@ -1,38 +0,0 @@ -### 昇腾NPU环境准备 - -### 依赖 -| 需求 | 支持的版本 | 推荐版本 | 注意 | -|-------------|-------------------| ----------- |------------------------------------------| -| vLLM | main | main | vllm-ascend 依赖 | -| Python | >= 3.9 | [3.10](https://www.python.org/downloads/) | vllm 依赖 | -| CANN | >= 8.0.RC2 | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | vllm-ascend and torch-npu 依赖 | -| torch-npu | >= 2.4.0 | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1) | vllm-ascend 依赖 | -| torch | >= 2.4.0 | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1) | torch-npu and vllm 依赖 | - - -以下为安装推荐版本软件的简短说明: - -#### 容器化安装 - -您可以直接使用[容器镜像](https://hub.docker.com/r/ascendai/cann),只需一行命令即可: - -```bash -docker run \ - --name vllm-ascend-env \ - --device /dev/davinci1 \ - --device /dev/davinci_manager \ - --device /dev/devmm_svm \ - --device /dev/hisi_hdc \ - -v /usr/local/dcmi:/usr/local/dcmi \ - -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ - -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \ - -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \ - -v /etc/ascend_install.info:/etc/ascend_install.info \ - -it quay.io/ascend/cann:8.0.rc3.beta1-910b-ubuntu22.04-py3.10 bash -``` - -您无需手动安装 `torch` 和 `torch_npu` ,它们将作为 `vllm-ascend` 依赖项自动安装。 - -#### 手动安装 - -您也可以选择手动安装,按照[昇腾安装指南](https://ascend.github.io/docs/sources/ascend/quick_install.html)中提供的说明配置环境。 diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 00000000..860501b3 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,15 @@ +# Ascend plugin for vLLM +vLLM Ascend plugin (vllm-ascend) is a community maintained hardware plugin for running vLLM on the Ascend NPU. + +This plugin is the recommended approach for supporting the Ascend backend within the vLLM community. It adheres to the principles outlined in the [[RFC]: Hardware pluggable](https://github.com/vllm-project/vllm/issues/11162), providing a hardware-pluggable interface that decouples the integration of the Ascend NPU with vLLM. + +By using vLLM Ascend plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, Multi-modal LLMs can run seamlessly on the Ascend NPU. + +## Contents + +- [Quick Start](./quick_start.md) +- [Installation](./installation.md) +- Usage + - [Running vLLM with Ascend](./usage/running_vllm_with_ascend.md) + - [Feature Support](./usage/feature_support.md) + - [Supported Models](./usage/supported_models.md) diff --git a/docs/environment.md b/docs/installation.md similarity index 83% rename from docs/environment.md rename to docs/installation.md index 5dd70b29..b09c6d35 100644 --- a/docs/environment.md +++ b/docs/installation.md @@ -1,3 +1,23 @@ +# Installation + + +## Building + +#### Build Python package from source + +```bash +git clone https://github.com/vllm-project/vllm-ascend.git +cd vllm-ascend +pip install -e . +``` + +#### Build container image from source +```bash +git clone https://github.com/vllm-project/vllm-ascend.git +cd vllm-ascend +docker build -t vllm-ascend-dev-image -f ./Dockerfile . +``` + ### Prepare Ascend NPU environment ### Dependencies diff --git a/docs/quick_start.md b/docs/quick_start.md new file mode 100644 index 00000000..548eb5ac --- /dev/null +++ b/docs/quick_start.md @@ -0,0 +1,17 @@ +# Quick Start + +## Prerequisites +### Support Devices +- Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2) +- Atlas 800I A2 Inference series (Atlas 800I A2) + +### Dependencies +| Requirement | Supported version | Recommended version | Note | +|-------------|-------------------| ----------- |------------------------------------------| +| vLLM | main | main | Required for vllm-ascend | +| Python | >= 3.9 | [3.10](https://www.python.org/downloads/) | Required for vllm | +| CANN | >= 8.0.RC2 | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | Required for vllm-ascend and torch-npu | +| torch-npu | >= 2.4.0 | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1) | Required for vllm-ascend | +| torch | >= 2.4.0 | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1) | Required for torch-npu and vllm | + +Find more about how to setup your environment in [here](docs/environment.md). \ No newline at end of file diff --git a/docs/supported_models.md b/docs/supported_models.md deleted file mode 100644 index a0990367..00000000 --- a/docs/supported_models.md +++ /dev/null @@ -1 +0,0 @@ -TBD diff --git a/docs/usage/feature_support.md b/docs/usage/feature_support.md new file mode 100644 index 00000000..b13bbb2d --- /dev/null +++ b/docs/usage/feature_support.md @@ -0,0 +1,19 @@ +# Feature Support + +| Feature | Supported | Note | +|---------|-----------|------| +| Chunked Prefill | ✗ | Plan in 2025 Q1 | +| Automatic Prefix Caching | ✅ | Improve performance in 2025 Q1 | +| LoRA | ✗ | Plan in 2025 Q1 | +| Prompt adapter | ✅ || +| Speculative decoding | ✅ | Improve accuracy in 2025 Q1| +| Pooling | ✗ | Plan in 2025 Q1 | +| Enc-dec | ✗ | Plan in 2025 Q1 | +| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 | +| LogProbs | ✅ || +| Prompt logProbs | ✅ || +| Async output | ✅ || +| Multi step scheduler | ✅ || +| Best of | ✅ || +| Beam search | ✅ || +| Guided Decoding | ✗ | Plan in 2025 Q1 | diff --git a/docs/usage/running_vllm_with_ascend.md b/docs/usage/running_vllm_with_ascend.md new file mode 100644 index 00000000..03de8dd5 --- /dev/null +++ b/docs/usage/running_vllm_with_ascend.md @@ -0,0 +1 @@ +# Running vLLM with Ascend \ No newline at end of file diff --git a/docs/usage/supported_models.md b/docs/usage/supported_models.md new file mode 100644 index 00000000..edf3df6c --- /dev/null +++ b/docs/usage/supported_models.md @@ -0,0 +1,24 @@ +# Supported Models + +| Model | Supported | Note | +|---------|-----------|------| +| Qwen 2.5 | ✅ || +| Mistral | | Need test | +| DeepSeek v2.5 | |Need test | +| LLama3.1/3.2 | ✅ || +| Gemma-2 | |Need test| +| baichuan | |Need test| +| minicpm | |Need test| +| internlm | ✅ || +| ChatGLM | ✅ || +| InternVL 2.5 | ✅ || +| Qwen2-VL | ✅ || +| GLM-4v | |Need test| +| Molomo | ✅ || +| LLaVA 1.5 | ✅ || +| Mllama | |Need test| +| LLaVA-Next | |Need test| +| LLaVA-Next-Video | |Need test| +| Phi-3-Vison/Phi-3.5-Vison | |Need test| +| Ultravox | |Need test| +| Qwen2-Audio | ✅ ||