Skip to content

Commit

Permalink
Update doc
Browse files Browse the repository at this point in the history
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
  • Loading branch information
wangxiyuan committed Feb 11, 2025
1 parent 7006835 commit d51ed19
Show file tree
Hide file tree
Showing 10 changed files with 94 additions and 165 deletions.
83 changes: 1 addition & 82 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,22 +30,6 @@ This plugin is the recommended approach for supporting the Ascend backend within

By using vLLM Ascend plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, Multi-modal LLMs can run seamlessly on the Ascend NPU.

## Prerequisites
### Support Devices
- Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
- Atlas 800I A2 Inference series (Atlas 800I A2)

### Dependencies
| Requirement | Supported version | Recommended version | Note |
|-------------|-------------------| ----------- |------------------------------------------|
| vLLM | main | main | Required for vllm-ascend |
| Python | >= 3.9 | [3.10](https://www.python.org/downloads/) | Required for vllm |
| CANN | >= 8.0.RC2 | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | Required for vllm-ascend and torch-npu |
| torch-npu | >= 2.4.0 | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1) | Required for vllm-ascend |
| torch | >= 2.4.0 | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1) | Required for torch-npu and vllm |

Find more about how to setup your environment in [here](docs/environment.md).

## Getting Started

> [!NOTE]
Expand Down Expand Up @@ -73,72 +57,7 @@ Run the following command to start the vLLM server with the [Qwen/Qwen2.5-0.5B-I
vllm serve Qwen/Qwen2.5-0.5B-Instruct
curl http://localhost:8000/v1/models
```

Please refer to [vLLM Quickstart](https://docs.vllm.ai/en/latest/getting_started/quickstart.html) for more details.

## Building

#### Build Python package from source

```bash
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
pip install -e .
```

#### Build container image from source
```bash
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
docker build -t vllm-ascend-dev-image -f ./Dockerfile .
```

See [Building and Testing](./CONTRIBUTING.md) for more details, which is a step-by-step guide to help you set up development environment, build and test.

## Feature Support Matrix
| Feature | Supported | Note |
|---------|-----------|------|
| Chunked Prefill || Plan in 2025 Q1 |
| Automatic Prefix Caching || Imporve performance in 2025 Q1 |
| LoRA || Plan in 2025 Q1 |
| Prompt adapter |||
| Speculative decoding || Impore accuracy in 2025 Q1|
| Pooling || Plan in 2025 Q1 |
| Enc-dec || Plan in 2025 Q1 |
| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
| LogProbs |||
| Prompt logProbs |||
| Async output |||
| Multi step scheduler |||
| Best of |||
| Beam search |||
| Guided Decoding || Plan in 2025 Q1 |

## Model Support Matrix

The list here is a subset of the supported models. See [supported_models](docs/supported_models.md) for more details:
| Model | Supported | Note |
|---------|-----------|------|
| Qwen 2.5 |||
| Mistral | | Need test |
| DeepSeek v2.5 | |Need test |
| LLama3.1/3.2 |||
| Gemma-2 | |Need test|
| baichuan | |Need test|
| minicpm | |Need test|
| internlm |||
| ChatGLM |||
| InternVL 2.5 |||
| Qwen2-VL |||
| GLM-4v | |Need test|
| Molomo |||
| LLaVA 1.5 |||
| Mllama | |Need test|
| LLaVA-Next | |Need test|
| LLaVA-Next-Video | |Need test|
| Phi-3-Vison/Phi-3.5-Vison | |Need test|
| Ultravox | |Need test|
| Qwen2-Audio |||
**Please refer to [Official Docs](./docs/index.md) for more details.**

## Contributing
We welcome and value any contributions and collaborations:
Expand Down
83 changes: 1 addition & 82 deletions README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,22 +30,6 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个让vLLM在Ascend NPU无缝运行的

使用 vLLM 昇腾插件,可以让类Transformer、混合专家(MOE)、嵌入、多模态等流行的大语言模型在 Ascend NPU 上无缝运行。

## 前提
### 支持的设备
- Atlas A2 训练系列 (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
- Atlas 800I A2 推理系列 (Atlas 800I A2)

### 依赖
| 需求 | 支持的版本 | 推荐版本 | 注意 |
|-------------|-------------------| ----------- |------------------------------------------|
| vLLM | main | main | vllm-ascend 依赖 |
| Python | >= 3.9 | [3.10](https://www.python.org/downloads/) | vllm 依赖 |
| CANN | >= 8.0.RC2 | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | vllm-ascend and torch-npu 依赖 |
| torch-npu | >= 2.4.0 | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1) | vllm-ascend 依赖 |
| torch | >= 2.4.0 | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1) | torch-npu and vllm 依赖 |

[此处](docs/environment.zh.md)了解更多如何配置您环境的信息。

## 开始使用

> [!NOTE]
Expand Down Expand Up @@ -74,72 +58,7 @@ vllm serve Qwen/Qwen2.5-0.5B-Instruct
curl http://localhost:8000/v1/models
```

请参阅 [vLLM 快速入门](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)以获取更多详细信息。

## 构建

#### 从源码构建Python包

```bash
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
pip install -e .
```

#### 构建容器镜像
```bash
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
docker build -t vllm-ascend-dev-image -f ./Dockerfile .
```

查看[构建和测试](./CONTRIBUTING.zh.md)以获取更多详细信息,其中包含逐步指南,帮助您设置开发环境、构建和测试。

## 特性支持矩阵
| Feature | Supported | Note |
|---------|-----------|------|
| Chunked Prefill || Plan in 2025 Q1 |
| Automatic Prefix Caching || Imporve performance in 2025 Q1 |
| LoRA || Plan in 2025 Q1 |
| Prompt adapter |||
| Speculative decoding || Impore accuracy in 2025 Q1|
| Pooling || Plan in 2025 Q1 |
| Enc-dec || Plan in 2025 Q1 |
| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
| LogProbs |||
| Prompt logProbs |||
| Async output |||
| Multi step scheduler |||
| Best of |||
| Beam search |||
| Guided Decoding || Plan in 2025 Q1 |

## 模型支持矩阵

此处展示了部分受支持的模型。有关更多详细信息,请参阅 [supported_models](docs/supported_models.md)
| Model | Supported | Note |
|---------|-----------|------|
| Qwen 2.5 |||
| Mistral | | Need test |
| DeepSeek v2.5 | |Need test |
| LLama3.1/3.2 |||
| Gemma-2 | |Need test|
| baichuan | |Need test|
| minicpm | |Need test|
| internlm |||
| ChatGLM |||
| InternVL 2.5 |||
| Qwen2-VL |||
| GLM-4v | |Need test|
| Molomo |||
| LLaVA 1.5 |||
| Mllama | |Need test|
| LLaVA-Next | |Need test|
| LLaVA-Next-Video | |Need test|
| Phi-3-Vison/Phi-3.5-Vison | |Need test|
| Ultravox | |Need test|
| Qwen2-Audio |||

**请参阅 [官方文档](./docs/index.md)以获取更多详细信息**

## 贡献
我们欢迎并重视任何形式的贡献与合作:
Expand Down
File renamed without changes.
11 changes: 11 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Ascend plugin for vLLM
This plugin allows you to use Ascend as a backend for vLLM.

## Contents

- [Quick Start](./quick_start.md)
- [Installation](./installation.md)
- Usage
- [Running vLLM with Ascend](./usage/running_vllm_with_ascend.md)
- [Feature Support](./usage/feature_support.md)
- [Supported Models](./usage/supported_models.md)
20 changes: 20 additions & 0 deletions docs/environment.md → docs/installation.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,23 @@
# Installation


## Building

#### Build Python package from source

```bash
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
pip install -e .
```

#### Build container image from source
```bash
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
docker build -t vllm-ascend-dev-image -f ./Dockerfile .
```

### Prepare Ascend NPU environment

### Dependencies
Expand Down
17 changes: 17 additions & 0 deletions docs/quick_start.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Quick Start

## Prerequisites
### Support Devices
- Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
- Atlas 800I A2 Inference series (Atlas 800I A2)

### Dependencies
| Requirement | Supported version | Recommended version | Note |
|-------------|-------------------| ----------- |------------------------------------------|
| vLLM | main | main | Required for vllm-ascend |
| Python | >= 3.9 | [3.10](https://www.python.org/downloads/) | Required for vllm |
| CANN | >= 8.0.RC2 | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | Required for vllm-ascend and torch-npu |
| torch-npu | >= 2.4.0 | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1) | Required for vllm-ascend |
| torch | >= 2.4.0 | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1) | Required for torch-npu and vllm |

Find more about how to setup your environment in [here](docs/environment.md).
1 change: 0 additions & 1 deletion docs/supported_models.md

This file was deleted.

19 changes: 19 additions & 0 deletions docs/usage/feature_support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Feature Support

| Feature | Supported | Note |
|---------|-----------|------|
| Chunked Prefill || Plan in 2025 Q1 |
| Automatic Prefix Caching || Imporve performance in 2025 Q1 |
| LoRA || Plan in 2025 Q1 |
| Prompt adapter |||
| Speculative decoding || Impore accuracy in 2025 Q1|
| Pooling || Plan in 2025 Q1 |
| Enc-dec || Plan in 2025 Q1 |
| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
| LogProbs |||
| Prompt logProbs |||
| Async output |||
| Multi step scheduler |||
| Best of |||
| Beam search |||
| Guided Decoding || Plan in 2025 Q1 |
1 change: 1 addition & 0 deletions docs/usage/running_vllm_with_ascend.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Running VLLM with Ascend
24 changes: 24 additions & 0 deletions docs/usage/supported_models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Supported Models

| Model | Supported | Note |
|---------|-----------|------|
| Qwen 2.5 |||
| Mistral | | Need test |
| DeepSeek v2.5 | |Need test |
| LLama3.1/3.2 |||
| Gemma-2 | |Need test|
| baichuan | |Need test|
| minicpm | |Need test|
| internlm |||
| ChatGLM |||
| InternVL 2.5 |||
| Qwen2-VL |||
| GLM-4v | |Need test|
| Molomo |||
| LLaVA 1.5 |||
| Mllama | |Need test|
| LLaVA-Next | |Need test|
| LLaVA-Next-Video | |Need test|
| Phi-3-Vison/Phi-3.5-Vison | |Need test|
| Ultravox | |Need test|
| Qwen2-Audio |||

0 comments on commit d51ed19

Please sign in to comment.