Skip to content

Commit

Permalink
[Doc] Add release note (#59)
Browse files Browse the repository at this point in the history
Add release note template and init the first release note content

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
  • Loading branch information
wangxiyuan and Yikun authored Feb 18, 2025
1 parent 7cc024a commit 7606977
Show file tree
Hide file tree
Showing 7 changed files with 91 additions and 17 deletions.
1 change: 1 addition & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@
'.DS_Store',
'.venv',
'README.md',
'user_guide/release.template.md',
# TODO(yikun): Remove this after zh supported
'**/*.zh.md'
]
Expand Down
7 changes: 4 additions & 3 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,11 @@ tutorials

% What does vLLM Ascend Plugin support?
:::{toctree}
:caption: Features
:caption: User Guide
:maxdepth: 1
features/suppoted_features
features/supported_models
user_guide/suppoted_features
user_guide/supported_models
user_guide/release_notes
:::

% How to contribute to the vLLM project
Expand Down
67 changes: 53 additions & 14 deletions docs/source/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ This document describes how to install vllm-ascend manually.
## Requirements

- OS: Linux
- Python: 3.10 or higher
- Python: 3.9 or higher
- A hardware with Ascend NPU. It's usually the Atlas 800 A2 series.
- Software:

Expand All @@ -15,11 +15,15 @@ This document describes how to install vllm-ascend manually.
| torch-npu | >= 2.5.1rc1 | Required for vllm-ascend |
| torch | >= 2.5.1 | Required for torch-npu and vllm |

You have 2 way to install:
- **Using pip**: first prepare env manually or via CANN image, then install `vllm-ascend` using pip.
- **Using docker**: use the `vllm-ascend` pre-built docker image directly.

## Configure a new environment

Before installing, you need to make sure firmware/driver and CANN is installed correctly.
Before installing, you need to make sure firmware/driver and CANN are installed correctly, refer to [link](https://ascend.github.io/docs/sources/ascend/quick_install.html) for more details.

### Install firmwares and drivers
### Configure hardware environment

To verify that the Ascend NPU firmware and driver were correctly installed, run:

Expand All @@ -29,16 +33,16 @@ npu-smi info

Refer to [Ascend Environment Setup Guide](https://ascend.github.io/docs/sources/ascend/quick_install.html) for more details.

### Install CANN
### Configure software environment

:::::{tab-set}
:sync-group: install

::::{tab-item} Using pip
::::{tab-item} Before using pip
:selected:
:sync: pip

The easiest way to prepare your CANN environment is using container directly:
The easiest way to prepare your software environment is using CANN image directly:

```bash
# Update DEVICE according to your device (/dev/davinci[0-7])
Expand All @@ -59,27 +63,38 @@ docker run --rm \
```

You can also install CANN manually:
> NOTE: This guide takes aarc64 as an example. If you run on x86, you need to replace `aarch64` with `x86_64` for the package name shown below.
```bash
# Create a virtual environment
python -m venv vllm-ascend-env
source vllm-ascend-env/bin/activate

# Install required python packages.
pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple attrs numpy==1.24.0 decorator sympy cffi pyyaml pathlib2 psutil protobuf scipy requests absl-py wheel typing_extensions
pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple attrs numpy<2.0.0 decorator sympy cffi pyyaml pathlib2 psutil protobuf scipy requests absl-py wheel typing_extensions

# Download and install the CANN package.
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/CANN%208.0.0/Ascend-cann-toolkit_8.0.0_linux-aarch64.run
sh Ascend-cann-toolkit_8.0.0_linux-aarch64.run --full
chmod +x ./Ascend-cann-toolkit_8.0.0_linux-aarch64.run
./Ascend-cann-toolkit_8.0.0_linux-aarch64.run --full

wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/CANN%208.0.0/Ascend-cann-kernels-910b_8.0.0_linux-aarch64.run
sh Ascend-cann-kernels-910b_8.0.0_linux-aarch64.run --full
chmod +x ./Ascend-cann-kernels-910b_8.0.0_linux-aarch64.run
./Ascend-cann-kernels-910b_8.0.0_linux-aarch64.run --install

wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/CANN%208.0.0/Ascend-cann-nnal_8.0.0_linux-aarch64.run
chmod +x./Ascend-cann-nnal_8.0.0_linux-aarch64.run
./Ascend-cann-nnal_8.0.0_linux-aarch64.run --install

source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/set_env.sh
```

::::

::::{tab-item} Using Docker
::::{tab-item} Before using docker
:sync: docker
No more extra step if you are using `vllm-ascend` image.
No more extra step if you are using `vllm-ascend` prebuilt docker image.
::::
:::::

Expand Down Expand Up @@ -120,8 +135,6 @@ pip install -e . -f https://download.pytorch.org/whl/torch/
You can just pull the **prebuilt image** and run it with bash.

```bash


# Update DEVICE according to your device (/dev/davinci[0-7])
DEVICE=/dev/davinci7
# Update the vllm-ascend image
Expand Down Expand Up @@ -172,7 +185,7 @@ prompts = [
# Create a sampling params object.
sampling_params = SamplingParams(max_tokens=100, temperature=0.0)
# Create an LLM.
llm = LLM(model="facebook/opt-125m")
llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct")

# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
Expand All @@ -188,3 +201,29 @@ Then run:
# export VLLM_USE_MODELSCOPE=true to speed up download if huggingface is not reachable.
python example.py
```

The output will be like:

```bash
INFO 02-18 02:33:37 __init__.py:28] Available plugins for group vllm.platform_plugins:
INFO 02-18 02:33:37 __init__.py:30] name=ascend, value=vllm_ascend:register
INFO 02-18 02:33:37 __init__.py:32] all available plugins for group vllm.platform_plugins will be loaded.
INFO 02-18 02:33:37 __init__.py:34] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 02-18 02:33:37 __init__.py:42] plugin ascend loaded.
INFO 02-18 02:33:37 __init__.py:174] Platform plugin ascend is activated
INFO 02-18 02:33:50 config.py:526] This model supports multiple tasks: {'reward', 'embed', 'generate', 'score', 'classify'}. Defaulting to 'generate'.
INFO 02-18 02:33:50 llm_engine.py:232] Initializing a V0 LLM engine (v0.7.1) with config: model='Qwen/Qwen2.5-0.5B-Instruct', speculative_config=None, tokenizer='./opt-125m', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=./opt-125m, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,
INFO 02-18 02:33:52 importing.py:14] Triton not installed or not compatible; certain GPU-related functions will not be available.
Loading pt checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 4.30it/s]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 4.29it/s]

INFO 02-18 02:33:59 executor_base.py:108] # CPU blocks: 98559, # CPU blocks: 7281
INFO 02-18 02:33:59 executor_base.py:113] Maximum concurrency for 2048 tokens per request: 769.99x
INFO 02-18 02:33:59 llm_engine.py:429] init engine (profile, create kv cache, warmup model) took 1.52 seconds
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.92it/s, est. speed input: 31.99 toks/s, output: 78.73 toks/s]
Prompt: 'Hello, my name is', Generated text: ' John, I am the daughter of Bill and Jocelyn, I am married'
Prompt: 'The president of the United States is', Generated text: " States President. I don't like him.\nThis is my favorite comment so"
Prompt: 'The capital of France is', Generated text: " Texas and everyone I've spoken to in the city knows the state's name,"
Prompt: 'The future of AI is', Generated text: ' people trying to turn a good computer into a machine, not a computer being human'
```
13 changes: 13 additions & 0 deletions docs/source/user_guide/release.template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
## {version}
### Highlights
- {feature}
### Bug fixes
- {bug}
### Other changes
- {change}
### Known issues
- {issue}
### Upgrade Notes
- {upgrade}
### Deprecation Notes
- {deprecation}
20 changes: 20 additions & 0 deletions docs/source/user_guide/release_notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Release note

## v0.7.1.rc1

We are excited to announce the release candidate of v0.7.1 for vllm-ascend. vllm-ascend is a community maintained hardware plugin for running vLLM on the Ascend NPU. With this release, users can now enjoy the latest features and improvements of vLLM on the Ascend NPU.

Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19)

### Highlights

- The first release which official supports the Ascend NPU on vLLM originally. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/latest/) to start the journey.

### Other changes

- Added the Ascend quantization config option, the implementation will comming soon.

### Known issues

- This release relies on an unreleased torch_npu version. Please [install](https://vllm-ascend.readthedocs.io/en/latest/installation.html) it manually.
- There are logs like `No platform deteced, vLLM is running on UnspecifiedPlatform` or `Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")` shown when runing vllm-ascend. It actually doesn't affect any functionality and performance. You can just ignore it. And it has been fixed in this [PR](https://github.com/vllm-project/vllm/pull/12432) which will be included in v0.7.3 soon.
File renamed without changes.
File renamed without changes.

0 comments on commit 7606977

Please sign in to comment.