diff --git a/docs/common/dev/_rkllm-deepseek-r1.mdx b/docs/common/dev/_rkllm-deepseek-r1.mdx new file mode 100644 index 000000000..a9efe329c --- /dev/null +++ b/docs/common/dev/_rkllm-deepseek-r1.mdx @@ -0,0 +1,100 @@ +[DeepSeek-R1](https://api-docs.deepseek.com/news/news250120) 是由杭州[深度求索](https://www.deepseek.com/)公司开发, +该模型完全开源了所有训练技术和模型权重,性能对齐闭源的 OpenAI-o1, +deepseek 通过 DeepSeek-R1 的输出,蒸馏了 6 个小模型给开源社区,包括 Qwen2.5 和 Llama3.1。 +本文档将讲述如何使用 RKLLM 将 DeepSeek-R1 蒸馏模型 [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) 大语言模型部署到 RK3588 上利用 NPU 进行硬件加速推理。 + +![rkllm_2.webp](/img/general-tutorial/rknn/rkllm_ds_1.webp) + +### 模型文件下载 + +:::tip +瑞莎已经提供编译好的 rkllm 模型和执行文件,用户可直接下载使用,如要参考编译过程可继续参考可选部分 +::: + +- 使用 [git LFS](https://git-lfs.com/) 从 [ModelScope](https://modelscope.cn/models/radxa/DeepSeek-R1-Distill-Qwen-1.5B_RKLLM) 下载预编译好的 rkllm + +```bash +git clone https://www.modelscope.cn/radxa/DeepSeek-R1-Distill-Qwen-1.5B_RKLLM.git +``` + +### (可选)模型编译 + +:::tip +请用户根据 [RKLLM安装](./rkllm_install) 完成 PC 端和开发板端 RKLLM 工作环境的准备 +::: + +- x86 PC 工作站中下载 [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) 权重文件, 如没安装 [git-lfs](https://git-lfs.com/),请自行安装 + ```bash + git clone https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B + ``` +- 激活 rkllm conda 环境, 可参考[RKLLM conda 安装](rkllm_install#x86-pc-工作站) + ```bash + conda activate rkllm + ``` +- 更改 `rknn-llm/rkllm-toolkit/examples/test.py` 中 modelpath 模型路径, dataset路径, rkllm 导出路径 + ```python + 15 modelpath = 'Your DeepSeek-R1-Distill-Qwen-1.5B Folder Path' + 29 datasert = None # 默认是 "./data_quant.json", 如无可以填写 None + 83 ret = llm.export_rkllm("./DeepSeek-R1-Distill-Qwen-1.5B.rkllm") + ``` +- 运行模型转换脚本 + ```bash + cd rknn-llm/rkllm-toolkit/examples/ + python3 test.py + ``` + 转换成功后可得到 DeepSeek-R1-Distill-Qwen-1.5B.rkllm 模型 + +### (可选)编译可执行文件 + +- 下载交叉编译工具链 [gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu](https://developer.arm.com/downloads/-/gnu-a/10-2-2020-11) +- 修改主程序 `rknn-llm/examples/rkllm_api_demo/src/llm_demo.cpp` 代码, 这里修改 PROMPT 格式的设置,和 PROMPT 的构造 + + ```cpp + 24 #define PROMPT_TEXT_PREFIX "<|im_start|>system\nYou are a helpful assistant.\n<|im_end|>\n<|im_start|>user\n" + 25 #define PROMPT_TEXT_POSTFIX "\n<|im_end|>\n<|im_start|>assistant\n" + 184 text = PROMPT_TEXT_PREFIX + input_str + PROMPT_TEXT_POSTFIX; + 185 // text = input_str; + ``` + + :::tip + 为什么要修改 PROMPT_TEXT_PREFIX 和 PROMPT_TEXT_POSTFIX? 这里需要参考 [DeepSeek-R1 论文](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf)中 Table1 的说明,需按照其格式对 DeepSeek-R1 模型提问 + ::: + ![rkllm_2.webp](/img/general-tutorial/rknn/rkllm_ds_2.webp) + +- 修改 `rknn-llm/examples/rkllm_api_demo/build-linux.sh` 编译脚本中 `GCC_COMPILER_PATH` 路径 + ```bash + GCC_COMPILER_PATH=gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu + ``` +- 运行模型转换脚本 + ```bash + cd rknn-llm/examples/rkllm_api_demo/ + bash build-linux.sh + ``` + 生成的可执行文件在 `build/build_linux_aarch64_Release/llm_demo` + +### 板端部署 + +#### 终端模式 + +- 将转换成功后的 DeepSeek-R1-Distill-Qwen-1.5B.rkllm 模型与编译后的二进制文件 llm_demo 复制到板端 +- 导入环境变量 + ```bash + export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:rknn-llm/rkllm-runtime/Linux/librkllm_api/aarch64 + ``` + :::tip + 使用 ModelScope 下载的用户可直接 export 下载仓库里的 librkllmrt.so + ::: +- 运行 llm_demo,输入 `exit` 退出 + ```bash + export RKLLM_LOG_LEVEL=1 + ./llm_demo DeepSeek-R1-Distill-Qwen-1.5B.rkllm 10000 10000 + ``` + ![rkllm_2.webp](/img/general-tutorial/rknn/rkllm_ds_3.webp) + +### 性能分析 + +对于数学问题: `解方程 x+y=12, 2x+4y=34, 求x,y的值`, 在 RK3588 上达 14.93 token/s +| Stage | Total Time (ms) | Tokens | Time per Token (ms) | Tokens per Second | +|----------|-----------------|--------|---------------------|-------------------| +| Prefill | 429.63 | 81 | 5.30 | 188.53 | +| Generate | 56103.71 | 851 | 66.99 | 14.93 | diff --git a/docs/common/dev/_rkllm-install.mdx b/docs/common/dev/_rkllm-install.mdx index 5d0b4c53f..ade16188a 100644 --- a/docs/common/dev/_rkllm-install.mdx +++ b/docs/common/dev/_rkllm-install.mdx @@ -6,15 +6,17 @@ RKLLM 可以帮助用户快速将 LLM 模型部署到 Rockchip 芯片中,目 #### 目前支持模型 -- [TinyLLAMA 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0/tree/fe8a4ea1ffedaf415f4da2f062534de366a451e6) -- [Qwen 1.8B](https://huggingface.co/Qwen/Qwen-1_8B-Chat/tree/1d0f68de57b88cfde81f3c3e537f24464d889081) -- [Qwen2 0.5B](https://huggingface.co/Qwen/Qwen1.5-0.5B/tree/8f445e3628f3500ee69f24e1303c9f10f5342a39) -- [Phi-2 2.7B](https://hf-mirror.com/microsoft/phi-2/tree/834565c23f9b28b96ccbeabe614dd906b6db551a) -- [Phi-3 3.8B](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/tree/291e9e30e38030c23497afa30f3af1f104837aa6) -- [ChatGLM3 6B](https://huggingface.co/THUDM/chatglm3-6b/tree/103caa40027ebfd8450289ca2f278eac4ff26405) -- [Gemma 2B](https://huggingface.co/google/gemma-2b-it/tree/de144fb2268dee1066f515465df532c05e699d48) -- [InternLM2 1.8B](https://huggingface.co/internlm/internlm2-chat-1_8b/tree/ecccbb5c87079ad84e5788baa55dd6e21a9c614d) -- [MiniCPM 2B](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16/tree/79fbb1db171e6d8bf77cdb0a94076a43003abd9e) +- [LLAMA models](https://huggingface.co/meta-llama) +- [TinyLLAMA models](https://huggingface.co/TinyLlama) +- [Qwen models](https://huggingface.co/models?search=Qwen/Qwen) +- [Phi models](https://huggingface.co/models?search=microsoft/phi) +- [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b/tree/103caa40027ebfd8450289ca2f278eac4ff26405) +- [Gemma models](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315) +- [InternLM2 models](https://huggingface.co/collections/internlm/internlm2-65b0ce04970888799707893c) +- [MiniCPM models](https://huggingface.co/collections/openbmb/minicpm-65d48bf958302b9fd25b698f) +- [TeleChat models](https://huggingface.co/Tele-AI) +- [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) +- [MiniCPM-V](https://huggingface.co/openbmb/MiniCPM-V-2_6) ## RKLLM 安装 @@ -39,7 +41,7 @@ RKLLM 可以帮助用户快速将 LLM 模型部署到 Rockchip 芯片中,目 - 创建 conda 环境 ```bash - conda create -n rkllm python=3.8 + conda create -n rkllm python=3.8.2 ``` - 进入 rkllm conda 环境 @@ -47,15 +49,15 @@ RKLLM 可以帮助用户快速将 LLM 模型部署到 Rockchip 芯片中,目 conda activate rkllm ``` - - 退出环境 + - _如要退出环境_ ```bash conda deactivate ``` - RKLLM-Toolkit是一套软件开发包,供用户在 PC 上进行 Huggingface 格式的 LLM 模型转换和量化 ```bash - git clone -b release-v1.0.1 https://github.com/airockchip/rknn-llm.git - pip3 install ./rknn-llm/rkllm-toolkit/packages/rkllm_toolkit-1.0.1-cp38-cp38-linux_x86_64.whl + git clone -b release-v1.1.4 https://github.com/airockchip/rknn-llm.git + pip3 install ./rknn-llm/rkllm-toolkit/packages/rkllm_toolkit-1.1.4-cp38-cp38-linux_x86_64.whl ``` 若执行以下命令没有报错,则安装成功 ```bash @@ -65,18 +67,21 @@ RKLLM 可以帮助用户快速将 LLM 模型部署到 Rockchip 芯片中,目 ### 开发板 -- 检查 NPU 驱动版本是否大于等于 0.9.6,如小于此版本请下载并烧录最新 radxa 6.1 固件 +- 检查 NPU 驱动版本是否大于等于 0.9.8,如小于此版本请下载并烧录最新 radxa 6.1 固件 + :::tip + radxa 6.1 固件默认 NPU 驱动版本为 0.9.6,请通过: `sudo rsetup -> System -> System Update` 升级系统以更新至 0.9.8 驱动。 + ::: ```bash $ sudo cat /sys/kernel/debug/rknpu/version - RKNPU driver: v0.9.6 + RKNPU driver: v0.9.8 ``` - (可选)手动编译 NPU 内核 +- (可选)手动编译 NPU 内核 若用户所使用的为非官方固件,需要对内核进行更新;其中,RKNPU 驱动包支持两个主要内核版本:[kernel-5.10](https://github.com/radxa/kernel/tree/stable-5.10-rock5) 和 [kernel-6.1](https://github.com/radxa/kernel/tree/linux-6.1-stan-rkr1);用户可在内核根目录下的 Makefile 中确认具体版本号。内核的具体的更新步骤如下: - 1) 下载压缩包 [rknpu_driver_0.9.6_20240322.tar.bz2](https://github.com/airockchip/rknn-llm/tree/main/rknpu-driver) + 1) 下载压缩包 [rknpu_driver_0.9.8_20241009.tar.bz2](https://github.com/airockchip/rknn-llm/tree/release-v1.1.4/rknpu-driver) 2) 解压该压缩包,将其中的 rknpu 驱动代码覆盖到当前内核代码目录 @@ -86,5 +91,5 @@ RKLLM 可以帮助用户快速将 LLM 模型部署到 Rockchip 芯片中,目 - RKLLM Runtime 为 Rockchip NPU 平台提供 C/C++ 编程接口,帮助用户部署 RKLLM 模型,加速 LLM 应用的实现 ```bash - git clone https://github.com/airockchip/rknn-llm.git + git clone -b release-v1.1.4 https://github.com/airockchip/rknn-llm.git ``` diff --git a/docs/common/dev/_rkllm-usage.mdx b/docs/common/dev/_rkllm-usage.mdx index 7b885a0be..8ec6a71e1 100644 --- a/docs/common/dev/_rkllm-usage.mdx +++ b/docs/common/dev/_rkllm-usage.mdx @@ -2,41 +2,44 @@ #### 目前支持模型 -- [TinyLLAMA 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0/tree/fe8a4ea1ffedaf415f4da2f062534de366a451e6) -- [Qwen 1.8B](https://huggingface.co/Qwen/Qwen-1_8B-Chat/tree/1d0f68de57b88cfde81f3c3e537f24464d889081) -- [Qwen2 0.5B](https://huggingface.co/Qwen/Qwen1.5-0.5B/tree/8f445e3628f3500ee69f24e1303c9f10f5342a39) -- [Phi-2 2.7B](https://hf-mirror.com/microsoft/phi-2/tree/834565c23f9b28b96ccbeabe614dd906b6db551a) -- [Phi-3 3.8B](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/tree/291e9e30e38030c23497afa30f3af1f104837aa6) -- [ChatGLM3 6B](https://huggingface.co/THUDM/chatglm3-6b/tree/103caa40027ebfd8450289ca2f278eac4ff26405) -- [Gemma 2B](https://huggingface.co/google/gemma-2b-it/tree/de144fb2268dee1066f515465df532c05e699d48) -- [InternLM2 1.8B](https://huggingface.co/internlm/internlm2-chat-1_8b/tree/ecccbb5c87079ad84e5788baa55dd6e21a9c614d) -- [MiniCPM 2B](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16/tree/79fbb1db171e6d8bf77cdb0a94076a43003abd9e) - -这里以 TinyLLAMA 1.1B 为例子,完整讲述如何从 0 开始部署大语言模型到搭载 RK3588 芯片的开发版上,并使用 NPU 进行硬件加速推理 +- [LLAMA models](https://huggingface.co/meta-llama) +- [TinyLLAMA models](https://huggingface.co/TinyLlama) +- [Qwen models](https://huggingface.co/models?search=Qwen/Qwen) +- [Phi models](https://huggingface.co/models?search=microsoft/phi) +- [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b/tree/103caa40027ebfd8450289ca2f278eac4ff26405) +- [Gemma models](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315) +- [InternLM2 models](https://huggingface.co/collections/internlm/internlm2-65b0ce04970888799707893c) +- [MiniCPM models](https://huggingface.co/collections/openbmb/minicpm-65d48bf958302b9fd25b698f) +- [TeleChat models](https://huggingface.co/Tele-AI) +- [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) +- [MiniCPM-V](https://huggingface.co/openbmb/MiniCPM-V-2_6) + +这里以 [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) 为例子,完整讲述如何从 0 开始部署大语言模型到搭载 RK3588 芯片的开发版上,并使用 NPU 进行硬件加速推理 :::tip 如没安装与配置 RKLLM 环境,请参考 [RKLLM 安装](rkllm_install) :::: ### 模型转换 -这里以 TinyLLAMA 1.1B 为例子,用户也可以选择任意[目前支持模型](#目前支持模型)列表中的链接 +这里以 [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) 为例子,用户也可以选择任意[目前支持模型](#目前支持模型)列表中的链接 -- x86 PC 工作站中下载 [TinyLLAMA 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) 所有文件, 如没安装 [git-lfs](https://git-lfs.com/),请自行安装 +- x86 PC 工作站中下载 [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) 权重文件, 如没安装 [git-lfs](https://git-lfs.com/),请自行安装 ```bash - git clone https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0 + git clone https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct ``` - 激活 rkllm conda 环境, 可参考[RKLLM conda 安装](rkllm_install#x86-pc-工作站) ```bash conda activate rkllm ``` -- 更改 `rknn-llm/rkllm-toolkit/examples/huggingface/test.py` 中 modelpath 模型路径与 rkllm 导出路径 +- 更改 `rknn-llm/rkllm-toolkit/examples/test.py` 中 modelpath 模型路径, dataset路径, rkllm 导出路径 ```python - modelpath = 'Your Huggingface LLM model' - ret = llm.export_rkllm("./Your_Huggingface_LLM_model.rkllm") + 15 modelpath = 'Your Huggingface LLM model' + 29 datasert = None # 默认是 "./data_quant.json", 如无可以填写 None + 83 ret = llm.export_rkllm("./Your_Huggingface_LLM_model.rkllm") ``` - 运行模型转换脚本 ```bash - cd rknn-llm/rkllm-toolkit/examples/huggingface + cd rknn-llm/rkllm-toolkit/examples/ python3 test.py ``` 转换成功后可得到 rkllm 模型 @@ -44,19 +47,18 @@ ### 编译可执行文件 - 下载交叉编译工具链 [gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu](https://developer.arm.com/downloads/-/gnu-a/10-2-2020-11) -- 修改主程序代码, 这里修改两个地方 +- 修改主程序 `rknn-llm/examples/rkllm_api_demo/src/llm_demo.cpp` 代码, 这里修改两个地方 ```vim - 74 param.num_npu_core = 3; // rk3588 num_npu_core 的取值范围 [1,3] - 118 string text = PROMPT_TEXT_PREFIX + input_str + PROMPT_TEXT_POSTFIX; - 119 // string text = input_str; + 184 text = PROMPT_TEXT_PREFIX + input_str + PROMPT_TEXT_POSTFIX; + 185 // text = input_str; ``` -- 修改 `rknn-llm/rkllm-runtime/examples/rkllm_api_demo/build-linux.sh` 编译脚本中 gcc 路径 +- 修改 `rknn-llm/examples/rkllm_api_demo/build-linux.sh` 编译脚本中 GCC_COMPILER_PATH 路径 ```bash GCC_COMPILER_PATH=gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu ``` - 运行模型转换脚本 ```bash - cd rknn-llm/rkllm-runtime/examples/rkllm_api_demo + cd rknn-llm/examples/rkllm_api_demo/ bash build-linux.sh ``` 生成的可执行文件在 `build/build_linux_aarch64_Release/llm_demo` @@ -68,102 +70,103 @@ - 将转换成功后的 rkllm 模型与编译后的二进制文件 llm_demo 复制到板端 - 导入环境变量 ```bash - ulimit -n 102400 - export LD_LIBRARY_PATH=rknn-llm/rkllm-runtime/runtime/Linux/librkllm_api/aarch64:$LD_LIBRARY_PATH + export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:rknn-llm/rkllm-runtime/Linux/librkllm_api/aarch64 ``` - 运行 llm_demo,输入 `exit` 退出 - `bash -taskset f0 ./llm_demo your_rkllm_path -` - ![rkllm_2.webp](/img/general-tutorial/rknn/rkllm_2.webp) - -#### Gradio 模式 - -##### 服务端 - -- 安装 gradio ```bash - pip3 install gradio - ``` -- 复制 `librkllmrt.so` 到 `rkllm_server/lib` - ```bash - cd rknn-llm/rkllm-runtime - cp ./runtime//Linux/librkllm_api/aarch64/librkllmrt.so ./examples/rkllm_server_demo/rkllm_server/lib - ``` -- 修改 gradio_server.py, 禁用 GPU 进行 prefill 加速 - ```python - rknnllm_param.use_gpu = False - ``` -- 启动 gradio server - ```bash - cd examples/rkllm_server_demo/rkllm_server - python3 gradio_server.py --target_platform rk3588 --rkllm_model_path your_model_path - ``` -- 浏览器访问开发板 ip 8080 端口 - ![rkllm_3.webp](/img/general-tutorial/rknn/rkllm_3.webp) - -##### 客户端 - -用户可以在开发板开启 gradio 服务端后在同网络环境其他设备上通过 Gradio API 调用 LLM gradio server - -- 安装 gradio_client - ```bash - pip3 install gradio_client - ``` -- 修改 chat_api_gradio.py 中的 ip 地址,用户需要根据自己部署的具体网址进行修改 - ```python - # 用户需要根据自己部署的具体 ip 进行修改 - client = Client("http://192.168.2.209:8080/") + export RKLLM_LOG_LEVEL=1 + ./llm_demo your_rkllm_path 10000 10000 ``` -- 运行 chat_api_gradio.py - `bash -cd rknn-llm/rkllm-runtime/examples/rkllm_server_demo -python3 chat_api_gradio.py -` - ![rkllm_4.webp](/img/general-tutorial/rknn/rkllm_4.webp) - -#### Falsk 模式 - -##### 服务端 - -- 安装 flask - ```bash - pip3 install flask==2.2.2 Werkzeug==2.2.2 - ``` -- 复制 `librkllmrt.so` 到 `rkllm_server/lib` - ```bash - cd rknn-llm/rkllm-runtime - cp ./runtime//Linux/librkllm_api/aarch64/librkllmrt.so ./examples/rkllm_server_demo/rkllm_server/lib - ``` -- 修改 flask_server.py, 禁用 GPU 进行 prefill 加速 - - ```python - rknnllm_param.use_gpu = False - ``` - -- 启动 flask server, 端口 8080 - `bash -cd examples/rkllm_server_demo/rkllm_server -python3 flask_server.py --target_platform rk3588 --rkllm_model_path your_model_path -` - ![rkllm_5.webp](/img/general-tutorial/rknn/rkllm_5.webp) - -##### 客户端 - -用户可以在开发板开启 flask 服务端后在同网络环境其他设备上通过 flask API 调用 flask server, 户在进行自定义功能的开发的过程中,只需参考该 API 访问示例使用对应的收发结 -构体进行数据的包装、解析即可 + ![rkllm_2.webp](/img/general-tutorial/rknn/rkllm_2.webp) -- 修改 chat_api_flask.py 中的 ip 地址,用户需要根据自己部署的具体网址进行修改 - ```python - # 用户需要根据自己部署的具体 ip 进行修改 - server_url = 'http://192.168.2.209:8080/rkllm_chat' - ``` -- 运行 chat_api_flask.py - `bash -cd rknn-llm/rkllm-runtime/examples/rkllm_server_demo -python3 chat_api_flask.py -` - ![rkllm_6.webp](/img/general-tutorial/rknn/rkllm_6.webp) +{/* #### Gradio 模式 */} + +{/* ##### 服务端 */} + +{/* - 准备一个虚拟环境并进入,请参考[Python 虚拟环境使用](venv_usage) */} +{/* - 安装 gradio */} +{/* ```bash */} +{/* pip3 install gradio */} +{/* ``` */} +{/* - 复制 `librkllmrt.so` 到 `rkllm_server/lib` */} +{/* ```bash */} +{/* cd rkllm-runtime/Linux/librkllm_api/aarch64 */} +{/* cp rkllm-runtime/Linux/librkllm_api/aarch64/librkllmrt.so ./examples/rkllm_server_demo/rkllm_server/lib */} +{/* ``` */} +{/* - 修改 gradio_server.py, 禁用 GPU 进行 prefill 加速 */} +{/* ```python */} +{/* rknnllm_param.use_gpu = False */} +{/* ``` */} +{/* - 启动 gradio server */} +{/* ```bash */} +{/* cd examples/rkllm_server_demo/rkllm_server */} +{/* python3 gradio_server.py --target_platform rk3588 --rkllm_model_path your_model_path */} +{/* ``` */} +{/* - 浏览器访问开发板 ip 8080 端口 */} +{/* ![rkllm_3.webp](/img/general-tutorial/rknn/rkllm_3.webp) */} + +{/* ##### 客户端 */} + +{/* 用户可以在开发板开启 gradio 服务端后在同网络环境其他设备上通过 Gradio API 调用 LLM gradio server */} + +{/* - 安装 gradio_client */} +{/* ```bash */} +{/* pip3 install gradio_client */} +{/* ``` */} +{/* - 修改 chat_api_gradio.py 中的 ip 地址,用户需要根据自己部署的具体网址进行修改 */} +{/* ```python */} +{/* # 用户需要根据自己部署的具体 ip 进行修改 */} +{/* client = Client("http://192.168.2.209:8080/") */} +{/* ``` */} +{/* - 运行 chat_api_gradio.py */} +{/* `bash */} +{/* cd rknn-llm/rkllm-runtime/examples/rkllm_server_demo */} +{/* python3 chat_api_gradio.py */} +{/* ` */} +{/* ![rkllm_4.webp](/img/general-tutorial/rknn/rkllm_4.webp) */} + +{/* #### Falsk 模式 */} + +{/* ##### 服务端 */} + +{/* - 安装 flask */} +{/* ```bash */} +{/* pip3 install flask==2.2.2 Werkzeug==2.2.2 */} +{/* ``` */} +{/* - 复制 `librkllmrt.so` 到 `rkllm_server/lib` */} +{/* ```bash */} +{/* cd rknn-llm/rkllm-runtime */} +{/* cp ./runtime//Linux/librkllm_api/aarch64/librkllmrt.so ./examples/rkllm_server_demo/rkllm_server/lib */} +{/* ``` */} +{/* - 修改 flask_server.py, 禁用 GPU 进行 prefill 加速 */} + +{/* ```python */} +{/* rknnllm_param.use_gpu = False */} +{/* ``` */} + +{/* - 启动 flask server, 端口 8080 */} +{/* `bash */} +{/* cd examples/rkllm_server_demo/rkllm_server */} +{/* python3 flask_server.py --target_platform rk3588 --rkllm_model_path your_model_path */} +{/* ` */} +{/* ![rkllm_5.webp](/img/general-tutorial/rknn/rkllm_5.webp) */} + +{/* ##### 客户端 */} + +{/* 用户可以在开发板开启 flask 服务端后在同网络环境其他设备上通过 flask API 调用 flask server, 户在进行自定义功能的开发的过程中,只需参考该 API 访问示例使用对应的收发结 */} +{/* 构体进行数据的包装、解析即可 */} + +{/* - 修改 chat_api_flask.py 中的 ip 地址,用户需要根据自己部署的具体网址进行修改 */} +{/* ```python */} +{/* # 用户需要根据自己部署的具体 ip 进行修改 */} +{/* server_url = 'http://192.168.2.209:8080/rkllm_chat' */} +{/* ``` */} +{/* - 运行 chat_api_flask.py */} +{/* `bash */} +{/* cd rknn-llm/rkllm-runtime/examples/rkllm_server_demo */} +{/* python3 chat_api_flask.py */} +{/* ` */} +{/* ![rkllm_6.webp](/img/general-tutorial/rknn/rkllm_6.webp) */} ### 部分模型性能对比 diff --git a/docs/compute-module/cm5/radxa-os/app-dev/rkllm_deepseek_r1.md b/docs/compute-module/cm5/radxa-os/app-dev/rkllm_deepseek_r1.md new file mode 100644 index 000000000..5188c668b --- /dev/null +++ b/docs/compute-module/cm5/radxa-os/app-dev/rkllm_deepseek_r1.md @@ -0,0 +1,9 @@ +--- +sidebar_position: 23 +--- + +# RKLLM DeepSeek-R1 + +import RKLLMDEEPSEEKR1 from '../../../../common/dev/\_rkllm-deepseek-r1.mdx'; + + diff --git a/docs/compute-module/nx5/radxa-os/app-dev/rkllm_deepseek_r1.md b/docs/compute-module/nx5/radxa-os/app-dev/rkllm_deepseek_r1.md new file mode 100644 index 000000000..5188c668b --- /dev/null +++ b/docs/compute-module/nx5/radxa-os/app-dev/rkllm_deepseek_r1.md @@ -0,0 +1,9 @@ +--- +sidebar_position: 23 +--- + +# RKLLM DeepSeek-R1 + +import RKLLMDEEPSEEKR1 from '../../../../common/dev/\_rkllm-deepseek-r1.mdx'; + + diff --git a/docs/rock5/rock5a/app-development/rkllm_deepseek_r1.md b/docs/rock5/rock5a/app-development/rkllm_deepseek_r1.md new file mode 100644 index 000000000..ff12ca8db --- /dev/null +++ b/docs/rock5/rock5a/app-development/rkllm_deepseek_r1.md @@ -0,0 +1,9 @@ +--- +sidebar_position: 23 +--- + +# RKLLM DeepSeek-R1 + +import RKLLMDEEPSEEKR1 from '../../../common/dev/\_rkllm-deepseek-r1.mdx'; + + diff --git a/docs/rock5/rock5b/app-development/rkllm_deepseek_r1.md b/docs/rock5/rock5b/app-development/rkllm_deepseek_r1.md new file mode 100644 index 000000000..ff12ca8db --- /dev/null +++ b/docs/rock5/rock5b/app-development/rkllm_deepseek_r1.md @@ -0,0 +1,9 @@ +--- +sidebar_position: 23 +--- + +# RKLLM DeepSeek-R1 + +import RKLLMDEEPSEEKR1 from '../../../common/dev/\_rkllm-deepseek-r1.mdx'; + + diff --git a/docs/rock5/rock5c/app-development/rkllm_deepseek_r1.md b/docs/rock5/rock5c/app-development/rkllm_deepseek_r1.md new file mode 100644 index 000000000..ff12ca8db --- /dev/null +++ b/docs/rock5/rock5c/app-development/rkllm_deepseek_r1.md @@ -0,0 +1,9 @@ +--- +sidebar_position: 23 +--- + +# RKLLM DeepSeek-R1 + +import RKLLMDEEPSEEKR1 from '../../../common/dev/\_rkllm-deepseek-r1.mdx'; + + diff --git a/docs/rock5/rock5itx/app-development/rkllm_deepseek_r1.md b/docs/rock5/rock5itx/app-development/rkllm_deepseek_r1.md new file mode 100644 index 000000000..ff12ca8db --- /dev/null +++ b/docs/rock5/rock5itx/app-development/rkllm_deepseek_r1.md @@ -0,0 +1,9 @@ +--- +sidebar_position: 23 +--- + +# RKLLM DeepSeek-R1 + +import RKLLMDEEPSEEKR1 from '../../../common/dev/\_rkllm-deepseek-r1.mdx'; + + diff --git a/docs/rock5/rock5t/app-development/rkllm_deepseek_r1.md b/docs/rock5/rock5t/app-development/rkllm_deepseek_r1.md new file mode 100644 index 000000000..ff12ca8db --- /dev/null +++ b/docs/rock5/rock5t/app-development/rkllm_deepseek_r1.md @@ -0,0 +1,9 @@ +--- +sidebar_position: 23 +--- + +# RKLLM DeepSeek-R1 + +import RKLLMDEEPSEEKR1 from '../../../common/dev/\_rkllm-deepseek-r1.mdx'; + + diff --git a/i18n/en/docusaurus-plugin-content-docs/current/common/dev/_rkllm-deepseek-r1.mdx b/i18n/en/docusaurus-plugin-content-docs/current/common/dev/_rkllm-deepseek-r1.mdx new file mode 100644 index 000000000..eda939b4b --- /dev/null +++ b/i18n/en/docusaurus-plugin-content-docs/current/common/dev/_rkllm-deepseek-r1.mdx @@ -0,0 +1,97 @@ +[DeepSeek-R1](https://api-docs.deepseek.com/news/news250120) is developed by [DeepSeek](https://www.deepseek.com/), a company based in Hangzhou. This model has fully open-sourced all training techniques and model weights, with performance comparable to the closed-source OpenAI-o1. Through DeepSeek-R1's output, DeepSeek has distilled 6 smaller models for the open-source community, including Qwen2.5 and Llama3.1. This document will explain how to use RKLLM to deploy the distilled model [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) on the RK3588, leveraging the NPU for hardware-accelerated inference. + +![rkllm_2.webp](/img/general-tutorial/rknn/rkllm_ds_1.webp) + +### Model File Download + +:::tip +Radxa has provided pre-compiled RKLLM models and executable files. Users can directly download and use them. If you want to reference the compilation process, you can continue to the optional sections. +::: + +- Use [git LFS](https://git-lfs.com/) to download the pre-compiled RKLLM from [ModelScope](https://modelscope.cn/models/radxa/DeepSeek-R1-Distill-Qwen-1.5B_RKLLM): + +```bash +git clone https://www.modelscope.cn/radxa/DeepSeek-R1-Distill-Qwen-1.5B_RKLLM.git +``` + +### (Optional) Model Compilation + +:::tip +Please ensure that the RKLLM working environment is set up on both the PC and the development board by following the [RKLLM Installation Guide](./rkllm_install). +::: + +- Download the [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) weight files on an x86 PC workstation. If [git-lfs](https://git-lfs.com/) is not installed, please install it. + ```bash + git clone https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B + ``` +- Activate the RKLLM conda environment. Refer to [RKLLM Installation](./rkllm_install#x86-pc-workstation) for details. + ```bash + conda activate rkllm + ``` +- Modify the `modelpath`, `dataset` path, and RKLLM export path in `rknn-llm/rkllm-toolkit/examples/test.py`: + ```python + 15 modelpath = 'Your DeepSeek-R1-Distill-Qwen-1.5B Folder Path' + 29 dataset = None # Default is "./data_quant.json". If not available, set to None. + 83 ret = llm.export_rkllm("./DeepSeek-R1-Distill-Qwen-1.5B.rkllm") + ``` +- Run the model conversion script: + ```bash + cd rknn-llm/rkllm-toolkit/examples/ + python3 test.py + ``` + After successful conversion, the `DeepSeek-R1-Distill-Qwen-1.5B.rkllm` model will be generated. + +### (Optional) Compile the Executable File + +- Download the cross-compilation toolchain [gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu](https://developer.arm.com/downloads/-/gnu-a/10-2-2020-11). +- Modify the main program code in `rknn-llm/examples/rkllm_api_demo/src/llm_demo.cpp`, specifically the PROMPT format and construction: + + ```cpp + 24 #define PROMPT_TEXT_PREFIX "<|im_start|>system\nYou are a helpful assistant.\n<|im_end|>\n<|im_start|>user\n" + 25 #define PROMPT_TEXT_POSTFIX "\n<|im_end|>\n<|im_start|>assistant\n" + 184 text = PROMPT_TEXT_PREFIX + input_str + PROMPT_TEXT_POSTFIX; + 185 // text = input_str; + ``` + + :::tip + Why modify `PROMPT_TEXT_PREFIX` and `PROMPT_TEXT_POSTFIX`? Refer to Table 1 in the [DeepSeek-R1 Paper](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf) for the required format when querying the DeepSeek-R1 model. + ::: + ![rkllm_2.webp](/img/general-tutorial/rknn/rkllm_ds_2.webp) + +- Modify the `GCC_COMPILER_PATH` in the `rknn-llm/examples/rkllm_api_demo/build-linux.sh` script: + ```bash + GCC_COMPILER_PATH=gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu + ``` +- Run the model conversion script: + ```bash + cd rknn-llm/examples/rkllm_api_demo/ + bash build-linux.sh + ``` + The executable file will be generated at `build/build_linux_aarch64_Release/llm_demo`. + +### On-Board Deployment + +#### Terminal Mode + +- Copy the converted `DeepSeek-R1-Distill-Qwen-1.5B.rkllm` model and the compiled `llm_demo` binary to the board. +- Set the environment variable: + ```bash + export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:rknn-llm/rkllm-runtime/Linux/librkllm_api/aarch64 + ``` + :::tip + Users who downloaded from ModelScope can directly export the `librkllmrt.so` from the downloaded repository. + ::: +- Run `llm_demo`. Type `exit` to quit: + ```bash + export RKLLM_LOG_LEVEL=1 + ./llm_demo DeepSeek-R1-Distill-Qwen-1.5B.rkllm 10000 10000 + ``` + ![rkllm_2.webp](/img/general-tutorial/rknn/rkllm_ds_3.webp) + +### Performance Analysis + +For the math problem: `Solve the equations x+y=12, 2x+4y=34, find the values of x and y`, the RK3588 achieves 14.93 tokens per second. +| Stage | Total Time (ms) | Tokens | Time per Token (ms) | Tokens per Second | +|----------|-----------------|--------|---------------------|-------------------| +| Prefill | 429.63 | 81 | 5.30 | 188.53 | +| Generate | 56103.71 | 851 | 66.99 | 14.93 | diff --git a/i18n/en/docusaurus-plugin-content-docs/current/common/dev/_rkllm-install.mdx b/i18n/en/docusaurus-plugin-content-docs/current/common/dev/_rkllm-install.mdx index f7a92cb66..2e78c624b 100644 --- a/i18n/en/docusaurus-plugin-content-docs/current/common/dev/_rkllm-install.mdx +++ b/i18n/en/docusaurus-plugin-content-docs/current/common/dev/_rkllm-install.mdx @@ -6,15 +6,17 @@ RKLLM helps users deploy LLM models to Rockchip chips quickly. Currently, it sup ### Currently Supported Models -- [TinyLLAMA 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0/tree/fe8a4ea1ffedaf415f4da2f062534de366a451e6) -- [Qwen 1.8B](https://huggingface.co/Qwen/Qwen-1_8B-Chat/tree/1d0f68de57b88cfde81f3c3e537f24464d889081) -- [Qwen2 0.5B](https://huggingface.co/Qwen/Qwen1.5-0.5B/tree/8f445e3628f3500ee69f24e1303c9f10f5342a39) -- [Phi-2 2.7B](https://hf-mirror.com/microsoft/phi-2/tree/834565c23f9b28b96ccbeabe614dd906b6db551a) -- [Phi-3 3.8B](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/tree/291e9e30e38030c23497afa30f3af1f104837aa6) -- [ChatGLM3 6B](https://huggingface.co/THUDM/chatglm3-6b/tree/103caa40027ebfd8450289ca2f278eac4ff26405) -- [Gemma 2B](https://huggingface.co/google/gemma-2b-it/tree/de144fb2268dee1066f515465df532c05e699d48) -- [InternLM2 1.8B](https://huggingface.co/internlm/internlm2-chat-1_8b/tree/ecccbb5c87079ad84e5788baa55dd6e21a9c614d) -- [MiniCPM 2B](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16/tree/79fbb1db171e6d8bf77cdb0a94076a43003abd9e) +- [LLAMA models](https://huggingface.co/meta-llama) +- [TinyLLAMA models](https://huggingface.co/TinyLlama) +- [Qwen models](https://huggingface.co/models?search=Qwen/Qwen) +- [Phi models](https://huggingface.co/models?search=microsoft/phi) +- [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b/tree/103caa40027ebfd8450289ca2f278eac4ff26405) +- [Gemma models](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315) +- [InternLM2 models](https://huggingface.co/collections/internlm/internlm2-65b0ce04970888799707893c) +- [MiniCPM models](https://huggingface.co/collections/openbmb/minicpm-65d48bf958302b9fd25b698f) +- [TeleChat models](https://huggingface.co/Tele-AI) +- [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) +- [MiniCPM-V](https://huggingface.co/openbmb/MiniCPM-V-2_6) ## Install RKLLM @@ -40,7 +42,7 @@ To use RKNPU, users first need to run the RKLLM-Toolkit tool on an x86 workstati - Create a conda environment ```bash - conda create -n rkllm python=3.8 + conda create -n rkllm python=3.8.2 ``` - Activate the rkllm conda environment @@ -58,8 +60,8 @@ To use RKNPU, users first need to run the RKLLM-Toolkit tool on an x86 workstati - RKLLM-Toolkit is a software development package for converting and quantizing Huggingface format LLM models on a PC. ```bash - git clone -b release-v1.0.1 https://github.com/airockchip/rknn-llm.git - pip3 install ./rknn-llm/rkllm-toolkit/packages/rkllm_toolkit-1.0.1-cp38-cp38-linux_x86_64.whl + git clone -b release-v1.1.4 https://github.com/airockchip/rknn-llm.git + pip3 install ./rknn-llm/rkllm-toolkit/packages/rkllm_toolkit-1.1.4-cp38-cp38-linux_x86_64.whl ``` If the following command runs without errors, the installation is successful: @@ -72,10 +74,13 @@ To use RKNPU, users first need to run the RKLLM-Toolkit tool on an x86 workstati ### Development Board - Check if the NPU driver version is 0.9.6 or higher. If it is lower, download and flash the latest Radxa 6.1 firmware. + :::tip + The default NPU driver version of radxa 6.1 firmware is 0.9.6, please update the system to 0.9.8 driver by: `sudo rsetup -> System -> System Update`. + ::: ```bash $ sudo cat /sys/kernel/debug/rknpu/version - RKNPU driver: v0.9.6 + RKNPU driver: v0.9.8 ``` (Optional) Manually compile the NPU kernel @@ -93,5 +98,5 @@ To use RKNPU, users first need to run the RKLLM-Toolkit tool on an x86 workstati - RKLLM Runtime provides C/C++ programming interfaces for the Rockchip NPU platform to help users deploy RKLLM models and accelerate LLM applications. ```bash - git clone https://github.com/airockchip/rknn-llm.git + git clone -b release-v1.1.4 https://github.com/airockchip/rknn-llm.git ``` diff --git a/i18n/en/docusaurus-plugin-content-docs/current/common/dev/_rkllm-usage.mdx b/i18n/en/docusaurus-plugin-content-docs/current/common/dev/_rkllm-usage.mdx index a39f869c6..a74feeda8 100644 --- a/i18n/en/docusaurus-plugin-content-docs/current/common/dev/_rkllm-usage.mdx +++ b/i18n/en/docusaurus-plugin-content-docs/current/common/dev/_rkllm-usage.mdx @@ -2,17 +2,19 @@ This document explains how to deploy large language models in Huggingface format #### Currently Supported Models -- [TinyLLAMA 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0/tree/fe8a4ea1ffedaf415f4da2f062534de366a451e6) -- [Qwen 1.8B](https://huggingface.co/Qwen/Qwen-1_8B-Chat/tree/1d0f68de57b88cfde81f3c3e537f24464d889081) -- [Qwen2 0.5B](https://huggingface.co/Qwen/Qwen1.5-0.5B/tree/8f445e3628f3500ee69f24e1303c9f10f5342a39) -- [Phi-2 2.7B](https://hf-mirror.com/microsoft/phi-2/tree/834565c23f9b28b96ccbeabe614dd906b6db551a) -- [Phi-3 3.8B](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/tree/291e9e30e38030c23497afa30f3af1f104837aa6) -- [ChatGLM3 6B](https://huggingface.co/THUDM/chatglm3-6b/tree/103caa40027ebfd8450289ca2f278eac4ff26405) -- [Gemma 2B](https://huggingface.co/google/gemma-2b-it/tree/de144fb2268dee1066f515465df532c05e699d48) -- [InternLM2 1.8B](https://huggingface.co/internlm/internlm2-chat-1_8b/tree/ecccbb5c87079ad84e5788baa55dd6e21a9c614d) -- [MiniCPM 2B](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16/tree/79fbb1db171e6d8bf77cdb0a94076a43003abd9e) - -This guide uses TinyLLAMA 1.1B as an example to show how to deploy a large language model from scratch on a development board equipped with the RK3588 chip and use the NPU for hardware-accelerated inference. +- [LLAMA models](https://huggingface.co/meta-llama) +- [TinyLLAMA models](https://huggingface.co/TinyLlama) +- [Qwen models](https://huggingface.co/models?search=Qwen/Qwen) +- [Phi models](https://huggingface.co/models?search=microsoft/phi) +- [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b/tree/103caa40027ebfd8450289ca2f278eac4ff26405) +- [Gemma models](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315) +- [InternLM2 models](https://huggingface.co/collections/internlm/internlm2-65b0ce04970888799707893c) +- [MiniCPM models](https://huggingface.co/collections/openbmb/minicpm-65d48bf958302b9fd25b698f) +- [TeleChat models](https://huggingface.co/Tele-AI) +- [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) +- [MiniCPM-V](https://huggingface.co/openbmb/MiniCPM-V-2_6) + +This guide uses [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) as an example to show how to deploy a large language model from scratch on a development board equipped with the RK3588 chip and use the NPU for hardware-accelerated inference. :::tip If the RKLLM environment is not installed and configured, please refer to [RKLLM Installation](rkllm_install). @@ -20,24 +22,25 @@ If the RKLLM environment is not installed and configured, please refer to [RKLLM ### Model Conversion -Using TinyLLAMA 1.1B as an example, users can also choose any of the links in the [currently supported models](#currently-supported-models) list. +Using [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) as an example, users can also choose any of the links in the [currently supported models](#currently-supported-models) list. -- Download all files of [TinyLLAMA 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) on an x86 PC workstation. If [git-lfs](https://git-lfs.com/) is not installed, please install it. +- Download all files of [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) on an x86 PC workstation. If [git-lfs](https://git-lfs.com/) is not installed, please install it. ```bash - git clone https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0 + git clone https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct ``` - Activate the rkllm conda environment. Refer to [RKLLM conda Installation](rkllm_install#x86-pc-workstation) if needed. ```bash conda activate rkllm ``` -- Change the model path and rkllm export path in `rknn-llm/rkllm-toolkit/examples/huggingface/test.py`. +- Change modelpath model path, dataset path, rkllm export path in `rknn-llm/rkllm-toolkit/examples/test.py`. ```python - modelpath = 'Your Huggingface LLM model' - ret = llm.export_rkllm("./Your_Huggingface_LLM_model.rkllm") + 15 modelpath = 'Your Huggingface LLM model' + 29 datasert = None # 默认是 "./data_quant.json", 如无可以填写 None + 83 ret = llm.export_rkllm("./Your_Huggingface_LLM_model.rkllm") ``` - Run the model conversion script. ```bash - cd rknn-llm/rkllm-toolkit/examples/huggingface + cd rknn-llm/rkllm-toolkit/examples/ python3 test.py ``` After successful conversion, you will get an rkllm model. @@ -45,125 +48,124 @@ Using TinyLLAMA 1.1B as an example, users can also choose any of the links in th ### Compile Executable File - Download the cross-compilation toolchain [gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu](https://developer.arm.com/downloads/-/gnu-a/10-2-2020-11). -- Modify the main program code. Here are two changes: +- Modify the main program `rknn-llm/examples/rkllm_api_demo/src/llm_demo.cpp` code, change two places here. ```vim - 74 param.num_npu_core = 3; // The value range for rk3588 num_npu_core is [1,3] - 118 string text = PROMPT_TEXT_PREFIX + input_str + PROMPT_TEXT_POSTFIX; - 119 // string text = input_str; + 184 text = PROMPT_TEXT_PREFIX + input_str + PROMPT_TEXT_POSTFIX; + 185 // text = input_str; ``` -- Modify the gcc path in the `rknn-llm/rkllm-runtime/examples/rkllm_api_demo/build-linux.sh` compilation script. +- Modify the GCC_COMPILER_PATH in the `rknn-llm/examples/rkllm_api_demo/build-linux.sh` compilation script. ```bash GCC_COMPILER_PATH=gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu ``` - Run the model conversion script. ```bash - cd rknn-llm/rkllm-runtime/examples/rkllm_api_demo + cd rknn-llm/examples/rkllm_api_demo/ bash build-linux.sh ``` The generated executable file is located in `build/build_linux_aarch64_Release/llm_demo`. ### Board Deployment -#### Local Terminal Mode +#### Terminal Mode - Copy the converted rkllm model and the compiled binary file llm_demo to the board. - Import environment variables. ```bash - ulimit -n 102400 - export LD_LIBRARY_PATH=rknn-llm/rkllm-runtime/runtime/Linux/librkllm_api/aarch64:$LD_LIBRARY_PATH + export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:rknn-llm/rkllm-runtime/Linux/librkllm_api/aarch64 ``` - Run llm_demo and enter `exit` to quit. ```bash - taskset f0 ./llm_demo your_rkllm_path + export RKLLM_LOG_LEVEL=1 + ./llm_demo your_rkllm_path 10000 10000 ``` ![rkllm_2.webp](/img/general-tutorial/rknn/rkllm_2.webp) -#### Gradio Mode - -##### Server Side - -- Install gradio. - ```bash - pip3 install gradio - ``` -- Copy `librkllmrt.so` to `rkllm_server/lib`. - ```bash - cd rknn-llm/rkllm-runtime - cp ./runtime//Linux/librkllm_api/aarch64/librkllmrt.so ./examples/rkllm_server_demo/rkllm_server/lib - ``` -- Modify gradio_server.py to disable GPU for prefill acceleration. - ```python - rknnllm_param.use_gpu = False - ``` -- Start the gradio server. - ```bash - cd examples/rkllm_server_demo/rkllm_server - python3 gradio_server.py --target_platform rk3588 --rkllm_model_path your_model_path - ``` -- Access the development board's IP port 8080 in your browser. - ![rkllm_3.webp](/img/general-tutorial/rknn/rkllm_3.webp) - -##### Client Side - -After starting the gradio server on the development board, users can call the LLM gradio server through the Gradio API on other devices in the same network environment. - -- Install gradio_client. - ```bash - pip3 install gradio_client - ``` -- Modify the IP address in chat_api_gradio.py. Users need to adjust this according to their deployment's specific address. - ```python - # Users need to modify according to their deployment's specific IP - client = Client("http://192.168.2.209:8080/") - ``` -- Run chat_api_gradio.py. - ```bash - cd rknn-llm/rkllm-runtime/examples/rkllm_server_demo - python3 chat_api_gradio.py - ``` - ![rkllm_4.webp](/img/general-tutorial/rknn/rkllm_4.webp) - -#### Flask Mode - -##### Server Side - -- Install flask. - ```bash - pip3 install flask==2.2.2 Werkzeug==2.2.2 - ``` -- Copy `librkllmrt.so` to `rkllm_server/lib`. - ```bash - cd rknn-llm/rkllm-runtime - cp ./runtime//Linux/librkllm_api/aarch64/librkllmrt.so ./examples/rkllm_server_demo/rkllm_server/lib - ``` -- Modify flask_server.py to disable GPU for prefill acceleration. - ```python - rknnllm_param.use_gpu = False - ``` -- Start the flask server on port 8080. - ```bash - cd examples/rkllm_server_demo/rkllm_server - python3 flask_server.py --target_platform rk3588 --rkllm_model_path your_model_path - ``` - ![rkllm_5.webp](/img/general-tutorial/rknn/rkllm_5.webp) - -##### Client Side - -After starting the flask server on the development board, users can call the flask server through the flask API on other devices in the same network environment. Users can refer to this API access example to develop custom functions, using the corresponding send/receive structures for data packaging and parsing. - -- Modify the IP address in chat_api_flask.py. Users need to adjust this according to their deployment's specific address. - ```python - # Users need to modify according to their deployment's specific IP - server_url = 'http://192.168.2.209:8080/rkllm_chat' - ``` -- Run chat_api_flask.py. - ```bash - cd rknn-llm/rkllm-runtime/examples/rkllm_server_demo - python3 chat_api_flask.py - ``` - ![rkllm_6.webp](/img/general-tutorial/rknn/rkllm_6.webp) - -### Performance Comparison of Some Models +{/* #### Gradio Mode */} + +{/* ##### Server Side */} + +{/* - Install gradio. */} +{/* ```bash */} +{/* pip3 install gradio */} +{/* ``` */} +{/* - Copy `librkllmrt.so` to `rkllm_server/lib`. */} +{/* ```bash */} +{/* cd rknn-llm/rkllm-runtime */} +{/* cp ./runtime//Linux/librkllm_api/aarch64/librkllmrt.so ./examples/rkllm_server_demo/rkllm_server/lib */} +{/* ``` */} +{/* - Modify gradio_server.py to disable GPU for prefill acceleration. */} +{/* ```python */} +{/* rknnllm_param.use_gpu = False */} +{/* ``` */} +{/* - Start the gradio server. */} +{/* ```bash */} +{/* cd examples/rkllm_server_demo/rkllm_server */} +{/* python3 gradio_server.py --target_platform rk3588 --rkllm_model_path your_model_path */} +{/* ``` */} +{/* - Access the development board's IP port 8080 in your browser. */} +{/* ![rkllm_3.webp](/img/general-tutorial/rknn/rkllm_3.webp) */} + +{/* ##### Client Side */} + +{/* After starting the gradio server on the development board, users can call the LLM gradio server through the Gradio API on other devices in the same network environment. */} + +{/* - Install gradio_client. */} +{/* ```bash */} +{/* pip3 install gradio_client */} +{/* ``` */} +{/* - Modify the IP address in chat_api_gradio.py. Users need to adjust this according to their deployment's specific address. */} +{/* ```python */} +{/* # Users need to modify according to their deployment's specific IP */} +{/* client = Client("http://192.168.2.209:8080/") */} +{/* ``` */} +{/* - Run chat_api_gradio.py. */} +{/* ```bash */} +{/* cd rknn-llm/rkllm-runtime/examples/rkllm_server_demo */} +{/* python3 chat_api_gradio.py */} +{/* ``` */} +{/* ![rkllm_4.webp](/img/general-tutorial/rknn/rkllm_4.webp) */} + +{/* #### Flask Mode */} + +{/* ##### Server Side */} + +{/* - Install flask. */} +{/* ```bash */} +{/* pip3 install flask==2.2.2 Werkzeug==2.2.2 */} +{/* ``` */} +{/* - Copy `librkllmrt.so` to `rkllm_server/lib`. */} +{/* ```bash */} +{/* cd rknn-llm/rkllm-runtime */} +{/* cp ./runtime//Linux/librkllm_api/aarch64/librkllmrt.so ./examples/rkllm_server_demo/rkllm_server/lib */} +{/* ``` */} +{/* - Modify flask_server.py to disable GPU for prefill acceleration. */} +{/* ```python */} +{/* rknnllm_param.use_gpu = False */} +{/* ``` */} +{/* - Start the flask server on port 8080. */} +{/* ```bash */} +{/* cd examples/rkllm_server_demo/rkllm_server */} +{/* python3 flask_server.py --target_platform rk3588 --rkllm_model_path your_model_path */} +{/* ``` */} +{/* ![rkllm_5.webp](/img/general-tutorial/rknn/rkllm_5.webp) */} + +{/* ##### Client Side */} + +{/* After starting the flask server on the development board, users can call the flask server through the flask API on other devices in the same network environment. Users can refer to this API access example to develop custom functions, using the corresponding send/receive structures for data packaging and parsing. */} + +{/* - Modify the IP address in chat_api_flask.py. Users need to adjust this according to their deployment's specific address. */} +{/* ```python */} +{/* # Users need to modify according to their deployment's specific IP */} +{/* server_url = 'http://192.168.2.209:8080/rkllm_chat' */} +{/* ``` */} +{/* - Run chat_api_flask.py. */} +{/* ```bash */} +{/* cd rknn-llm/rkllm-runtime/examples/rkllm_server_demo */} +{/* python3 chat_api_flask.py */} +{/* ``` */} +{/* ![rkllm_6.webp](/img/general-tutorial/rknn/rkllm_6.webp) */} + +### Performance Comparison of Models | Model | Parameter Size | Chip | Chip Count | Inference Speed | | --------- | -------------- | ------ | ---------- | --------------- | diff --git a/i18n/en/docusaurus-plugin-content-docs/current/compute-module/cm5/radxa-os/app-dev/rkllm_deepseek_r1.md b/i18n/en/docusaurus-plugin-content-docs/current/compute-module/cm5/radxa-os/app-dev/rkllm_deepseek_r1.md new file mode 100644 index 000000000..5188c668b --- /dev/null +++ b/i18n/en/docusaurus-plugin-content-docs/current/compute-module/cm5/radxa-os/app-dev/rkllm_deepseek_r1.md @@ -0,0 +1,9 @@ +--- +sidebar_position: 23 +--- + +# RKLLM DeepSeek-R1 + +import RKLLMDEEPSEEKR1 from '../../../../common/dev/\_rkllm-deepseek-r1.mdx'; + + diff --git a/i18n/en/docusaurus-plugin-content-docs/current/compute-module/nx5/radxa-os/app-dev/rkllm_deepseek_r1.md b/i18n/en/docusaurus-plugin-content-docs/current/compute-module/nx5/radxa-os/app-dev/rkllm_deepseek_r1.md new file mode 100644 index 000000000..5188c668b --- /dev/null +++ b/i18n/en/docusaurus-plugin-content-docs/current/compute-module/nx5/radxa-os/app-dev/rkllm_deepseek_r1.md @@ -0,0 +1,9 @@ +--- +sidebar_position: 23 +--- + +# RKLLM DeepSeek-R1 + +import RKLLMDEEPSEEKR1 from '../../../../common/dev/\_rkllm-deepseek-r1.mdx'; + + diff --git a/i18n/en/docusaurus-plugin-content-docs/current/rock5/rock5a/app-development/rkllm_deepseek_r1.md b/i18n/en/docusaurus-plugin-content-docs/current/rock5/rock5a/app-development/rkllm_deepseek_r1.md new file mode 100644 index 000000000..ff12ca8db --- /dev/null +++ b/i18n/en/docusaurus-plugin-content-docs/current/rock5/rock5a/app-development/rkllm_deepseek_r1.md @@ -0,0 +1,9 @@ +--- +sidebar_position: 23 +--- + +# RKLLM DeepSeek-R1 + +import RKLLMDEEPSEEKR1 from '../../../common/dev/\_rkllm-deepseek-r1.mdx'; + + diff --git a/i18n/en/docusaurus-plugin-content-docs/current/rock5/rock5b/app-development/rkllm_deepseek_r1.md b/i18n/en/docusaurus-plugin-content-docs/current/rock5/rock5b/app-development/rkllm_deepseek_r1.md new file mode 100644 index 000000000..ff12ca8db --- /dev/null +++ b/i18n/en/docusaurus-plugin-content-docs/current/rock5/rock5b/app-development/rkllm_deepseek_r1.md @@ -0,0 +1,9 @@ +--- +sidebar_position: 23 +--- + +# RKLLM DeepSeek-R1 + +import RKLLMDEEPSEEKR1 from '../../../common/dev/\_rkllm-deepseek-r1.mdx'; + + diff --git a/i18n/en/docusaurus-plugin-content-docs/current/rock5/rock5c/app-development/rkllm_deepseek_r1.md b/i18n/en/docusaurus-plugin-content-docs/current/rock5/rock5c/app-development/rkllm_deepseek_r1.md new file mode 100644 index 000000000..ff12ca8db --- /dev/null +++ b/i18n/en/docusaurus-plugin-content-docs/current/rock5/rock5c/app-development/rkllm_deepseek_r1.md @@ -0,0 +1,9 @@ +--- +sidebar_position: 23 +--- + +# RKLLM DeepSeek-R1 + +import RKLLMDEEPSEEKR1 from '../../../common/dev/\_rkllm-deepseek-r1.mdx'; + + diff --git a/i18n/en/docusaurus-plugin-content-docs/current/rock5/rock5itx/app-development/rkllm_deepseek_r1.md b/i18n/en/docusaurus-plugin-content-docs/current/rock5/rock5itx/app-development/rkllm_deepseek_r1.md new file mode 100644 index 000000000..ff12ca8db --- /dev/null +++ b/i18n/en/docusaurus-plugin-content-docs/current/rock5/rock5itx/app-development/rkllm_deepseek_r1.md @@ -0,0 +1,9 @@ +--- +sidebar_position: 23 +--- + +# RKLLM DeepSeek-R1 + +import RKLLMDEEPSEEKR1 from '../../../common/dev/\_rkllm-deepseek-r1.mdx'; + + diff --git a/i18n/en/docusaurus-plugin-content-docs/current/rock5/rock5t/app-development/rkllm_deepseek_r1.md b/i18n/en/docusaurus-plugin-content-docs/current/rock5/rock5t/app-development/rkllm_deepseek_r1.md new file mode 100644 index 000000000..ff12ca8db --- /dev/null +++ b/i18n/en/docusaurus-plugin-content-docs/current/rock5/rock5t/app-development/rkllm_deepseek_r1.md @@ -0,0 +1,9 @@ +--- +sidebar_position: 23 +--- + +# RKLLM DeepSeek-R1 + +import RKLLMDEEPSEEKR1 from '../../../common/dev/\_rkllm-deepseek-r1.mdx'; + + diff --git a/static/img/general-tutorial/rknn/rkllm_2.webp b/static/img/general-tutorial/rknn/rkllm_2.webp index 33141896b..4ce65ace2 100644 Binary files a/static/img/general-tutorial/rknn/rkllm_2.webp and b/static/img/general-tutorial/rknn/rkllm_2.webp differ diff --git a/static/img/general-tutorial/rknn/rkllm_ds_1.webp b/static/img/general-tutorial/rknn/rkllm_ds_1.webp new file mode 100644 index 000000000..34ca3179a Binary files /dev/null and b/static/img/general-tutorial/rknn/rkllm_ds_1.webp differ diff --git a/static/img/general-tutorial/rknn/rkllm_ds_2.webp b/static/img/general-tutorial/rknn/rkllm_ds_2.webp new file mode 100644 index 000000000..346e7f751 Binary files /dev/null and b/static/img/general-tutorial/rknn/rkllm_ds_2.webp differ diff --git a/static/img/general-tutorial/rknn/rkllm_ds_3.webp b/static/img/general-tutorial/rknn/rkllm_ds_3.webp new file mode 100644 index 000000000..8ff57e9f2 Binary files /dev/null and b/static/img/general-tutorial/rknn/rkllm_ds_3.webp differ