-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
e34d7b3
commit 7070f97
Showing
5 changed files
with
280 additions
and
242 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
FAQ | ||
======= | ||
|
||
设备指定 | ||
-------- | ||
|
||
**Q:为什么我的 NPU 卡没调用起来?** | ||
|
||
1. 通过 ``ASCEND_RT_VISIBLE_DEVICES`` 环境变量指定昇腾 NPU 卡,如 ``ASCEND_RT_VISIBLE_DEVICES=0,1,2,3`` 指定使用 0,1,2,3四张 NPU 卡进行微调/推理。 | ||
|
||
.. hint:: | ||
|
||
昇腾 NPU 卡从 0 开始编号,docker 容器内也是如此; | ||
如映射物理机上的 6,7 号 NPU 卡到容器内使用,其对应的卡号分别为 0,1 | ||
|
||
2. 检查是否安装 torch-npu,建议通过 ``pip install -e '.[torch-npu,metrics]'`` 安装 LLaMA-Factory。 | ||
|
||
推理报错 | ||
---------- | ||
|
||
**Q:使用昇腾 NPU 推理报错 RuntimeError: ACL stream synchronize failed, error code:507018** | ||
|
||
A:设置 do_sample: false,取消随机抽样策略 | ||
|
||
关联 issues: | ||
|
||
- https://github.com/hiyouga/LLaMA-Factory/issues/3840 | ||
|
||
微调/训练报错 | ||
-------------- | ||
|
||
**Q:使用 ChatGLM 系列模型微调/训练模型时,报错 NotImplementedError: Unknown device for graph fuser** | ||
|
||
A:在 modelscope 或 huggingface 下载的 repo 里修改 ``modeling_chatglm.py`` 代码,取消 torch.jit 装饰器注释 | ||
|
||
关联 issues: | ||
|
||
- https://github.com/hiyouga/LLaMA-Factory/issues/3788 | ||
- https://github.com/hiyouga/LLaMA-Factory/issues/4228 | ||
|
||
|
||
**Q:微调/训练启动后,HCCL 报错,包含如下关键信息:** | ||
|
||
.. code-block:: shell | ||
RuntimeError: [ERROR] HCCL error in: torch_npu/csrc/distributed/ProcessGroupHCCL.cpp:64 | ||
[ERROR] 2024-05-21-11:57:54 (PID:927000, Device:3, RankID:3) ERR02200 DIST call hccl api failed. | ||
EJ0001: 2024-05-21-11:57:54.167.645 Failed to initialize the HCCP process. Reason: Maybe the last training process is running. | ||
Solution: Wait for 10s after killing the last training process and try again. | ||
TraceBack (most recent call last): | ||
tsd client wait response fail, device response code[1]. unknown device error.[FUNC:WaitRsp][FILE:process_mode_manager.cpp][LINE:290] | ||
Fail to get sq reg virtual addr, deviceId=3, sqId=40.[FUNC:Setup][FILE:stream.cc][LINE:1102] | ||
stream setup failed, retCode=0x7020010.[FUNC:SyncGetDevMsg][FILE:api_impl.cc][LINE:4643] | ||
Sync get device msg failed, retCode=0x7020010.[FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4704] | ||
rtGetDevMsg execute failed, reason=[driver error:internal error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] | ||
A:杀掉 device 侧所有进程,等待 10s 后重新启动训练。 | ||
|
||
关联 issues: | ||
|
||
- https://github.com/hiyouga/LLaMA-Factory/issues/3839 | ||
|
||
.. **Q:微调 ChatGLM3 使用 fp16 报错 Gradient overflow. Skipping step Loss scaler reducing loss scale to ...;使用 bf16 时 'loss': 0.0, 'grad_norm': nan** | ||
.. https://github.com/hiyouga/LLaMA-Factory/issues/3308 | ||
**Q:使用 TeleChat 模型在昇腾 NPU 推理时,报错 AssertionError: Torch not compiled with CUDA enabled** | ||
|
||
A:此问题一般由代码中包含 cuda 相关硬编码造成,根据报错信息,找到 cuda 硬编码所在位置,对应修改为 NPU 代码。如 ``.cuda()`` 替换为 ``.npu()`` ; ``.to("cuda")`` 替换为 ``.to("npu")`` | ||
|
||
**Q:模型微调遇到报错 DeviceType must be NPU. Actual DeviceType is: cpu,例如下列报错信息** | ||
|
||
.. code-block:: shell | ||
File "/usr/local/pyenv/versions/3.10.13/envs/x/lib/python3.10/site-packages/transformers-4.41.1-py3.10.egg/transformers/generation/utils.py", line 1842, in generate | ||
result = self._sample( | ||
File "/usr/local/pyenv/versions/3.10.13/envs/x/lib/python3.10/site-packages/transformers-4.41.1-py3.10.egg/transformers/generation/utils.py", line 2568, in _sample | ||
next_tokens = next_tokens * unfinished_sequences + \ | ||
RuntimeError: t == c10::DeviceType::PrivateUse1 INTERNAL ASSERT FAILED at "third_party/op-plugin/op_plugin/ops/base_ops/opapi/MulKernelNpuOpApi.cpp":26, please report a bug to PyTorch. DeviceType must be NPU. Actual DeviceType is: cpu | ||
[ERROR] 2024-05-29-17:04:48 (PID:70209, Device:0, RankID:-1) ERR00001 PTA invalid parameter | ||
A:此类报错通常为部分 Tensor 未放到 NPU 上,请确保报错中算子所涉及的操作数均在 NPU 上。如上面的报错中,MulKernelNpuOpApi 算子为乘法算子,应确保 next_tokens 和 unfinished_sequences 均已放在 NPU 上。 | ||
.. **Q:单卡 NPU 情况下,使用 DeepSpeed 训练模型,报错 AttributeError :'GemmaForCausalLM'obiect has no attribute"save checkpoint",此处 GemmaForCausalLM 还可能为其他模型,详细报错如下图** | ||
**Q:单卡 NPU 情况下,使用 DeepSpeed 训练模型,报错 AttributeError :'GemmaForCausalLM'obiect has no attribute"save checkpoint",此处 GemmaForCausalLM 还可能为其他模型** | ||
.. .. figure:: ./images/lf-bugfix.png | ||
.. :align: center | ||
A:此问题一般为使用 ``python src/train.py`` 启动训练脚本或使用 ``llamafactory-cli train`` 的同时设置环境变量 ``FORCE_TORCHRUN`` 为 false 或 0 时出现。 | ||
由于 DeepSpeed 只对分布式 launcher 启动的程序中的模型用 ``DeepSpeedEngine`` 包装,包装后才有 ``save_checkpoint`` 等方法。 | ||
因此使用 ``torchrun`` 启动训练即可解决问题,即: | ||
.. code-block:: shell | ||
torchrun --nproc_per_node $NPROC_PER_NODE \ | ||
--nnodes $NNODES \ | ||
--node_rank $RANK \ | ||
--master_addr $MASTER_ADDR \ | ||
--master_port $MASTER_PORT \ | ||
src/train.py | ||
同时使用 ``llamafactory-cli train`` 和 DeepSpeed 时,LLaMA-Factory 将自动设置 ``FORCE_TORCHRUN`` 为 1,启动分布式训练。如果您的代码中没有这个功能,请更新 LLaMA-Factory 为最新代码。 | ||
关联 issue 及 PR: | ||
- https://github.com/hiyouga/LLaMA-Factory/issues/4077 | ||
- https://github.com/hiyouga/LLaMA-Factory/pull/4082 | ||
问题反馈 | ||
---------- | ||
如果您遇到任何问题,欢迎在 `官方社区 <https://github.com/hiyouga/LLaMA-Factory/issues/>`_ 提 issue,我们将第一时间进行响应。 | ||
*持续更新中 ...* | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,5 +4,6 @@ LLaMA-Factory | |
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
install.md | ||
quick_start.md | ||
install.rst | ||
quick_start.rst | ||
faq.rst |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,90 +1,59 @@ | ||
LLAMA-Factory × 昇腾 安装指南 | ||
=========================== | ||
安装指南 | ||
============== | ||
|
||
本教程面向使用 LLAMA-Factory & 昇腾的开发者,帮助完成昇腾环境下 LLaMA-Factory 的安装。 | ||
|
||
.. - [LLAMA-Factory × 昇腾 安装指南](#llama-factory--昇腾-安装指南) | ||
.. - [昇腾环境安装](#昇腾环境安装) | ||
.. - [LLaMA-Factory 安装](#llama-factory-安装) | ||
.. - [最简安装](#最简安装) | ||
.. - [推荐安装](#推荐安装) | ||
.. - [安装校验](#安装校验) | ||
昇腾环境安装 | ||
------------ | ||
|
||
请根据已有昇腾产品型号及CPU架构等按照 `快速安装昇腾环境指引 <https://ascend.github.io/docs/sources/ascend/quick_install.html>`_ 进行昇腾环境安装,或使用已安装好昇腾环境及 LLaMA-Factory 的 docker 镜像: | ||
请根据已有昇腾产品型号及CPU架构等按照 :doc:`快速安装昇腾环境指引 <../ascend/quick_install>` 进行昇腾环境安装,或使用已安装好昇腾环境及 LLaMA-Factory 的 docker 镜像: | ||
|
||
- `[32GB]LLaMA-Factory-Cann8-Python3.10-Pytorch2.2.0 <http://mirrors.cn-central-221.ovaijisuan.com/detail/130.html>`_ | ||
- TODO | ||
|
||
- `[64GB]LLaMA-Factory-Cann8-Python3.10-Pytorch2.2.0 <http://mirrors.cn-central-221.ovaijisuan.com/detail/131.html>`_ | ||
.. warning:: | ||
LLAMA-Factory 支持的 CANN 最低版本为 8.0.rc1 | ||
|
||
LLaMA-Factory 下载安装 | ||
Python 环境创建 | ||
---------------------- | ||
|
||
.. note:: | ||
如果你已经选择使用上述 docker 镜像,可忽略此步骤,直接开始 LLaMA-Factory 探索之旅。 | ||
|
||
准备好昇腾环境后,下面即可安装 LLaMA-Factory。推荐使用 conda 创建和管理 Python 虚拟环境,有关 conda 的使用方法不在本教程范围内,此处仅给出用到的指令,如有需要可到 [conda 用户指南](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html)中详细查阅。 | ||
如果你已经选择使用上述 docker 镜像,可忽略此步骤,直接开始使用 LLaMA-Factory。 | ||
|
||
.. code-block:: shell | ||
:linenos: | ||
# 创建 python 3.10 的虚拟环境 | ||
conda create -n <your_env_name> python=3.10 | ||
# 激活虚拟环境 | ||
conda activate <your_env_name> | ||
LLaMA-Factory 下载 | ||
~~~~~~~~~~~~~~~~~~~ | ||
LLaMA-Factory 安装 | ||
---------------------- | ||
|
||
从 `LLaMA-Factory github 官方仓库 <https://github.com/hiyouga/LLaMA-Factory>`_ 手动下载,或使用 git 拉取最新的 LLaMA-Factory 库: | ||
使用以下指令安装带有 torch-npu 的 LLaMA-Factory: | ||
|
||
.. code-block:: shell | ||
:linenos: | ||
git clone git@github.com:hiyouga/LLaMA-Factory.git | ||
最简安装 | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
完成 conda 虚拟环境的激活后,使用以下命令安装带有 torch-npu 的 LLaMA-Factory: | ||
.. code-block:: shell | ||
:linenos: | ||
pip install -e .[torch_npu,metrics] | ||
安装校验 | ||
---------------------- | ||
|
||
推荐安装 | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
推荐使用 deepspeed 、modelscope 功能,可在 ``[]`` 中继续添加依赖项安装,如: | ||
使用 ``llamafactory-cli env`` 指令对 LLaMA-Factory × 昇腾的安装进行校验,如下图所示,正确显示 LLaMA-Factory、PyTorch NPU 和 CANN 版本号及 NPU 型号等信息即说明安装成功。 | ||
|
||
.. code-block:: shell | ||
:linenos: | ||
pip install -e .[torch_npu,metrics,deepspeed,modelscope] | ||
根据 LLaMA-Factory 官方指引,现已支持的可选额外依赖项包括: | ||
|
||
> 可选的额外依赖项:torch、torch_npu、metrics、deepspeed、bitsandbytes、vllm、galore、badam、gptq、awq、aqlm、qwen、modelscope、quality | ||
|
||
可根据需要进行选择安装。 | ||
|
||
安装完成后出现 ``Successfully installed xxx xxx ...`` 关键回显信息即说明各依赖包安装成功,如遇依赖包版本冲突,可使用 ``pip install --no-deps -e .`` 安装。 | ||
|
||
### 安装校验 | ||
|
||
在[LLaMA-Factory 安装](#LLaMA-Factory 安装)中搭建好的 conda 虚拟环境下,使用 ``llamafactory-cli version`` 指令对 LLaMA-Factory × 昇腾的安装进行校验,如下图所示,正确显示 LLaMA-Factory 版本号说明 LLaMA-Factory 安装成功;显示 `Setting ds_accelerator to npu` 说明 deepspeed 及 npu 环境安装成功。 | ||
|
||
.. figure:: ./images/install_check.png | ||
:align: left | ||
|
||
|
||
.. note:: | ||
如果采用最简安装,未安装 deepspeed,则回显如下图: | ||
|
||
.. figure:: ./images/install_check_simple.png | ||
:align: center | ||
- `llamafactory` version: 0.8.2.dev0 | ||
- Platform: Linux-4.19.90-vhulk2211.3.0.h1543.eulerosv2r10.aarch64-aarch64-with-glibc2.31 | ||
- Python version: 3.10.14 | ||
- PyTorch version: 2.1.0 (NPU) | ||
- Transformers version: 4.41.2 | ||
- Datasets version: 2.19.2 | ||
- Accelerate version: 0.31.0 | ||
- PEFT version: 0.11.1 | ||
- TRL version: 0.9.4 | ||
- NPU type: xxx | ||
- CANN version: 8.0.RC2.alpha001 | ||
请愉快使用 LLaMA-Factory × 昇腾实现大语言模型微调、推理吧! |
Oops, something went wrong.