-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Adapt the MiniCPM-2B model #164
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
image build failure is a know issue. I'll fix later.
@@ -18,4 +18,6 @@ | |||
|
|||
def register(): | |||
"""Register the NPU platform.""" | |||
# To ensure that the module is correctly replaced, add it at the beginning | |||
import vllm_ascend.patch_module # noqa: F401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not suggested to add any imprt vllm
in register
fun to avoid circle import problem. You can just add this to platform.check_and_update_config, or like you did below in worker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to add it to platform.check_and_update_config or worker at first, but it executes after importing the module we need to replace. Then I tried to add it to platform.pre_register_and_update, it executes before importing the module, but maybe other modules already reference old module objects, replacing sys.modules directly does not update their references, it doesn't work. Do you have any suggestions to solve this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, I just notice this is to patch vllm.model_executor.layers.fused_moe.fused_moe
. It's fine currently. But for long term support, there are two way to make it better:
- rewrite mincpm and use ModelRegistery to register a new one in vllm-ascend
- contribute to vllm to make the fused_moe code compatible with non-triton supported hardware.
Please rebase the code and fix the lint error. otherwise LGTM |
Signed-off-by: Wang Kunpeng <1289706727@qq.com>
ccc2ffa
to
f96829c
Compare
Done. Thanks a lot |
What this PR does / why we need it?
In order to adapt the MiniCPM-2B model of the vLLM framework to Ascend hardware, the following two modifications are required: 1. The qkv type conversion in the forward function is deleted to support the rope operator. 2. The fused_moe.fused_moe module is replaced to prevent errors caused by triton not supported,this modification can be deleted if triton is supported
Does this PR introduce any user-facing change?
How was this patch tested?