-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ray]Ray Patch #92
base: main
Are you sure you want to change the base?
[Ray]Ray Patch #92
Conversation
monkey patch is not suggested. Please try to find a better way to fix the issue. |
In this PR, vLLM removes the platform detection during import. For example, in Currently, I have considered the following three solutions:
|
### What this PR does / why we need it? In the case where `backend = ray`, only the main process completes the `forward_oot` call, while the other worker processes call `forward_native`. (This bug should also exist when `backend = mp`.) ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? **Environment:** CANN: 8.0.0 PyTorch: 2.5.1 Torch: 2.5.1rc1 python: 3.10 python: 3.10 vllm: branch main vllm-ascend: branch main The current implementation avoids the Ray Worker initialization issue, as addressed in the [PR](#92). Then, during the `forward_oot` call, logging will be performed. **Script:** ```bash python examples/offline_distributed_inference_npu.py ``` **Result:** ```bash NPURayWorkerWrapper pid=3984223) forward_oot run. ############################################# (NPURayWorkerWrapper pid=3984223) forward_oot run. ############################################# (NPURayWorkerWrapper pid=3984223) forward_oot run. ############################################# (NPURayWorkerWrapper pid=3984223) forward_oot run. ############################################# (NPURayWorkerWrapper pid=3984223) forward_oot run. ############################################# forward_oot run. ############################################# forward_oot run. ############################################# Processed prompts: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00, 1.96s/it, est. speed input: 2.80 toks/s, output: 51.00 toks/s] Prompt: 'Hello, my name is', Generated text: ' Alex and I am a 16 year old male. I have been diagnosed with a rare genetic disorder called X-linked recessive. I have been told that I will not be able to have children. I have been told that I will not be able to have children because of the X-linked recessive disorder. I have been told that I will not be able to have children because of the X-linked recessive disorder. I have been told that I will not be able to have children because of' Prompt: 'The president of the United States is', Generated text: ' Statesman. He is the leader of the country. He is the one who makes the decisions. He is the one who makes the laws. He is the one who makes the rules. He is the one who makes the country strong. He is the one who makes the country happy. He is the one who makes the country safe. He is the one who makes the country free. He is the one who makes the country beautiful. He is the one who makes the country great. He is' Prompt: 'The capital of France is', Generated text: ' the city of Paris. It is the largest city in France and the second largest city in Europe. It is located in the center of the country, in the south of the country. It is situated on the banks of the Seine River, which flows through the city. The city is surrounded by the Alps and the Pyrenees mountains. The city is also surrounded by the Mediterranean Sea. The city is known for its beautiful architecture, its museums, its parks, and its food. Paris is' Prompt: 'The future of AI is', Generated text: ' following the path of the internet, and the internet is following the path of the web. The web is a network of interconnected web pages, and the internet is a network of interconnected computers. The web is a network of interconnected computers, and the internet is a network of interconnected computers. The web is a network of interconnected computers, and the internet is a network of interconnected computers. The web is a network of interconnected computers, and the internet is a network of interconnected computers. The web is a network' ``` --------- Signed-off-by: Chenguang Li <757486878@qq.com>
Signed-off-by: Chenguang Li <757486878@qq.com>
d7fcb01
to
04c8d30
Compare
Signed-off-by: Chenguang Li <757486878@qq.com>
Signed-off-by: Chenguang Li <757486878@qq.com>
Signed-off-by: Chenguang Li <757486878@qq.com>
### What this PR does / why we need it? This PR resolves the issue with inference on the Ray backend. For more details, see [here](#92). ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? Validation was performed based on v0.7.3, and the specific validation script can be found [here](#92). --------- Signed-off-by: Chenguang Li <757486878@qq.com>
What this PR does / why we need it?
This PR enables vLLM to perform inference using the Ray backend.
The current issues encountered when running vLLM with Ray as the backend are as follows:
Script:
Result:
This issue occurs because Ray serializes and deserializes the
RayWorkerWrapper
class when passing it to other worker processes for execution. However, during execution, the requiredimport torch_npu
is missing, leading to an error.We define a class
NPURayWorkerWrapper
that inherits fromRayWorkerWrapper
and use a monkey patch to importtorch_npu
.As shown in the figure below.

Does this PR introduce any user-facing change?
no.
How was this patch tested?
Environment:
CANN: 8.0.0
PyTorch: 2.5.1
Torch: 2.5.1rc1
python: 3.10
vllm: branch main
vllm-ascend: branch main
Script:
Result: