Skip to content

Commit

Permalink
Ray Worker Ops Optimization (#136)
Browse files Browse the repository at this point in the history
### What this PR does / why we need it?
In the case where `backend = ray`, only the main process completes the
`forward_oot` call, while the other worker processes call
`forward_native`. (This bug should also exist when `backend = mp`.)

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
**Environment:**

CANN: 8.0.0
PyTorch: 2.5.1
Torch: 2.5.1rc1
python: 3.10
python: 3.10
vllm: branch main 
vllm-ascend: branch main 
The current implementation avoids the Ray Worker initialization issue,
as addressed in the
[PR](#92). Then, during
the `forward_oot` call, logging will be performed.

**Script:**

```bash
python examples/offline_distributed_inference_npu.py
```

**Result:**
```bash
NPURayWorkerWrapper pid=3984223) forward_oot run. #############################################
(NPURayWorkerWrapper pid=3984223) forward_oot run. #############################################
(NPURayWorkerWrapper pid=3984223) forward_oot run. #############################################
(NPURayWorkerWrapper pid=3984223) forward_oot run. #############################################
(NPURayWorkerWrapper pid=3984223) forward_oot run. #############################################
forward_oot run. #############################################
forward_oot run. #############################################
Processed prompts: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00,  1.96s/it, est. speed input: 2.80 toks/s, output: 51.00 toks/s]
Prompt: 'Hello, my name is', Generated text: ' Alex and I am a 16 year old male. I have been diagnosed with a rare genetic disorder called X-linked recessive. I have been told that I will not be able to have children. I have been told that I will not be able to have children because of the X-linked recessive disorder. I have been told that I will not be able to have children because of the X-linked recessive disorder. I have been told that I will not be able to have children because of'
Prompt: 'The president of the United States is', Generated text: ' Statesman. He is the leader of the country. He is the one who makes the decisions. He is the one who makes the laws. He is the one who makes the rules. He is the one who makes the country strong. He is the one who makes the country happy. He is the one who makes the country safe. He is the one who makes the country free. He is the one who makes the country beautiful. He is the one who makes the country great. He is'
Prompt: 'The capital of France is', Generated text: ' the city of Paris. It is the largest city in France and the second largest city in Europe. It is located in the center of the country, in the south of the country. It is situated on the banks of the Seine River, which flows through the city. The city is surrounded by the Alps and the Pyrenees mountains. The city is also surrounded by the Mediterranean Sea. The city is known for its beautiful architecture, its museums, its parks, and its food. Paris is'
Prompt: 'The future of AI is', Generated text: ' following the path of the internet, and the internet is following the path of the web. The web is a network of interconnected web pages, and the internet is a network of interconnected computers. The web is a network of interconnected computers, and the internet is a network of interconnected computers. The web is a network of interconnected computers, and the internet is a network of interconnected computers. The web is a network of interconnected computers, and the internet is a network of interconnected computers. The web is a network'
```

---------

Signed-off-by: Chenguang Li <757486878@qq.com>
  • Loading branch information
noemotiovon authored Feb 21, 2025
1 parent 386817b commit 202b39a
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 3 deletions.
3 changes: 0 additions & 3 deletions vllm_ascend/platform.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,9 +103,6 @@ def mem_get_info(cls) -> Tuple[int, int]:

@classmethod
def check_and_update_config(cls, vllm_config: VllmConfig) -> None:
# Register ops when setup.
from vllm_ascend import ops # noqa: F401

parallel_config = vllm_config.parallel_config
if parallel_config.worker_cls == "auto":
parallel_config.worker_cls = "vllm_ascend.worker.NPUWorker"
Expand Down
2 changes: 2 additions & 0 deletions vllm_ascend/worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,8 @@ def __init__(
is_driver_worker: bool = False,
model_runner_cls: Optional[Type[ModelRunnerBase]] = None,
) -> None:
# Register ops when worker init.
from vllm_ascend import ops # noqa: F401

WorkerBase.__init__(self, vllm_config=vllm_config)
# Try to import mindie_turbo to accelerate vLLM inference.
Expand Down

0 comments on commit 202b39a

Please sign in to comment.