-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update dependency vllm to v0.7.2 [SECURITY] #28
Open
renovate
wants to merge
1
commit into
main
Choose a base branch
from
renovate/pypi-vllm-vulnerability
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
cc8ecc0
to
005f2a5
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
None yet
0 participants
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==v0.6.6
->==0.7.2
==v0.6.4
->==0.7.2
==0.6.6
->==0.7.2
==0.6.4
->==0.7.2
GitHub Vulnerability Alerts
CVE-2024-8768
A flaw was found in the vLLM library. A completions API request with an empty prompt will crash the vLLM API server, resulting in a denial of service.
CVE-2025-24357
Description
The vllm/model_executor/weight_utils.py implements hf_model_weights_iterator to load the model checkpoint, which is downloaded from huggingface. It use torch.load function and weights_only parameter is default value False. There is a security warning on https://pytorch.org/docs/stable/generated/torch.load.html, when torch.load load a malicious pickle data it will execute arbitrary code during unpickling.
Impact
This vulnerability can be exploited to execute arbitrary codes and OS commands in the victim machine who fetch the pretrained repo remotely.
Note that most models now use the safetensors format, which is not vulnerable to this issue.
References
CVE-2025-25183
Summary
Maliciously constructed prompts can lead to hash collisions, resulting in prefix cache reuse, which can interfere with subsequent responses and cause unintended behavior.
Details
vLLM's prefix caching makes use of Python's built-in hash() function. As of Python 3.12, the behavior of hash(None) has changed to be a predictable constant value. This makes it more feasible that someone could try exploit hash collisions.
Impact
The impact of a collision would be using cache that was generated using different content. Given knowledge of prompts in use and predictable hashing behavior, someone could intentionally populate the cache using a prompt known to collide with another prompt in use.
Solution
We address this problem by initializing hashes in vllm with a value that is no longer constant and predictable. It will be different each time vllm runs. This restores behavior we got in Python versions prior to 3.12.
Using a hashing algorithm that is less prone to collision (like sha256, for example) would be the best way to avoid the possibility of a collision. However, it would have an impact to both performance and memory footprint. Hash collisions may still occur, though they are no longer straight forward to predict.
To give an idea of the likelihood of a collision, for randomly generated hash values (assuming the hash generation built into Python is uniformly distributed), with a cache capacity of 50,000 messages and an average prompt length of 300, a collision will occur on average once every 1 trillion requests.
References
vllm: Malicious model to RCE by torch.load in hf_model_weights_iterator
CVE-2025-24357 / GHSA-rh4j-5rhw-hr54
More information
Details
Description
The vllm/model_executor/weight_utils.py implements hf_model_weights_iterator to load the model checkpoint, which is downloaded from huggingface. It use torch.load function and weights_only parameter is default value False. There is a security warning on https://pytorch.org/docs/stable/generated/torch.load.html, when torch.load load a malicious pickle data it will execute arbitrary code during unpickling.
Impact
This vulnerability can be exploited to execute arbitrary codes and OS commands in the victim machine who fetch the pretrained repo remotely.
Note that most models now use the safetensors format, which is not vulnerable to this issue.
References
Severity
CVSS:3.1/AV:N/AC:H/PR:N/UI:R/S:U/C:H/I:H/A:H
References
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
vLLM uses Python 3.12 built-in hash() which leads to predictable hash collisions in prefix cache
CVE-2025-25183 / GHSA-rm76-4mrf-v9r8
More information
Details
Summary
Maliciously constructed prompts can lead to hash collisions, resulting in prefix cache reuse, which can interfere with subsequent responses and cause unintended behavior.
Details
vLLM's prefix caching makes use of Python's built-in hash() function. As of Python 3.12, the behavior of hash(None) has changed to be a predictable constant value. This makes it more feasible that someone could try exploit hash collisions.
Impact
The impact of a collision would be using cache that was generated using different content. Given knowledge of prompts in use and predictable hashing behavior, someone could intentionally populate the cache using a prompt known to collide with another prompt in use.
Solution
We address this problem by initializing hashes in vllm with a value that is no longer constant and predictable. It will be different each time vllm runs. This restores behavior we got in Python versions prior to 3.12.
Using a hashing algorithm that is less prone to collision (like sha256, for example) would be the best way to avoid the possibility of a collision. However, it would have an impact to both performance and memory footprint. Hash collisions may still occur, though they are no longer straight forward to predict.
To give an idea of the likelihood of a collision, for randomly generated hash values (assuming the hash generation built into Python is uniformly distributed), with a cache capacity of 50,000 messages and an average prompt length of 300, a collision will occur on average once every 1 trillion requests.
References
Severity
CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:N/I:L/A:N
References
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
Release Notes
vllm-project/vllm (vllm)
v0.7.2
Compare Source
Highlights
transformers
library at the moment (#12604)transformers
backend support via--model-impl=transformers
. This allows vLLM to be ran with arbitrary Hugging Face text models (#11330, #12785, #12727).torch.compile
to fused_moe/grouped_topk, yielding 5% throughput enhancement (#12637)Core Engine
VLLM_LOGITS_PROCESSOR_THREADS
to speed up structured decoding in high batch size scenarios (#12368)Security Update
Other
What's Changed
transformers
backend support by @ArthurZucker in https://github.com/vllm-project/vllm/pull/11330uncache_blocks
and support recaching full blocks by @comaniac in https://github.com/vllm-project/vllm/pull/12415VLLM_LOGITS_PROCESSOR_THREADS
by @akeshet in https://github.com/vllm-project/vllm/pull/12368Linear
handling inTransformersModel
by @hmellor in https://github.com/vllm-project/vllm/pull/12727FinishReason
enum and use constant strings by @njhill in https://github.com/vllm-project/vllm/pull/12760TransformersModel
UX by @hmellor in https://github.com/vllm-project/vllm/pull/12785New Contributors
Full Changelog: vllm-project/vllm@v0.7.1...v0.7.2
v0.7.1
Compare Source
Highlights
This release features MLA optimization for Deepseek family of models. Compared to v0.7.0 released this Monday, we offer ~3x the generation throughput, ~10x the memory capacity for tokens, and horizontal context scalability with pipeline parallelism
V1
For the V1 architecture, we
Models
Hardwares
Others
What's Changed
prompt_logprobs
with ChunkedPrefill by @NickLucche in https://github.com/vllm-project/vllm/pull/10132pre-commit
hooks by @hmellor in https://github.com/vllm-project/vllm/pull/12475suggestion
pre-commit
hook multiple times by @hmellor in https://github.com/vllm-project/vllm/pull/12521?device={device}
when changing tab in installation guides by @hmellor in https://github.com/vllm-project/vllm/pull/12560cutlass_scaled_mm
to support 2d group (blockwise) scaling by @LucasWilkinson in https://github.com/vllm-project/vllm/pull/11868sparsity_config.ignore
in Cutlass Integration by @rahul-tuli in https://github.com/vllm-project/vllm/pull/12517New Contributors
Full Changelog: vllm-project/vllm@v0.7.0...v0.7.1
v0.7.0
Compare Source
Highlights
VLLM_USE_V1=1
. See our blog for more details. (44 commits).LLM.sleep
,LLM.wake_up
,LLM.collective_rpc
,LLM.reset_prefix_cache
) in vLLM for the post training frameworks! (#12361, #12084, #12284).torch.compile
is now fully integrated in vLLM, and enabled by default in V1. You can turn it on via-O3
engine parameter. (#11614, #12243, #12043, #12191, #11677, #12182, #12246).This release features
Features
Models
get_*_embeddings
methods according to this guide is automatically supported by V1 engine.Hardwares
W8A8
(#11785)Features
collective_rpc
abstraction (#12151, #11256)moe_align_block_size
for cuda graph and large num_experts (#12222)Others
weights_only=True
when usingtorch.load()
([#12366](https://reConfiguration
📅 Schedule: Branch creation - "" in timezone America/Toronto, Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about these updates again.
This PR was generated by Mend Renovate. View the repository job log.