-
-
Notifications
You must be signed in to change notification settings - Fork 7.2k
[Model][VLM] Add Qwen2.5-Omni model support (end-to-end full support) #16347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
This pull request has merge conflicts that must be resolved before it can be |
I think we can further split this PR, with the first one (after Qwen2.5-Omni thinker only) adding |
Thanks for this contribution! As we discussed offline, we'll be carefully reviewing this PR/design and think about how to enable end-to-end support for models like this with vLLM! |
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com> (cherry picked from commit 005879f2b22e40b7d03be7063e80686862a72e2d)
Signed-off-by: fyabc <suyang.fy@alibaba-inc.com>
elif 'video' in ele: | ||
audio_key = 'video' | ||
audios.append(librosa.load(ele[audio_key], sr=16000)[0]) | ||
videos.append(fetch_and_read_video(audio_key)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
videos.append(fetch_and_read_video(audio_key)) | |
videos.append(fetch_and_read_video(ele[audio_key])) |
This draft PR adding support for Qwen2.5-Omni model (end-to-end full support).
This PR is a later version of #15130, it adds support for talker, code2wav, and an
OmniLLMEngine
class to manage the end-to-end audio generation process.You can see #15130 for more details about
Qwen2.5-Omni
model architecture.NOTE: Since this PR makes significant changes to vLLM, its a draft and will not be merged in the short term.
Requirements
This PR requires huggingface/transformers#36752.
Note: You need to install transformers from source from that branch
Example Usage
This command will print text output and generate
.wav
output files under current folder.