Skip to content

[Bug]: AttributeError: 'Qwen2_5OmniConfig' object has no attribute 'num_attention_heads' #16645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
jieguolove opened this issue Apr 15, 2025 · 1 comment
Open
1 task done
Labels
bug Something isn't working

Comments

@jieguolove
Copy link

Your current environment

just look here:
https://github.com/huggingface/transformers/issues/37515#issuecomment-2804126324

🐛 Describe the bug

`System Info
root@445d74596699:/vllm-workspace# transformers-cli env

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

transformers version: 4.52.0.dev0
Platform: Linux-5.15.0-43-generic-x86_64-with-glibc2.35
Python version: 3.12.9
Huggingface_hub version: 0.30.2
Safetensors version: 0.5.3
Accelerate version: 1.5.2
Accelerate config: not found
DeepSpeed version: not installed
PyTorch version (GPU?): 2.6.0+cu124 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA L20
`(base) root@node15:/disk2/Qwen2.5-Omni-7B# more docker-compose.yml
#version: '3.3'
services:

vllm
vllm-openai:
image: vllm/vllm-openai:v0.8.2
container_name: Qwen2.5-Omni-7B
restart: unless-stopped
runtime: nvidia
ports:

  • 8007:8000
    volumes:
  • /disk2:/models
    command: >
    --model /models/Qwen2.5-Omni-7B
    --tokenizer_mode="auto"
    --trust-remote-code
    --dtype=bfloat16
    --max_num_seqs=256
    --tensor_parallel_size=1
    --gpu-memory-utilization=0.9
    --max-model-len=65536
    --served-model-name=Qwen2.5-Omni-7B
    deploy:
    resources:
    reservations:
    devices:
  • driver: nvidia
    capabilities: [gpu]
    device_ids: [ "1" ]
    ipc: host
    networks:
    vllm:
    (base) root@node15:/disk2/Qwen2.5-Omni-7B# docker commit 445d74596699 vllm/vllm-openai:v0.8.2
    sha256:fdf1171c4bc4edc473bb3857597124ae73176c1691a27befccb4360c81ff0d60
    (base) root@node15:/disk2/Qwen2.5-Omni-7B# docker compose -f docker-compose.yml up -d
    [+] Running 2/2
    ✔ Network qwen25-omni-7b_default Created 0.0s
    ✔ Container Qwen2.5-Omni-7B Started 0.6s
    (base) root@node15:/disk2/Qwen2.5-Omni-7B# docker logs -f Qwen2.5-Omni-7B
    INFO 04-15 00:06:11 [init.py:239] Automatically detected platform cuda.
    INFO 04-15 00:06:13 [api_server.py:981] vLLM API server version 0.8.2
    INFO 04-15 00:06:13 [api_server.py:982] args: Namespace(host=None, port=8000, uvicorn_log_level='info', disable_uvicorn_access_log=False, allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='/models/Qwen2.5-Omni-7B', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='bfloat16', kv_cache_dtype='auto', max_model_len=65536, guided_decoding_backend='xgrammar', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, enable_expert_parallel=False, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=None, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=256, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, use_tqdm_on_load=True, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_config=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=['Qwen2.5-Omni-7B'], qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, additional_config=None, enable_reasoning=False, reasoning_parser=None, disable_cascade_attn=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, enable_server_load_tracking=False)
    Unrecognized keys in rope_scaling for 'rope_type'='default': {'mrope_section'}
    INFO 04-15 00:06:22 [config.py:585] This model supports multiple tasks: {'reward', 'generate', 'classify', 'score', 'embed'}. Defaulting to 'generate'.
    INFO 04-15 00:06:22 [config.py:1697] Chunked prefill is enabled with max_num_batched_tokens=2048.
    INFO 04-15 00:06:24 [core.py:54] Initializing a V1 LLM engine (v0.8.2) with config: model='/models/Qwen2.5-Omni-7B', speculative_config=None, tokenizer='/models/Qwen2.5-Omni-7B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=65536, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=Qwen2.5-Omni-7B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":3,"custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":512}
    WARNING 04-15 00:06:25 [utils.py:2321] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7fabea685df0>
    INFO 04-15 00:06:26 [parallel_state.py:954] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0
    ERROR 04-15 00:06:26 [core.py:343] EngineCore hit an exception: Traceback (most recent call last):
    ERROR 04-15 00:06:26 [core.py:343] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 335, in run_engine_core
    ERROR 04-15 00:06:26 [core.py:343] engine_core = EngineCoreProc(*args, **kwargs)
    ERROR 04-15 00:06:26 [core.py:343] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ERROR 04-15 00:06:26 [core.py:343] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 290, in init
    ERROR 04-15 00:06:26 [core.py:343] super().init(vllm_config, executor_class, log_stats)
    ERROR 04-15 00:06:26 [core.py:343] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 60, in init
    ERROR 04-15 00:06:26 [core.py:343] self.model_executor = executor_class(vllm_config)
    ERROR 04-15 00:06:26 [core.py:343] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ERROR 04-15 00:06:26 [core.py:343] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 52, in init
    ERROR 04-15 00:06:26 [core.py:343] self._init_executor()
    ERROR 04-15 00:06:26 [core.py:343] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 46, in _init_executor
    ERROR 04-15 00:06:26 [core.py:343] self.collective_rpc("init_device")
    ERROR 04-15 00:06:26 [core.py:343] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
    ERROR 04-15 00:06:26 [core.py:343] answer = run_method(self.driver_worker, method, args, kwargs)
    ERROR 04-15 00:06:26 [core.py:343] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ERROR 04-15 00:06:26 [core.py:343] File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2255, in run_method
    ERROR 04-15 00:06:26 [core.py:343] return func(*args, **kwargs)
    ERROR 04-15 00:06:26 [core.py:343] ^^^^^^^^^^^^^^^^^^^^^
    ERROR 04-15 00:06:26 [core.py:343] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 604, in init_device
    ERROR 04-15 00:06:26 [core.py:343] self.worker.init_device() # type: ignore
    ERROR 04-15 00:06:26 [core.py:343] ^^^^^^^^^^^^^^^^^^^^^^^^^
    ERROR 04-15 00:06:26 [core.py:343] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 120, in init_device
    ERROR 04-15 00:06:26 [core.py:343] self.model_runner: GPUModelRunner = GPUModelRunner(
    ERROR 04-15 00:06:26 [core.py:343] ^^^^^^^^^^^^^^^
    ERROR 04-15 00:06:26 [core.py:343] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 106, in init
    ERROR 04-15 00:06:26 [core.py:343] self.num_kv_heads = model_config.get_num_kv_heads(parallel_config)
    ERROR 04-15 00:06:26 [core.py:343] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ERROR 04-15 00:06:26 [core.py:343] File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 884, in get_num_kv_heads
    ERROR 04-15 00:06:26 [core.py:343] total_num_kv_heads = self.get_total_num_kv_heads()
    ERROR 04-15 00:06:26 [core.py:343] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ERROR 04-15 00:06:26 [core.py:343] File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 876, in get_total_num_kv_heads
    ERROR 04-15 00:06:26 [core.py:343] return self.hf_text_config.num_attention_heads
    ERROR 04-15 00:06:26 [core.py:343] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ERROR 04-15 00:06:26 [core.py:343] File "/usr/local/lib/python3.12/dist-packages/transformers/configuration_utils.py", line 211, in getattribute
    ERROR 04-15 00:06:26 [core.py:343] return super().getattribute(key)
    ERROR 04-15 00:06:26 [core.py:343] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ERROR 04-15 00:06:26 [core.py:343] AttributeError: 'Qwen2_5OmniConfig' object has no attribute 'num_attention_heads'
    ERROR 04-15 00:06:26 [core.py:343]
    CRITICAL 04-15 00:06:26 [core_client.py:269] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.`

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@jieguolove jieguolove added the bug Something isn't working label Apr 15, 2025
@lengrongfu
Copy link
Contributor

Current vllm not support Qwen2_5Omni #16347.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants