You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to serve llama-3.2-vision-instruct model on a new machine we have, and I am getting the error below, the call stack is quite deep, so the terminal output is quite long. It looks to me like torch dynamo compiled the graph into triton and then it was unable to be imported. I am unfamiliar with how torch dynamo or triton works. Can someone perhaps provide some insights on what might be wrong?
(sglang04) user@cuda0100:~/huggingface$ TORCH_LOGS="+dynamo" TORCHDYNAMO_VERBOSE=1 CUDA_VISIBLE_DEVICES=0 python -m sglang.launch_server --host 0.0.0.0 --port 10000 --model-path llama-3-2-11b-vision-instruct --served-model-name llama-3-2-11b-vision-instruct --chat-template llama_3_vision
[2025-01-11 00:44:00] server_args=ServerArgs(model_path='llama-3-2-11b-vision-instruct', tokenizer_path='llama-3-2-11b-vision-instruct', tokenizer_mode='auto', skip_tokenizer_init=False, load_format='auto', trust_remote_code=False, dtype='auto', kv_cache_dtype='auto', quantization=None, context_length=None, device='cuda', served_model_name='llama-3-2-11b-vision-instruct', chat_template='llama_3_vision', is_embedding=False, revision=None, host='0.0.0.0', port=10000, mem_fraction_static=0.88, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, tp_size=1, stream_interval=1, random_seed=629956608, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='SGLang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, lora_paths=None, max_loras_per_batch=8, attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=160, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, num_continuous_decode_steps=1, delete_ckpt_after_loading=False)
[2025-01-11 00:44:01] Use chat template for the OpenAI-compatible API server: llama_3_vision
[2025-01-11 00:44:06 TP0] Overlap scheduler is disabled for multimodal models.
[2025-01-11 00:44:06 TP0] Automatically turn off --chunked-prefill-size and adjust --mem-fraction-static for multimodal models.
[2025-01-11 00:44:06 TP0] Init torch distributed begin.
[W111 00:44:06.050399691 socket.cpp:697] [c10d] The client socket cannot be initialized to connect to [localhost.sri.com]:10768 (errno: 97 - Address family not supported by protocol).
[2025-01-11 00:44:06 TP0] Load weight begin. avail mem=92.60 GB
Loading safetensors checkpoint shards: 0% Completed | 0/5 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 20% Completed | 1/5 [00:00<00:00, 4.89it/s]
Loading safetensors checkpoint shards: 40% Completed | 2/5 [00:00<00:01, 1.89it/s]
Loading safetensors checkpoint shards: 60% Completed | 3/5 [00:01<00:01, 1.44it/s]
Loading safetensors checkpoint shards: 80% Completed | 4/5 [00:02<00:00, 1.36it/s]
Loading safetensors checkpoint shards: 100% Completed | 5/5 [00:03<00:00, 1.29it/s]
Loading safetensors checkpoint shards: 100% Completed | 5/5 [00:03<00:00, 1.43it/s]
[2025-01-11 00:44:10 TP0] Load weight end. type=MllamaForConditionalGeneration, dtype=torch.bfloat16, avail mem=72.43 GB
[2025-01-11 00:44:10 TP0] Memory pool end. avail mem=14.09 GB
[2025-01-11 00:44:10 TP0] Capture cuda graph begin. This can take up to several minutes.
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] torchdynamo start compiling clamp_position /export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py:106, stack (elided 6 frames):
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] File "<string>", line 1, in <module>
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/multiprocessing/spawn.py", line 120, in spawn_main
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] exitcode = _main(fd, parent_sentinel)
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/multiprocessing/spawn.py", line 133, in _main
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] return self._bootstrap(parent_sentinel)
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] self.run()
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/multiprocessing/process.py", line 108, in run
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] self._target(*self._args, **self._kwargs)
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/managers/scheduler.py", line 1489, in run_scheduler_process
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/managers/scheduler.py", line 194, in __init__
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] self.tp_worker = TpWorkerClass(
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/managers/tp_worker.py", line 62, in __init__
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] self.model_runner = ModelRunner(
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 173, in __init__
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] self.init_cuda_graphs()
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 624, in init_cuda_graphs
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] self.cuda_graph_runner = CudaGraphRunner(self)
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 193, in __init__
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] self.capture()
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 254, in capture
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] ) = self.capture_one_batch_size(bs, forward)
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 318, in capture_one_batch_size
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] run_once()
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 306, in run_once
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] positions=clamp_position(seq_lens),
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] return fn(*args, **kwargs)
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1116, in __call__
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0] return self._torchdynamo_orig_callable(
[rank0]:V0111 00:44:11.495000 140677839730496 torch/_dynamo/convert_frame.py:776] [0/0]
[rank0]:I0111 00:44:11.496000 140677839730496 torch/_dynamo/logging.py:56] [0/0] Step 1: torchdynamo start tracing clamp_position /export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py:106
[rank0]:V0111 00:44:11.497000 140677839730496 torch/fx/experimental/symbolic_shapes.py:2529] [0/0] create_env
[rank0]:V0111 00:44:11.501000 140677839730496 torch/_dynamo/symbolic_convert.py:775] [0/0] [__trace_source] TRACE starts_line /export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py:106 in clamp_position ()
[rank0]:V0111 00:44:11.501000 140677839730496 torch/_dynamo/symbolic_convert.py:775] [0/0] [__trace_source] @maybe_torch_compile(dynamic=True)
[rank0]:V0111 00:44:11.507000 140677839730496 torch/_dynamo/symbolic_convert.py:798] [0/0] [__trace_bytecode] TRACE RESUME 0 []
[rank0]:V0111 00:44:11.507000 140677839730496 torch/_dynamo/symbolic_convert.py:775] [0/0] [__trace_source] TRACE starts_line /export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py:108 in clamp_position (clamp_position)
[rank0]:V0111 00:44:11.507000 140677839730496 torch/_dynamo/symbolic_convert.py:775] [0/0] [__trace_source] return torch.clamp((seq_lens - 1), min=0).to(torch.int64)
[rank0]:V0111 00:44:11.507000 140677839730496 torch/_dynamo/symbolic_convert.py:798] [0/0] [__trace_bytecode] TRACE LOAD_GLOBAL torch []
[rank0]:V0111 00:44:11.508000 140677839730496 torch/_dynamo/symbolic_convert.py:798] [0/0] [__trace_bytecode] TRACE LOAD_ATTR clamp [NullVariable(), PythonModuleVariable(<module 'torch' from '/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/__init__.py'>)]
[rank0]:V0111 00:44:11.509000 140677839730496 torch/_dynamo/symbolic_convert.py:798] [0/0] [__trace_bytecode] TRACE LOAD_FAST seq_lens [NullVariable(), TorchInGraphFunctionVariable(<built-in method clamp of type object at 0x7ff176266580>)]
[rank0]:V0111 00:44:11.509000 140677839730496 torch/_dynamo/symbolic_convert.py:798] [0/0] [__trace_bytecode] TRACE LOAD_CONST 1 [NullVariable(), TorchInGraphFunctionVariable(<built-in method clamp of type object at 0x7ff176266580>), LazyVariableTracker()]
[rank0]:V0111 00:44:11.509000 140677839730496 torch/_dynamo/symbolic_convert.py:798] [0/0] [__trace_bytecode] TRACE BINARY_OP 10 [NullVariable(), TorchInGraphFunctionVariable(<built-in method clamp of type object at 0x7ff176266580>), LazyVariableTracker(), ConstantVariable()]
[rank0]:V0111 00:44:11.509000 140677839730496 torch/_dynamo/output_graph.py:2033] [0/0] create_graph_input L_seq_lens_ L['seq_lens']
[rank0]:V0111 00:44:11.509000 140677839730496 torch/_dynamo/variables/builder.py:2268] [0/0] wrap_to_fake L['seq_lens'] (1,) StatefulSymbolicContext(dynamic_sizes=[<DimDynamic.DUCK: 1>], constraint_sizes=[None], view_base_context=StatefulSymbolicContext(dynamic_sizes=[<DimDynamic.DUCK: 1>], constraint_sizes=[None], view_base_context=None, tensor_source=AttrSource(base=LocalSource(local_name='seq_lens', cell_or_freevar=False), member='_base'), shape_env_to_source_to_symbol_cache={}), tensor_source=LocalSource(local_name='seq_lens', cell_or_freevar=False), shape_env_to_source_to_symbol_cache={}) <class 'torch.Tensor'>
[rank0]:I0111 00:44:11.531000 140677839730496 torch/fx/experimental/symbolic_shapes.py:3549] [0/0] create_symbol s0 = 160 for L['seq_lens']._base.size()[0] [2, 9223372036854775806] at sglang/srt/model_executor/cuda_graph_runner.py:108 in clamp_position (_dynamo/variables/builder.py:2276 in <lambda>), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="s0"
[rank0]:V0111 00:44:11.531000 140677839730496 torch/fx/experimental/symbolic_shapes.py:5167] [0/0] eval True == True [statically known]
[rank0]:V0111 00:44:11.532000 140677839730496 torch/fx/experimental/symbolic_shapes.py:5167] [0/0] eval False == False [statically known]
[rank0]:V0111 00:44:11.533000 140677839730496 torch/_dynamo/output_graph.py:1892] [0/0] [__trace_call] TRACE FX call sub from /export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py:108 in clamp_position (clamp_position)
[rank0]:V0111 00:44:11.533000 140677839730496 torch/_dynamo/output_graph.py:1892] [0/0] [__trace_call] return torch.clamp((seq_lens - 1), min=0).to(torch.int64)
[rank0]:V0111 00:44:11.533000 140677839730496 torch/_dynamo/output_graph.py:1892] [0/0] [__trace_call] ~~~~~~~~~^~~
[rank0]:V0111 00:44:11.534000 140677839730496 torch/_dynamo/symbolic_convert.py:798] [0/0] [__trace_bytecode] TRACE LOAD_CONST 0 [NullVariable(), TorchInGraphFunctionVariable(<built-in method clamp of type object at 0x7ff176266580>), TensorVariable()]
[rank0]:V0111 00:44:11.534000 140677839730496 torch/_dynamo/symbolic_convert.py:798] [0/0] [__trace_bytecode] TRACE KW_NAMES ('min',) [NullVariable(), TorchInGraphFunctionVariable(<built-in method clamp of type object at 0x7ff176266580>), TensorVariable(), ConstantVariable()]
[rank0]:V0111 00:44:11.534000 140677839730496 torch/_dynamo/symbolic_convert.py:798] [0/0] [__trace_bytecode] TRACE PRECALL 2 [NullVariable(), TorchInGraphFunctionVariable(<built-in method clamp of type object at 0x7ff176266580>), TensorVariable(), ConstantVariable()]
[rank0]:V0111 00:44:11.534000 140677839730496 torch/_dynamo/symbolic_convert.py:798] [0/0] [__trace_bytecode] TRACE CALL 2 [NullVariable(), TorchInGraphFunctionVariable(<built-in method clamp of type object at 0x7ff176266580>), TensorVariable(), ConstantVariable()]
[rank0]:V0111 00:44:11.535000 140677839730496 torch/_dynamo/output_graph.py:1892] [0/0] [__trace_call] TRACE FX call clamp from /export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py:108 in clamp_position (clamp_position)
[rank0]:V0111 00:44:11.535000 140677839730496 torch/_dynamo/output_graph.py:1892] [0/0] [__trace_call] return torch.clamp((seq_lens - 1), min=0).to(torch.int64)
[rank0]:V0111 00:44:11.535000 140677839730496 torch/_dynamo/output_graph.py:1892] [0/0] [__trace_call] ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:V0111 00:44:11.536000 140677839730496 torch/_dynamo/symbolic_convert.py:798] [0/0] [__trace_bytecode] TRACE LOAD_METHOD to [TensorVariable()]
[rank0]:V0111 00:44:11.536000 140677839730496 torch/_dynamo/symbolic_convert.py:798] [0/0] [__trace_bytecode] TRACE LOAD_GLOBAL torch [NullVariable(), GetAttrVariable()]
[rank0]:V0111 00:44:11.536000 140677839730496 torch/_dynamo/symbolic_convert.py:798] [0/0] [__trace_bytecode] TRACE LOAD_ATTR int64 [NullVariable(), GetAttrVariable(), PythonModuleVariable(<module 'torch' from '/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/__init__.py'>)]
[rank0]:V0111 00:44:11.536000 140677839730496 torch/_dynamo/symbolic_convert.py:798] [0/0] [__trace_bytecode] TRACE PRECALL 1 [NullVariable(), GetAttrVariable(), ConstantVariable()]
[rank0]:V0111 00:44:11.536000 140677839730496 torch/_dynamo/symbolic_convert.py:798] [0/0] [__trace_bytecode] TRACE CALL 1 [NullVariable(), GetAttrVariable(), ConstantVariable()]
[rank0]:V0111 00:44:11.537000 140677839730496 torch/_dynamo/output_graph.py:1892] [0/0] [__trace_call] TRACE FX call to from /export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py:108 in clamp_position (clamp_position)
[rank0]:V0111 00:44:11.537000 140677839730496 torch/_dynamo/output_graph.py:1892] [0/0] [__trace_call] return torch.clamp((seq_lens - 1), min=0).to(torch.int64)
[rank0]:V0111 00:44:11.537000 140677839730496 torch/_dynamo/output_graph.py:1892] [0/0] [__trace_call] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
[rank0]:V0111 00:44:11.538000 140677839730496 torch/_dynamo/symbolic_convert.py:798] [0/0] [__trace_bytecode] TRACE RETURN_VALUE None [TensorVariable()]
[rank0]:I0111 00:44:11.538000 140677839730496 torch/_dynamo/logging.py:56] [0/0] Step 1: torchdynamo done tracing clamp_position (RETURN_VALUE)
[rank0]:V0111 00:44:11.538000 140677839730496 torch/_dynamo/symbolic_convert.py:2626] [0/0] RETURN_VALUE triggered compile
[rank0]:V0111 00:44:11.538000 140677839730496 torch/_dynamo/output_graph.py:972] [0/0] COMPILING GRAPH due to GraphCompileReason(reason='return_value', user_stack=[<FrameSummary file /export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py, line 108 in clamp_position>], graph_break=False)
[rank0]:V0111 00:44:11.539000 140677839730496 torch/_dynamo/output_graph.py:1291] [0/0] [__graph_code] TRACED GRAPH
[rank0]:V0111 00:44:11.539000 140677839730496 torch/_dynamo/output_graph.py:1291] [0/0] [__graph_code] ===== __compiled_fn_1 =====
[rank0]:V0111 00:44:11.539000 140677839730496 torch/_dynamo/output_graph.py:1291] [0/0] [__graph_code] /export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.Module):
[rank0]:V0111 00:44:11.539000 140677839730496 torch/_dynamo/output_graph.py:1291] [0/0] [__graph_code] def forward(self, L_seq_lens_: "i32[1][1]cuda:0"):
[rank0]:V0111 00:44:11.539000 140677839730496 torch/_dynamo/output_graph.py:1291] [0/0] [__graph_code] l_seq_lens_ = L_seq_lens_
[rank0]:V0111 00:44:11.539000 140677839730496 torch/_dynamo/output_graph.py:1291] [0/0] [__graph_code]
[rank0]:V0111 00:44:11.539000 140677839730496 torch/_dynamo/output_graph.py:1291] [0/0] [__graph_code] # File: /export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py:108 in clamp_position, code: return torch.clamp((seq_lens - 1), min=0).to(torch.int64)
[rank0]:V0111 00:44:11.539000 140677839730496 torch/_dynamo/output_graph.py:1291] [0/0] [__graph_code] sub: "i32[1][1]cuda:0" = l_seq_lens_ - 1; l_seq_lens_ = None
[rank0]:V0111 00:44:11.539000 140677839730496 torch/_dynamo/output_graph.py:1291] [0/0] [__graph_code] clamp: "i32[1][1]cuda:0" = torch.clamp(sub, min = 0); sub = None
[rank0]:V0111 00:44:11.539000 140677839730496 torch/_dynamo/output_graph.py:1291] [0/0] [__graph_code] to: "i64[1][1]cuda:0" = clamp.to(torch.int64); clamp = None
[rank0]:V0111 00:44:11.539000 140677839730496 torch/_dynamo/output_graph.py:1291] [0/0] [__graph_code] return (to,)
[rank0]:V0111 00:44:11.539000 140677839730496 torch/_dynamo/output_graph.py:1291] [0/0] [__graph_code]
[rank0]:V0111 00:44:11.539000 140677839730496 torch/_dynamo/output_graph.py:1291] [0/0] [__graph_code]
[rank0]:I0111 00:44:11.540000 140677839730496 torch/_dynamo/logging.py:56] [0/0] Step 2: calling compiler function inductor
[rank0]:V0111 00:44:11.919000 140677839730496 torch/fx/experimental/symbolic_shapes.py:5167] [0/0] eval True == True [statically known]
[2025-01-11 00:44:11 TP0] Scheduler hit an exception: Traceback (most recent call last):
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 193, in __init__
self.capture()
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 254, in capture
) = self.capture_one_batch_size(bs, forward)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 318, in capture_one_batch_size
run_once()
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 306, in run_once
positions=clamp_position(seq_lens),
^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1116, in __call__
return self._torchdynamo_orig_callable(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 948, in __call__
result = self._inner_convert(
^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 472, in __call__
return _compile(
^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_utils_internal.py", line 84, in wrapper_function
return StrobelightCompileTimeProfiler.profile_compile_time(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_strobelight/compile_time_profiler.py", line 129, in profile_compile_time
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 817, in _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 636, in compile_inner
out_code = transform_code_object(code, transform)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1185, in transform_code_object
transformations(instructions, code_options)
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 178, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 582, in transform
tracer.run()
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2451, in run
super().run()
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 893, in run
while self.step():
^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 805, in step
self.dispatch_table[inst.opcode](self, inst)
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2642, in RETURN_VALUE
self._return(inst)
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2627, in _return
self.output.compile_subgraph(
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1098, in compile_subgraph
self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1318, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1409, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1390, in call_user_compiler
compiled_fn = compiler_fn(gm, self.example_inputs())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 129, in __call__
compiled_gm = compiler_fn(gm, example_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/__init__.py", line 1951, in __call__
return compile_fx(model_, inputs_, config_patches=self.config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1505, in compile_fx
return aot_autograd(
^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/backends/common.py", line 69, in __call__
cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 954, in aot_module_simplified
compiled_fn, _ = create_aot_dispatcher_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 687, in create_aot_dispatcher_function
compiled_fn, fw_metadata = compiler_fn(
^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 168, in aot_dispatch_base
compiled_fw = compiler(fw_module, updated_flat_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1410, in fw_compiler_base
return inner_compile(
^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 84, in debug_wrapper
inner_compiled_fn = compiler_fn(gm, example_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_inductor/debug.py", line 304, in inner
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 527, in compile_fx_inner
compiled_graph = fx_codegen_and_compile(
^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 831, in fx_codegen_and_compile
compiled_fn = graph.compile_to_fn()
^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_inductor/graph.py", line 1749, in compile_to_fn
return self.compile_to_module().call
^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_inductor/graph.py", line 1699, in compile_to_module
mod = PyCodeCache.load_by_key_path(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_inductor/codecache.py", line 3062, in load_by_key_path
mod = _reload_python_module(key, path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module
exec(code, mod.__dict__, mod.__dict__)
File "/tmp/torchinductor_user/x7/cx7ftdl3gx6m3kzpn5wlhqfephhq6xyofs2frmmwocxcghrn6acq.py", line 73, in <module>
async_compile.wait(globals())
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_inductor/async_compile.py", line 247, in wait
scope[key] = result.result()
^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_inductor/codecache.py", line 3424, in result
self.kernel.precompile()
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 232, in precompile
compiled_binary, launcher = self._precompile_config(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 440, in _precompile_config
binary._init_handles()
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/triton/compiler/compiler.py", line 370, in _init_handles
self.run = driver.active.launcher_cls(self.src, self.metadata)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 361, in __init__
mod = compile_module_from_src(src, "__triton_launcher")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 62, in compile_module_from_src
mod = importlib.util.module_from_spec(spec)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 573, in module_from_spec
File "<frozen importlib._bootstrap_external>", line 1233, in create_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
ImportError: /tmp/torchinductor_user/triton/0/5ac196b3fec222314712cdadd63505a7ab3c79a3594e5268f06da8bf9d1e5d8c/__triton_launcher.so: failed to map segment from shared object
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/managers/scheduler.py", line 1489, in run_scheduler_process
scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/managers/scheduler.py", line 194, in __init__
self.tp_worker = TpWorkerClass(
^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/managers/tp_worker.py", line 62, in __init__
self.model_runner = ModelRunner(
^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 173, in __init__
self.init_cuda_graphs()
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 624, in init_cuda_graphs
self.cuda_graph_runner = CudaGraphRunner(self)
^^^^^^^^^^^^^^^^^^^^^
File "/export/home/cuda01001/user/miniforge3/envs/sglang04/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 195, in __init__
raise Exception(
Exception: Capture cuda graph failed: backend='inductor' raised:
ImportError: /tmp/torchinductor_user/triton/0/5ac196b3fec222314712cdadd63505a7ab3c79a3594e5268f06da8bf9d1e5d8c/__triton_launcher.so: failed to map segment from shared object
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True
Possible solutions:
1. disable cuda graph by --disable-cuda-graph
2. set --mem-fraction-static to a smaller value (e.g., 0.8 or 0.7)
3. disable torch compile by not using --enable-torch-compile
Open an issue on GitHub https://github.com/sgl-project/sglang/issues/new/choose
[rank0]:I0111 00:44:12.016000 140677839730496 torch/_dynamo/utils.py:335] TorchDynamo compilation metrics:
[rank0]:I0111 00:44:12.016000 140677839730496 torch/_dynamo/utils.py:335] Function, Runtimes (s)
[rank0]:I0111 00:44:12.016000 140677839730496 torch/_dynamo/utils.py:335] _compile.<locals>.compile_inner, 0.0000
[rank0]:I0111 00:44:12.016000 140677839730496 torch/_dynamo/utils.py:335] OutputGraph.call_user_compiler, 0.0000
[rank0]:I0111 00:44:12.016000 140677839730496 torch/_dynamo/utils.py:335] create_aot_dispatcher_function, 0.0000
[rank0]:I0111 00:44:12.016000 140677839730496 torch/_dynamo/utils.py:335] compile_fx.<locals>.fw_compiler_base, 0.0000
[rank0]:I0111 00:44:12.016000 140677839730496 torch/_dynamo/utils.py:335] compile_fx_inner, 0.0000
[rank0]:I0111 00:44:12.016000 140677839730496 torch/_dynamo/utils.py:335] GraphLowering.run, 0.0038
[rank0]:I0111 00:44:12.016000 140677839730496 torch/_dynamo/utils.py:335] GraphLowering.compile_to_module, 0.0000
[rank0]:I0111 00:44:12.016000 140677839730496 torch/_dynamo/utils.py:335] Scheduler.__init__, 0.0061
[rank0]:I0111 00:44:12.016000 140677839730496 torch/_dynamo/utils.py:335] Scheduler.codegen, 0.0075
[rank0]:I0111 00:44:12.016000 140677839730496 torch/_dynamo/utils.py:335] WrapperCodeGen.generate, 0.0003
[rank0]:V0111 00:44:12.016000 140677839730496 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats constrain_symbol_range: CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
[rank0]:V0111 00:44:12.016000 140677839730496 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats evaluate_expr: CacheInfo(hits=4, misses=3, maxsize=256, currsize=3)
[rank0]:V0111 00:44:12.016000 140677839730496 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats _simplify_floor_div: CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
[rank0]:V0111 00:44:12.016000 140677839730496 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats _maybe_guard_rel: CacheInfo(hits=0, misses=0, maxsize=256, currsize=0)
[rank0]:V0111 00:44:12.016000 140677839730496 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats _find: CacheInfo(hits=1, misses=1, maxsize=None, currsize=1)
[rank0]:V0111 00:44:12.016000 140677839730496 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats has_hint: CacheInfo(hits=0, misses=0, maxsize=256, currsize=0)
[rank0]:V0111 00:44:12.016000 140677839730496 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats size_hint: CacheInfo(hits=0, misses=0, maxsize=256, currsize=0)
[rank0]:V0111 00:44:12.016000 140677839730496 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats simplify: CacheInfo(hits=0, misses=2, maxsize=None, currsize=2)
[rank0]:V0111 00:44:12.016000 140677839730496 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats _update_divisible: CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
[rank0]:V0111 00:44:12.016000 140677839730496 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats replace: CacheInfo(hits=74, misses=7, maxsize=None, currsize=7)
[rank0]:V0111 00:44:12.016000 140677839730496 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats _maybe_evaluate_static: CacheInfo(hits=1, misses=2, maxsize=None, currsize=2)
[rank0]:V0111 00:44:12.016000 140677839730496 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats get_implications: CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
[rank0]:V0111 00:44:12.016000 140677839730496 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats get_axioms: CacheInfo(hits=2, misses=2, maxsize=None, currsize=2)
[rank0]:V0111 00:44:12.016000 140677839730496 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats safe_expand: CacheInfo(hits=15, misses=8, maxsize=256, currsize=8)
[rank0]:V0111 00:44:12.016000 140677839730496 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats uninteresting_files: CacheInfo(hits=12, misses=1, maxsize=None, currsize=1)
Killed
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I was trying to serve llama-3.2-vision-instruct model on a new machine we have, and I am getting the error below, the call stack is quite deep, so the terminal output is quite long. It looks to me like torch dynamo compiled the graph into triton and then it was unable to be imported. I am unfamiliar with how torch dynamo or triton works. Can someone perhaps provide some insights on what might be wrong?
Beta Was this translation helpful? Give feedback.
All reactions