Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coredumped while model generate nothing with exclude_input_in_output true #261

Closed
FightingMan opened this issue Dec 26, 2023 · 3 comments
Closed
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@FightingMan
Copy link

set exclude_input_in_output true will core dumped while model gnerate nothing

 Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid:  29314) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x000000000006751c tensorrt_llm::batch_manager::GptManager::returnCompletedRequests()  :0
 2 0x000000000006ce1e tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop()  :0
 3 0x00000000000dc253 std::error_code::default_error_condition()  ???:0
 4 0x0000000000094ac3 pthread_condattr_setpshared()  ???:0
 5 0x0000000000126660 __xmknodat()  ???:0
=================================
[multi-test-0:29284] *** Process received signal ***
[multi-test-0:29284] Signal: Segmentation fault (11)
[multi-test-0:29284] Signal code:  (-6)
[multi-test-0:29284] Failing at address: 0x7264
[multi-test-0:29284] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fb5b726f520]
[multi-test-0:29284] [ 1] /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x6751c)[0x7fb5434bb51c]
[multi-test-0:29284] [ 2] /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x6ce1e)[0x7fb5434c0e1e]
[multi-test-0:29284] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253)[0x7fb5b7531253]
[multi-test-0:29284] [ 4] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7fb5b72c1ac3]
[multi-test-0:29284] [ 5] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126660)[0x7fb5b7353660]
[multi-test-0:29284] *** End of error message ***
 0# 0x000056351A572C2D in /opt/tritonserver/bin/tritonserver
 1# 0x00007F731366F520 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# 0x00007F729F4BB51C in /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
 3# 0x00007F729F4C0E1E in /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
 4# 0x00007F7313931253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 5# 0x00007F73136C1AC3 in /usr/lib/x86_64-linux-gnu/libc.so.6
 6# 0x00007F7313753660 in /usr/lib/x86_64-linux-gnu/libc.so.6

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node multi-test-0 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

while set exclude_input_in_output false, it works ok without ensemble.
under ensemble mode, and set exclude_input_in_output false, it returns :

{"error":"attempt to access non-existing array index '0'"}

set exclude_input_in_output true it will core dumped, both emsemble mode and single tensorrt_llm

tritonserver build from master 2023/12/18

| server_version                   | 2.42.0dev 

tensorrt-llm build from master newest
QWEN model, tested it with python run.py it works ok without coredump,tp=1,It seems to be a trtllm_backend problem

@byshiue byshiue added the triaged Issue has been triaged by maintainers label Dec 28, 2023
@juney-nvidia
Copy link
Collaborator

juney-nvidia commented Jan 1, 2024

@FightingMan

Hi, thanks for reporting this.

Can you follow this template to provide the concrete steps of reproducing your issue? It is easier for our engineers to investigate it and provide the help.

June

@FightingMan
Copy link
Author

@juney-nvidia thanks, I think you cant reproduce it, it with special fine-tune model,generate nothing
fixed with main branch now
https://github.com/triton-inference-server/tensorrtllm_backend?tab=readme-ov-file#option-2-build-via-docker

@nullxjx
Copy link

nullxjx commented Feb 1, 2024

I re-encounter this problem again using the image compiled from main branch on 20240126, the problem is exactly the same. under ensemble mode, and set exclude_input_in_output false, while model gnerate nothing, it returns :

{"error":"attempt to access non-existing array index '0'"}

I investigate this problem and I do not think this error came from the the postprocessing module in ensemble mode.

this error occurs only when model gnerate nothing.

@juney-nvidia can you help me fix it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants