Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault crash: Tensorrt-LLM crash when using guided decoding xgrammar and kv cache reuse #2660

Open
2 of 4 tasks
Somasundaram-Palaniappan opened this issue Jan 6, 2025 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@Somasundaram-Palaniappan

System Info

  • CPU Architecture x86_64
  • GPU Nvidia A100
  • Tensorrt-LLM 0.16.0

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Description:

  • Tensorrt LLM crashes with the following dump when using xgrammar guided decoding and kv cache reuse together.
  • if KV cache reuse is not used then no crash.

Steps to reproduce this issue

  • Start with the guided decoding example @ TensorRT-LLM/examples/llm-api/llm_guided_decoding.py
  • modify LLM object to enable kv-cache-reuse
import tensorrt_llm.bindings.executor as trtllm

kv_cache_config=trtllm.KvCacheConfig(enable_block_reuse=True)

llm = LLM(model="/trt_engines/a100/llama/tp1/compiled-model/", tokenizer="/trt_engines/a100/llama/tp1/tokenizer",
              guided_decoding_backend='xgrammar',kv_cache_config=kv_cache_config)
  • Run the example code

Expected behavior

No crash

actual behavior

Segmentation fault crash.

Error

[baf3afa81c47:1647813] Signal: Segmentation fault (11)
[baf3afa81c47:1647813] Signal code: Address not mapped (1)
[baf3afa81c47:1647813] Failing at address: (nil)
[baf3afa81c47:1647813] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f8bcffdf520]
[baf3afa81c47:1647813] [ 1] /tensorrt-0.16/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN8xgrammar14GrammarMatcher20FillNextTokenBitmaskEP8DLTensori+0x0)[0x7f89d845f220]
[baf3afa81c47:1647813] [ 2] /tensorrt-0.16/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager13GuidedDecoder5buildERKNS0_17ScheduledRequestsE+0x296)[0x7f89d8391fb6]
[baf3afa81c47:1647813] [ 3] /tensorrt-0.16/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager27TrtGptModelInflightBatching12forwardAsyncERKSt4listISt10shared_ptrINS0_10LlmRequestEESaIS5_EE+0x6bf)[0x7f89d83f929f]
[baf3afa81c47:1647813] [ 4] /tensorrt-0.16/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm8executor8Executor4Impl12forwardAsyncERSt4listISt10shared_ptrINS_13batch_manager10LlmRequestEESaIS7_EE+0x1e6)[0x7f89d848e746]
[baf3afa81c47:1647813] [ 5] /tensorrt-0.16/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm8executor8Executor4Impl13executionLoopEv+0x501)[0x7f89d8495331]
[baf3afa81c47:1647813] [ 6] /tensorrt-0.16/lib/python3.10/site-packages/torch/lib/libtorch.so(+0x145c0)[0x7f8bc80ad5c0]
[baf3afa81c47:1647813] [ 7] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f8bd0031ac3]
[baf3afa81c47:1647813] [ 8] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44)[0x7f8bd00c2bf4]

additional notes

Description:

  • Tensorrt LLM crashes with the following dump when using xgrammar guided decoding and kv cache reuse together.
  • if KV cache reuse is not used then no crash.
@Somasundaram-Palaniappan Somasundaram-Palaniappan added the bug Something isn't working label Jan 6, 2025
@nv-guomingz
Copy link
Collaborator

Hi @syuoni would u please take a look this xgrammar related issue?

@syuoni
Copy link
Collaborator

syuoni commented Jan 6, 2025

Thanks for reporting this issue. We also found this issue internally, and it's currently being fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants