Segmentation fault crash: Tensorrt-LLM crash when using guided decoding xgrammar and kv cache reuse #2660

Somasundaram-Palaniappan · 2025-01-06T08:22:05Z

System Info

CPU Architecture x86_64
GPU Nvidia A100
Tensorrt-LLM 0.16.0

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Description:

Tensorrt LLM crashes with the following dump when using xgrammar guided decoding and kv cache reuse together.
if KV cache reuse is not used then no crash.

Steps to reproduce this issue

Start with the guided decoding example @ TensorRT-LLM/examples/llm-api/llm_guided_decoding.py
modify LLM object to enable kv-cache-reuse

import tensorrt_llm.bindings.executor as trtllm

kv_cache_config=trtllm.KvCacheConfig(enable_block_reuse=True)

llm = LLM(model="/trt_engines/a100/llama/tp1/compiled-model/", tokenizer="/trt_engines/a100/llama/tp1/tokenizer",
              guided_decoding_backend='xgrammar',kv_cache_config=kv_cache_config)

Run the example code

Expected behavior

No crash

actual behavior

Segmentation fault crash.

Error

[baf3afa81c47:1647813] Signal: Segmentation fault (11)
[baf3afa81c47:1647813] Signal code: Address not mapped (1)
[baf3afa81c47:1647813] Failing at address: (nil)
[baf3afa81c47:1647813] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f8bcffdf520]
[baf3afa81c47:1647813] [ 1] /tensorrt-0.16/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN8xgrammar14GrammarMatcher20FillNextTokenBitmaskEP8DLTensori+0x0)[0x7f89d845f220]
[baf3afa81c47:1647813] [ 2] /tensorrt-0.16/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager13GuidedDecoder5buildERKNS0_17ScheduledRequestsE+0x296)[0x7f89d8391fb6]
[baf3afa81c47:1647813] [ 3] /tensorrt-0.16/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager27TrtGptModelInflightBatching12forwardAsyncERKSt4listISt10shared_ptrINS0_10LlmRequestEESaIS5_EE+0x6bf)[0x7f89d83f929f]
[baf3afa81c47:1647813] [ 4] /tensorrt-0.16/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm8executor8Executor4Impl12forwardAsyncERSt4listISt10shared_ptrINS_13batch_manager10LlmRequestEESaIS7_EE+0x1e6)[0x7f89d848e746]
[baf3afa81c47:1647813] [ 5] /tensorrt-0.16/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm8executor8Executor4Impl13executionLoopEv+0x501)[0x7f89d8495331]
[baf3afa81c47:1647813] [ 6] /tensorrt-0.16/lib/python3.10/site-packages/torch/lib/libtorch.so(+0x145c0)[0x7f8bc80ad5c0]
[baf3afa81c47:1647813] [ 7] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f8bd0031ac3]
[baf3afa81c47:1647813] [ 8] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44)[0x7f8bd00c2bf4]

additional notes

Description:

Tensorrt LLM crashes with the following dump when using xgrammar guided decoding and kv cache reuse together.
if KV cache reuse is not used then no crash.

The text was updated successfully, but these errors were encountered:

nv-guomingz · 2025-01-06T13:30:31Z

Hi @syuoni would u please take a look this xgrammar related issue?

syuoni · 2025-01-06T14:29:14Z

Thanks for reporting this issue. We also found this issue internally, and it's currently being fixed.

Somasundaram-Palaniappan added the bug Something isn't working label Jan 6, 2025

nv-guomingz assigned syuoni Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault crash: Tensorrt-LLM crash when using guided decoding xgrammar and kv cache reuse #2660

Segmentation fault crash: Tensorrt-LLM crash when using guided decoding xgrammar and kv cache reuse #2660

Somasundaram-Palaniappan commented Jan 6, 2025

nv-guomingz commented Jan 6, 2025

syuoni commented Jan 6, 2025

Segmentation fault crash: Tensorrt-LLM crash when using guided decoding xgrammar and kv cache reuse #2660

Segmentation fault crash: Tensorrt-LLM crash when using guided decoding xgrammar and kv cache reuse #2660

Comments

Somasundaram-Palaniappan commented Jan 6, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

nv-guomingz commented Jan 6, 2025

syuoni commented Jan 6, 2025