-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Issues: NVIDIA/TensorRT-LLM
[Issue Template]Short one-line summary of the issue #270
#783
opened Jan 1, 2024 by
juney-nvidia
Open
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Help needed: No clear documentation/examples for implementing speculative decoding with backend serve
#2671
opened Jan 8, 2025 by
e1ijah1
trtllm-serve without any output Qwne2.5-7b
bug
Something isn't working
OpenAI API
#2667
opened Jan 8, 2025 by
Justin-12138
1 of 4 tasks
fp8 quantization for CohereForCausalLM
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2666
opened Jan 7, 2025 by
Alireza3242
What are supported low-bit (int8/fp8/int4) data types in MLP and Attention layers?
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2664
opened Jan 6, 2025 by
mirzadeh
QTIP Quantization Support?
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2663
opened Jan 6, 2025 by
aikitoria
Segmentation fault crash: Tensorrt-LLM crash when using guided decoding xgrammar and kv cache reuse
bug
Something isn't working
#2660
opened Jan 6, 2025 by
Somasundaram-Palaniappan
2 of 4 tasks
[QST] why the implementation of f16xs8 mixed gemm is different between TRT-LLM and native cutlass mixed gemm example?
Investigating
Performance
Issue about performance number
triaged
Issue has been triaged by maintainers
#2659
opened Jan 5, 2025 by
danielhua23
Qwen2 VL cannot be convert to checkpoint on TensorRT-LLM
bug
Something isn't working
Investigating
LLM API/Workflow
triaged
Issue has been triaged by maintainers
#2658
opened Jan 5, 2025 by
xunuohope1107
2 of 4 tasks
No module named 'tensorrt_llm.bindings'` error message
triaged
Issue has been triaged by maintainers
#2656
opened Jan 3, 2025 by
maulikmadhavi
setuptools conflict
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2655
opened Jan 3, 2025 by
kanebay
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterable
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#2652
opened Jan 3, 2025 by
Whisht
2 of 4 tasks
gemma 2 convert_checkpoint takes gpu ram more than needed
bug
Something isn't working
Investigating
LLM API/Workflow
triaged
Issue has been triaged by maintainers
#2647
opened Jan 2, 2025 by
Alireza3242
2 of 4 tasks
Failed to build engine with lookahead_decoding
bug
Something isn't working
Investigating
Speculative Decoding
triaged
Issue has been triaged by maintainers
#2641
opened Dec 31, 2024 by
aikitoria
2 of 4 tasks
Multi-Modal on TRT-LLM on aarch64 (Holoscan IGX Devkit) fails to covert VILA checkpoints
bug
Something isn't working
#2638
opened Dec 30, 2024 by
MMelQin
2 of 4 tasks
How to make it not display info information?executorExampleBasic.cpp
#2637
opened Dec 28, 2024 by
aaIce
Cpp runner outputs wrong results when using lora + tensor parallelism
bug
Something isn't working
Investigating
Lora/P-tuning
triaged
Issue has been triaged by maintainers
#2634
opened Dec 28, 2024 by
ShuaiShao93
4 tasks
Troubleshoot mistral model
bug
Something isn't working
#2632
opened Dec 26, 2024 by
krishnanpooja
1 of 4 tasks
[Performance] KV cache reuse is slower when batch size > 1
Investigating
KV-Cache Management
triaged
Issue has been triaged by maintainers
#2631
opened Dec 26, 2024 by
ReginaZh
Previous Next
ProTip!
no:milestone will show everything without a milestone.