Skip to content

Issues: NVIDIA/TensorRT-LLM

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Label
Filter by label
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Milestones
Filter by milestone
Assignee
Filter by who’s assigned
Sort

Issues list

AWQ performance issue for higher batches bug Something isn't working
#1757 opened Jun 8, 2024 by canamika27
2 of 4 tasks
server.cc:251] failed to enable peer access for some device pairs bug Something isn't working
#1754 opened Jun 8, 2024 by Godlovecui
2 of 4 tasks
[Question] "Building from source code is necessary if you want the best performance" question Further information is requested triaged Issue has been triaged by maintainers
#1750 opened Jun 6, 2024 by DreamGenX
Does TensorRT-LLM support high concurrent requests? bug Something isn't working waiting for feedback
#1745 opened Jun 6, 2024 by Godlovecui
2 of 4 tasks
Quantizing Phi-3 128k Instruct to FP8 fails. feature request New feature or request Investigating quantization Issue about lower bit quantization, including int8, int4, fp8
#1741 opened Jun 5, 2024 by kalradivyanshu
2 of 4 tasks
[ERROR] Assertion failed: Can't free tmp workspace for GEMM tactics profiling. duplicate This issue or pull request already exists feature request New feature or request Investigating
#1739 opened Jun 5, 2024 by grvsh02
1 of 2 tasks
Inflight batching for fp8 Llama and Mixtral is broken bug Something isn't working Investigating quantization Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers
#1738 opened Jun 5, 2024 by bprus
2 of 4 tasks
Conditionals seems to be evaluated eagerly feature request New feature or request Investigating
#1724 opened Jun 4, 2024 by CrimsonRadiator
2 of 4 tasks
Lora support with LLama3-70B and AWQ Quantization feature request New feature or request Investigating triaged Issue has been triaged by maintainers
#1721 opened Jun 4, 2024 by smehta2000
2 of 4 tasks
When the request is large, the Triton server has a very high TTFT bug Something isn't working
#1719 opened Jun 4, 2024 by Godlovecui
2 of 4 tasks
After deployment, each request exception generates a core.xxxx file bug Something isn't working
#1715 opened Jun 3, 2024 by taorui-plus
2 of 4 tasks
AssertionError: Each dimension must specify a 3-elements tuple or list in the order of (min,opt,max), got {dim=} Investigating quantization Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers
#1714 opened Jun 3, 2024 by doruksonmez
Llava multimodel example is giving segfault bug Something isn't working triaged Issue has been triaged by maintainers
#1709 opened Jun 1, 2024 by buddhapuneeth
2 of 4 tasks
Diversity Search not resulting in diverse outputs triaged Issue has been triaged by maintainers
#1707 opened May 31, 2024 by Bhuvanesh09
Support for Python 3.11 (+ windows) bug Something isn't working
#1706 opened May 30, 2024 by Sharrnah
2 of 4 tasks
build docker images from source is too large neeed more info triaged Issue has been triaged by maintainers
#1705 opened May 30, 2024 by Fred-cell
24.05-trtllm-python-py3 image size question Further information is requested triaged Issue has been triaged by maintainers
#1704 opened May 30, 2024 by Prots
High WER and Incomplete Transcription Issue with Whisper bug Something isn't working triaged Issue has been triaged by maintainers
#1697 opened May 29, 2024 by teith
2 of 4 tasks
ProTip! Find all open issues with in progress development work with linked:pr.