Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: Unable to serve Llama3 using vLLM Docker container
bug
Something isn't working
#4725
opened May 10, 2024 by
vecorro
[Performance]: Why does vllm spend so much memory even using OPT model?
performance
Performance-related issues
#4723
opened May 9, 2024 by
MitchellX
[Feature]: Enforce formatting standards for C++ and CUDA code
feature request
#4721
opened May 9, 2024 by
mgoin
[Bug]: Not able to do lora inference with phi-3
bug
Something isn't working
#4715
opened May 9, 2024 by
WeiXiaoSummer
[Bug]: export failed when kv cache fp8 quantizing Qwen1.5-72B-Chat-GPTQ-Int4
bug
Something isn't working
#4714
opened May 9, 2024 by
frankxyy
[Bug]: Running the punica lora on Qwen1.5 32B model encountered RuntimeError: No suitable kernel. h_in=64 h_out=3424 dtype=Float out_dtype=BFloat16
bug
Something isn't working
#4708
opened May 9, 2024 by
victorzhz111
[Doc]: OpenAI Server Command Line Args Broken
documentation
Improvements or additions to documentation
#4707
opened May 9, 2024 by
noamgat
[Bug]: use thread after call multiple times. KeyError: request_id
bug
Something isn't working
#4706
opened May 9, 2024 by
xubzhlin
[Feature]: Is it possible to dynamically adjust lora tp policy to different situations ?
feature request
#4704
opened May 9, 2024 by
yyccli
[Performance]: why hf is better than vllm when using benchmark throughput
performance
Performance-related issues
#4702
opened May 9, 2024 by
yuki252111
[Feature]: Supporting a version of Consistency LLM
feature request
#4701
opened May 9, 2024 by
usaxena-asapp
[Performance]: large rate of decrease in generation throughput when SamplingParams.logprobs increases
performance
Performance-related issues
#4699
opened May 9, 2024 by
jeffrey-fong
[Performance]: benchmarking vllm copy kernel and pytorch index copy
help wanted
Extra attention is needed
performance
Performance-related issues
#4698
opened May 9, 2024 by
youkaichao
2 tasks
[Bug]: batched prefill returning gibberish in some cases.
bug
Something isn't working
#4697
opened May 8, 2024 by
fmmoret
[Feature]: bind python and c++ through tools other than pybind11
feature request
help wanted
Extra attention is needed
#4694
opened May 8, 2024 by
youkaichao
Installation with CPU with errors
installation
Installation problems
#4692
opened May 8, 2024 by
ming-ddtechcg
[Doc]: API reference for LLM class
documentation
Improvements or additions to documentation
#4684
opened May 8, 2024 by
zplizzi
[Usage]: Get time statistics with each request
usage
How to use vllm
#4683
opened May 8, 2024 by
arunpatala
[Bug]: assert parts[0] == "base_model" AssertionError
bug
Something isn't working
#4682
opened May 8, 2024 by
Edisonwei54
[Usage]: Out of Memory w/ multiple models
usage
How to use vllm
#4678
opened May 8, 2024 by
yudataguy
[Feature]: support lora such as qwen-7b and qwen1.5
feature request
#4677
opened May 8, 2024 by
kynow2
[Bug]: i cant't use vllm lora with 2gpus, but 1 gpu is ok
bug
Something isn't working
#4676
opened May 8, 2024 by
kynow2
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.