vllm-project / vllm Public

Notifications
Fork 2.6k
Star 19.2k

Code
Issues 790
Pull requests 218
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q2 2024

#3861 opened Apr 4, 2024 by simon-mo

Open 21

v0.4.2 Release Tracker

#4505 by simon-mo was closed May 5, 2024

Closed 11

Virtual Office Hours: May 15 2pm ET

#4538 opened May 1, 2024 by robertgshaw2-neuralmagic

Open 1

Labels 41 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

790 Open 1,889 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Bug]: Unable to serve Llama3 using vLLM Docker container bug

Something isn't working

#4725 opened May 10, 2024 by vecorro

[Performance]: Why does vllm spend so much memory even using OPT model? performance

Performance-related issues

#4723 opened May 9, 2024 by MitchellX

[Feature]: Enforce formatting standards for C++ and CUDA code feature request

#4721 opened May 9, 2024 by mgoin

[Bug]: Not able to do lora inference with phi-3 bug

Something isn't working

#4715 opened May 9, 2024 by WeiXiaoSummer

[Bug]: export failed when kv cache fp8 quantizing Qwen1.5-72B-Chat-GPTQ-Int4 bug

Something isn't working

#4714 opened May 9, 2024 by frankxyy

[Bug]: Running the punica lora on Qwen1.5 32B model encountered RuntimeError: No suitable kernel. h_in=64 h_out=3424 dtype=Float out_dtype=BFloat16 bug

Something isn't working

#4708 opened May 9, 2024 by victorzhz111

[Doc]: OpenAI Server Command Line Args Broken documentation

Improvements or additions to documentation

#4707 opened May 9, 2024 by noamgat

[Bug]: use thread after call multiple times. KeyError: request_id bug

Something isn't working

#4706 opened May 9, 2024 by xubzhlin

[Feature]: Is it possible to dynamically adjust lora tp policy to different situations ? feature request

#4704 opened May 9, 2024 by yyccli

[Performance]: why hf is better than vllm when using benchmark throughput performance

Performance-related issues

#4702 opened May 9, 2024 by yuki252111

[Feature]: Supporting a version of Consistency LLM feature request

#4701 opened May 9, 2024 by usaxena-asapp

[Performance]: large rate of decrease in generation throughput when SamplingParams.logprobs increases performance

Performance-related issues

#4699 opened May 9, 2024 by jeffrey-fong

[Performance]: benchmarking vllm copy kernel and pytorch index copy help wanted

Extra attention is needed

performance

Performance-related issues

#4698 opened May 9, 2024 by youkaichao

2 tasks

[Bug]: batched prefill returning gibberish in some cases. bug

Something isn't working

#4697 opened May 8, 2024 by fmmoret

[Bug]: VLLM + tritonserver bug

Something isn't working

#4695 opened May 8, 2024 by dlopes78

[Feature]: bind python and c++ through tools other than pybind11 feature request help wanted

Extra attention is needed

#4694 opened May 8, 2024 by youkaichao

Installation with CPU with errors installation

Installation problems

#4692 opened May 8, 2024 by ming-ddtechcg

[Doc]: API reference for LLM class documentation

Improvements or additions to documentation

#4684 opened May 8, 2024 by zplizzi

[Usage]: Get time statistics with each request usage

How to use vllm

#4683 opened May 8, 2024 by arunpatala

[Bug]: assert parts[0] == "base_model" AssertionError bug

Something isn't working

#4682 opened May 8, 2024 by Edisonwei54

[Feature]: support for aixcoder feature request

#4679 opened May 8, 2024 by chucksylar

[Usage]: Out of Memory w/ multiple models usage

How to use vllm

#4678 opened May 8, 2024 by yudataguy

[Feature]: support lora such as qwen-7b and qwen1.5 feature request

#4677 opened May 8, 2024 by kynow2

[Bug]: i cant't use vllm lora with 2gpus, but 1 gpu is ok bug

Something isn't working

#4676 opened May 8, 2024 by kynow2

[Feature]: vAttention feature request

#4675 opened May 8, 2024 by nivibilla

Previous 1 2 3 4 5 … 31 32 Next

Previous Next

ProTip! Add no:assignee to see everything that’s not assigned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly