vllm-project / vllm Public

Notifications
Fork 2.6k
Star 19.3k

Code
Issues 794
Pull requests 218
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: vllm-project/vllm

Labels 41 Milestones 0

New pull request New

218 Open 1,621 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Core] Fix circular reference which leaked llm instance in local dev env

#4737 opened May 10, 2024 by rkooo567

Loading…

[Bugfix] Fix CLI arguments in OpenAI server docs

#4729 opened May 10, 2024 by AllenDou

Loading…

[Core]fix type annotation for swap_blocks

#4726 opened May 10, 2024 by jikunshang

Loading…

[Speculative decoding] Improve n-gram efficiency

#4724 opened May 9, 2024 by comaniac

Loading…

[CI/Build] Enforce style for C++ and CUDA code with clang-format

#4722 opened May 9, 2024 by mgoin

Loading…

[Misc] Added devcontainer to help vscode dev setup

#4720 opened May 9, 2024 by ElefHead

Loading…

[Misc] Apply a couple g++ cleanups

#4719 opened May 9, 2024 by stevegrubb

Loading…

[CORE] Improvement in ranks code

#4718 opened May 9, 2024 by SwapnilDreams100

Loading…

[CI/Build] Tweak Marlin Nondeterminism Issues

#4713 opened May 9, 2024 by robertgshaw2-neuralmagic

Loading…

add TypeLogitsProcessor

#4712 opened May 9, 2024 by eitanturok • Draft

[Doc] Add API reference for offline inference

#4710 opened May 9, 2024 by DarkLight1337

Loading…

[Core][Hash][Automatic Prefix caching] Accelerating the hashing function by avoiding deep copies

#4696 opened May 8, 2024 by KuntaiDu

Loading…

Remove Ray health check

#4693 opened May 8, 2024 by Yard1

Loading…

[Core] Implement sharded state loader

#4690 opened May 8, 2024 by aurickq

Loading…

[Frontend] OpenAI API server: Do not add bos token by default when encoding

#4688 opened May 8, 2024 by bofenghuang

Loading…

[Misc] Add OpenTelemetry support

#4687 opened May 8, 2024 by ronensc

Loading…

[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API

#4681 opened May 8, 2024 by rkooo567

Loading…

[Misc] enable DynamicNTKScalingRotaryEmbedding YaRNScalingRotaryEmbedding test case

#4668 opened May 8, 2024 by AllenDou

Loading…

[Draft] [FP8] CUTLASS FP8 matrix multiply

#4662 opened May 7, 2024 by pcmoritz • Draft

[ROCm][Hardware][AMD] Adding Navi21 to fallback to naive attention if Triton is not used

rocm

#4658 opened May 7, 2024 by alexeykondrat

Loading…

[Frontend][OpenAI] Add support for OpenAI tools calling

#4656 opened May 7, 2024 by Xwdit

Loading…

Support Deepseek-V2

new model

Requests to new models

#4650 opened May 7, 2024 by zwd003

Loading…

[Scheduler] Warning upon preemption and Swapping

#4647 opened May 7, 2024 by rkooo567

Loading…

[CORE] Adding support for insertion of soft-tuned prompts

#4645 opened May 7, 2024 by SwapnilDreams100

Loading…

[Frontend][OpenAI] Support for returning max_model_len on /v1/models response

#4643 opened May 7, 2024 by Avinash-Raj

Loading…

Previous 1 2 3 4 5 … 8 9 Next

Previous Next

ProTip! Updated in the last three days: updated:>2024-05-07.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly