Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Core] Fix circular reference which leaked llm instance in local dev env
#4737
opened May 10, 2024 by
rkooo567
Loading…
[CI/Build] Enforce style for C++ and CUDA code with
clang-format
#4722
opened May 9, 2024 by
mgoin
Loading…
[CI/Build] Tweak Marlin Nondeterminism Issues
#4713
opened May 9, 2024 by
robertgshaw2-neuralmagic
Loading…
[Core][Hash][Automatic Prefix caching] Accelerating the hashing function by avoiding deep copies
#4696
opened May 8, 2024 by
KuntaiDu
Loading…
[Frontend] OpenAI API server: Do not add bos token by default when encoding
#4688
opened May 8, 2024 by
bofenghuang
Loading…
[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API
#4681
opened May 8, 2024 by
rkooo567
Loading…
[Misc] enable DynamicNTKScalingRotaryEmbedding YaRNScalingRotaryEmbedding test case
#4668
opened May 8, 2024 by
AllenDou
Loading…
[ROCm][Hardware][AMD] Adding Navi21 to fallback to naive attention if Triton is not used
rocm
#4658
opened May 7, 2024 by
alexeykondrat
Loading…
[CORE] Adding support for insertion of soft-tuned prompts
#4645
opened May 7, 2024 by
SwapnilDreams100
Loading…
[Frontend][OpenAI] Support for returning max_model_len on /v1/models response
#4643
opened May 7, 2024 by
Avinash-Raj
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2024-05-07.