neuralmagic / nm-vllm Public

forked from vllm-project/vllm

Notifications
Fork 7
Star 209

Code
Issues
Pull requests 30
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: neuralmagic/nm-vllm

Labels 9 Milestones 0

New pull request New

30 Open 219 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

updates for automation (and release)

#265 opened May 23, 2024 by andy-neuma

Loading…

update install commands

#264 opened May 23, 2024 by dhuangnm

Loading…

Fix quantization rounding

#263 opened May 22, 2024 by varun-sundar-rabindranath

Loading…

Update Dockerfile for the release

#262 opened May 22, 2024 by dhuangnm

Loading…

Address py38/39 incompatibilities

#261 opened May 22, 2024 by dbarbuzzi

Loading…

Lwilkinson/metrics expansion

#258 opened May 22, 2024 by LucasWilkinson • Draft

Upstream sync 2024 05 19

#249 opened May 19, 2024 by robertgshaw2-neuralmagic

Loading…

Reasonable int8 configs

#238 opened May 13, 2024 by varun-sundar-rabindranath

Loading…

[CI/Build] Basic server correctness test

#237 opened May 13, 2024 by derekk-nm • Draft

Skipping refactor

#234 opened May 13, 2024 by robertgshaw2-neuralmagic

Loading…

Create test_optional_libraries.py

#230 opened May 9, 2024 by mgoin

Loading…

[Activation Quantization] Dynamic Per Token Support

#225 opened May 6, 2024 by dsikka

Loading…

Initial CompressedTensors config + Activation Quantization support …

#219 opened Apr 30, 2024 by dsikka

Loading…

Torch compile fusion backend prototype

#209 opened Apr 25, 2024 by bnellnm • Draft

[WIP] Please do not delete - comparing changes between branches

#203 opened Apr 23, 2024 by afeldman-nm

Loading…

[WIP] FLAN-T5 integration

#194 opened Apr 17, 2024 by afeldman-nm

Loading…

WIP: basic correctness test

#192 opened Apr 17, 2024 by derekk-nm • Draft

whl centric

#191 opened Apr 17, 2024 by andy-neuma

Loading…

Prototype FP8Linear W8A8 runtime quantization

#190 opened Apr 15, 2024 by mgoin • Draft

Entrypoint for hosting local Kobold Lite chat interface

#184 opened Apr 12, 2024 by mgoin

Loading…

Added Docker Compose Example

#182 opened Apr 12, 2024 by robertgshaw2-neuralmagic

Loading…

vllm - quantization : DO NOT MERGE

#180 opened Apr 11, 2024 by varun-sundar-rabindranath

Loading…

Pypi and updates

#177 opened Apr 9, 2024 by andy-neuma

Loading…

[WIP] Upstream encoder/decoder support based on multiple blocktables

#161 opened Apr 2, 2024 by afeldman-nm • Draft

Support for compressed-tensors

#159 opened Apr 2, 2024 by dbogunowicz

Loading…

Previous 1 2 Next

Previous Next

ProTip! Add no:assignee to see everything that’s not assigned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly