SnapKV 📷

We introduce an innovative and out-of-box KV cache compression method, SnapKV.

Requirements

Currently tested with transformers==4.37.0, need to check if it is compatible with higher version.

transformers>=4.36
flash-attn==2.4.0

Installation

git clone git@github.com:FasterDecoding/SnapKV.git
cd SnapKV
pip install -e .

Quick Start

Use SnapKV-optimized Models

For example:

from snapkv.monkeypatch.monkeypatch import replace_mistral
replace_mistral() # Use monkey patches enable SnapKV

Check the example notebook.

Customize Your SnapKV-optimized Models

SnapKV can be easily integrated with other models.

You can follow the comment marked with [SnapKV] in existing models to construct your own models. (Currently we support Llama family/ Mistral/ Mixtral)

The detailed algorithm of SnapKV is in snapkv_utils.py

Partial Results

TODO

Add observation experiments for reduplication.
Add LongBench for reduplication.
Explore the prompt phase compression.

Citation

If you feel this project is helpful, please consider cite our report 😊

@article{li2024snapkv,
  title={SnapKV: LLM Knows What You are Looking for Before Generation},
  author={Li, Yuhong and Huang, Yingbing and Yang, Bowen and Venkitesh, Bharat and Locatelli, Acyr and Ye, Hanchen and Cai, Tianle and Lewis, Patrick and Chen, Deming},
  journal={arXiv preprint arXiv:2404.14469},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
assets		assets
experiments/LongBench		experiments/LongBench
notebooks		notebooks
snapkv/monkeypatch		snapkv/monkeypatch
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

experiments/LongBench

experiments/LongBench

notebooks

notebooks

snapkv/monkeypatch

snapkv/monkeypatch

.gitignore

.gitignore

README.md

README.md

pyproject.toml

pyproject.toml

Repository files navigation

SnapKV 📷

Requirements

Installation

Quick Start

Use SnapKV-optimized Models

Customize Your SnapKV-optimized Models

Partial Results

TODO

Citation

About

Releases

Packages

Contributors 3

Languages

FasterDecoding/SnapKV

Folders and files

Latest commit

History

Repository files navigation

SnapKV 📷

Requirements

Installation

Quick Start

Use SnapKV-optimized Models

Customize Your SnapKV-optimized Models

Partial Results

TODO

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages