Skip to content

Kathryn-cat/mlsys-literature

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 

Repository files navigation

MLSys Literature


Inference

  1. Full Stack Optimization of Transformer Inference: a Survey: https://arxiv.org/pdf/2302.14017.pdf
  2. Large Transformer Model Inference Optimization: https://lilianweng.github.io/posts/2023-01-10-inference-optimization/
  3. High-throughput Generative Inference of Large Language Models with a Single GPU: https://arxiv.org/pdf/2303.06865.pdf
  4. Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations: https://arxiv.org/pdf/2304.11267.pdf

Quantization

  1. Up or Down? Adaptive Rounding for Post-Training Quantization: https://arxiv.org/pdf/2004.10568.pdf
  2. 8-bit Optimizers via Block-wise Quantization: https://arxiv.org/pdf/2110.02861.pdf
  3. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale: https://arxiv.org/pdf/2208.07339.pdf
  4. ULPPACK: FAST SUB-8-BIT MATRIX MULTIPLY ON COMMODITY SIMD HARDWARE: https://proceedings.mlsys.org/paper/2022/file/14bfa6bb14875e45bba028a21ed38046-Paper.pdf
  5. GPTQ: ACCURATE POST-TRAINING QUANTIZATION FOR GENERATIVE PRE-TRAINED TRANSFORMERS: https://arxiv.org/pdf/2210.17323.pdf
  6. RPTQ: Reorder-based Post-training Quantization for Large Language Models: https://arxiv.org/pdf/2304.01089.pdf

Training

  1. Training Compute-Optimal Large Language Models: https://arxiv.org/pdf/2203.15556.pdf
  2. Decentralized Training of Foundation Models in Heterogeneous Environments: https://arxiv.org/pdf/2206.01288.pdf
  3. ZeRO: Memory Optimizations Toward Training Trillion Parameter Models: https://arxiv.org/pdf/1910.02054.pdf
  4. Stable and low-precision training for large-scale vision-language models: https://arxiv.org/pdf/2304.13013.pdf

Scaling

  1. Sparse is Enough in Scaling Transformers: https://proceedings.neurips.cc/paper/2021/file/51f15efdd170e6043fa02a74882f0470-Paper.pdf
  2. Scaling Transformer to 1M tokens and beyond with RMT: https://arxiv.org/pdf/2304.11062.pdf

Compilation

  1. RAF: Holistic Compilation for Deep Learning Model Training: https://arxiv.org/pdf/2303.04759v1.pdf
  2. Graphene: An IR for Optimized Tensor Computations on GPUs: https://dl.acm.org/doi/pdf/10.1145/3582016.3582018
  3. Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs: https://arxiv.org/pdf/2210.09603.pdf

Fine-Tuning

  1. LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS: https://arxiv.org/pdf/2106.09685.pdf

Sparsity:

  1. JAXPRUNER: A CONCISE LIBRARY FOR SPARSITY RESEARCH: https://arxiv.org/pdf/2304.14082.pdf

About

A collection of papers related to machine learning systems. (still in early stage of collection)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published