v0.7.1: patch release

Latest

Latest

fxmarty released this 01 Mar 13:14

· 21 commits to main since this release

Support loading sharded quantized checkpoints

Sharded checkpoints can now be loaded in the from_quantized method.

Support loading sharded quantized checkpoints. by @LaaZa in #425

Gemma GPTQ quantization

Gemma model can be quantized with AutoGPTQ.

Add support for Gemma models. by @LaaZa in #561

Other changes and fixes

Add back missing import by @fxmarty in #553
Fix bias materialization for Marlin by @fxmarty in #554
Fix shape check marlin by @fxmarty in #557
Explicitely check compute capability in marlin's QLinear by @fxmarty in #567
Compatibility with latest transformers by @fxmarty in #573

Full Changelog: v0.7.0...v0.7.1

Contributors

LaaZa and fxmarty

Assets 2