Skip to content

Mistral, RoPE Scaling, CodeLlama

Compare
Choose a tag to compare
@danielhanchen danielhanchen released this 14 Dec 02:11
· 152 commits to main since this release
  • 1. Preliminary Mistral support (4K context) Solves #2
  • 2. FINAL Mistral support (Sliding Window Attention) Solves #2
  • 3. Solves #10
  • 4. Preliminary Solves #8 and #6 Now supports Yi, TinyLlama and all with Grouped Query Attention
  • 5. FINAL GQA support - allow Flash Attn v2 install path
  • 6. Solves #5
  • 7. Solves #7 which supports larger vocab sizes over 2^15 but below 2^16
  • 8. Update Readme
  • 9. Preliminary DPO Support by example from https://github.com/152334H
  • 10. WSL (Windows) Support confirmed by https://github.com/RandomInternetPreson

Use Mistral as follows:

pip install "unsloth[colab_ampere] @ git+https://github.com/unslothai/unsloth.git"
from unsloth import FastMistralModel
import torch

model, tokenizer = FastMistralModel.from_pretrained(
    model_name = model_name,
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

model = FastMistralModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Currently only supports dropout = 0
    bias = "none",    # Currently only supports bias = "none"
    use_gradient_checkpointing = True,
    random_state = 3407,
    max_seq_length = max_seq_length,
)

https://unsloth.ai/blog/mistral-benchmark for full benchmarks and more details