Skip to content

Releases: erfanzar/EasyDeL

Pallas Fusion: GPU Turbocharged 🚀

16 May 09:33
Compare
Choose a tag to compare

EasyDeL version 0.0.65

  • New Features

    • Pallas Flash Attention on CPU/GPU/TPU via FJFormer and supports bias.
    • ORPO Trainer is added and now it's in your bag.
    • WebSocket Serve Engine.
    • Now EasyDeL is 30% faster on GPUs.
    • No JAX-Triton is now needed to run GPU kernels.
    • Now you can specify the backward kernel implementation for Pallas Attention.
    • now you have to import EasyDeL as easydel instead of EasyDel.
  • New Models

    • OpenELM model series are now present.
    • DeepseekV2 model series are now present.
  • Fixed Bugs

    • CUDNN FlashAttention Bugs are now fixed.
    • Llama3 Model 8Bit quantization of parameters had a lot of improvements.
    • Splash Attention bugs on TPUs are now fixed .
    • Dbrx Model Bugs are fixed.
    • DPOTrainer Bugs are Fixed (creating dataset).
  • Known Bugs

    • Splash Attention won't work on TPUv3.
    • Pallas Attention won't work on TPUv3.
    • You need to install flash_attn in order to convert HF DeepseekV2 to EasyDeL (bug in DeepseekV2 implementation from original authors).
    • Some Examples are out dated.

Full Changelog: 0.0.63...0.0.65

0.0.63

27 Apr 12:56
Compare
Choose a tag to compare

whats changed

  • Phi3 Model Added.
  • Dbrx Model Added.
  • Arctic Model Added.
  • Lora Fine-Tuning Bugs Fixed.
  • Vanilla Attention is Optimized.
  • Sharded Vanilla is the default attention mechanism now.

Full Changelog: 0.0.61...0.0.63

EasyDeL-0.0.61 Dynamic Changes

17 Apr 15:45
Compare
Choose a tag to compare

What's Changed

  • Add support for iterable dataset loading by @yhavinga in #138
  • SFTTrainer bugs are fixed.
  • Parameter quantization is now available for all of the models.
  • AutoEasyDeLModelForCausalLM now supports load_in_8bit.
  • Memory Management improved.
  • Gemma Models Generation Issue is now Fixed.
  • Trainers are now 2~8% faster.
  • Attention Operation is improved.
  • The Cohere Model is now present.
  • JAXServer is improved.
  • Due to recent changes a lot of examples of documentation have changed and will be changed soon.

Full Changelog: 0.0.60...0.0.61

EasyDeL Version 0.0.60

06 Apr 15:50
Compare
Choose a tag to compare

What's Changed

  • SFTTrainer is now available.
  • VideoCausalLanguageModelTrainer is now available.
  • New models such as Grok-1, Qwen2Moe, Mamba, Rwkv, and Whisper are available.
  • MoE models had some speed improvements.
  • Training Speed is now 18%~42% faster.
  • Normal Attention is now faster by 12%~30% #131 .
  • DPOTrainer Bugs Fixed.
  • CausalLanguageModelTrainer is now more customizable.
  • WANDB logging has improved.
  • Performace Mode is added to Training Arguments.
  • Model configs pass attributes to PretrainedConfig to prevent override… by @yhavinga in #122
  • Ignore token label smooth z loss by @yhavinga in #123
  • Time the whole train loop instead of only call to train step function by @yhavinga in #124
  • Add save_total_limit argument to delete older checkpoints by @yhavinga in #127
  • Add gradient norm logging, fix metric collection on multi-worker setup by @yhavinga in #135

Full Changelog: 0.0.55...0.0.60

EasyDeL Version 0.0.55

03 Mar 09:30
Compare
Choose a tag to compare

EasyDeL Version 0.0.55

  • JAX DPOTrainer Bugs Fixed
  • StableLM Models are supported with FlashAttention and RING-Attention
  • RingAttention is supported for Up to 512K or 1M token training and inference
  • chunk MLP Is Supported for Up to 512K or 1M token training and inference
  • now all the Models support shared key and value caching for high context length interface and can be accessed via use_sharded_kv_caching=True in model config (see examples).
  • EasyDeL successfully passed 1256000 Context Length Inference on TPUs (Llama Model Tested)
  • Vision Trainer is added, you might except some bugs from that.

Full Changelog: 0.0.50...0.0.55

0.0.50 Mixture of EasyDeL experts

08 Feb 11:40
Compare
Choose a tag to compare

What's Changed

  • Optimize mean loss and accuracy calculation by @yhavinga in #100
  • Mixtral Models are fully supported and they are PJIT-compatible
  • A Wider range of models now support FlashAttention on TPU
  • Qwen 1, Qwen 2, PHI 2, Robert is new Added Models which support FlashAttention on TPU and EasyBIT
  • LoRA support for the trainer is now Added (EasyDeLXRapTureConfig)
  • Adding EasyDel Serve Engine APIs
  • Adding Prompter (Beta and might be removed in future updates)
  • The Training Process is now 21 % Faster in 0.0.50 than 0.0.42.
  • Transform Functions are now Automated for all the models (Except MosaicMPT for this one you still have to use static methods)
  • The Trainer APIs have changed and now it's faster, more dynamic, and more hackable.
  • Default Version of the JAX now changed to 0.4.22 for FJFormer custom Pallas kernels usage.

New Contributors

Full Changelog: 0.0.42...0.0.50

Version 0.0.42 Easy State

11 Jan 12:56
Compare
Choose a tag to compare

New Features:

  • EasyDelState is added
  • Auto Convertors from torch > huggingface > jax > flax > EasyDel are added
  • Trainer has a lot of improvements

Full Changelog: 0.0.41...0.0.42

0.0.41

26 Dec 17:41
Compare
Choose a tag to compare

what has changed so far in 0.0.41

  • API Changes
  • making CausalLanguageModel Trainer separated from others
  • Custom Errors added
  • Timer bugs fixed
  • AutoEasyDelForCasualLM is now more automated and falcon bugs have been fixed
  • 4D Mesh being used for better partitioning
  • And many many more

Full Changelog: 0.0.40...0.0.41

Version 0.0.40

19 Dec 21:09
Compare
Choose a tag to compare

What's Changed

Full Changelog: 0.0.38...0.0.40

v 0.0.38