Releases: erfanzar/EasyDeL
Releases · erfanzar/EasyDeL
Pallas Fusion: GPU Turbocharged 🚀
EasyDeL version 0.0.65
-
New Features
- Pallas Flash Attention on CPU/GPU/TPU via FJFormer and supports bias.
- ORPO Trainer is added and now it's in your bag.
- WebSocket Serve Engine.
- Now EasyDeL is 30% faster on GPUs.
- No JAX-Triton is now needed to run GPU kernels.
- Now you can specify the backward kernel implementation for Pallas Attention.
- now you have to import EasyDeL as
easydel
instead ofEasyDel
.
-
New Models
- OpenELM model series are now present.
- DeepseekV2 model series are now present.
-
Fixed Bugs
- CUDNN FlashAttention Bugs are now fixed.
- Llama3 Model 8Bit quantization of parameters had a lot of improvements.
- Splash Attention bugs on TPUs are now fixed .
- Dbrx Model Bugs are fixed.
- DPOTrainer Bugs are Fixed (creating dataset).
-
Known Bugs
- Splash Attention won't work on TPUv3.
- Pallas Attention won't work on TPUv3.
- You need to install flash_attn in order to convert HF DeepseekV2 to EasyDeL (bug in DeepseekV2 implementation from original authors).
- Some Examples are out dated.
Full Changelog: 0.0.63...0.0.65
0.0.63
whats changed
- Phi3 Model Added.
- Dbrx Model Added.
- Arctic Model Added.
- Lora Fine-Tuning Bugs Fixed.
- Vanilla Attention is Optimized.
- Sharded Vanilla is the default attention mechanism now.
Full Changelog: 0.0.61...0.0.63
EasyDeL-0.0.61 Dynamic Changes
What's Changed
- Add support for iterable dataset loading by @yhavinga in #138
SFTTrainer
bugs are fixed.Parameter quantization
is now available for all of the models.AutoEasyDeLModelForCausalLM
now supportsload_in_8bit
.- Memory Management improved.
Gemma
Models Generation Issue is now Fixed.- Trainers are now 2~8% faster.
- Attention Operation is improved.
- The
Cohere
Model is now present. JAXServer
is improved.- Due to recent changes a lot of examples of documentation have changed and will be changed soon.
Full Changelog: 0.0.60...0.0.61
EasyDeL Version 0.0.60
What's Changed
SFTTrainer
is now available.VideoCausalLanguageModelTrainer
is now available.- New models such as Grok-1, Qwen2Moe, Mamba, Rwkv, and Whisper are available.
- MoE models had some speed improvements.
- Training Speed is now 18%~42% faster.
- Normal Attention is now faster by 12%~30% #131 .
- DPOTrainer Bugs Fixed.
- CausalLanguageModelTrainer is now more customizable.
- WANDB logging has improved.
- Performace Mode is added to Training Arguments.
- Model configs pass attributes to PretrainedConfig to prevent override… by @yhavinga in #122
- Ignore token label smooth z loss by @yhavinga in #123
- Time the whole train loop instead of only call to train step function by @yhavinga in #124
- Add save_total_limit argument to delete older checkpoints by @yhavinga in #127
- Add gradient norm logging, fix metric collection on multi-worker setup by @yhavinga in #135
Full Changelog: 0.0.55...0.0.60
EasyDeL Version 0.0.55
EasyDeL Version 0.0.55
- JAX
DPOTrainer
Bugs Fixed - StableLM Models are supported with FlashAttention and RING-Attention
- RingAttention is supported for Up to 512K or 1M token training and inference
- chunk MLP Is Supported for Up to 512K or 1M token training and inference
- now all the Models support shared key and value caching for high context length interface and can be accessed via
use_sharded_kv_caching=True
in model config (see examples). - EasyDeL successfully passed 1256000 Context Length Inference on TPUs (Llama Model Tested)
- Vision Trainer is added, you might except some bugs from that.
Full Changelog: 0.0.50...0.0.55
0.0.50 Mixture of EasyDeL experts
What's Changed
- Optimize mean loss and accuracy calculation by @yhavinga in #100
- Mixtral Models are fully supported and they are
PJIT-compatible
- A Wider range of models now support FlashAttention on TPU
- Qwen 1, Qwen 2, PHI 2, Robert is new Added Models which support FlashAttention on TPU and
EasyBIT
- LoRA support for the trainer is now Added (
EasyDeLXRapTureConfig
) - Adding EasyDel Serve Engine APIs
- Adding Prompter (Beta and might be removed in future updates)
- The Training Process is now 21 % Faster in
0.0.50
than0.0.42
. - Transform Functions are now Automated for all the models (Except
MosaicMPT
for this one you still have to use static methods) - The Trainer APIs have changed and now it's faster, more dynamic, and more hackable.
- Default Version of the JAX now changed to 0.4.22 for
FJFormer
custom Pallas kernels usage.
New Contributors
Full Changelog: 0.0.42...0.0.50
Version 0.0.42 Easy State
New Features:
EasyDelState
is added- Auto Convertors from torch > huggingface > jax > flax > EasyDel are added
- Trainer has a lot of improvements
Full Changelog: 0.0.41...0.0.42
0.0.41
what has changed so far in 0.0.41
- API Changes
- making CausalLanguageModel Trainer separated from others
- Custom Errors added
- Timer bugs fixed
AutoEasyDelForCasualLM
is now more automated and falcon bugs have been fixed- 4D Mesh being used for better partitioning
- And many many more
Full Changelog: 0.0.40...0.0.41
Version 0.0.40
What's Changed
- Updating JAXBeta branch from main branch by @erfanzar in #42
- Update Beta Branch by @erfanzar in #48
- Update V0.0.40 Beta (Adding Flash Attention, Adding 8,6,4 Bit models ,improving Documentations) by @erfanzar in #52
- Fix eval batch loop (beta branch) by @w11wo in #51
- Support Sphinx Docstring Format by @w11wo in #53
- Update Beta Branch by @erfanzar in #55
- Update Beta Branch by @erfanzar in #56
- Updating Mistral and Llama Models by @erfanzar in #57
- Updating Beta Branch by @erfanzar in #58
- Changing Mesh by @erfanzar in #60
- Updating Beta by @erfanzar in #62
- 4D Mesh now supported for all the Models and BITs improved by @erfanzar in #64
Full Changelog: 0.0.38...0.0.40
v 0.0.38
Changes and Latest Commits:
Full Changelog: 0.0.37...0.0.38