Skip to content

Releases: hpcaitech/ColossalAI

Version v0.3.7 Release Today!

27 Apr 11:00
4cfbf30
Compare
Choose a tag to compare

What's Changed

Release

Hotfix

  • [hotfix] add soft link to support required files (#5661) by Tong Li
  • [hotfix] Fixed fused layernorm bug without apex (#5609) by Edenzzzz
  • [hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606) by Edenzzzz
  • [hotfix] fix typo s/get_defualt_parser /get_default_parser (#5548) by digger yu
  • [hotfix] quick fixes to make legacy tutorials runnable (#5559) by Edenzzzz
  • [hotfix] set return_outputs=False in examples and polish code (#5404) by Wenhao Chen
  • [hotfix] fix typo s/keywrods/keywords etc. (#5429) by digger yu

News

Lazyinit

Shardformer

Fix

  • [Fix]: implement thread-safety singleton to avoid deadlock for very large-scale training scenarios (#5625) by Season
  • [fix] fix typo s/muiti-node /multi-node etc. (#5448) by digger yu
  • [Fix] Grok-1 use tokenizer from the same pretrained path (#5532) by Yuanheng Zhao
  • [fix] fix grok-1 example typo (#5506) by Yuanheng Zhao

Coloattention

Example

Exampe

Feature

Zero

Doc

Devops

Shardformer, pipeline

  • [shardformer, pipeline] add gradient_checkpointing_ratio and heterogenous shard policy for llama (#5508) by Wenhao Chen

Colossalchat

Format

Full Changelog: v0.3.7...v0.3.6

Version v0.3.6 Release Today!

07 Mar 15:38
8020f42
Compare
Choose a tag to compare

What's Changed

Release

Colossal-llama2

  • [colossal-llama2] add stream chat examlple for chat version model (#5428) by Camille Zhong

Hotfix

Doc

Eval-hotfix

Devops

Example

Workflow

Shardformer

Setup

Fsdp

  • [fsdp] impl save/load shard model/optimizer (#5357) by QinLuo

Extension

Llama

Full Changelog: v0.3.6...v0.3.5

Version v0.3.5 Release Today!

23 Feb 08:46
adae123
Compare
Choose a tag to compare

What's Changed

Release

Llama

Moe

Lr-scheduler

Eval

Gemini

Fix

Checkpointio

  • [checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347) by Hongxin Liu

Chat

Extension

Doc

Tests

Accelerator

Workflow

Feat

Nfc

Hotfix

Sync

Shardformer

  • [shardformer] hybridparallelplugin support gradients accumulation. (#5246) by flybird11111
  • [shardformer] llama support DistCrossEntropy (#5176) by flybird11111
  • [shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) by Wenhao Chen
  • [shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084) by flybird11111
  • [shardformer] fix llama error when transformers upgraded. (#5055) by flybird11111
  • [shardformer] Fix serialization error with Tensor Parallel state saving (#5018) by Jun Gao

Ci

Npu

Pipeline

  • [pipeline] A more general _communicate in p2p (#5062) by Elsa Granger
  • [pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214) by Wenhao Chen
  • [pipeline]: support arbitrary batch size in forward_only mode (#5201) by Wenhao Chen
  • [pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) by Wenhao Chen

Format

Read more

Version v0.3.4 Release Today!

01 Nov 05:57
8993c8a
Compare
Choose a tag to compare

What's Changed

Release

Pipeline inference

  • [Pipeline Inference] Merge pp with tp (#4993) by Bin Jia
  • [Pipeline inference] Combine kvcache with pipeline inference (#4938) by Bin Jia
  • [Pipeline Inference] Sync pipeline inference branch to main (#4820) by Bin Jia

Doc

Hotfix

  • [hotfix] fix the bug of repeatedly storing param group (#4951) by Baizhou Zhang
  • [hotfix] Fix the bug where process groups were not being properly released. (#4940) by littsk
  • [hotfix] fix torch 2.0 compatibility (#4936) by Hongxin Liu
  • [hotfix] fix lr scheduler bug in torch 2.0 (#4864) by Baizhou Zhang
  • [hotfix] fix bug in sequence parallel test (#4887) by littsk
  • [hotfix] Correct several erroneous code comments (#4794) by littsk
  • [hotfix] fix norm type error in zero optimizer (#4795) by littsk
  • [hotfix] change llama2 Colossal-LLaMA-2 script filename (#4800) by Chandler-Bing

Kernels

  • [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965) by Cuiqing Li

Inference

  • [Inference] Dynamic Batching Inference, online and offline (#4953) by Jianghai
  • [Inference]ADD Bench Chatglm2 script (#4963) by Jianghai
  • [inference] add reference and fix some bugs (#4937) by Xu Kai
  • [inference] Add smmoothquant for llama (#4904) by Xu Kai
  • [inference] add llama2 support (#4898) by Xu Kai
  • [inference]fix import bug and delete down useless init (#4830) by Jianghai

Test

  • [test] merge old components to test to model zoo (#4945) by Hongxin Liu
  • [test] add no master test for low level zero plugin (#4934) by Zhongkai Zhao
  • Merge pull request #4856 from KKZ20/test/model_support_for_low_level_zero by ppt0011
  • [test] modify model supporting part of low_level_zero plugin (including correspoding docs) by Zhongkai Zhao

Refactor

  • [Refactor] Integrated some lightllm kernels into token-attention (#4946) by Cuiqing Li

Nfc

Format

Gemini

Kernel

  • [kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921) by Hongxin Liu

Feature

  • [feature] support no master weights option for low level zero plugin (#4816) by Zhongkai Zhao
  • [feature] Add clip_grad_norm for hybrid_parallel_plugin (#4837) by littsk
  • [feature] ColossalEval: Evaluation Pipeline for LLMs (#4786) by Yuanchen

Checkpointio

Infer

Chat

Misc

  • [misc] add last_epoch in CosineAnnealingWarmupLR (#4778) by Yan haixu

Lazy

Fix

Full Changelog: v0.3.4...v0.3.3

Version v0.3.3 Release Today!

22 Sep 10:30
4146f1c
Compare
Choose a tag to compare

What's Changed

Release

Inference

Feature

  • [feature] add gptq for inference (#4754) by Xu Kai
  • [Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577) by Cuiqing Li

Bug

  • [bug] Fix the version check bug in colossalai run when generating the cmd. (#4713) by littsk
  • [bug] fix get_default_parser in examples (#4764) by Baizhou Zhang

Lazy

Chat

Doc

Shardformer

Misc

Format

Legacy

Kernel

Example

Hotfix

Devops

Pipeline

Full Changelog: v0.3.3...v0.3.2

Version v0.3.2 Release Today!

06 Sep 15:42
9709b8f
Compare
Choose a tag to compare

What's Changed

Release

Shardformer

Legacy

Test

Zero

  • [zero] hotfix master param sync (#4618) by Hongxin Liu
  • [zero]fix zero ckptIO with offload (#4529) by LuGY
  • [zero]support zero2 with gradient accumulation (#4511) by LuGY

Checkpointio

Coati

Doc

Pipeline

  • [pipeline] 1f1b schedule receive microbatch size (#4589) by Hongxin Liu
  • [pipeline] rewrite bert tests and fix some bugs (#4409) by Jianghai
  • [pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388) by Baizhou Zhang
  • [pipeline] add chatglm (#4363) by Jianghai
  • [pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354) by Baizhou Zhang
  • [pipeline] refactor test pipeline and remove useless utils in pipeline (#4324) by Jianghai
  • [pipeline] add unit test for 1f1b (#4303) by LuGY
  • [pipeline] fix return_dict/fix pure_pipeline_test (#4331) by [Baizhou Zhang](htt...
Read more

Version v0.3.1 Release Today!

01 Aug 07:02
8064771
Compare
Choose a tag to compare

What's Changed

Release

Chat

Zero

  • [zero] optimize the optimizer step time (#4221) by LuGY
  • [zero] support shard optimizer state dict of zero (#4194) by LuGY
  • [zero] add state dict for low level zero (#4179) by LuGY
  • [zero] allow passing process group to zero12 (#4153) by LuGY
  • [zero]support no_sync method for zero1 plugin (#4138) by LuGY
  • [zero] refactor low level zero for shard evenly (#4030) by LuGY

Nfc

  • [NFC] polish applications/Chat/coati/models/utils.py codestyle (#4277) by yuxuan-lou
  • [NFC] polish applications/Chat/coati/trainer/strategies/base.py code style (#4278) by Zirui Zhu
  • [NFC] polish applications/Chat/coati/models/generation.py code style (#4275) by RichardoLuo
  • [NFC] polish applications/Chat/inference/server.py code style (#4274) by Yuanchen
  • [NFC] fix format of application/Chat/coati/trainer/utils.py (#4273) by アマデウス
  • [NFC] polish applications/Chat/examples/train_reward_model.py code style (#4271) by Xu Kai
  • [NFC] fix: format (#4270) by dayellow
  • [NFC] polish runtime_preparation_pass style (#4266) by Wenhao Chen
  • [NFC] polish unary_elementwise_generator.py code style (#4267) by YeAnbang
  • [NFC] polish applications/Chat/coati/trainer/base.py code style (#4260) by shenggan
  • [NFC] polish applications/Chat/coati/dataset/sft_dataset.py code style (#4259) by Zheng Zangwei (Alex Zheng)
  • [NFC] polish colossalai/booster/plugin/low_level_zero_plugin.py code style (#4256) by 梁爽
  • [NFC] polish colossalai/auto_parallel/offload/amp_optimizer.py code style (#4255) by Yanjia0
  • [NFC] polish colossalai/cli/benchmark/utils.py code style (#4254) by ocd_with_naming
  • [NFC] policy applications/Chat/examples/ray/mmmt_prompt.py code style (#4250) by CZYCW
  • [NFC] polish applications/Chat/coati/models/base/actor.py code style (#4248) by Junming Wu
  • [NFC] polish applications/Chat/inference/requirements.txt code style (#4265) by Camille Zhong
  • [NFC] Fix format for mixed precision (#4253) by Jianghai
  • [nfc]fix ColossalaiOptimizer is not defined (#4122) by digger yu
  • [nfc] fix dim not defined and fix typo (#3991) by digger yu
  • [nfc] fix typo colossalai/zero (#3923) by digger yu
  • [nfc]fix typo colossalai/pipeline tensor nn (#3899) by digger yu
  • [nfc] fix typo colossalai/nn (#3887) by digger yu
  • [nfc] fix typo colossalai/cli fx kernel (#3847) by digger yu

Example

Ci

Checkpointio

Lazy

Kernels

  • [Kernels] added triton-implemented of self attention for colossal-ai (#4241) by Cuiqing Li

Docker

Dtensor

Workflow

Cli

Format

Shardformer

Read more

Version v0.3.0 Release Today!

25 May 08:26
d42b1be
Compare
Choose a tag to compare

What's Changed

Release

Nfc

  • [nfc] fix typo colossalai/ applications/ (#3831) by digger yu
  • [NFC]fix typo colossalai/auto_parallel nn utils etc. (#3779) by digger yu
  • [NFC] fix typo colossalai/amp auto_parallel autochunk (#3756) by digger yu
  • [NFC] fix typo with colossalai/auto_parallel/tensor_shard (#3742) by digger yu
  • [NFC] fix typo applications/ and colossalai/ (#3735) by digger-yu
  • [NFC] polish colossalai/engine/gradient_handler/init.py code style (#3329) by Ofey Chan
  • [NFC] polish colossalai/context/random/init.py code style (#3327) by yuxuan-lou
  • [NFC] polish colossalai/fx/tracer/_tracer_utils.py (#3323) by Michelle
  • [NFC] polish colossalai/gemini/paramhooks/_param_hookmgr.py code style by Xu Kai
  • [NFC] polish initializer_data.py code style (#3287) by RichardoLuo
  • [NFC] polish colossalai/cli/benchmark/models.py code style (#3290) by Ziheng Qin
  • [NFC] polish initializer_3d.py code style (#3279) by Kai Wang (Victor Kai)
  • [NFC] polish colossalai/engine/gradient_accumulation/_gradient_accumulation.py code style (#3277) by Sze-qq
  • [NFC] polish colossalai/context/parallel_context.py code style (#3276) by Arsmart1
  • [NFC] polish colossalai/engine/schedule/_pipeline_schedule_v2.py code style (#3275) by Zirui Zhu
  • [NFC] polish colossalai/nn/_ops/addmm.py code style (#3274) by Tong Li
  • [NFC] polish colossalai/amp/init.py code style (#3272) by lucasliunju
  • [NFC] polish code style (#3273) by Xuanlei Zhao
  • [NFC] policy colossalai/fx/proxy.py code style (#3269) by CZYCW
  • [NFC] polish code style (#3268) by Yuanchen
  • [NFC] polish tensor_placement_policy.py code style (#3265) by Camille Zhong
  • [NFC] polish colossalai/fx/passes/split_module.py code style (#3263) by CsRic
  • [NFC] polish colossalai/global_variables.py code style (#3259) by jiangmingyan
  • [NFC] polish colossalai/engine/gradient_handler/_moe_gradient_handler.py (#3260) by LuGY
  • [NFC] polish colossalai/fx/profiler/experimental/profiler_module/embedding.py code style (#3256) by dayellow

Doc

Workflow

Booster

Docs

  • [docs] change placememt_policy to placement_policy (#3829) by digger yu

Evaluation

  • [evaluation] add automatic evaluation pipeline (#3821) by Yuanchen

Docker

Api

  • [API] add docstrings and initialization to apex amp, naive amp (#3783) by jiangmingyan

Test

Read more

Version v0.2.8 Release Today!

29 Mar 02:26
a0b3749
Compare
Choose a tag to compare

What's Changed

Release

Format

Doc

Application

Chat

Coati

Colossalchat

Examples

Fx

Booster

Ci

Api

Hotfix

Chatgpt

Lazyinit

  • [lazyinit] combine lazy tensor with dtensor (#3204) by ver217
  • [lazyinit] add correctness verification (#3147) by ver217
  • [lazyinit] refactor lazy tensor and lazy init ctx (#3131) by ver217

Auto

Analyzer

Dreambooth

  • [dreambooth] fixing the incompatibity in requirements.txt (#3190) by NatalieC323

Auto-parallel

  • [auto-parallel] add auto-offload feature (#3154) by Zihao

Zero

  • [zero] Refactor ZeroContextConfig class using dataclass (#3186) by YH

Test

Refactor

Tests

  • [tests] model zoo add torchaudio models (#3138) by ver217
  • [tests] diffuser models in model zoo (#3136) by HELSON

Docker

Dtensor

Workflow

  • [workflow] purged extension cache before GPT test (#3128) by Frank Lee

Autochunk

Tutorial

Nvidia

Full Changelog: v0.2.8...v0.2.7

Version v0.2.7 Release Today!

10 Mar 06:56
26db1cb
Compare
Choose a tag to compare

What's Changed

Release

Chatgpt

Kernel

  • [kernel] added kernel loader to softmax autograd function (#3093) by Frank Lee
  • [kernel] cached the op kernel and fixed version check (#2886) by Frank Lee

Analyzer

  • [analyzer] a minimal implementation of static graph analyzer (#2852) by Super Daniel

Diffusers

Doc

Autochunk

Dtensor

Workflow

  • [workflow] fixed doc build trigger condition (#3072) by Frank Lee
  • [workflow] supported conda package installation in doc test (#3028) by Frank Lee
  • [workflow] fixed the post-commit failure when no formatting needed (#3020) by Frank Lee
  • [workflow] added auto doc test on PR (#2929) by Frank Lee
  • [workflow] moved pre-commit to post-commit (#2895) by Frank Lee

Booster

Example

Hotfix

Revert] recover "[refactor

Format

Pipeline

Fx

Refactor

Misc

Autoparallel

  • [autoparallel] apply repeat block to reduce solving time (#2912) by YuliangLiu0306
  • [autoparallel] find repeat blocks (#2854) by YuliangLiu0306
  • [autoparallel] Patch meta information for nodes that will not be handled by SPMD solver (#2823) by Boyuan Yao
  • [autoparallel] Patch meta information of torch.where (#2822) by Boyuan Yao
  • [autoparallel] Patch meta information of torch.tanh() and torch.nn.Dropout (#2773) by Boyuan Yao
  • [autoparallel] Patch tensor related operations meta information (#2789) by Boyuan Yao
  • [autoparallel] rotor solver refactor (#2813) by Boyuan Yao
  • [autoparallel] Patch meta information of torch.nn.Embedding (#2760) by [Boyuan Ya...
Read more