Skip to content

Releases: microsoft/tutel

Tutel v0.3.2

08 May 06:47
d4c20c3
Compare
Choose a tag to compare

What's New in v0.3.2:

  1. Add --use_tensorcore option for benchmarking in tutel.examples.helloworld.
  2. Read TUTEL_GLOBAL_TIMEOUT_SEC from environment variable to configure NCCL timeout setting.
  3. Extend tutel.examples.helloworld_custom_expert to explain the way to override MoE with customized expert layers.
How to Setup:
python3 -m pip install -v -U --no-build-isolation https://github.com/microsoft/tutel/archive/refs/tags/v0.3.2.tar.gz

Tutel v0.3.1

06 Jan 10:27
fdf6e59
Compare
Choose a tag to compare

What's New in v0.3.1:

  1. Enable 2 additional collective communication primitives: net.batch_all_to_all_v(), net.batch_all_gather_v().
How to Setup:
python3 -m pip install -v -U --no-build-isolation https://github.com/microsoft/tutel/archive/refs/tags/v0.3.1.tar.gz

Tutel v0.3.0

05 Aug 16:41
71db950
Compare
Choose a tag to compare

What's New in v0.3.0:

  1. Support Megablocks-style dMoE inference (see README.md for more information)
How to Setup:
python3 -m pip install -v -U --no-build-isolation https://github.com/microsoft/tutel/archive/refs/tags/v0.3.0.tar.gz

Tutel v0.2.1

30 Mar 22:00
d61df8d
Compare
Choose a tag to compare

What's New in v0.2.1:

  1. Support Switchable Parallelism with example tutel.examples.helloworld_switch.
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.2.1.tar.gz

Tutel v0.2.0

11 Aug 04:19
Compare
Choose a tag to compare

What's New in v0.2.0:

  1. Support Windows Python3 + Torch Installation;
  2. Add examples to enable Tutel MoE in Fairseq;
  3. Refactor MoE Layer implementation, letting all features (e.g. top-X, overlap, parallel_type, capacity, ..) be able to change at different forward interations;
  4. New features: load_importance_loss, cosine router, inequivalent_tokens;
  5. Extend capacity_factor value that includes zero value and negative values for smarter capacity estimation;
  6. Add tutel.checkpoint conversion tools to reformat checkpoint files, making it able to use existing checkpoints to train/infer with a different world size.
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.2.0.tar.gz

Tutel v0.1.5

26 Feb 07:19
bddc915
Compare
Choose a tag to compare

What's New in v0.1.5:

  1. Add 2D hierarchical a2a algorithm used for extremely-large scaling;
  2. Support different parallel_type for MoE computation: data, model, auto;
  3. Combine different expert granularities (e.g. normal, sharded experts, megatron dense ffn) into same programming interface & style;
  4. New features: is_postscore to specify whether gating scores are weighed during encoding or decoding;
  5. Enhance existing features: JIT compiler, a2a overlap with 2D.
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.5.tar.gz

Contributors: @abuccts, @yzygitzh, @ghostplant, @EricWangCN

Tutel v0.1.4

09 Feb 04:34
ca6f018
Compare
Choose a tag to compare

What's New in v0.1.4:

  1. Enhance communication features: a2a overlap with computation, support different granularity of group creation, etc.
  2. Add single-thread CPU implementation for correctness check & reference;
  3. Refine JIT compiler interface for flexible usability: jit::inject_source && jit::jit_execute;
  4. Enhance examples: fp64 support, cuda amp, checkpointing, etc.
  5. Support execution inside torch.distributed.pipeline.
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.4.tar.gz

Contributors: @yzygitzh, @ghostplant, @EricWangCN

Tutel v0.1.3

29 Dec 03:29
ea17ea6
Compare
Choose a tag to compare

What's New in v0.1.3:

  1. Add Tutel Launcher Support based on Open MPI;
  2. Support Establishing Data Model Parallel in Initialization;
  3. Support Single Expert Evenly Sharded on Multiple GPUs;
  4. Support List of Gates and Forwarding MoE Layer with Specified Gating Index;
  5. Fix NVRTC Compatibility when Enabling USE_NVRTC=1;
  6. Other Implementation Enhancements & Correctness Checking;
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.3.tar.gz

Contributors: @ghostplant, @EricWangCN, @guoshzhao.

Tutel v0.1.2

16 Nov 08:26
6b434d9
Compare
Choose a tag to compare

What's New in v0.1.2:

  1. General-purpose top-k gating with {'type': 'top', 'k': 2};
  2. Add Megatron-ML Tensor Parallel as gating type;
  3. Add deepspeed-based & megatron-based helloworld example for fair comparison;
  4. Add torch.bfloat16 datatype support for single-GPU;
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.2.tar.gz

Contributors: @ghostplant, @EricWangCN, @foreveronehundred.

Tutel v0.1.1

10 Oct 14:00
Compare
Choose a tag to compare

What's New in v0.1.1:

  1. Enable fp16 support for AMDGPU.
  2. Using NVRTC for JIT compilation if available.
  3. Add new system_init interface for initializing NUMA settings in distributed GPUs.
  4. Extend more gating types: Top3Gate & Top4Gate.
  5. Allow high level to change capacity value in Tutel fast dispatcher.
  6. Add custom AllToAll extension for old Pytorch version without builtin AllToAll operator support.
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.1.tar.gz

Contributors: @jspark1105 , @ngoyal2707 , @guoshzhao, @ghostplant .