[BUG] Fix some bugs in OpenMoE Implementation #5267

Orion-Zheng · 2024-01-14T10:23:14Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

The original positional encoding during inference is inconsistent with flaxformer implementation.

The original implementation will raise some bugs when running on CPU. So I modified the colossalai/moe/layers.py to avoid importing LoadBalancer when unnecessary.
The original convert_openmoe_checkpoint.py is very memory-consuming(almost taking up 3x model weights space), which cause it inconvenient to convert 34B version(requiring ~500GB memory). I added init_empty_weights and remove the unnecessary model loading code to save up memory in the conversion process. Now it can use ~170GB memory to convert 34B version on CPU.
Update the inference example/script to allow multi-GPUs inference(using HuggingFace library) and provide an inference demo on Colab, which can be seen in the new README.md.
Update the model config of base and 8B versions, also add config file of 34B model.

To test the correctness of this PR, you can run the Colab notebook here to test all of my modifications.
https://colab.research.google.com/drive/1Je0oAm3o9ZyZC5Yp7fJmWUn-qrsrASSc?usp=sharing

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

Tell us more if you don't enjoy contributing to Colossal-AI.

…_empty_weights and lazy_parameters

Orion-Zheng added 8 commits January 14, 2024 03:25

only import LoadBalancer when needed to avoid errors

e65d2b2

Update README with inference demo and update requirements

a44a830

update config files for different models

fb8c5ec

update inference example

3d000cc

correct some bugs in modeling_openmoe.py

64bc50b

Save half of the memory used for converting checkpoints by using init…

2c416de

…_empty_weights and lazy_parameters

update convert_openmoe_ckpt.sh and fix typos in modeling_openmoe.py

2eee01f

add cpu support to device.py

80eeb06

Orion-Zheng requested a review from a team as a code owner January 14, 2024 10:23