Issues: NVIDIA/Megatron-LM
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[QUESTION] Why is expert parallelism not supported during fp16 training?
#810
opened May 7, 2024 by
yutian-mt
[QUESTION] Is it expected to do grad norm on dense-optimizer and moe-optimizer respectively?
#785
opened Apr 19, 2024 by
ezioliao
[QUESTION] found NaN in local grad norm in backward pass before data-parallel communication collective
#780
opened Apr 16, 2024 by
ftgreat
[BUG] The bug about the options of the Megatron-core, transformer-impl and flash-attention.
#778
opened Apr 12, 2024 by
Baibaifan
[BUG] Passed the wrong type of argument to torch.distributed.broadcast.
#774
opened Apr 11, 2024 by
sandyhouse
[QUESTION] vicuna-7b-v1.5 weight conversion from huggingface to megatron-lm format
#773
opened Apr 10, 2024 by
uehara-mech
[QUESTION] Why megatron-core seems slower and use more gpu mem than legacy for gpt_pretrain?
#770
opened Apr 9, 2024 by
REIGN12
[QUESTION]why replace F.embedding() with [] on VocabParallelEmbedding class?
#769
opened Apr 9, 2024 by
starkhu
[BUG] How to checkpoint the specific microbatch in pipeline parallelism?
#767
opened Apr 7, 2024 by
robotsp
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.