support alternative parallelism #2

152334H · 2023-12-10T08:43:26Z

--num-gpus is implemented by sharding each expert layer across GPUs, i.e. expert parallelism

this is probably not advisable for local experimentation, especially on batch size 1 -- where EP only adds communication overhead to no speed benefit vs naive model/pipeline parallel.

The text was updated successfully, but these errors were encountered:

tonysy · 2023-12-10T08:49:20Z

Good suggestions, I am working on other parallelism method. Also, contribution is welcomed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support alternative parallelism #2

support alternative parallelism #2

152334H commented Dec 10, 2023

tonysy commented Dec 10, 2023

support alternative parallelism #2

support alternative parallelism #2

Comments

152334H commented Dec 10, 2023

tonysy commented Dec 10, 2023