You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Assuming I have a sequence of length 512, but num_slot * num_expert = 9, it should still be able to run. In this case, would there be a performance drop?
The text was updated successfully, but these errors were encountered:
Have to validate that empirically, but I would definitely expect a performance drop here. Because of softmax, my intuition is it's likely that 1 token dominates each expert, and some tokens get short shrift. This might be fine if you have like, 2x more slots, or maybe even 4x or 8x, but >50 tokens per slot I imagine would be skimping a bit and likely to hurt results.
Assuming I have a sequence of length 512, but num_slot * num_expert = 9, it should still be able to run. In this case, would there be a performance drop?
The text was updated successfully, but these errors were encountered: