You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, when I look at the model/wte tensor (token embedding weights) of GPT-2 117 M example.
It has a shape of [768,50257]. I guess it is the [embedding_dim, vocabulary_size].
Should the usual dimension order for wte be [vocabulary_size, embedding_dim] instead?
If so, why does ggml tensor store the dimensions in a reversed order?
Similarly, model/wpe (positional encoding weights) has the shape of [768,1024], which also seems to be reversed from a usual [1024, 768] order.
Thanks,
The text was updated successfully, but these errors were encountered:
Hi, when I look at the model/wte tensor (token embedding weights) of GPT-2 117 M example.
It has a shape of [768,50257]. I guess it is the [embedding_dim, vocabulary_size].
Should the usual dimension order for wte be [vocabulary_size, embedding_dim] instead?
If so, why does ggml tensor store the dimensions in a reversed order?
Similarly, model/wpe (positional encoding weights) has the shape of [768,1024], which also seems to be reversed from a usual [1024, 768] order.
Thanks,
The text was updated successfully, but these errors were encountered: