You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
self.cos_cached and self.sin_cached have same shape of x, aren't they??
So if this line intended to compute RoPE with partial of x which means x[...,:self.d],
i think this line should be x_rope = (x_rope * self.cos_cached[...,:self.d) + (neg_half_x * self.sin_cached[...,:self.d])
please let me know if i'm wrong
The text was updated successfully, but these errors were encountered:
You are correct that self.cos_cached and self.sin_cached have same shape of x.
And when it comes to the modication, that is also correct because it would ensure that the rotary embeddings are applied only to the subset of features specified by self.d
They have the similar shapes. The truncation of cached sin/cos to x.shape[0] is truncating them to sequence length. Because the sequence lengths (number of tokens per sample) changes.
annotated_deep_learning_paper_implementations/labml_nn/transformers/rope/__init__.py
Line 188 in f42c0e9
self.cos_cached
andself.sin_cached
have same shape ofx
, aren't they??So if this line intended to compute RoPE with partial of x which means
x[...,:self.d]
,i think this line should be
x_rope = (x_rope * self.cos_cached[...,:self.d) + (neg_half_x * self.sin_cached[...,:self.d])
please let me know if i'm wrong
The text was updated successfully, but these errors were encountered: