Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about RoPE code #227

Open
yukyeongmin opened this issue Nov 15, 2023 · 3 comments
Open

question about RoPE code #227

yukyeongmin opened this issue Nov 15, 2023 · 3 comments

Comments

@yukyeongmin
Copy link

yukyeongmin commented Nov 15, 2023

x_rope = (x_rope * self.cos_cached[:x.shape[0]]) + (neg_half_x * self.sin_cached[:x.shape[0]])

self.cos_cached and self.sin_cached have same shape of x, aren't they??

So if this line intended to compute RoPE with partial of x which means x[...,:self.d],
i think this line should be
x_rope = (x_rope * self.cos_cached[...,:self.d) + (neg_half_x * self.sin_cached[...,:self.d])

please let me know if i'm wrong

@nagamonish
Copy link

You are correct that self.cos_cached and self.sin_cached have same shape of x.

And when it comes to the modication, that is also correct because it would ensure that the rotary embeddings are applied only to the subset of features specified by self.d

@vpj
Copy link
Member

vpj commented Nov 26, 2023

They have the similar shapes. The truncation of cached sin/cos to x.shape[0] is truncating them to sequence length. Because the sequence lengths (number of tokens per sample) changes.

@yukyeongmin
Copy link
Author

Thanks for reply!! @vpj @nagamonish

Didn't you have any problems running that code? The original code didn't work for me with different shape of input. And i thought it's about grammar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants