question about RoPE code #227

yukyeongmin · 2023-11-15T13:33:55Z

annotated_deep_learning_paper_implementations/labml_nn/transformers/rope/__init__.py

Line 188 in f42c0e9

    
           x_rope = (x_rope * self.cos_cached[:x.shape[0]]) + (neg_half_x * self.sin_cached[:x.shape[0]])

self.cos_cached and self.sin_cached have same shape of x, aren't they??

So if this line intended to compute RoPE with partial of x which means x[...,:self.d],
i think this line should be
x_rope = (x_rope * self.cos_cached[...,:self.d) + (neg_half_x * self.sin_cached[...,:self.d])

please let me know if i'm wrong

The text was updated successfully, but these errors were encountered:

nagamonish · 2023-11-26T01:50:40Z

You are correct that self.cos_cached and self.sin_cached have same shape of x.

And when it comes to the modication, that is also correct because it would ensure that the rotary embeddings are applied only to the subset of features specified by self.d

vpj · 2023-11-26T10:09:28Z

They have the similar shapes. The truncation of cached sin/cos to x.shape[0] is truncating them to sequence length. Because the sequence lengths (number of tokens per sample) changes.

yukyeongmin · 2023-11-26T11:13:18Z

Thanks for reply!! @vpj @nagamonish

Didn't you have any problems running that code? The original code didn't work for me with different shape of input. And i thought it's about grammar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about RoPE code #227

question about RoPE code #227

yukyeongmin commented Nov 15, 2023 •

edited

nagamonish commented Nov 26, 2023

vpj commented Nov 26, 2023

yukyeongmin commented Nov 26, 2023

question about RoPE code #227

question about RoPE code #227

Comments

yukyeongmin commented Nov 15, 2023 • edited

nagamonish commented Nov 26, 2023

vpj commented Nov 26, 2023

yukyeongmin commented Nov 26, 2023

yukyeongmin commented Nov 15, 2023 •

edited