Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about RoPE code #253

Open
rangehow opened this issue May 11, 2024 · 2 comments
Open

Question about RoPE code #253

rangehow opened this issue May 11, 2024 · 2 comments

Comments

@rangehow
Copy link

I found here exist a difference in rope implementation mostly on permutation. Does this difference not affect the final result ? I'm not quite sure what I'm thinking. Sincerely ask for your advice : )

Paper version should be:
image

version in this repo:
image

@vpj
Copy link
Member

vpj commented May 20, 2024

The ordering is different. So it wont affect training from scratch but you cant load a model trained with different ordering.

@rangehow
Copy link
Author

Thanks for your answer : ) Is there exist some reason that the latter implementation was widely used in code instead former one ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants