New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Faster AVX2 matrix multiplications for lgacy quants #405

Merged

jart merged 4 commits into Mozilla-Ocho:main from ikawrakow:ik/new_legacy_mul_mat

May 10, 2024

Commits on May 10, 2024

Matrix multiplications for legacy qunats

Kawrakow committed May 10, 2024
Configuration menu
View commit details

Copy full SHA for 8f7394f

Browse repository at this point
Copy the full SHA

8f7394f View commit details

Browse the repository at this point in the history
Very slightly faster Q5 dequantization
```
Somehow memcpy is kind of slow, so for
getting 4 bytes from 2-byte-aligned data
it is faster to just do or on two
consecutive 16-bit entries.
```
Kawrakow committed May 10, 2024
Configuration menu
View commit details

Copy full SHA for 897be80

Browse repository at this point
Copy the full SHA

897be80 View commit details

Browse the repository at this point in the history
Make it work after rebase
```
However, the way it is currently, we have lost the
zen4-tuned version.
```
Kawrakow committed May 10, 2024
Configuration menu
View commit details

Copy full SHA for a3bca82

Browse repository at this point
Copy the full SHA

a3bca82 View commit details

Browse the repository at this point in the history
Restore faster AVX512VNNI+AVX512VL performance

Kawrakow committed May 10, 2024
Configuration menu
View commit details

Copy full SHA for 610a3e9

Browse repository at this point
Copy the full SHA

610a3e9 View commit details

Browse the repository at this point in the history