Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster AVX2 matrix multiplications for lgacy quants #405

Merged
merged 4 commits into from
May 10, 2024

Commits on May 10, 2024

  1. Configuration menu
    Copy the full SHA
    8f7394f View commit details
    Browse the repository at this point in the history
  2. Very slightly faster Q5 dequantization

    Somehow memcpy is kind of slow, so for
    getting 4 bytes from 2-byte-aligned data
    it is faster to just do or on two
    consecutive 16-bit entries.
    Kawrakow committed May 10, 2024
    Configuration menu
    Copy the full SHA
    897be80 View commit details
    Browse the repository at this point in the history
  3. Make it work after rebase

    However, the way it is currently, we have lost the
    zen4-tuned version.
    Kawrakow committed May 10, 2024
    Configuration menu
    Copy the full SHA
    a3bca82 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    610a3e9 View commit details
    Browse the repository at this point in the history