Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meow 0.6 candidate functions #82

Open
NoHatCoder opened this issue Aug 12, 2021 · 4 comments
Open

Meow 0.6 candidate functions #82

NoHatCoder opened this issue Aug 12, 2021 · 4 comments

Comments

@NoHatCoder
Copy link

Wanted to share what I have been working on, still needs some work, but I have 4 hash functions that I'm so far reasonably pleased with. Check them out: https://github.com/NoHatCoder/Meow-Hash-0.6-Candidate

Not in the code, but I also finally figured how we might utilize AVX512 without overflowing the registers too much on older CPUs. We would run 4 parallel tracks that don't intermingle before finalization. In 128 bit code, for each block of several KiB do one lane at a time, that way we don't have to swap what lane resides in registers all the time. Finalization gets more complicated, so we probably want to fall back to the plain 128 bit version for short input.

Poke @cmuratori @petersn

@cmuratori
Copy link
Owner

Awesome! I will take a look.

Separately, I am curious: the four-parallel-track construction is how I did the original Meow Hash (the one that didn't have enough diffusion). I am curious: if it can work for AVX-512, why was it not able to be retained from the original Meow Hash for 128-bit? Because in general, parallel-stream construction is the best kind of construction for throughput, since AES instructions have 4-cycle latency...

- Casey

@NoHatCoder
Copy link
Author

I didn't think of this until now, maybe you considered this construction obvious, but I just thought that if we did parallel tracks we would run out of registers, and thus add a bunch of overhead to the 128 bit implementation.

@cmuratori
Copy link
Owner

Well, it's not so much that I considered it obvious as that it was the original design of Meow Hash :) My assumption was that since you didn't use any parallel construction in your blocks for the 128-bit version, your reasoning was that the hash was not as good if it was mixed at the end. But I guess that is not true? If not, that is excellent, because the more parallel tracks you can do, the faster you can go, typically, and that's why I designed the original one that way.

  • Casey

@cmuratori
Copy link
Owner

So, since I was looking at Chacha20 and AES-256-ctr recently, I also have some important updates: it turns out both Zen2/3 and Tiger Lake added a second AES unit. That means that parallel construction becomes much more important now for speed, because the newer x64 chips can issue two AES instructions every cycle even without VAES!

I need to take a look at what you've got so far @NoHatCoder and I'll think about how it will arrange.

- Casey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants