Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

examples/matmul: fix host matrix printing & verification code #1480

Merged
merged 9 commits into from
May 29, 2024

Conversation

andrej
Copy link
Contributor

@andrej andrej commented May 13, 2024

Joe, I just noticed the issues with verification you caught a couple weeks ago still are there. I think I pointed to the fix over e-mail but it looks like it never made it in.

I think it would be good to merge this quickly, as currently printing matrices with non-square sizes could lead to segfaults due to the errors this PR fixes.

@andrej
Copy link
Contributor Author

andrej commented May 13, 2024

This also contains the code for the stochastic verification, but it's not used anywhere. I just copied the entire file from my branch, that's why this is in there too.

@andrej
Copy link
Contributor Author

andrej commented May 13, 2024

And one last note, I reduced the relative tolerance for float comparisons to 0.1 (previously was 0.5). I think 0.5 was way too high, completely wrong results still passed. Somewhere along the line, I think the vectorized matvec kernel started to fail silently because the verification with this huge tolerance let it slip through. So I temporarily swapped in the scalar kernel.

@fifield
Copy link
Collaborator

fifield commented May 14, 2024

I think it would be good to merge this quickly, as currently printing matrices with non-square sizes could lead to segfaults due to the errors this PR fixes.

I had numerous segfaults last time I tried to run the sweep script. Was this the cause?

@andrej
Copy link
Contributor Author

andrej commented May 14, 2024

@fifield I would say this is the likely culprit, yes, since during a sweep there would be many non-square matrix sizes. Likely there still remain other mistakes. I haven't run the sweep in a while and have made some silly mistakes lately...

I also just figured out why the vectorized matrix-vector didn't verify. I'll push the fix for that into this branch as well in a minute.

Edit: I pushed the fix. I still had to crank relative tolerance up to 15% for it to pass, which seems high still. But maybe that's just the precision you get for bf16?

@jgmelber jgmelber enabled auto-merge May 29, 2024 14:13
@jgmelber jgmelber added this pull request to the merge queue May 29, 2024
Merged via the queue into Xilinx:main with commit 61ef658 May 29, 2024
51 checks passed
singagan pushed a commit that referenced this pull request Jun 5, 2024
Co-authored-by: Joseph Melber <jgmelber@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants