examples/matmul: fix host matrix printing & verification code #1480

andrej · 2024-05-13T21:08:24Z

Joe, I just noticed the issues with verification you caught a couple weeks ago still are there. I think I pointed to the fix over e-mail but it looks like it never made it in.

I think it would be good to merge this quickly, as currently printing matrices with non-square sizes could lead to segfaults due to the errors this PR fixes.

…ise fails with the fixed verification

andrej · 2024-05-13T21:09:24Z

This also contains the code for the stochastic verification, but it's not used anywhere. I just copied the entire file from my branch, that's why this is in there too.

andrej · 2024-05-13T21:11:53Z

And one last note, I reduced the relative tolerance for float comparisons to 0.1 (previously was 0.5). I think 0.5 was way too high, completely wrong results still passed. Somewhere along the line, I think the vectorized matvec kernel started to fail silently because the verification with this huge tolerance let it slip through. So I temporarily swapped in the scalar kernel.

fifield · 2024-05-14T14:02:56Z

I think it would be good to merge this quickly, as currently printing matrices with non-square sizes could lead to segfaults due to the errors this PR fixes.

I had numerous segfaults last time I tried to run the sweep script. Was this the cause?

andrej · 2024-05-14T19:28:05Z

@fifield I would say this is the likely culprit, yes, since during a sweep there would be many non-square matrix sizes. Likely there still remain other mistakes. I haven't run the sweep in a while and have made some silly mistakes lately...

I also just figured out why the vectorized matrix-vector didn't verify. I'll push the fix for that into this branch as well in a minute.

Edit: I pushed the fix. I still had to crank relative tolerance up to 15% for it to pass, which seems high still. But maybe that's just the precision you get for bf16?

Co-authored-by: Joseph Melber <jgmelber@gmail.com>

andrej added 2 commits May 13, 2024 14:01

examples/matmul: fix host matrix printing & verification code

ef301f8

examples/matvec: temporarily switch to scalar mat-vec since it otherw…

ee3a610

…ise fails with the fixed verification

andrej requested review from denolf, jgmelber and fifield as code owners May 13, 2024 21:08

fifield approved these changes May 14, 2024

View reviewed changes

examples/matvec: fix data layout transformation

d249893

jgmelber approved these changes May 14, 2024

View reviewed changes

jgmelber and others added 6 commits May 14, 2024 22:23

Merge branch 'main' into fix-matmul

2441cb0

matvec: fix formatting

b896b7f

Merge branch 'main' into fix-matmul

a509e4c

Merge branch 'main' into fix-matmul

a21caa2

Merge branch 'main' into fix-matmul

1bafe6c

Merge branch 'main' into fix-matmul

8740727

jgmelber enabled auto-merge May 29, 2024 14:13

jgmelber added this pull request to the merge queue May 29, 2024

Merged via the queue into Xilinx:main with commit 61ef658 May 29, 2024
51 checks passed

singagan pushed a commit that referenced this pull request Jun 5, 2024

examples/matmul: fix host matrix printing & verification code (#1480)

0dc15c8

Co-authored-by: Joseph Melber <jgmelber@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples/matmul: fix host matrix printing & verification code #1480

examples/matmul: fix host matrix printing & verification code #1480

andrej commented May 13, 2024

andrej commented May 13, 2024

andrej commented May 13, 2024

fifield commented May 14, 2024

andrej commented May 14, 2024 •

edited

examples/matmul: fix host matrix printing & verification code #1480

examples/matmul: fix host matrix printing & verification code #1480

Conversation

andrej commented May 13, 2024

andrej commented May 13, 2024

andrej commented May 13, 2024

fifield commented May 14, 2024

andrej commented May 14, 2024 • edited

andrej commented May 14, 2024 •

edited