Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I match the results of profiling with the parameters of the cost model? #131

Open
xvanQ opened this issue Jan 31, 2024 · 1 comment

Comments

@xvanQ
Copy link

xvanQ commented Jan 31, 2024

The output of profile bandwidth is as follows:
size: 0.25 MB, gpu-to-cpu bandwidth: 5.505 GB/s
size: 32.00 MB, gpu-to-cpu bandwidth: 13.220 GB/s
size: 128.00 MB, gpu-to-cpu bandwidth: 13.324 GB/s

size: 0.25 MB, cpu-to-gpu bandwidth: 4.556 GB/s
size: 32.00 MB, cpu-to-gpu bandwidth: 12.285 GB/s
size: 128.00 MB, cpu-to-gpu bandwidth: 12.251 GB/s

Which is ctog_bdw, which is gtoc_bdw_cache, which is gtoc_bdw_hidden?

The output of profile matmul is as follows:
device: cuda, N: 1024, latency: 0.06 ms, TFLOPS: 68.186
device: cuda, N: 2048, latency: 0.20 ms, TFLOPS: 97.026

device: cpu, N: 1024, latency: 0.89 ms, TFLOPS: 3.488
device: cpu, N: 2048, latency: 8.44 ms, TFLOPS: 2.924

which is mm_flops_p, mm_flops_g, bmm_flops_p,bmm_flops_g and cpu_flops?
Thanks

@nustart0720
Copy link

Have you figured out this question, I have this question too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants