Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ir] Make statements final, and change IRNode::as to use static_cast #7896

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bobcao3
Copy link
Collaborator

@bobcao3 bobcao3 commented Apr 26, 2023

In theory and in small scale experiments with different compilers, this does result in less calls to dynamic_cast and potentially cheaper dynamic_cast & virtual functions. However on Windows the benefits seem to be within run-to-run variance at least on my machine.

Walkthrough

🤖 Generated by Copilot at 310a4e3

@netlify
Copy link

netlify bot commented Apr 26, 2023

Deploy Preview for docsite-preview canceled.

Name Link
🔨 Latest commit 5e0a34e
🔍 Latest deploy log https://app.netlify.com/sites/docsite-preview/deploys/64487de2bce23700080d2e59

@bobcao3
Copy link
Collaborator Author

bobcao3 commented Apr 26, 2023

/benchmark

Copy link
Contributor

@jim19930609 jim19930609 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there's significant performance gain, I would suggest that we keep the type check TI_ASSERT(is<T>());.

static_cast to Derived class pointers doesn't perform any check and is easy to cause undefined behavior if not used with caution.

@taichi-gardener
Copy link
Contributor

Benchmark Report

Baseline: v1.5.0
Current: 5e0a34e4222e6da8abc0b07afa51aa8e4e13f61f

Item Baseline Current Change
c_api_library_size:android@arch=x64 0.000 1960320.000 $\textcolor{gray}{\textsf{ 0.00\%}}$
c_api_library_size:linux@arch=x64 0.000 52474888.000 $\textcolor{gray}{\textsf{ 0.00\%}}$
c_api_library_size:wall_time@arch=x64 0.000 0.0177 $\textcolor{gray}{\textsf{ 0.00\%}}$
fill:GB/s@arch=cuda,n=134217728 0.000 826.719 $\textcolor{gray}{\textsf{ 0.00\%}}$
fill:GB/s@arch=cuda,n=16777216 0.000 808.394 $\textcolor{gray}{\textsf{ 0.00\%}}$
fill:GB/s@arch=cuda,n=33554432 0.000 818.861 $\textcolor{gray}{\textsf{ 0.00\%}}$
fill:GB/s@arch=cuda,n=536870912 0.000 818.053 $\textcolor{gray}{\textsf{ 0.00\%}}$
fill:GB/s@arch=vulkan,n=134217728 0.000 688.889 $\textcolor{gray}{\textsf{ 0.00\%}}$
fill:GB/s@arch=vulkan,n=16777216 0.000 694.280 $\textcolor{gray}{\textsf{ 0.00\%}}$
fill:GB/s@arch=vulkan,n=33554432 0.000 708.785 $\textcolor{gray}{\textsf{ 0.00\%}}$
fill:GB/s@arch=vulkan,n=536870912 0.000 651.315 $\textcolor{gray}{\textsf{ 0.00\%}}$
fill:wall_time@arch=cuda,n=134217728 0.000 0.605 $\textcolor{gray}{\textsf{ 0.00\%}}$
fill:wall_time@arch=cuda,n=16777216 0.000 0.0773 $\textcolor{gray}{\textsf{ 0.00\%}}$
fill:wall_time@arch=cuda,n=33554432 0.000 0.153 $\textcolor{gray}{\textsf{ 0.00\%}}$
fill:wall_time@arch=cuda,n=536870912 0.000 2.445 $\textcolor{gray}{\textsf{ 0.00\%}}$
fill:wall_time@arch=vulkan,n=134217728 0.000 0.726 $\textcolor{gray}{\textsf{ 0.00\%}}$
fill:wall_time@arch=vulkan,n=16777216 0.000 0.09 $\textcolor{gray}{\textsf{ 0.00\%}}$
fill:wall_time@arch=vulkan,n=33554432 0.000 0.176 $\textcolor{gray}{\textsf{ 0.00\%}}$
fill:wall_time@arch=vulkan,n=536870912 0.000 3.071 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:fps@arch=cuda,dim=2,n_grid=128 0.000 2067.345 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:fps@arch=cuda,dim=2,n_grid=256 0.000 1210.752 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:fps@arch=cuda,dim=2,n_grid=32 0.000 2119.636 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:fps@arch=cuda,dim=2,n_grid=64 0.000 2126.721 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:fps@arch=cuda,dim=3,n_grid=128 0.000 13.852 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:fps@arch=cuda,dim=3,n_grid=256 0.000 0.333 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:fps@arch=cuda,dim=3,n_grid=32 0.000 468.748 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:fps@arch=cuda,dim=3,n_grid=64 0.000 92.409 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:fps@arch=vulkan,dim=2,n_grid=128 0.000 1415.555 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:fps@arch=vulkan,dim=2,n_grid=256 0.000 953.419 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:fps@arch=vulkan,dim=2,n_grid=32 0.000 2203.425 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:fps@arch=vulkan,dim=2,n_grid=64 0.000 2236.451 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:fps@arch=vulkan,dim=3,n_grid=128 0.000 20.088 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:fps@arch=vulkan,dim=3,n_grid=256 0.000 0.354 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:fps@arch=vulkan,dim=3,n_grid=32 0.000 944.886 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:fps@arch=vulkan,dim=3,n_grid=64 0.000 121.122 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:particles/s@arch=cuda,dim=2,n_grid=128 0.000 423392168.593 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:particles/s@arch=cuda,dim=2,n_grid=256 0.000 991848129.001 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:particles/s@arch=cuda,dim=2,n_grid=32 0.000 27131344.642 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:particles/s@arch=cuda,dim=2,n_grid=64 0.000 108888091.572 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:particles/s@arch=cuda,dim=3,n_grid=128 0.000 181556803.801 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:particles/s@arch=cuda,dim=3,n_grid=256 0.000 34924451.290 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:particles/s@arch=cuda,dim=3,n_grid=32 0.000 95999583.015 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:particles/s@arch=cuda,dim=3,n_grid=64 0.000 151403596.190 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:particles/s@arch=vulkan,dim=2,n_grid=128 0.000 289905703.604 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:particles/s@arch=vulkan,dim=2,n_grid=256 0.000 781041065.492 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:particles/s@arch=vulkan,dim=2,n_grid=32 0.000 28203835.939 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:particles/s@arch=vulkan,dim=2,n_grid=64 0.000 114506311.519 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:particles/s@arch=vulkan,dim=3,n_grid=128 0.000 263294523.345 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:particles/s@arch=vulkan,dim=3,n_grid=256 0.000 37114474.846 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:particles/s@arch=vulkan,dim=3,n_grid=32 0.000 193512621.704 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:particles/s@arch=vulkan,dim=3,n_grid=64 0.000 198446251.496 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:wall_time@arch=cuda,dim=2,n_grid=128 0.000 0.484 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:wall_time@arch=cuda,dim=2,n_grid=256 0.000 0.826 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:wall_time@arch=cuda,dim=2,n_grid=32 0.000 0.472 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:wall_time@arch=cuda,dim=2,n_grid=64 0.000 0.47 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:wall_time@arch=cuda,dim=3,n_grid=128 0.000 72.193 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:wall_time@arch=cuda,dim=3,n_grid=256 0.000 3002.412 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:wall_time@arch=cuda,dim=3,n_grid=32 0.000 2.133 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:wall_time@arch=cuda,dim=3,n_grid=64 0.000 10.821 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:wall_time@arch=vulkan,dim=2,n_grid=128 0.000 0.706 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:wall_time@arch=vulkan,dim=2,n_grid=256 0.000 1.049 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:wall_time@arch=vulkan,dim=2,n_grid=32 0.000 0.454 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:wall_time@arch=vulkan,dim=2,n_grid=64 0.000 0.447 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:wall_time@arch=vulkan,dim=3,n_grid=128 0.000 49.782 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:wall_time@arch=vulkan,dim=3,n_grid=256 0.000 2825.248 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:wall_time@arch=vulkan,dim=3,n_grid=32 0.000 1.058 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm:wall_time@arch=vulkan,dim=3,n_grid=64 0.000 8.256 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:compile_time@arch=cuda,dim=2,n_grid=128 0.000 0.0922 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:compile_time@arch=cuda,dim=2,n_grid=256 0.000 0.0921 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:compile_time@arch=cuda,dim=2,n_grid=32 0.000 0.091 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:compile_time@arch=cuda,dim=2,n_grid=64 0.000 0.0925 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:compile_time@arch=cuda,dim=3,n_grid=128 0.000 0.202 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:compile_time@arch=cuda,dim=3,n_grid=256 0.000 0.204 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:compile_time@arch=cuda,dim=3,n_grid=32 0.000 0.201 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:compile_time@arch=cuda,dim=3,n_grid=64 0.000 0.201 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:compile_time@arch=vulkan,dim=2,n_grid=128 0.000 0.0212 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:compile_time@arch=vulkan,dim=2,n_grid=256 0.000 0.0266 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:compile_time@arch=vulkan,dim=2,n_grid=32 0.000 0.0214 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:compile_time@arch=vulkan,dim=2,n_grid=64 0.000 0.0215 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:compile_time@arch=vulkan,dim=3,n_grid=128 0.000 0.0367 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:compile_time@arch=vulkan,dim=3,n_grid=256 0.000 0.0365 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:compile_time@arch=vulkan,dim=3,n_grid=32 0.000 0.0427 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:compile_time@arch=vulkan,dim=3,n_grid=64 0.000 0.0363 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:wall_time@arch=cuda,dim=2,n_grid=128 0.000 92.229 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:wall_time@arch=cuda,dim=2,n_grid=256 0.000 92.075 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:wall_time@arch=cuda,dim=2,n_grid=32 0.000 90.994 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:wall_time@arch=cuda,dim=2,n_grid=64 0.000 92.479 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:wall_time@arch=cuda,dim=3,n_grid=128 0.000 202.352 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:wall_time@arch=cuda,dim=3,n_grid=256 0.000 203.797 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:wall_time@arch=cuda,dim=3,n_grid=32 0.000 201.266 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:wall_time@arch=cuda,dim=3,n_grid=64 0.000 200.872 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:wall_time@arch=vulkan,dim=2,n_grid=128 0.000 21.173 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:wall_time@arch=vulkan,dim=2,n_grid=256 0.000 26.642 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:wall_time@arch=vulkan,dim=2,n_grid=32 0.000 21.397 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:wall_time@arch=vulkan,dim=2,n_grid=64 0.000 21.548 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:wall_time@arch=vulkan,dim=3,n_grid=128 0.000 36.658 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:wall_time@arch=vulkan,dim=3,n_grid=256 0.000 36.533 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:wall_time@arch=vulkan,dim=3,n_grid=32 0.000 42.722 $\textcolor{gray}{\textsf{ 0.00\%}}$
mpm_compile:wall_time@arch=vulkan,dim=3,n_grid=64 0.000 36.302 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:bips@arch=cuda,n=128,variant=CacheBlock 0.000 0.783 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:bips@arch=cuda,n=128,variant=Naive 0.000 0.782 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:bips@arch=cuda,n=256,variant=CacheBlock 0.000 2.977 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:bips@arch=cuda,n=256,variant=Naive 0.000 2.726 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:bips@arch=cuda,n=262144,variant=CacheBlock 0.000 1047.359 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:bips@arch=cuda,n=262144,variant=Naive 0.000 526.561 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:bips@arch=cuda,n=512,variant=CacheBlock 0.000 11.185 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:bips@arch=cuda,n=512,variant=Naive 0.000 7.711 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:bips@arch=vulkan,n=128,variant=Naive 0.000 0.402 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:bips@arch=vulkan,n=256,variant=Naive 0.000 1.210 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:bips@arch=vulkan,n=262144,variant=Naive 0.000 359.861 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:bips@arch=vulkan,n=512,variant=Naive 0.000 3.180 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:wall_time@arch=cuda,n=128,variant=CacheBlock 0.000 0.0209 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:wall_time@arch=cuda,n=128,variant=Naive 0.000 0.021 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:wall_time@arch=cuda,n=256,variant=CacheBlock 0.000 0.022 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:wall_time@arch=cuda,n=256,variant=Naive 0.000 0.024 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:wall_time@arch=cuda,n=262144,variant=CacheBlock 0.000 65.612 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:wall_time@arch=cuda,n=262144,variant=Naive 0.000 130.506 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:wall_time@arch=cuda,n=512,variant=CacheBlock 0.000 0.0234 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:wall_time@arch=cuda,n=512,variant=Naive 0.000 0.034 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:wall_time@arch=vulkan,n=128,variant=Naive 0.000 0.0408 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:wall_time@arch=vulkan,n=256,variant=Naive 0.000 0.0542 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:wall_time@arch=vulkan,n=262144,variant=Naive 0.000 190.961 $\textcolor{gray}{\textsf{ 0.00\%}}$
nbody:wall_time@arch=vulkan,n=512,variant=Naive 0.000 0.0824 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=1,n=1024 0.000 753.313 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=1,n=2048 0.000 778.279 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=1,n=256 0.000 46.529 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=1,n=4096 0.000 815.515 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=128,n=1024 0.000 721.281 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=128,n=2048 0.000 771.020 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=128,n=256 0.000 45.686 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=128,n=4096 0.000 799.939 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=256,n=1024 0.000 514.921 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=256,n=2048 0.000 606.787 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=256,n=256 0.000 46.340 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=256,n=4096 0.000 643.080 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=32,n=1024 0.000 740.469 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=32,n=2048 0.000 776.373 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=32,n=256 0.000 46.580 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=32,n=4096 0.000 813.678 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=64,n=1024 0.000 730.783 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=64,n=2048 0.000 776.433 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=64,n=256 0.000 46.218 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=cuda,len_coeff=64,n=4096 0.000 812.121 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=1,n=1024 0.000 776.054 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=1,n=2048 0.000 745.159 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=1,n=256 0.000 51.987 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=1,n=4096 0.000 818.705 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=128,n=1024 0.000 570.670 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=128,n=2048 0.000 749.802 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=128,n=256 0.000 49.972 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=128,n=4096 0.000 821.555 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=256,n=1024 0.000 444.688 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=256,n=2048 0.000 575.649 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=256,n=256 0.000 48.452 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=256,n=4096 0.000 609.640 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=32,n=1024 0.000 644.072 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=32,n=2048 0.000 743.627 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=32,n=256 0.000 50.121 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=32,n=4096 0.000 817.045 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=64,n=1024 0.000 629.983 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=64,n=2048 0.000 744.279 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=64,n=256 0.000 50.977 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gbs@arch=vulkan,len_coeff=64,n=4096 0.000 817.569 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=1,n=1024 0.000 125.552 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=1,n=2048 0.000 129.713 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=1,n=256 0.000 7.755 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=1,n=4096 0.000 135.919 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=128,n=1024 0.000 15387.322 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=128,n=2048 0.000 16448.430 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=128,n=256 0.000 974.640 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=128,n=4096 0.000 17065.363 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=256,n=1024 0.000 21969.967 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=256,n=2048 0.000 25889.584 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=256,n=256 0.000 1977.163 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=256,n=4096 0.000 27438.091 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=32,n=1024 0.000 3949.166 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=32,n=2048 0.000 4140.654 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=32,n=256 0.000 248.428 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=32,n=4096 0.000 4339.616 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=64,n=1024 0.000 7795.019 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=64,n=2048 0.000 8281.950 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=64,n=256 0.000 492.988 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=cuda,len_coeff=64,n=4096 0.000 8662.629 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=1,n=1024 0.000 129.342 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=1,n=2048 0.000 124.193 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=1,n=256 0.000 8.664 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=1,n=4096 0.000 136.451 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=128,n=1024 0.000 12174.299 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=128,n=2048 0.000 15995.769 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=128,n=256 0.000 1066.061 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=128,n=4096 0.000 17526.507 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=256,n=1024 0.000 18973.365 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=256,n=2048 0.000 24561.023 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=256,n=256 0.000 2067.285 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=256,n=4096 0.000 26011.324 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=32,n=1024 0.000 3435.049 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=32,n=2048 0.000 3966.012 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=32,n=256 0.000 267.312 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=32,n=4096 0.000 4357.573 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=64,n=1024 0.000 6719.819 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=64,n=2048 0.000 7938.979 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=64,n=256 0.000 543.754 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:gflops@arch=vulkan,len_coeff=64,n=4096 0.000 8720.734 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=1,n=1024 0.000 0.0167 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=1,n=2048 0.000 0.0647 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=1,n=256 0.000 0.0169 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=1,n=4096 0.000 0.247 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=128,n=1024 0.000 0.0174 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=128,n=2048 0.000 0.0653 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=128,n=256 0.000 0.0172 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=128,n=4096 0.000 0.252 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=256,n=1024 0.000 0.0244 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=256,n=2048 0.000 0.0829 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=256,n=256 0.000 0.017 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=256,n=4096 0.000 0.313 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=32,n=1024 0.000 0.017 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=32,n=2048 0.000 0.0648 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=32,n=256 0.000 0.0169 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=32,n=4096 0.000 0.247 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=64,n=1024 0.000 0.0172 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=64,n=2048 0.000 0.0648 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=64,n=256 0.000 0.017 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=cuda,len_coeff=64,n=4096 0.000 0.248 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=1,n=1024 0.000 0.0162 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=1,n=2048 0.000 0.0675 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=1,n=256 0.000 0.0151 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=1,n=4096 0.000 0.246 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=128,n=1024 0.000 0.022 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=128,n=2048 0.000 0.0671 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=128,n=256 0.000 0.0157 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=128,n=4096 0.000 0.245 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=256,n=1024 0.000 0.0283 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=256,n=2048 0.000 0.0874 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=256,n=256 0.000 0.0162 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=256,n=4096 0.000 0.33 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=32,n=1024 0.000 0.0195 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=32,n=2048 0.000 0.0677 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=32,n=256 0.000 0.0157 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=32,n=4096 0.000 0.246 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=64,n=1024 0.000 0.02 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=64,n=2048 0.000 0.0676 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=64,n=256 0.000 0.0154 $\textcolor{gray}{\textsf{ 0.00\%}}$
nested-saxpy:wall_time@arch=vulkan,len_coeff=64,n=4096 0.000 0.246 $\textcolor{gray}{\textsf{ 0.00\%}}$
stencil:GB/s@N=1024,arch=cuda 0.000 282.343 $\textcolor{gray}{\textsf{ 0.00\%}}$
stencil:GB/s@N=1024,arch=vulkan 0.000 160.411 $\textcolor{gray}{\textsf{ 0.00\%}}$
stencil:GB/s@N=256,arch=cuda 0.000 25.745 $\textcolor{gray}{\textsf{ 0.00\%}}$
stencil:GB/s@N=256,arch=vulkan 0.000 10.208 $\textcolor{gray}{\textsf{ 0.00\%}}$
stencil:GB/s@N=4096,arch=cuda 0.000 360.692 $\textcolor{gray}{\textsf{ 0.00\%}}$
stencil:GB/s@N=4096,arch=vulkan 0.000 358.051 $\textcolor{gray}{\textsf{ 0.00\%}}$
stencil:fps@N=1024,arch=cuda 0.000 33657.965 $\textcolor{gray}{\textsf{ 0.00\%}}$
stencil:fps@N=1024,arch=vulkan 0.000 19122.506 $\textcolor{gray}{\textsf{ 0.00\%}}$
stencil:fps@N=256,arch=cuda 0.000 49105.421 $\textcolor{gray}{\textsf{ 0.00\%}}$
stencil:fps@N=256,arch=vulkan 0.000 19470.291 $\textcolor{gray}{\textsf{ 0.00\%}}$
stencil:fps@N=4096,arch=cuda 0.000 2687.366 $\textcolor{gray}{\textsf{ 0.00\%}}$
stencil:fps@N=4096,arch=vulkan 0.000 2667.686 $\textcolor{gray}{\textsf{ 0.00\%}}$
stencil:wall_time@N=1024,arch=cuda 0.000 0.0297 $\textcolor{gray}{\textsf{ 0.00\%}}$
stencil:wall_time@N=1024,arch=vulkan 0.000 0.0523 $\textcolor{gray}{\textsf{ 0.00\%}}$
stencil:wall_time@N=256,arch=cuda 0.000 0.0204 $\textcolor{gray}{\textsf{ 0.00\%}}$
stencil:wall_time@N=256,arch=vulkan 0.000 0.0514 $\textcolor{gray}{\textsf{ 0.00\%}}$
stencil:wall_time@N=4096,arch=cuda 0.000 0.372 $\textcolor{gray}{\textsf{ 0.00\%}}$
stencil:wall_time@N=4096,arch=vulkan 0.000 0.375 $\textcolor{gray}{\textsf{ 0.00\%}}$

@bobcao3
Copy link
Collaborator Author

bobcao3 commented Apr 26, 2023

Unless there's significant performance gain, I would suggest that we keep the type check TI_ASSERT(is<T>());.

static_cast to Derived class pointers doesn't perform any check and is easy to cause undefined behavior if not used with caution.

perf gain is minimal probably because modern CPUs getting too good at prediction. I think a potential way to do this is to use system assert statement instead of TI_ASSERT, which gets voided in Release build. This could go in junction with shipping both release mode and debug mode binary

@jim19930609
Copy link
Contributor

Unless there's significant performance gain, I would suggest that we keep the type check TI_ASSERT(is<T>());.
static_cast to Derived class pointers doesn't perform any check and is easy to cause undefined behavior if not used with caution.

perf gain is minimal probably because modern CPUs getting too good at prediction. I think a potential way to do this is to use system assert statement instead of TI_ASSERT, which gets voided in Release build. This could go in junction with shipping both release mode and debug mode binary

Agreed!

@jim19930609
Copy link
Contributor

Benchmark Report

Baseline: v1.5.0 Current: 5e0a34e4222e6da8abc0b07afa51aa8e4e13f61f

Item Baseline Current Change
c_api_library_size:android@arch=x64 0.000 1960320.000  0.00%
c_api_library_size:linux@arch=x64 0.000 52474888.000  0.00%
c_api_library_size:wall_time@arch=x64 0.000 0.0177  0.00%
fill:GB/s@arch=cuda,n=134217728 0.000 826.719  0.00%
fill:GB/s@arch=cuda,n=16777216 0.000 808.394  0.00%
fill:GB/s@arch=cuda,n=33554432 0.000 818.861  0.00%
fill:GB/s@arch=cuda,n=536870912 0.000 818.053  0.00%
fill:GB/s@arch=vulkan,n=134217728 0.000 688.889  0.00%
fill:GB/s@arch=vulkan,n=16777216 0.000 694.280  0.00%
fill:GB/s@arch=vulkan,n=33554432 0.000 708.785  0.00%
fill:GB/s@arch=vulkan,n=536870912 0.000 651.315  0.00%
fill:wall_time@arch=cuda,n=134217728 0.000 0.605  0.00%
fill:wall_time@arch=cuda,n=16777216 0.000 0.0773  0.00%
fill:wall_time@arch=cuda,n=33554432 0.000 0.153  0.00%
fill:wall_time@arch=cuda,n=536870912 0.000 2.445  0.00%
fill:wall_time@arch=vulkan,n=134217728 0.000 0.726  0.00%
fill:wall_time@arch=vulkan,n=16777216 0.000 0.09  0.00%
fill:wall_time@arch=vulkan,n=33554432 0.000 0.176  0.00%
fill:wall_time@arch=vulkan,n=536870912 0.000 3.071  0.00%
mpm:fps@arch=cuda,dim=2,n_grid=128 0.000 2067.345  0.00%
mpm:fps@arch=cuda,dim=2,n_grid=256 0.000 1210.752  0.00%
mpm:fps@arch=cuda,dim=2,n_grid=32 0.000 2119.636  0.00%
mpm:fps@arch=cuda,dim=2,n_grid=64 0.000 2126.721  0.00%
mpm:fps@arch=cuda,dim=3,n_grid=128 0.000 13.852  0.00%
mpm:fps@arch=cuda,dim=3,n_grid=256 0.000 0.333  0.00%
mpm:fps@arch=cuda,dim=3,n_grid=32 0.000 468.748  0.00%
mpm:fps@arch=cuda,dim=3,n_grid=64 0.000 92.409  0.00%
mpm:fps@arch=vulkan,dim=2,n_grid=128 0.000 1415.555  0.00%
mpm:fps@arch=vulkan,dim=2,n_grid=256 0.000 953.419  0.00%
mpm:fps@arch=vulkan,dim=2,n_grid=32 0.000 2203.425  0.00%
mpm:fps@arch=vulkan,dim=2,n_grid=64 0.000 2236.451  0.00%
mpm:fps@arch=vulkan,dim=3,n_grid=128 0.000 20.088  0.00%
mpm:fps@arch=vulkan,dim=3,n_grid=256 0.000 0.354  0.00%
mpm:fps@arch=vulkan,dim=3,n_grid=32 0.000 944.886  0.00%
mpm:fps@arch=vulkan,dim=3,n_grid=64 0.000 121.122  0.00%
mpm:particles/s@arch=cuda,dim=2,n_grid=128 0.000 423392168.593  0.00%
mpm:particles/s@arch=cuda,dim=2,n_grid=256 0.000 991848129.001  0.00%
mpm:particles/s@arch=cuda,dim=2,n_grid=32 0.000 27131344.642  0.00%
mpm:particles/s@arch=cuda,dim=2,n_grid=64 0.000 108888091.572  0.00%
mpm:particles/s@arch=cuda,dim=3,n_grid=128 0.000 181556803.801  0.00%
mpm:particles/s@arch=cuda,dim=3,n_grid=256 0.000 34924451.290  0.00%
mpm:particles/s@arch=cuda,dim=3,n_grid=32 0.000 95999583.015  0.00%
mpm:particles/s@arch=cuda,dim=3,n_grid=64 0.000 151403596.190  0.00%
mpm:particles/s@arch=vulkan,dim=2,n_grid=128 0.000 289905703.604  0.00%
mpm:particles/s@arch=vulkan,dim=2,n_grid=256 0.000 781041065.492  0.00%
mpm:particles/s@arch=vulkan,dim=2,n_grid=32 0.000 28203835.939  0.00%
mpm:particles/s@arch=vulkan,dim=2,n_grid=64 0.000 114506311.519  0.00%
mpm:particles/s@arch=vulkan,dim=3,n_grid=128 0.000 263294523.345  0.00%
mpm:particles/s@arch=vulkan,dim=3,n_grid=256 0.000 37114474.846  0.00%
mpm:particles/s@arch=vulkan,dim=3,n_grid=32 0.000 193512621.704  0.00%
mpm:particles/s@arch=vulkan,dim=3,n_grid=64 0.000 198446251.496  0.00%
mpm:wall_time@arch=cuda,dim=2,n_grid=128 0.000 0.484  0.00%
mpm:wall_time@arch=cuda,dim=2,n_grid=256 0.000 0.826  0.00%
mpm:wall_time@arch=cuda,dim=2,n_grid=32 0.000 0.472  0.00%
mpm:wall_time@arch=cuda,dim=2,n_grid=64 0.000 0.47  0.00%
mpm:wall_time@arch=cuda,dim=3,n_grid=128 0.000 72.193  0.00%
mpm:wall_time@arch=cuda,dim=3,n_grid=256 0.000 3002.412  0.00%
mpm:wall_time@arch=cuda,dim=3,n_grid=32 0.000 2.133  0.00%
mpm:wall_time@arch=cuda,dim=3,n_grid=64 0.000 10.821  0.00%
mpm:wall_time@arch=vulkan,dim=2,n_grid=128 0.000 0.706  0.00%
mpm:wall_time@arch=vulkan,dim=2,n_grid=256 0.000 1.049  0.00%
mpm:wall_time@arch=vulkan,dim=2,n_grid=32 0.000 0.454  0.00%
mpm:wall_time@arch=vulkan,dim=2,n_grid=64 0.000 0.447  0.00%
mpm:wall_time@arch=vulkan,dim=3,n_grid=128 0.000 49.782  0.00%
mpm:wall_time@arch=vulkan,dim=3,n_grid=256 0.000 2825.248  0.00%
mpm:wall_time@arch=vulkan,dim=3,n_grid=32 0.000 1.058  0.00%
mpm:wall_time@arch=vulkan,dim=3,n_grid=64 0.000 8.256  0.00%
mpm_compile:compile_time@arch=cuda,dim=2,n_grid=128 0.000 0.0922  0.00%
mpm_compile:compile_time@arch=cuda,dim=2,n_grid=256 0.000 0.0921  0.00%
mpm_compile:compile_time@arch=cuda,dim=2,n_grid=32 0.000 0.091  0.00%
mpm_compile:compile_time@arch=cuda,dim=2,n_grid=64 0.000 0.0925  0.00%
mpm_compile:compile_time@arch=cuda,dim=3,n_grid=128 0.000 0.202  0.00%
mpm_compile:compile_time@arch=cuda,dim=3,n_grid=256 0.000 0.204  0.00%
mpm_compile:compile_time@arch=cuda,dim=3,n_grid=32 0.000 0.201  0.00%
mpm_compile:compile_time@arch=cuda,dim=3,n_grid=64 0.000 0.201  0.00%
mpm_compile:compile_time@arch=vulkan,dim=2,n_grid=128 0.000 0.0212  0.00%
mpm_compile:compile_time@arch=vulkan,dim=2,n_grid=256 0.000 0.0266  0.00%
mpm_compile:compile_time@arch=vulkan,dim=2,n_grid=32 0.000 0.0214  0.00%
mpm_compile:compile_time@arch=vulkan,dim=2,n_grid=64 0.000 0.0215  0.00%
mpm_compile:compile_time@arch=vulkan,dim=3,n_grid=128 0.000 0.0367  0.00%
mpm_compile:compile_time@arch=vulkan,dim=3,n_grid=256 0.000 0.0365  0.00%
mpm_compile:compile_time@arch=vulkan,dim=3,n_grid=32 0.000 0.0427  0.00%
mpm_compile:compile_time@arch=vulkan,dim=3,n_grid=64 0.000 0.0363  0.00%
mpm_compile:wall_time@arch=cuda,dim=2,n_grid=128 0.000 92.229  0.00%
mpm_compile:wall_time@arch=cuda,dim=2,n_grid=256 0.000 92.075  0.00%
mpm_compile:wall_time@arch=cuda,dim=2,n_grid=32 0.000 90.994  0.00%
mpm_compile:wall_time@arch=cuda,dim=2,n_grid=64 0.000 92.479  0.00%
mpm_compile:wall_time@arch=cuda,dim=3,n_grid=128 0.000 202.352  0.00%
mpm_compile:wall_time@arch=cuda,dim=3,n_grid=256 0.000 203.797  0.00%
mpm_compile:wall_time@arch=cuda,dim=3,n_grid=32 0.000 201.266  0.00%
mpm_compile:wall_time@arch=cuda,dim=3,n_grid=64 0.000 200.872  0.00%
mpm_compile:wall_time@arch=vulkan,dim=2,n_grid=128 0.000 21.173  0.00%
mpm_compile:wall_time@arch=vulkan,dim=2,n_grid=256 0.000 26.642  0.00%
mpm_compile:wall_time@arch=vulkan,dim=2,n_grid=32 0.000 21.397  0.00%
mpm_compile:wall_time@arch=vulkan,dim=2,n_grid=64 0.000 21.548  0.00%
mpm_compile:wall_time@arch=vulkan,dim=3,n_grid=128 0.000 36.658  0.00%
mpm_compile:wall_time@arch=vulkan,dim=3,n_grid=256 0.000 36.533  0.00%
mpm_compile:wall_time@arch=vulkan,dim=3,n_grid=32 0.000 42.722  0.00%
mpm_compile:wall_time@arch=vulkan,dim=3,n_grid=64 0.000 36.302  0.00%
nbody:bips@arch=cuda,n=128,variant=CacheBlock 0.000 0.783  0.00%
nbody:bips@arch=cuda,n=128,variant=Naive 0.000 0.782  0.00%
nbody:bips@arch=cuda,n=256,variant=CacheBlock 0.000 2.977  0.00%
nbody:bips@arch=cuda,n=256,variant=Naive 0.000 2.726  0.00%
nbody:bips@arch=cuda,n=262144,variant=CacheBlock 0.000 1047.359  0.00%
nbody:bips@arch=cuda,n=262144,variant=Naive 0.000 526.561  0.00%
nbody:bips@arch=cuda,n=512,variant=CacheBlock 0.000 11.185  0.00%
nbody:bips@arch=cuda,n=512,variant=Naive 0.000 7.711  0.00%
nbody:bips@arch=vulkan,n=128,variant=Naive 0.000 0.402  0.00%
nbody:bips@arch=vulkan,n=256,variant=Naive 0.000 1.210  0.00%
nbody:bips@arch=vulkan,n=262144,variant=Naive 0.000 359.861  0.00%
nbody:bips@arch=vulkan,n=512,variant=Naive 0.000 3.180  0.00%
nbody:wall_time@arch=cuda,n=128,variant=CacheBlock 0.000 0.0209  0.00%
nbody:wall_time@arch=cuda,n=128,variant=Naive 0.000 0.021  0.00%
nbody:wall_time@arch=cuda,n=256,variant=CacheBlock 0.000 0.022  0.00%
nbody:wall_time@arch=cuda,n=256,variant=Naive 0.000 0.024  0.00%
nbody:wall_time@arch=cuda,n=262144,variant=CacheBlock 0.000 65.612  0.00%
nbody:wall_time@arch=cuda,n=262144,variant=Naive 0.000 130.506  0.00%
nbody:wall_time@arch=cuda,n=512,variant=CacheBlock 0.000 0.0234  0.00%
nbody:wall_time@arch=cuda,n=512,variant=Naive 0.000 0.034  0.00%
nbody:wall_time@arch=vulkan,n=128,variant=Naive 0.000 0.0408  0.00%
nbody:wall_time@arch=vulkan,n=256,variant=Naive 0.000 0.0542  0.00%
nbody:wall_time@arch=vulkan,n=262144,variant=Naive 0.000 190.961  0.00%
nbody:wall_time@arch=vulkan,n=512,variant=Naive 0.000 0.0824  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=1,n=1024 0.000 753.313  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=1,n=2048 0.000 778.279  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=1,n=256 0.000 46.529  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=1,n=4096 0.000 815.515  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=128,n=1024 0.000 721.281  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=128,n=2048 0.000 771.020  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=128,n=256 0.000 45.686  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=128,n=4096 0.000 799.939  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=256,n=1024 0.000 514.921  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=256,n=2048 0.000 606.787  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=256,n=256 0.000 46.340  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=256,n=4096 0.000 643.080  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=32,n=1024 0.000 740.469  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=32,n=2048 0.000 776.373  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=32,n=256 0.000 46.580  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=32,n=4096 0.000 813.678  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=64,n=1024 0.000 730.783  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=64,n=2048 0.000 776.433  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=64,n=256 0.000 46.218  0.00%
nested-saxpy:gbs@arch=cuda,len_coeff=64,n=4096 0.000 812.121  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=1,n=1024 0.000 776.054  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=1,n=2048 0.000 745.159  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=1,n=256 0.000 51.987  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=1,n=4096 0.000 818.705  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=128,n=1024 0.000 570.670  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=128,n=2048 0.000 749.802  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=128,n=256 0.000 49.972  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=128,n=4096 0.000 821.555  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=256,n=1024 0.000 444.688  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=256,n=2048 0.000 575.649  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=256,n=256 0.000 48.452  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=256,n=4096 0.000 609.640  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=32,n=1024 0.000 644.072  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=32,n=2048 0.000 743.627  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=32,n=256 0.000 50.121  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=32,n=4096 0.000 817.045  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=64,n=1024 0.000 629.983  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=64,n=2048 0.000 744.279  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=64,n=256 0.000 50.977  0.00%
nested-saxpy:gbs@arch=vulkan,len_coeff=64,n=4096 0.000 817.569  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=1,n=1024 0.000 125.552  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=1,n=2048 0.000 129.713  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=1,n=256 0.000 7.755  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=1,n=4096 0.000 135.919  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=128,n=1024 0.000 15387.322  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=128,n=2048 0.000 16448.430  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=128,n=256 0.000 974.640  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=128,n=4096 0.000 17065.363  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=256,n=1024 0.000 21969.967  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=256,n=2048 0.000 25889.584  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=256,n=256 0.000 1977.163  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=256,n=4096 0.000 27438.091  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=32,n=1024 0.000 3949.166  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=32,n=2048 0.000 4140.654  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=32,n=256 0.000 248.428  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=32,n=4096 0.000 4339.616  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=64,n=1024 0.000 7795.019  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=64,n=2048 0.000 8281.950  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=64,n=256 0.000 492.988  0.00%
nested-saxpy:gflops@arch=cuda,len_coeff=64,n=4096 0.000 8662.629  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=1,n=1024 0.000 129.342  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=1,n=2048 0.000 124.193  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=1,n=256 0.000 8.664  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=1,n=4096 0.000 136.451  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=128,n=1024 0.000 12174.299  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=128,n=2048 0.000 15995.769  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=128,n=256 0.000 1066.061  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=128,n=4096 0.000 17526.507  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=256,n=1024 0.000 18973.365  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=256,n=2048 0.000 24561.023  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=256,n=256 0.000 2067.285  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=256,n=4096 0.000 26011.324  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=32,n=1024 0.000 3435.049  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=32,n=2048 0.000 3966.012  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=32,n=256 0.000 267.312  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=32,n=4096 0.000 4357.573  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=64,n=1024 0.000 6719.819  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=64,n=2048 0.000 7938.979  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=64,n=256 0.000 543.754  0.00%
nested-saxpy:gflops@arch=vulkan,len_coeff=64,n=4096 0.000 8720.734  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=1,n=1024 0.000 0.0167  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=1,n=2048 0.000 0.0647  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=1,n=256 0.000 0.0169  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=1,n=4096 0.000 0.247  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=128,n=1024 0.000 0.0174  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=128,n=2048 0.000 0.0653  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=128,n=256 0.000 0.0172  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=128,n=4096 0.000 0.252  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=256,n=1024 0.000 0.0244  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=256,n=2048 0.000 0.0829  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=256,n=256 0.000 0.017  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=256,n=4096 0.000 0.313  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=32,n=1024 0.000 0.017  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=32,n=2048 0.000 0.0648  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=32,n=256 0.000 0.0169  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=32,n=4096 0.000 0.247  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=64,n=1024 0.000 0.0172  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=64,n=2048 0.000 0.0648  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=64,n=256 0.000 0.017  0.00%
nested-saxpy:wall_time@arch=cuda,len_coeff=64,n=4096 0.000 0.248  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=1,n=1024 0.000 0.0162  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=1,n=2048 0.000 0.0675  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=1,n=256 0.000 0.0151  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=1,n=4096 0.000 0.246  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=128,n=1024 0.000 0.022  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=128,n=2048 0.000 0.0671  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=128,n=256 0.000 0.0157  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=128,n=4096 0.000 0.245  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=256,n=1024 0.000 0.0283  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=256,n=2048 0.000 0.0874  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=256,n=256 0.000 0.0162  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=256,n=4096 0.000 0.33  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=32,n=1024 0.000 0.0195  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=32,n=2048 0.000 0.0677  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=32,n=256 0.000 0.0157  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=32,n=4096 0.000 0.246  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=64,n=1024 0.000 0.02  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=64,n=2048 0.000 0.0676  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=64,n=256 0.000 0.0154  0.00%
nested-saxpy:wall_time@arch=vulkan,len_coeff=64,n=4096 0.000 0.246  0.00%
stencil:GB/s@N=1024,arch=cuda 0.000 282.343  0.00%
stencil:GB/s@N=1024,arch=vulkan 0.000 160.411  0.00%
stencil:GB/s@N=256,arch=cuda 0.000 25.745  0.00%
stencil:GB/s@N=256,arch=vulkan 0.000 10.208  0.00%
stencil:GB/s@N=4096,arch=cuda 0.000 360.692  0.00%
stencil:GB/s@N=4096,arch=vulkan 0.000 358.051  0.00%
stencil:fps@N=1024,arch=cuda 0.000 33657.965  0.00%
stencil:fps@N=1024,arch=vulkan 0.000 19122.506  0.00%
stencil:fps@N=256,arch=cuda 0.000 49105.421  0.00%
stencil:fps@N=256,arch=vulkan 0.000 19470.291  0.00%
stencil:fps@N=4096,arch=cuda 0.000 2687.366  0.00%
stencil:fps@N=4096,arch=vulkan 0.000 2667.686  0.00%
stencil:wall_time@N=1024,arch=cuda 0.000 0.0297  0.00%
stencil:wall_time@N=1024,arch=vulkan 0.000 0.0523  0.00%
stencil:wall_time@N=256,arch=cuda 0.000 0.0204  0.00%
stencil:wall_time@N=256,arch=vulkan 0.000 0.0514  0.00%
stencil:wall_time@N=4096,arch=cuda 0.000 0.372  0.00%
stencil:wall_time@N=4096,arch=vulkan 0.000 0.375  0.00%

Diff of 0% seems pretty weird. Is there anything wrong with our baseline? @feisuzhu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants