You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is extension of the main Turbine refactoring work: #1931. To enable future performance-related work, we should recreate the 1.0 benchmarking mode from vicuna.py:
Enablement
Allow llama2 to be run with a single prompt using a CLI script (@raikonenfnu)
Port the benchmarking/statistics options from vicuna.py (e.g., setting the prompts, generating exactly K output tokens, running multiple iterations and reporting the averages, etc.)
Add a README with benchmarking instructions
Correctness
Make sure the output is human-readable with 7b/13b/70b on the targets of interest (gfx9, gfx11, and others)
Performance
Add ukernel for argmax for ROCm GFX9 (@raikonenfnu)
Add ukernel for argmax for ROCm GFX11 (@raikonenfnu)
Add kernel for argmax for SPIR-V/Vulkan (@qedawkins)
The text was updated successfully, but these errors were encountered:
This is extension of the main Turbine refactoring work: #1931. To enable future performance-related work, we should recreate the 1.0 benchmarking mode from
vicuna.py
:Enablement
Correctness
Performance
The text was updated successfully, but these errors were encountered: