Enable llama2 benchmarking with Turbine #2050

kuhar · 2023-12-29T22:08:31Z

This is extension of the main Turbine refactoring work: #1931. To enable future performance-related work, we should recreate the 1.0 benchmarking mode from vicuna.py:

Enablement

Allow llama2 to be run with a single prompt using a CLI script (@raikonenfnu)
Port the benchmarking/statistics options from vicuna.py (e.g., setting the prompts, generating exactly K output tokens, running multiple iterations and reporting the averages, etc.)
Add a README with benchmarking instructions

Correctness

Make sure the output is human-readable with 7b/13b/70b on the targets of interest (gfx9, gfx11, and others)

Performance

Add ukernel for argmax for ROCm GFX9 (@raikonenfnu)
Add ukernel for argmax for ROCm GFX11 (@raikonenfnu)
Add kernel for argmax for SPIR-V/Vulkan (@qedawkins)

The text was updated successfully, but these errors were encountered:

kuhar · 2023-12-29T22:14:19Z

cc: @antiagainst @harsh-nod

kuhar assigned kuhar, qedawkins and raikonenfnu Dec 29, 2023

kuhar added enhancement New feature or request performance labels Dec 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable llama2 benchmarking with Turbine #2050

Enable llama2 benchmarking with Turbine #2050

kuhar commented Dec 29, 2023 •

edited by raikonenfnu

kuhar commented Dec 29, 2023

Enable llama2 benchmarking with Turbine #2050

Enable llama2 benchmarking with Turbine #2050

Comments

kuhar commented Dec 29, 2023 • edited by raikonenfnu

Enablement

Correctness

Performance

kuhar commented Dec 29, 2023

kuhar commented Dec 29, 2023 •

edited by raikonenfnu