let application pass session options to runtime, allow float16 for llm kv-cache #631

guschmue · 2024-03-07T01:13:30Z

This sits on top of #596.

This PR introduces two changes:

Enhanced Session Options:

Applications can now specify their own execution provider, which is essential for scenarios where the execution provider string is a dictionary containing parameters for WebGPU or WebNN.
For WebGPU, ort-web has been updated to allow session options to designate preferredOutputLocation['outputName'] as 'gpu-buffer'. This ensures outputs remain on the GPU, facilitating reference passing without the need for copying. This enhancement is particularly beneficial for the LLM kvcache, offering substantial performance improvements.

Support for External Data and Precision Specification

ort-web now supports setting externalData within the session options, allowing the use of ONNX external data files.
The model configuration file has been updated to allow precision specification. While the default remains float32, users can opt for float16 to leverage the capabilities of WebGPU and WebNN.

We acknowledge that the documentation for these options is currently lacking, and efforts are underway to address this.
In the long term, transformers.js is expected to utilize these new features, although this integration represents a more complex undertaking. In the interim, applications have the flexibility to configure these settings as needed.

guschmue added 3 commits March 6, 2024 16:32

let application pass session options to the runtime

9e461b6

allow float16 for llm kv-cache

ed1ba9b

merge main

e800d37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

let application pass session options to runtime, allow float16 for llm kv-cache #631

let application pass session options to runtime, allow float16 for llm kv-cache #631

guschmue commented Mar 7, 2024

let application pass session options to runtime, allow float16 for llm kv-cache #631

Are you sure you want to change the base?

let application pass session options to runtime, allow float16 for llm kv-cache #631

Conversation

guschmue commented Mar 7, 2024

Enhanced Session Options:

Support for External Data and Precision Specification