Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

let application pass session options to runtime, allow float16 for llm kv-cache #631

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

guschmue
Copy link

@guschmue guschmue commented Mar 7, 2024

This sits on top of #596.

This PR introduces two changes:

Enhanced Session Options:

Applications can now specify their own execution provider, which is essential for scenarios where the execution provider string is a dictionary containing parameters for WebGPU or WebNN.
For WebGPU, ort-web has been updated to allow session options to designate preferredOutputLocation['outputName'] as 'gpu-buffer'. This ensures outputs remain on the GPU, facilitating reference passing without the need for copying. This enhancement is particularly beneficial for the LLM kvcache, offering substantial performance improvements.

Support for External Data and Precision Specification

ort-web now supports setting externalData within the session options, allowing the use of ONNX external data files.
The model configuration file has been updated to allow precision specification. While the default remains float32, users can opt for float16 to leverage the capabilities of WebGPU and WebNN.

We acknowledge that the documentation for these options is currently lacking, and efforts are underway to address this.
In the long term, transformers.js is expected to utilize these new features, although this integration represents a more complex undertaking. In the interim, applications have the flexibility to configure these settings as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant