let application pass session options to runtime, allow float16 for llm kv-cache #631
+61
−73
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This sits on top of #596.
This PR introduces two changes:
Enhanced Session Options:
Applications can now specify their own execution provider, which is essential for scenarios where the execution provider string is a dictionary containing parameters for WebGPU or WebNN.
For WebGPU, ort-web has been updated to allow session options to designate preferredOutputLocation['outputName'] as 'gpu-buffer'. This ensures outputs remain on the GPU, facilitating reference passing without the need for copying. This enhancement is particularly beneficial for the LLM kvcache, offering substantial performance improvements.
Support for External Data and Precision Specification
ort-web now supports setting externalData within the session options, allowing the use of ONNX external data files.
The model configuration file has been updated to allow precision specification. While the default remains float32, users can opt for float16 to leverage the capabilities of WebGPU and WebNN.
We acknowledge that the documentation for these options is currently lacking, and efforts are underway to address this.
In the long term, transformers.js is expected to utilize these new features, although this integration represents a more complex undertaking. In the interim, applications have the flexibility to configure these settings as needed.