-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LFX Mentorship (Jun-Aug, 2024): Support piper as a new backend of the WASI-NN WasmEdge plugin #3381
Comments
This project looks very interesting. I really want to attand it! I have rich experience in Rust/Wasm programming and deep learning, so I believe that I am good for this task. |
Hi @hydai , |
If you are interested in this project and would like to apply for it. Please ensure you can build the piper framework and run the sample applications. Since the whole project is to integrate the piper as one of the WASI-NN backend, the most important part is to understand the piper workflow. |
I have built Piper and run several applications on it; everything is working fine so far. Now, I want to begin working on the project and explore more about Piper as per the project's requirements. Has any previous work been done on this project, or do we have to start from scratch? Additionally, if we need to start from scratch, could you please provide some similar references? |
None for the Piper integration. But there are lots of different backends for the WASI-NN plugin. You can see the appendix section.
Same as the previous problem, start from scratch with the Piper part. There is an existing WASI-NN implementation for other backends. |
@hydai |
Hey, @Raunak2024 |
@hydai can someone please guide me on this Also, I am currently trying to understand the WASI-NN backend implementation for PyTorch. Is there any developer documentation or other resources available that can help me understand the implementation? |
If you are still having this, feel free to reach to me on discord. I had something similar when setting up the RAG server-> Discord ID: angadsinghh |
Okay! But I was referring to the official site of Wasmedge |
Check out the official repo of WebAssembly once |
As far as I know, Piper seems to use ONNX models and onnxruntime. I think instead of making Piper a new WASI-NN backend, we should support ONNX. There are some relationships between backends and model file formats: OpenVINO - .bin and .xml ONNX - .onnx PyTorch - .pt TensorflowLite - .tflite GGML - .gguf I think Piper is at a higher level. Therefore, in order for Piper to work in WasmEdge, it should have some libraries that use the ONNX backend for synthesis (phoneme ids to audio), and other processing (like phonemization) should run in wasm (non system-level part). Please let me know what you think. |
Regarding ONNX, WasmEdge seems to support it before, but it was removed for some reasons. Related links: Anyway, I've modified the code in the master branch and successfully built WasmEdge with ONNX support using onnxruntime. It can run F32 models successfully. However, piper uses an I64 tensor type models. WasmEdge currently understands F16, F32, U8 and I32. We will need I64 support for Piper to work. plugin/wasi_nn/types.h:
wasmedge-wasi-nn has more tensor types. https://github.com/second-state/wasmedge-wasi-nn/blob/ggml/rust/src/tensor.rs:
The wasi-nn proposal also has BF16. https://github.com/WebAssembly/wasi-nn/blob/main/wit/wasi-nn.wit:
The problem here is that they use different values for the tensor type enum, which may cause problems if we update the enum in WasmEdge. However, most backends return WASINN::ErrNo::InvalidArgument when getting a tensor type other than F32. The example at https://github.com/second-state/WasmEdge-WASINN-examples uses U8 for GGML, but the GGML backend doesn't seem to check tensor type values anyway. The value of F32 enum is not changed, so the impact is likely to be small. Is there a specific version of wasi-nn proposal that WasmEdge currently supports or intend to support? |
@hydai |
Why not both? ONNX could be one of the WASI-NN backends, and so does Piper. |
Since nobody will maintain it, this backend is currently not enabled on the master branch.
Because WASI-NN is migrating to the Component Model proposal, however, it's non sense to wait for it to ship in the preview 2 proposals. We are forking wasmedge-wasi-nn as the SDK we mentioned in the appendix section and use it for now. |
The WASI-NN API is very powerful. Since it accepts bytes (tensors) and outputs bytes (tensors), almost any function can be wrapped around it. I think it will be abusing of the API if we add too much stuff to it. One of the benefits of performing machine learning inference in WASM is its sandbox security. If there are too many backends, the security will depend heavily on the implementation of the backends. If we really need Piper, would it be better to make just another plugin (such as "wasmedge-piper-tts")? When I see something claiming to be WASI-NN I expect it to follow the WASI-NN proposal. The WASI-NN Component Model currently does not have a graph-encoding (backend) enum for Piper. Also, Piper does not look like a graph IR, but a very specific tts system to me. Do you think piper will enter the WASI-NN list in the future? If I understand correctly, WasmEdge currently does not support multiple WASI-NN backends at the same time (this can probably be changed). One possible use case of multiple backends is to use GGML to generate LLM response and pass it to Piper to synthesize audio. If Piper is in another plugin, there will be no problem. To summarize, the benefits of making Piper another plugin outside of WASI-NN are:
Implementing a completely new plugin can be more difficult though. |
WASI-NN API should be a general API. I think all of the ML/AI-related frameworks should be one of its backends if we can do this. Otherwise, there will be WASI-TTS, WASI-LLM, WASI-ObjectDetection, WASI-SpeechToText, and more. I don't think we will really need so many different specs in the future. Also, if there are too many plugins, the security will also depend heavily on their implementation. Backends and plugins are the same. Both are the host functions.
We can do it but we may not want it. If the execution flow is just like WASI-NN style, which are load, init, set-input, comput, and get-output. Why don't we choose WASI-NN as our spec instead of creating a brand-new one?
I will say it is possible; the original WASI-NN proposal doesn't contain ggml/gguf, but it has now.
We do support multiple backends; the only thing you need to do is set multiple backends enabled, that's all. We just don't release pre-built assets since there are various combinations. The most important thing is that TF/PyTorch relies on dynamic libraries; that's why we don't want to create all-in-one WASI-NN pre-built assets to avoid the dependencies nightmare. However,
So, I am fine with accepting Piper as a standalone plugin if it offers more benefits. Once the execution flow is complete, it's pretty easy to move it to a WASI-NN backend or keep it standalone. |
Yes, I understand backends and plugins are the same. My original thought was that we would only provide a limited major NN graph backends and libraries for specific tasks will have to be implemented in wasm user functions. I get your idea though.
I just realize WASMEDGE_PLUGIN_WASI_NN_BACKEND can be a semicolon or whitespace seperated list because of the foreach command. Before I only read this CMakeLists.txt and thought multiple backends was not supported. Thank you for your detailed explanation. This really helps me understand the goal of this mentorship. |
I have created a fork of WasmEdge which supports piper as a backend at https://github.com/PeterD1524/WasmEdge/tree/wasi_nn_piper. It works almost the same as piper's command line but with json as input. Piper currently does not expose its code as a library so I have to patch its CMakeLists to make things work. Maybe there is a better way to do this. Here is an example using the API with the rust wasmedge-wasi-nn crate. fn main() {
let graph = wasmedge_wasi_nn::GraphBuilder::new(
wasmedge_wasi_nn::GraphEncoding::Piper,
wasmedge_wasi_nn::ExecutionTarget::CPU,
)
.build_from_bytes([serde_json::json!({
"model": "en_US-lessac-medium.onnx", // path to .onnx voice file, required
"config": "en_US-lessac-medium.onnx.json", // path to model config, default is model path + .json
"espeak_data": "espeak-ng-data", // path to espeak-ng data directory, required for espeak phonemes
})
.to_string()])
.unwrap();
let mut context = graph.init_execution_context().unwrap();
context
.set_input(
0,
wasmedge_wasi_nn::TensorType::U8,
&[1],
"Welcome to the world of speech synthesis!".as_bytes(),
)
.unwrap();
context.compute().unwrap();
// output is wav by default
let mut out_buffer = vec![0u8; 1 << 20];
let size = context.get_output(0, &mut out_buffer).unwrap();
std::fs::write("welcome.wav", &out_buffer[..size]).unwrap();
} The enum |
Summary
Motivation
WasmEdge supports PyTorch, TensorFlow Lite, llama.cpp, and more NN backend. Dealing with the text-to-voice is a big thing that we want to achieve. To make it possible, we would like to integrate piper, A fast, local neural text-to-speech system in C++ as a new WASI-NN backend.
Details
Application link
https://mentorship.lfx.linuxfoundation.org/project/61014739-ac16-4188-bdab-c87c0a502470
Appendix
The text was updated successfully, but these errors were encountered: