You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm wondering if ExecuTorch can be compiled for WebAssembly target? As far as I understand, XNNPACK exists for wasm-simd, so theoretically at least for CPU it can be done? (e.g. to be compared with tflite+tfjs, ort-web and tvm-wasm at least for some popular models like MobileNets)
(This is especially interesting if strong fusion/codegen can be done to produce fused wasm-simd code/fused WebGPU programs - although maybe this is an ask for Inductor)
The text was updated successfully, but these errors were encountered:
I've talked with @digantdesai about this before. I think for xnnpack he mentioned it should just be plug and play. Ive been wanting to try out wasm for sometime now just havent had the bandwidth.
I also wonder about the fusion capabilities of executorch :) Does it allow Inductor codegen'd fused kernels (e.g. think quant/dequant fused into the flash attn kernel directly, with positional embedding computation also fused into this kernel)?
Another interesting backend is webgpu/wgpu: https://github.com/huggingface/ratchet or even directly wgpu/wgsl shaders could in theory be a compilation target for fused kernels
But even if executorch does not support wild codegen/fusions - it's still be good to have it as a baseline with comparisons against ort-web and tflate-tfjs and tvm-wasm and ggml compiled to wasm. This should show roughly where all these frameworks stand (especially if compiling is relatively doable)
And given that currently PyTorch does not have its own inference wasm/WebGPU story, having executorch compiled to wasm-simd might be a nice baseline to have (especially if it's minimalistic and relatively simple to compile)
I'm wondering if ExecuTorch can be compiled for WebAssembly target? As far as I understand, XNNPACK exists for wasm-simd, so theoretically at least for CPU it can be done? (e.g. to be compared with tflite+tfjs, ort-web and tvm-wasm at least for some popular models like MobileNets)
(This is especially interesting if strong fusion/codegen can be done to produce fused wasm-simd code/fused WebGPU programs - although maybe this is an ask for Inductor)
The text was updated successfully, but these errors were encountered: