Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLC-LLM] Uncaught (in promise) LinkError: WebAssembly.instantiate(): Import #4 "env" #373

Open
DavidGOrtega opened this issue Apr 21, 2024 · 2 comments

Comments

@DavidGOrtega
Copy link
Contributor

I have setup mlc-llm with latest to compile my models, however they do not work.

Uncaught (in promise) LinkError: WebAssembly.instantiate(): Import #4 "env" "_ZN3mlc3llm5serve16JSONSchemaToEBNFENSt3__212basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEENS2_8optionalIiEENS9_INS2_4pairIS8_S8_EEEEb": function import requires a callable

How to reproduce:
clone phi-2

export TVM_HOME=/your/path/mlc-llm/3rdparty/tvm
export MLC_LLM_HOME=/your/path/mlc-llm

export MODEL=/your/path/models/phi-2
export QUANTIZATION=q0f16

mlc_llm convert_weight $MODEL --quantization $QUANTIZATION -o $MODEL/MLC
mlc_llm gen_config $MODEL --quantization $QUANTIZATION  --conv-template phi-2 -o $MODEL/MLC
mlc_llm compile $MODEL/MLC/mlc-chat-config.json --device webgpu -o $MODEL/MLC/webllm.wasm

Ouput:

mlc_llm compile $MODEL/MLC/mlc-chat-config.json --device webgpu -o $MODEL/MLC/webllm.wasm
[2024-04-21 14:17:06] INFO auto_config.py:69: Found model configuration: /models/phi-2/MLC/mlc-chat-config.json
[2024-04-21 14:17:06] INFO auto_config.py:153: Found model type: phi. Use `--model-type` to override.
Compiling with arguments:
  --config          Phi1Config(vocab_size=51200, hidden_size=2560, intermediate_size=10240, num_hidden_layers=32, num_attention_heads=32, layer_norm_eps=1e-05, position_embedding_base=10000.0, partial_rotary_factor=0.4, num_key_value_heads=32, context_window_size=2048, prefill_chunk_size=2048, head_dim=80, tensor_parallel_shards=1, max_batch_size=80, kwargs={})
  --quantization    NoQuantize(name='q0f16', kind='no-quant', model_dtype='float16')
  --model-type      phi
  --target          {"host": {"kind": "llvm", "tag": "", "keys": ["cpu"], "mtriple": "wasm32-unknown-unknown-wasm"}, "max_num_threads": 256, "kind": "webgpu", "tag": "", "keys": ["webgpu", "gpu"]}
  --opt             flashinfer=0;cublas_gemm=0;faster_transformer=0;cudagraph=0;cutlass=0;ipc_allreduce_strategy=NONE
  --system-lib-prefix ""
  --output          /models/phi-2/MLC/webllm.wasm
  --overrides       context_window_size=None;sliding_window_size=None;prefill_chunk_size=None;attention_sink_size=None;max_batch_size=None;tensor_parallel_shards=None
[2024-04-21 14:17:06] INFO compile.py:137: Creating model from: Phi1Config(vocab_size=51200, hidden_size=2560, intermediate_size=10240, num_hidden_layers=32, num_attention_heads=32, layer_norm_eps=1e-05, position_embedding_base=10000.0, partial_rotary_factor=0.4, num_key_value_heads=32, context_window_size=2048, prefill_chunk_size=2048, head_dim=80, tensor_parallel_shards=1, max_batch_size=80, kwargs={})
[2024-04-21 14:17:06] INFO compile.py:156: Exporting the model to TVM Unity compiler
[2024-04-21 14:17:06] INFO compile.py:162: Running optimizations using TVM Unity
[2024-04-21 14:17:06] INFO compile.py:176: Registering metadata: {'model_type': 'phi', 'quantization': 'q0f16', 'context_window_size': 2048, 'sliding_window_size': -1, 'attention_sink_size': -1, 'prefill_chunk_size': 2048, 'tensor_parallel_shards': 1, 'kv_cache_bytes': 0}
[2024-04-21 14:17:06] WARNING auto_target.py:123: --system-lib-prefix is not specified when building a static library
[2024-04-21 14:17:07] INFO pipeline.py:50: Running TVM Relax graph-level optimizations
[2024-04-21 14:17:08] INFO pipeline.py:50: Lowering to TVM TIR kernels
[2024-04-21 14:17:10] INFO pipeline.py:50: Running TVM TIR-level optimizations
[2024-04-21 14:17:15] INFO pipeline.py:50: Running TVM Dlight low-level optimizations
[2024-04-21 14:17:16] INFO pipeline.py:50: Lowering to VM bytecode
[2024-04-21 14:17:17] INFO estimate_memory_usage.py:57: [Memory usage] Function `alloc_embedding_tensor`: 10.00 MB
[2024-04-21 14:17:17] INFO estimate_memory_usage.py:57: [Memory usage] Function `batch_decode`: 3.91 MB
[2024-04-21 14:17:17] INFO estimate_memory_usage.py:57: [Memory usage] Function `batch_prefill`: 100.78 MB
[2024-04-21 14:17:17] INFO estimate_memory_usage.py:57: [Memory usage] Function `batch_verify`: 100.00 MB
[2024-04-21 14:17:17] INFO estimate_memory_usage.py:57: [Memory usage] Function `create_tir_paged_kv_cache`: 0.00 MB
[2024-04-21 14:17:17] INFO estimate_memory_usage.py:57: [Memory usage] Function `decode`: 0.05 MB
[2024-04-21 14:17:17] INFO estimate_memory_usage.py:57: [Memory usage] Function `embed`: 10.00 MB
[2024-04-21 14:17:17] INFO estimate_memory_usage.py:57: [Memory usage] Function `prefill`: 100.01 MB
[2024-04-21 14:17:17] INFO estimate_memory_usage.py:57: [Memory usage] Function `softmax_with_temperature`: 0.00 MB
[2024-04-21 14:17:18] INFO pipeline.py:50: Compiling external modules
[2024-04-21 14:17:18] INFO pipeline.py:50: Compilation complete! Exporting to disk
[14:17:20] /Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/target/llvm/codegen_llvm.cc:185: Warning: Set native vector bits to be 128 for wasm32
[2024-04-21 14:17:47] INFO model_metadata.py:96: Total memory usage: 5402.61 MB (Parameters: 5301.83 MB. KVCache: 0.00 MB. Temporary buffer: 100.78 MB)
[2024-04-21 14:17:47] INFO model_metadata.py:105: To reduce memory usage, tweak `prefill_chunk_size`, `context_window_size` and `sliding_window_size`
[2024-04-21 14:17:47] INFO compile.py:198: Generated: /models/phi-2/MLC/webllm.wasm
(mlc-chat-venv) Davids-MacBook-Pro:mlc-llm davidgortega$ mlc_llm gen_config $MODEL --quantization $QUANTIZATION  --conv-template phi-2 -o $
MODEL/MLC
[2024-04-21 14:18:59] INFO auto_config.py:115: Found model configuration: /models/phi-2/config.json
[2024-04-21 14:18:59] INFO auto_config.py:153: Found model type: phi. Use `--model-type` to override.
[2024-04-21 14:18:59] INFO phi_model.py:53: context_window_size not found in config.json. Falling back to max_position_embeddings (2048)
[2024-04-21 14:18:59] INFO config.py:106: Overriding max_batch_size from 1 to 80
[2024-04-21 14:18:59] INFO gen_config.py:187: [generation_config.json] Setting eos_token_id: 50256
[2024-04-21 14:18:59] INFO gen_config.py:187: [generation_config.json] Setting bos_token_id: 50256
[2024-04-21 14:18:59] INFO gen_config.py:201: Not found tokenizer config: /models/phi-2/tokenizer.model
[2024-04-21 14:18:59] INFO gen_config.py:199: Found tokenizer config: /models/phi-2/tokenizer.json. Copying to /models/phi-2/MLC/tokenizer.json
[2024-04-21 14:18:59] INFO gen_config.py:199: Found tokenizer config: /models/phi-2/vocab.json. Copying to /models/phi-2/MLC/vocab.json
[2024-04-21 14:18:59] INFO gen_config.py:199: Found tokenizer config: /models/phi-2/merges.txt. Copying to /models/phi-2/MLC/merges.txt
[2024-04-21 14:18:59] INFO gen_config.py:199: Found tokenizer config: /models/phi-2/added_tokens.json. Copying to /models/phi-2/MLC/added_tokens.json
[2024-04-21 14:18:59] INFO gen_config.py:199: Found tokenizer config: /models/phi-2/tokenizer_config.json. Copying to /models/phi-2/MLC/tokenizer_config.json
[2024-04-21 14:18:59] INFO gen_config.py:76: [System default] Setting pad_token_id: 0
[2024-04-21 14:18:59] INFO gen_config.py:76: [System default] Setting temperature: 0.7
[2024-04-21 14:18:59] INFO gen_config.py:76: [System default] Setting presence_penalty: 0.0
[2024-04-21 14:18:59] INFO gen_config.py:76: [System default] Setting frequency_penalty: 0.0
[2024-04-21 14:18:59] INFO gen_config.py:76: [System default] Setting repetition_penalty: 1.0
[2024-04-21 14:18:59] INFO gen_config.py:76: [System default] Setting top_p: 0.95
[2024-04-21 14:18:59] INFO gen_config.py:76: [System default] Setting mean_gen_len: 128
[2024-04-21 14:18:59] INFO gen_config.py:76: [System default] Setting max_gen_len: 512
[2024-04-21 14:18:59] INFO gen_config.py:76: [System default] Setting shift_fill_factor: 0.3
[2024-04-21 14:18:59] INFO gen_config.py:263: Dumping configuration file to: /models/phi-2/MLC/mlc-chat-config.json
(mlc-chat-venv) Davids-MacBook-Pro:mlc-llm davidgortega$ mlc_llm compile $MODEL/MLC/mlc-chat-config.json --device webgpu -o $MODEL/MLC/webllm.wasm
[2024-04-21 14:19:07] INFO auto_config.py:69: Found model configuration: /models/phi-2/MLC/mlc-chat-config.json
[2024-04-21 14:19:07] INFO auto_config.py:153: Found model type: phi. Use `--model-type` to override.
Compiling with arguments:
  --config          Phi1Config(vocab_size=51200, hidden_size=2560, intermediate_size=10240, num_hidden_layers=32, num_attention_heads=32, layer_norm_eps=1e-05, position_embedding_base=10000.0, partial_rotary_factor=0.4, num_key_value_heads=32, context_window_size=2048, prefill_chunk_size=2048, head_dim=80, tensor_parallel_shards=1, max_batch_size=80, kwargs={})
  --quantization    NoQuantize(name='q0f16', kind='no-quant', model_dtype='float16')
  --model-type      phi
  --target          {"host": {"kind": "llvm", "tag": "", "keys": ["cpu"], "mtriple": "wasm32-unknown-unknown-wasm"}, "max_num_threads": 256, "kind": "webgpu", "tag": "", "keys": ["webgpu", "gpu"]}
  --opt             flashinfer=0;cublas_gemm=0;faster_transformer=0;cudagraph=0;cutlass=0;ipc_allreduce_strategy=NONE
  --system-lib-prefix ""
  --output         /models/phi-2/MLC/webllm.wasm
  --overrides       context_window_size=None;sliding_window_size=None;prefill_chunk_size=None;attention_sink_size=None;max_batch_size=None;tensor_parallel_shards=None
[2024-04-21 14:19:07] INFO compile.py:137: Creating model from: Phi1Config(vocab_size=51200, hidden_size=2560, intermediate_size=10240, num_hidden_layers=32, num_attention_heads=32, layer_norm_eps=1e-05, position_embedding_base=10000.0, partial_rotary_factor=0.4, num_key_value_heads=32, context_window_size=2048, prefill_chunk_size=2048, head_dim=80, tensor_parallel_shards=1, max_batch_size=80, kwargs={})
[2024-04-21 14:19:07] INFO compile.py:156: Exporting the model to TVM Unity compiler
[2024-04-21 14:19:07] INFO compile.py:162: Running optimizations using TVM Unity
[2024-04-21 14:19:07] INFO compile.py:176: Registering metadata: {'model_type': 'phi', 'quantization': 'q0f16', 'context_window_size': 2048, 'sliding_window_size': -1, 'attention_sink_size': -1, 'prefill_chunk_size': 2048, 'tensor_parallel_shards': 1, 'kv_cache_bytes': 0}
[2024-04-21 14:19:07] WARNING auto_target.py:123: --system-lib-prefix is not specified when building a static library
[2024-04-21 14:19:08] INFO pipeline.py:50: Running TVM Relax graph-level optimizations
[2024-04-21 14:19:10] INFO pipeline.py:50: Lowering to TVM TIR kernels
[2024-04-21 14:19:12] INFO pipeline.py:50: Running TVM TIR-level optimizations
[2024-04-21 14:19:17] INFO pipeline.py:50: Running TVM Dlight low-level optimizations
[2024-04-21 14:19:18] INFO pipeline.py:50: Lowering to VM bytecode
[2024-04-21 14:19:19] INFO estimate_memory_usage.py:57: [Memory usage] Function `alloc_embedding_tensor`: 10.00 MB
[2024-04-21 14:19:19] INFO estimate_memory_usage.py:57: [Memory usage] Function `batch_decode`: 3.91 MB
[2024-04-21 14:19:19] INFO estimate_memory_usage.py:57: [Memory usage] Function `batch_prefill`: 100.78 MB
[2024-04-21 14:19:19] INFO estimate_memory_usage.py:57: [Memory usage] Function `batch_verify`: 100.00 MB
[2024-04-21 14:19:19] INFO estimate_memory_usage.py:57: [Memory usage] Function `create_tir_paged_kv_cache`: 0.00 MB
[2024-04-21 14:19:19] INFO estimate_memory_usage.py:57: [Memory usage] Function `decode`: 0.05 MB
[2024-04-21 14:19:19] INFO estimate_memory_usage.py:57: [Memory usage] Function `embed`: 10.00 MB
[2024-04-21 14:19:19] INFO estimate_memory_usage.py:57: [Memory usage] Function `prefill`: 100.01 MB
[2024-04-21 14:19:19] INFO estimate_memory_usage.py:57: [Memory usage] Function `softmax_with_temperature`: 0.00 MB
[2024-04-21 14:19:20] INFO pipeline.py:50: Compiling external modules
[2024-04-21 14:19:20] INFO pipeline.py:50: Compilation complete! Exporting to disk
[14:19:22] /Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/target/llvm/codegen_llvm.cc:185: Warning: Set native vector bits to be 128 for wasm32
[2024-04-21 14:19:48] INFO model_metadata.py:96: Total memory usage: 5402.61 MB (Parameters: 5301.83 MB. KVCache: 0.00 MB. Temporary buffer: 100.78 MB)
[2024-04-21 14:19:48] INFO model_metadata.py:105: To reduce memory usage, tweak `prefill_chunk_size`, `context_window_size` and `sliding_window_size`
[2024-04-21 14:19:48] INFO compile.py:198: Generated: /models/phi-2/MLC/webllm.wasm

If I point the wasm to mlc's phi-2 it works.

@DavidGOrtega
Copy link
Contributor Author

Im trying to work in adding webgou tests using cypress, but Im spending more time looking at this than anything else.

@CharlieFRuan
Copy link
Contributor

It should be fixed now via mlc-ai/mlc-llm#2187.

We recently used EMCC to include runtime code from https://github.com/mlc-ai/mlc-llm into the model WASM; as of now mainly for the grammar usages. Currently, I am prioritizing to ensure users who use the prebuilt models have a smooth experience. For instance, we introduced wasm versioning.

For compiling a customized model, as a current workaround, using the commits specified in the WASM version PRs should guarantee a working WASM (though admittedly a bit inconvenient):

In future, we will find ways to make sure that the compile-runtime flow is also smooth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants