You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have an Apple M2 Max 32GB.
I've created a conda environment (tried Python3.9 and 3.8) and can't seem to figure out how to get MLC to run without errors.
these are both in my code directory, rather than in the package directory?
If I use prebuilt models, it asks for config files, so I grabbed them from huggingface
I call it from the demo python file using the commands: model = "dist/prebuilt/Llama-2-7b-chat-hf-q4f16_1-MLC" model_lib = "dist/prebuilt/lib/Llama-2-7b-chat-hf/Llama-2-7b-chat-hf-q4f16_1-metal.so" engine = MLCEngine(model,model_lib=model_lib)
code % python3 test_mlc.py
[2024-05-10 10:35:07] INFO auto_device.py:88: Not found device: cuda:0
[2024-05-10 10:35:08] INFO auto_device.py:88: Not found device: rocm:0
[2024-05-10 10:35:08] INFO auto_device.py:79: Found device: metal:0
[2024-05-10 10:35:09] INFO auto_device.py:88: Not found device: vulkan:0
[2024-05-10 10:35:10] INFO auto_device.py:88: Not found device: opencl:0
[2024-05-10 10:35:10] INFO auto_device.py:35: Using device: metal:0
[2024-05-10 10:35:10] INFO chat_module.py:379: Using model folder: /Users/xxx/work/xxx/code/dist/prebuilt/Llama-2-7b-chat-hf-q4f16_1-MLC
[2024-05-10 10:35:10] INFO chat_module.py:380: Using mlc chat config: /Users/xxx/work/xxx/code/dist/prebuilt/Llama-2-7b-chat-hf-q4f16_1-MLC/mlc-chat-config.json
[2024-05-10 10:35:10] INFO chat_module.py:529: Using library model: dist/prebuilt/lib/Llama-2-7b-chat-hf/Llama-2-7b-chat-hf-q4f16_1-metal.so
[2024-05-10 10:35:10] INFO engine_base.py:124: The selected engine mode is local. We choose small max batch size and KV cache capacity to use less GPU memory.
[2024-05-10 10:35:10] INFO engine_base.py:149: If you don't have concurrent requests and only use the engine interactively, please select mode "interactive".
[2024-05-10 10:35:10] INFO engine_base.py:154: If you have high concurrent requests and want to maximize the GPU memory utilization, please select mode "server".
[10:35:10] /Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/mlc-llm/cpp/metadata/model.cc:97: Warning: Failed to parse metadata:
{"model_type": "llama", "quantization": "q4f16_1", "context_window_size": 4096, "prefill_chunk_size": 4096, "sliding_window_size": -1, "attention_sink_size": -1, "tensor_parallel_shards": 1, "params": [{"name": "model.embed_tokens.q_weight", "shape": [32000, 512], "dtype": "uint32", "preprocs": []}, {"name": "model.embed_tokens.q_scale", "shape": [32000, 128], "dtype": "float16", "preprocs": []}, {"name": "model.layers.0.self_attn.qkv_proj.q_weight", "shape": [12288, 512], "dtype": "uint32", "preprocs": []}, {"name": "model.layers.0.self_attn.qkv_proj.q_scale", "shape": [12288, 128], "dtype": "float16", "preprocs": []}, {"name":
...
"model.layers.31.post_attention_layernorm.weight", "shape": [4096], "dtype": "float16", "preprocs": []}, {"name": "model.norm.weight", "shape": [4096], "dtype": "float16", "preprocs": []}, {"name": "lm_head.q_weight", "shape": [32000, 512], "dtype": "uint32", "preprocs": []}, {"name": "lm_head.q_scale", "shape": [32000, 128], "dtype": "float16", "preprocs": []}], "memory_usage": {"_initialize_effect": 0, "decode": 34523648, "prefill": 3479311360, "softmax_with_temperature": 0}}
libc++abi: terminating due to uncaught exception of type std::exception: std::exception
If I try to run it from the huggingface link with model = "HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC" engine = MLCEngine(model)
I get a multiprocessing error Traceback (most recent call last): File "test_mlc.py", line 11, in <module> engine = MLCEngine(model) File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/serve/engine.py", line 1442, in __init__ super().__init__( File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/serve/engine_base.py", line 442, in __init__ ) = _process_model_args(models, device) File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/serve/engine_base.py", line 115, in _process_model_args model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models] File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/serve/engine_base.py", line 115, in <listcomp> model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models] File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/serve/engine_base.py", line 85, in _convert_model_info model_path, config_file_path = _get_model_path(model.model) File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/chat_module.py", line 363, in _get_model_path mlc_dir = download_mlc_weights(model) File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/support/download.py", line 153, in download_mlc_weights file_url, file_dest = future.result() File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/concurrent/futures/_base.py", line 437, in result return self.__get_result() File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result raise self._exception concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending. /Users/xxx/miniforge3/envs/llms/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
If I try the cloned version on my local machine:
ValueError: Traceback (most recent call last): File "/Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 255 ValueError: Error when loading parameters from params_shard_0.bin: [10:44:23] /Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/runtime/relax_vm/ndarray_cache_support.cc:193: Check failed: this->nbytes == raw_data_buffer->length() (65536000 vs. 133) : ValueError: Encountered an corrupted parameter shard. It means it is not downloaded completely or downloading is interrupted. Please try to download again
I appreciate some of these moves might not make sense due to my architecture and what's coming off huggingface, but I've just been trying anything to get it going.
Amazing! Thank you! I've check that now and both of those commands work for me. I've also been able to get the python version to run with the cached versions that were generated through those commands.
model = "/Users/xxx/.cache/mlc_llm/model_weights/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC"
model_lib = "/Users/xxx/.cache/mlc_llm/model_lib/31c3a3541d55119244b74860b99d4176.dylib"
❓ General Questions
I have an Apple M2 Max 32GB.
I've created a conda environment (tried Python3.9 and 3.8) and can't seem to figure out how to get MLC to run without errors.
git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/prebuilt/lib
git clone https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC dist/prebuilt/Llama-2-7b-chat-hf-q4f16_1-MLC
Things I was unsure of from documentation:
I call it from the demo python file using the commands:
model = "dist/prebuilt/Llama-2-7b-chat-hf-q4f16_1-MLC" model_lib = "dist/prebuilt/lib/Llama-2-7b-chat-hf/Llama-2-7b-chat-hf-q4f16_1-metal.so" engine = MLCEngine(model,model_lib=model_lib)
code % python3 test_mlc.py
[2024-05-10 10:35:07] INFO auto_device.py:88: Not found device: cuda:0
[2024-05-10 10:35:08] INFO auto_device.py:88: Not found device: rocm:0
[2024-05-10 10:35:08] INFO auto_device.py:79: Found device: metal:0
[2024-05-10 10:35:09] INFO auto_device.py:88: Not found device: vulkan:0
[2024-05-10 10:35:10] INFO auto_device.py:88: Not found device: opencl:0
[2024-05-10 10:35:10] INFO auto_device.py:35: Using device: metal:0
[2024-05-10 10:35:10] INFO chat_module.py:379: Using model folder: /Users/xxx/work/xxx/code/dist/prebuilt/Llama-2-7b-chat-hf-q4f16_1-MLC
[2024-05-10 10:35:10] INFO chat_module.py:380: Using mlc chat config: /Users/xxx/work/xxx/code/dist/prebuilt/Llama-2-7b-chat-hf-q4f16_1-MLC/mlc-chat-config.json
[2024-05-10 10:35:10] INFO chat_module.py:529: Using library model: dist/prebuilt/lib/Llama-2-7b-chat-hf/Llama-2-7b-chat-hf-q4f16_1-metal.so
[2024-05-10 10:35:10] INFO engine_base.py:124: The selected engine mode is local. We choose small max batch size and KV cache capacity to use less GPU memory.
[2024-05-10 10:35:10] INFO engine_base.py:149: If you don't have concurrent requests and only use the engine interactively, please select mode "interactive".
[2024-05-10 10:35:10] INFO engine_base.py:154: If you have high concurrent requests and want to maximize the GPU memory utilization, please select mode "server".
[10:35:10] /Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/mlc-llm/cpp/metadata/model.cc:97: Warning: Failed to parse metadata:
{"model_type": "llama", "quantization": "q4f16_1", "context_window_size": 4096, "prefill_chunk_size": 4096, "sliding_window_size": -1, "attention_sink_size": -1, "tensor_parallel_shards": 1, "params": [{"name": "model.embed_tokens.q_weight", "shape": [32000, 512], "dtype": "uint32", "preprocs": []}, {"name": "model.embed_tokens.q_scale", "shape": [32000, 128], "dtype": "float16", "preprocs": []}, {"name": "model.layers.0.self_attn.qkv_proj.q_weight", "shape": [12288, 512], "dtype": "uint32", "preprocs": []}, {"name": "model.layers.0.self_attn.qkv_proj.q_scale", "shape": [12288, 128], "dtype": "float16", "preprocs": []}, {"name":
...
"model.layers.31.post_attention_layernorm.weight", "shape": [4096], "dtype": "float16", "preprocs": []}, {"name": "model.norm.weight", "shape": [4096], "dtype": "float16", "preprocs": []}, {"name": "lm_head.q_weight", "shape": [32000, 512], "dtype": "uint32", "preprocs": []}, {"name": "lm_head.q_scale", "shape": [32000, 128], "dtype": "float16", "preprocs": []}], "memory_usage": {"_initialize_effect": 0, "decode": 34523648, "prefill": 3479311360, "softmax_with_temperature": 0}}
libc++abi: terminating due to uncaught exception of type std::exception: std::exception
model = "HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC" engine = MLCEngine(model)
I get a multiprocessing error
Traceback (most recent call last): File "test_mlc.py", line 11, in <module> engine = MLCEngine(model) File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/serve/engine.py", line 1442, in __init__ super().__init__( File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/serve/engine_base.py", line 442, in __init__ ) = _process_model_args(models, device) File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/serve/engine_base.py", line 115, in _process_model_args model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models] File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/serve/engine_base.py", line 115, in <listcomp> model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models] File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/serve/engine_base.py", line 85, in _convert_model_info model_path, config_file_path = _get_model_path(model.model) File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/chat_module.py", line 363, in _get_model_path mlc_dir = download_mlc_weights(model) File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/support/download.py", line 153, in download_mlc_weights file_url, file_dest = future.result() File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/concurrent/futures/_base.py", line 437, in result return self.__get_result() File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result raise self._exception concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending. /Users/xxx/miniforge3/envs/llms/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
ValueError: Traceback (most recent call last):
File "/Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 255 ValueError: Error when loading parameters from params_shard_0.bin: [10:44:23] /Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/runtime/relax_vm/ndarray_cache_support.cc:193: Check failed: this->nbytes == raw_data_buffer->length() (65536000 vs. 133) : ValueError: Encountered an corrupted parameter shard. It means it is not downloaded completely or downloading is interrupted. Please try to download again
I appreciate some of these moves might not make sense due to my architecture and what's coming off huggingface, but I've just been trying anything to get it going.
Any tips appreciated,
Thanks you,
Tam
Environmnent:
This file may be used to create an environment using:
$ conda create --name --file
platform: osx-arm64
annotated-types=0.6.0=pypi_0
anyio=4.3.0=pypi_0
attrs=23.2.0=pypi_0
bzip2=1.0.8=h93a5062_5
ca-certificates=2024.2.2=hf0a4a13_0
certifi=2024.2.2=pypi_0
charset-normalizer=3.3.2=pypi_0
click=8.1.7=pypi_0
cloudpickle=3.0.0=pypi_0
decorator=5.1.1=pypi_0
distro=1.9.0=pypi_0
dnspython=2.6.1=pypi_0
email-validator=2.1.1=pypi_0
exceptiongroup=1.2.1=pypi_0
fastapi=0.111.0=pypi_0
fastapi-cli=0.0.3=pypi_0
filelock=3.14.0=pypi_0
fsspec=2024.3.1=pypi_0
git-lfs=3.5.1=hce30654_0
h11=0.14.0=pypi_0
httpcore=1.0.5=pypi_0
httptools=0.6.1=pypi_0
httpx=0.27.0=pypi_0
idna=3.7=pypi_0
jinja2=3.1.4=pypi_0
libffi=3.4.2=h3422bc3_5
libsqlite=3.45.3=h091b4b1_0
libzlib=1.2.13=h53f4e23_5
markdown-it-py=3.0.0=pypi_0
markupsafe=2.1.5=pypi_0
mdurl=0.1.2=pypi_0
ml-dtypes=0.2.0=pypi_0
mlc-ai-nightly=0.15.dev315=pypi_0
mlc-llm-nightly=0.1.dev1213=pypi_0
mpmath=1.3.0=pypi_0
ncurses=6.5=hb89a1cb_0
networkx=3.1=pypi_0
numpy=1.24.4=pypi_0
openai=1.28.0=pypi_0
openssl=3.3.0=h0d3ecfb_0
orjson=3.10.3=pypi_0
pip=24.0=pyhd8ed1ab_0
prompt-toolkit=3.0.43=pypi_0
psutil=5.9.8=pypi_0
pydantic=2.7.1=pypi_0
pydantic-core=2.18.2=pypi_0
pygments=2.18.0=pypi_0
python=3.8.19=h2469fbe_0_cpython
python-dotenv=1.0.1=pypi_0
python-multipart=0.0.9=pypi_0
pyyaml=6.0.1=pypi_0
readline=8.2=h92ec313_1
regex=2024.5.10=pypi_0
requests=2.31.0=pypi_0
rich=13.7.1=pypi_0
safetensors=0.4.3=pypi_0
scipy=1.10.1=pypi_0
setuptools=69.5.1=pyhd8ed1ab_0
shellingham=1.5.4=pypi_0
shortuuid=1.0.13=pypi_0
sniffio=1.3.1=pypi_0
starlette=0.37.2=pypi_0
sympy=1.12.1rc1=pypi_0
tiktoken=0.6.0=pypi_0
tk=8.6.13=h5083fa2_1
torch=2.3.0=pypi_0
tornado=6.4=pypi_0
tqdm=4.66.4=pypi_0
typer=0.12.3=pypi_0
typing-extensions=4.11.0=pypi_0
ujson=5.9.0=pypi_0
urllib3=2.2.1=pypi_0
uvicorn=0.29.0=pypi_0
uvloop=0.19.0=pypi_0
watchfiles=0.21.0=pypi_0
wcwidth=0.2.13=pypi_0
websockets=12.0=pypi_0
wheel=0.43.0=pyhd8ed1ab_1
xz=5.2.6=h57fd34a_0
The text was updated successfully, but these errors were encountered: