Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Can't get going on Mac M2 Chip #2316

Closed
polajnta opened this issue May 10, 2024 · 3 comments
Closed

[Question] Can't get going on Mac M2 Chip #2316

polajnta opened this issue May 10, 2024 · 3 comments
Labels
question Question about the usage

Comments

@polajnta
Copy link

polajnta commented May 10, 2024

❓ General Questions

I have an Apple M2 Max 32GB.
I've created a conda environment (tried Python3.9 and 3.8) and can't seem to figure out how to get MLC to run without errors.

  1. I tried running a prebuilt model:
    git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/prebuilt/lib
    git clone https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC dist/prebuilt/Llama-2-7b-chat-hf-q4f16_1-MLC

Things I was unsure of from documentation:

  • these are both in my code directory, rather than in the package directory?
  • If I use prebuilt models, it asks for config files, so I grabbed them from huggingface

I call it from the demo python file using the commands:
model = "dist/prebuilt/Llama-2-7b-chat-hf-q4f16_1-MLC" model_lib = "dist/prebuilt/lib/Llama-2-7b-chat-hf/Llama-2-7b-chat-hf-q4f16_1-metal.so" engine = MLCEngine(model,model_lib=model_lib)

code % python3 test_mlc.py
[2024-05-10 10:35:07] INFO auto_device.py:88: Not found device: cuda:0
[2024-05-10 10:35:08] INFO auto_device.py:88: Not found device: rocm:0
[2024-05-10 10:35:08] INFO auto_device.py:79: Found device: metal:0
[2024-05-10 10:35:09] INFO auto_device.py:88: Not found device: vulkan:0
[2024-05-10 10:35:10] INFO auto_device.py:88: Not found device: opencl:0
[2024-05-10 10:35:10] INFO auto_device.py:35: Using device: metal:0
[2024-05-10 10:35:10] INFO chat_module.py:379: Using model folder: /Users/xxx/work/xxx/code/dist/prebuilt/Llama-2-7b-chat-hf-q4f16_1-MLC
[2024-05-10 10:35:10] INFO chat_module.py:380: Using mlc chat config: /Users/xxx/work/xxx/code/dist/prebuilt/Llama-2-7b-chat-hf-q4f16_1-MLC/mlc-chat-config.json
[2024-05-10 10:35:10] INFO chat_module.py:529: Using library model: dist/prebuilt/lib/Llama-2-7b-chat-hf/Llama-2-7b-chat-hf-q4f16_1-metal.so
[2024-05-10 10:35:10] INFO engine_base.py:124: The selected engine mode is local. We choose small max batch size and KV cache capacity to use less GPU memory.
[2024-05-10 10:35:10] INFO engine_base.py:149: If you don't have concurrent requests and only use the engine interactively, please select mode "interactive".
[2024-05-10 10:35:10] INFO engine_base.py:154: If you have high concurrent requests and want to maximize the GPU memory utilization, please select mode "server".
[10:35:10] /Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/mlc-llm/cpp/metadata/model.cc:97: Warning: Failed to parse metadata:
{"model_type": "llama", "quantization": "q4f16_1", "context_window_size": 4096, "prefill_chunk_size": 4096, "sliding_window_size": -1, "attention_sink_size": -1, "tensor_parallel_shards": 1, "params": [{"name": "model.embed_tokens.q_weight", "shape": [32000, 512], "dtype": "uint32", "preprocs": []}, {"name": "model.embed_tokens.q_scale", "shape": [32000, 128], "dtype": "float16", "preprocs": []}, {"name": "model.layers.0.self_attn.qkv_proj.q_weight", "shape": [12288, 512], "dtype": "uint32", "preprocs": []}, {"name": "model.layers.0.self_attn.qkv_proj.q_scale", "shape": [12288, 128], "dtype": "float16", "preprocs": []}, {"name":
...
"model.layers.31.post_attention_layernorm.weight", "shape": [4096], "dtype": "float16", "preprocs": []}, {"name": "model.norm.weight", "shape": [4096], "dtype": "float16", "preprocs": []}, {"name": "lm_head.q_weight", "shape": [32000, 512], "dtype": "uint32", "preprocs": []}, {"name": "lm_head.q_scale", "shape": [32000, 128], "dtype": "float16", "preprocs": []}], "memory_usage": {"_initialize_effect": 0, "decode": 34523648, "prefill": 3479311360, "softmax_with_temperature": 0}}
libc++abi: terminating due to uncaught exception of type std::exception: std::exception

  1. If I try to run it from the huggingface link with
    model = "HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC" engine = MLCEngine(model)
    I get a multiprocessing error
    Traceback (most recent call last): File "test_mlc.py", line 11, in <module> engine = MLCEngine(model) File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/serve/engine.py", line 1442, in __init__ super().__init__( File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/serve/engine_base.py", line 442, in __init__ ) = _process_model_args(models, device) File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/serve/engine_base.py", line 115, in _process_model_args model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models] File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/serve/engine_base.py", line 115, in <listcomp> model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models] File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/serve/engine_base.py", line 85, in _convert_model_info model_path, config_file_path = _get_model_path(model.model) File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/chat_module.py", line 363, in _get_model_path mlc_dir = download_mlc_weights(model) File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/site-packages/mlc_llm/support/download.py", line 153, in download_mlc_weights file_url, file_dest = future.result() File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/concurrent/futures/_base.py", line 437, in result return self.__get_result() File "/Users/xxx/miniforge3/envs/llms/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result raise self._exception concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending. /Users/xxx/miniforge3/envs/llms/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
  2. If I try the cloned version on my local machine:
    ValueError: Traceback (most recent call last):
    File "/Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 255 ValueError: Error when loading parameters from params_shard_0.bin: [10:44:23] /Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/runtime/relax_vm/ndarray_cache_support.cc:193: Check failed: this->nbytes == raw_data_buffer->length() (65536000 vs. 133) : ValueError: Encountered an corrupted parameter shard. It means it is not downloaded completely or downloading is interrupted. Please try to download again
    I appreciate some of these moves might not make sense due to my architecture and what's coming off huggingface, but I've just been trying anything to get it going.

Any tips appreciated,
Thanks you,
Tam

Environmnent:

This file may be used to create an environment using:
$ conda create --name --file
platform: osx-arm64
annotated-types=0.6.0=pypi_0
anyio=4.3.0=pypi_0
attrs=23.2.0=pypi_0
bzip2=1.0.8=h93a5062_5
ca-certificates=2024.2.2=hf0a4a13_0
certifi=2024.2.2=pypi_0
charset-normalizer=3.3.2=pypi_0
click=8.1.7=pypi_0
cloudpickle=3.0.0=pypi_0
decorator=5.1.1=pypi_0
distro=1.9.0=pypi_0
dnspython=2.6.1=pypi_0
email-validator=2.1.1=pypi_0
exceptiongroup=1.2.1=pypi_0
fastapi=0.111.0=pypi_0
fastapi-cli=0.0.3=pypi_0
filelock=3.14.0=pypi_0
fsspec=2024.3.1=pypi_0
git-lfs=3.5.1=hce30654_0
h11=0.14.0=pypi_0
httpcore=1.0.5=pypi_0
httptools=0.6.1=pypi_0
httpx=0.27.0=pypi_0
idna=3.7=pypi_0
jinja2=3.1.4=pypi_0
libffi=3.4.2=h3422bc3_5
libsqlite=3.45.3=h091b4b1_0
libzlib=1.2.13=h53f4e23_5
markdown-it-py=3.0.0=pypi_0
markupsafe=2.1.5=pypi_0
mdurl=0.1.2=pypi_0
ml-dtypes=0.2.0=pypi_0
mlc-ai-nightly=0.15.dev315=pypi_0
mlc-llm-nightly=0.1.dev1213=pypi_0
mpmath=1.3.0=pypi_0
ncurses=6.5=hb89a1cb_0
networkx=3.1=pypi_0
numpy=1.24.4=pypi_0
openai=1.28.0=pypi_0
openssl=3.3.0=h0d3ecfb_0
orjson=3.10.3=pypi_0
pip=24.0=pyhd8ed1ab_0
prompt-toolkit=3.0.43=pypi_0
psutil=5.9.8=pypi_0
pydantic=2.7.1=pypi_0
pydantic-core=2.18.2=pypi_0
pygments=2.18.0=pypi_0
python=3.8.19=h2469fbe_0_cpython
python-dotenv=1.0.1=pypi_0
python-multipart=0.0.9=pypi_0
pyyaml=6.0.1=pypi_0
readline=8.2=h92ec313_1
regex=2024.5.10=pypi_0
requests=2.31.0=pypi_0
rich=13.7.1=pypi_0
safetensors=0.4.3=pypi_0
scipy=1.10.1=pypi_0
setuptools=69.5.1=pyhd8ed1ab_0
shellingham=1.5.4=pypi_0
shortuuid=1.0.13=pypi_0
sniffio=1.3.1=pypi_0
starlette=0.37.2=pypi_0
sympy=1.12.1rc1=pypi_0
tiktoken=0.6.0=pypi_0
tk=8.6.13=h5083fa2_1
torch=2.3.0=pypi_0
tornado=6.4=pypi_0
tqdm=4.66.4=pypi_0
typer=0.12.3=pypi_0
typing-extensions=4.11.0=pypi_0
ujson=5.9.0=pypi_0
urllib3=2.2.1=pypi_0
uvicorn=0.29.0=pypi_0
uvloop=0.19.0=pypi_0
watchfiles=0.21.0=pypi_0
wcwidth=0.2.13=pypi_0
websockets=12.0=pypi_0
wheel=0.43.0=pyhd8ed1ab_1
xz=5.2.6=h57fd34a_0

@polajnta polajnta added the question Question about the usage label May 10, 2024
@tqchen
Copy link
Contributor

tqchen commented May 10, 2024

Both works on my m2

  • mlc_llm chat HF://Llama-2-7b-chat-hf-q4f16_1-MLC
  • mlc_llm serve HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC

@polajnta
Copy link
Author

Amazing! Thank you! I've check that now and both of those commands work for me. I've also been able to get the python version to run with the cached versions that were generated through those commands.

model = "/Users/xxx/.cache/mlc_llm/model_weights/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC"
model_lib = "/Users/xxx/.cache/mlc_llm/model_lib/31c3a3541d55119244b74860b99d4176.dylib"

Thanks again!

@tqchen tqchen closed this as completed May 10, 2024
@tqchen
Copy link
Contributor

tqchen commented May 10, 2024

glad it works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question about the usage
Projects
None yet
Development

No branches or pull requests

2 participants