Skip to content

Commit

Permalink
[MLC-LLM] Introducing Llama 3 running locally on Android using MLC-LLM (
Browse files Browse the repository at this point in the history
  • Loading branch information
HamidShojanazeri committed May 16, 2024
2 parents 227fd59 + 76cb603 commit 519c5a6
Show file tree
Hide file tree
Showing 5 changed files with 204 additions and 1 deletion.
2 changes: 1 addition & 1 deletion recipes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ This folder contains examples organized by topic:
[quickstart](./quickstart)|The "Hello World" of using Llama 3, start here if you are new to using Llama 3
[multilingual](./multilingual)|Scripts to add a new language to Llama
[finetuning](./finetuning)|Scripts to finetune Llama 3 on single-GPU and multi-GPU setups
[inference](./inference)|Scripts to deploy Llama 3 for inference locally and using model servers
[inference](./inference)|Scripts to deploy Llama 3 for inference [locally](./inference/local_inference/), on mobile [Android](./inference/mobile_inference/android_inference/) and using [model servers](./inference/mobile_inference/)
[use_cases](./use_cases)|Scripts showing common applications of Llama 3
[responsible_ai](./responsible_ai)|Scripts to use PurpleLlama for safeguarding model outputs
[llama_api_providers](./llama_api_providers)|Scripts to run inference on Llama via hosted endpoints
Expand Down
147 changes: 147 additions & 0 deletions recipes/inference/mobile_inference/android_inference/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# Running Llama3 8B Instruct on Android with MLC-LLM

Author: Thierry Moreau - tmoreau@octo.ai

# Overview
In this tutorial we'll learn how to deploy Llama3 8B Instruct on an Android-based phone using MLC-LLM.

Machine Learning Compilation for Large Language Models (MLC LLM) is a high-performance universal deployment solution that allows native deployment of any large language models with native APIs with compiler acceleration. The mission of this project is to enable everyone to develop, optimize and deploy AI models natively on everyone's devices with ML compilation techniques.

You can read more about MLC-LLM at the following [link](https://github.com/mlc-ai/mlc-llm).

MLC-LLM is also what powers the Llama3 inference APIs provided by [OctoAI](https://octo.ai/). You can use OctoAI for your Llama3 cloud-based inference needs by trying out the examples under the [following path](../../../llama_api_providers/OctoAI_API_examples/).

This tutorial was tested with the following setup:
* MacBook Pro 16 inch from 2021 with Apple M1 Max and 32GB of RAM running Sonoma 14.3.1
* OnePlus 12 Android Smartphone with a Snapdragon 8Gen3 SoC and 12GB or RAM, running OxygenOS 14.0

Running Llama3 on a phone will likely require a powerful chipset. We haven't tested extensively the range of chipset that will support this usecase. Feel free to update this README.md to specify what devices were successfully tested.

| Phone | Chipset | RAM | Status | Comments |
|------------|------------------|------|---------|----------|
| OnePlus 12 | Snapdragon 8Gen3 | 12GB | Success | None |
| | | | | |

This guide is heavily based on the [MLC Android Guide](https://llm.mlc.ai/docs/deploy/android.html), but several steps have been taken to streamline the instructions.

# Pre-requisites

## Python

Whether you're using conda or virtual env to manage your environment, we highly recommend starting from scratch with a clean new environment.

For instance with virtual environment:
```bash
python3 -m venv .venv
source .venv/bin/activate
```

Next you'll need to install the following packages:
```bash
python3 -m pip install -r requirements.txt
```

## Rust

[Rust](https://www.rust-lang.org/tools/install) is needed to cross-compile HuggingFace tokenizers to Android.
Make sure rustc, cargo, and rustup are available in $PATH.


## Android Studio

Install Android Studio from <!-- markdown-link-check-disable -->https://developer.android.com/studio<!-- markdown-link-check-enable --> with NDK and CMake.

To install NDK and CMake, in the Android Studio welcome page, click “Projects → SDK Manager → SDK Tools”. Set up the following environment variables:

* ANDROID_NDK so that $ANDROID_NDK/build/cmake/android.toolchain.cmake is available.
* TVM_NDK_CC that points to NDK's clang compiler.

For instance, the paths will look like the following on OSX for user `moreau`:
```bash
# Android + TVM setup
export ANDROID_NDK="/Users/moreau/Library/Android/sdk/ndk/26.1.10909125"
export TVM_NDK_CC="$ANDROID_NDK/toolchains/llvm/prebuilt/darwin-x86_64/bin/aarch64-linux-android24-clang"
```

This tutorial was tested successfully on Android Studio Hedgehog | 2023.1.1 Patch 1.

## JDK

JDK, such as OpenJDK >= 17, to compile Java bindings of TVM Unity runtime.

We strongly recommend setting the JAVA_HOME to the JDK bundled with Android Studio. Using Android Studio’s JBR bundle as recommended (<!-- markdown-link-check-disable -->https://developer.android.com/build/jdks<!-- markdown-link-check-enable -->) will reduce the chances of potential errors in JNI compilation.

For instance on macOS, you'll need to point JAVA_HOME to the following.

```bash
export JAVA_HOME=/Applications/Android\ Studio.app/Contents/jbr/Contents/Home
```

To make sure the java binary can be found do an `ls $JAVA_HOME/bin/java`

## MLC-LLM

Let's clone mlc-llm from its repo in the directory of your choice:

```bash
cd /path/to/where/to/clone/repo
git clone https://github.com/mlc-ai/mlc-llm --recursive
export MLC_LLM_HOME=/path/to/mlc-llm
```

At the time of writing this README, we tested `mlc-llm` at the following sha: `21feb7010db02e0c2149489f5972d6a8a796b5a0`.

## Phone Setup

On your phone, enable debugging on your phone in your phone’s developer settings. Each phone manufacturer will have its own approach to enabling debug mode, so a simple Google search should equip you with the steps to do that on your phone.

In addition, make sure to change your USB configuration from "Charging" to "MTP (Media Transfer Protocol)". This will allow us to connect to the device serially.

Connect your phone to your development machine. On OSX, you'll be prompted on the dev machine whether you want to allow the accessory to connect. Hit "Allow".

# Build Steps

## Building the Android Package with MLC

First edit the file under `android/MLCChat/mlc-package-config.json` and with the [mlc-package-config.json](./mlc-package-config.json) in llama-recipes.

To understand what these JSON fields mean you can refer to this [documentation](https://llm.mlc.ai/docs/deploy/android.html#step-2-build-runtime-and-model-libraries).


From the `mlc-llm` project root directory:

```bash
cd $MLC_LLM_HOME
cd android/MLCChat
python3 -m mlc_llm package --package-config mlc-package-config.json --output dist
```

The command above will take a few minutes to run as it runs through the following steps:

* Compile the Llama 3 8B instruct specified in the `mlc-package-config.json` into a binary model library.
* Build the `mlc-llm` runtime and tokenizer. In addition to the model itself, a lightweight runtime and tokenizer are required to actually run the LLM.

## Building and Running MLC Chat in Android Studio

Now let's launch Android Studio.

* On the "Welcome to Android Studio" page, hit "Open", and navigate to `$MLC_LLM_HOME/android/MLCChat`, then hit "Open"
* A window will pop up asking whether to "Trust and Open project 'MLCChat'" - hit "Trust Project"
* The project will now launch
* Under File -> Project Structure... -> Project change the Gradle Version (second drop down from the top) to 8.5

Connect your phone to your development machine - assuming you've followed the setup steps in the pre-requisite section, you should be able to see the device.

Next you'll need to:

* Hit Build -> Make Project.
* Hit Run -> Run 'app'

The MLCChat app will launch on your phone, now access your phone:

* Under Model List you'll see the `Llama-3-8B-Instruct` LLM listed.
* The model's not quite ready to launch yet, because the weights need to be downloaded over Wifi first. Hit the Download button on the right to the model name to download the weights from HuggingFace.

Note that you can change the build settings to bundle the weights with the MLCChat app so you don't have to download the weights over wifi. To do so you can follow the instructions [here](https://llm.mlc.ai/docs/deploy/android.html#bundle-model-weights).

Once the model weights are downloaded you can now interact with Llama 3 locally on your Android phone!
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"device": "android",
"model_list": [
{
"model": "HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC",
"estimated_vram_bytes": 4348727787,
"model_id": "Llama-3-8B-Instruct",
"overrides": {
"context_window_size": 768,
"prefill_chunk_size": 256
}
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
--pre
--find-links https://mlc.ai/wheels
mlc-llm-nightly
mlc-ai-nightly
attrs
decorator
numpy
psutil
pydantic
requests
scipy
setuptools
torch
tqdm
28 changes: 28 additions & 0 deletions scripts/spellcheck_conf/wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1314,6 +1314,34 @@ AgentExecutor
LangGraph
langgraph
vectorstore
CMake
Chipset
JBR
JNI
MLCChat
MTP
MacBook
Moreau
NDK
NDK's
OSX
OnePlus
OxygenOS
SoC
Sonoma
TVM
Thierry
Wifi
chipset
feb
moreau
octo
rustc
rustup
sha
tmoreau
toolchain
wifi
AgentFinish
ReAct
customizable
Expand Down

0 comments on commit 519c5a6

Please sign in to comment.