Skip to content

Latest commit

 

History

History
36 lines (25 loc) · 1.36 KB

README.md

File metadata and controls

36 lines (25 loc) · 1.36 KB

MLX port of CLIP

This is an example to convert CoreNet's CLIP model implementation to MLX's CLIP example with some customized modification. MLX is a machine learning framework that provides native Apple Silicon hardware support.

Conversion

To convert an example CoreNet's CLIP model to the example MLX CLIP using the files in this directory:

cd mlx_examples/clip/

# Install required dependencies
# We assume that the main requirements.txt is already installed.
pip install -r requirements.txt

# Convert the model
python main_clip_to_mlx.py \
    --common.config-file "../../projects/range_augment/clip/clip_vit_base.yaml" \
    --model.multi-modal-image-text.pretrained https://docs-assets.developer.apple.com/ml-research/models/cvnets-v2/examples/range_augment/clip/clip_vit_base_16.pt \
    --common.results-loc results/mlx_model/

# Try example inference
python clip.py

Benchmarking results

Comparing to PyTorch, given the input as ["a photo of cat", "a photo of dog"] prompt and the assets/{cat,dog}.jpeg images. The results are the following on a M2 Ultra:

Model PyTorch time 100iters (s) MLX time 100iters (s) Speedup (%)
FP16 Base variant 2.7322 1.0743 60.68%
FP16 Huge variant 4.9098 4.3189 12.04%