MLX port of CLIP

This is an example to convert CoreNet's CLIP model implementation to MLX's CLIP example with some customized modification. MLX is a machine learning framework that provides native Apple Silicon hardware support.

Conversion

To convert an example CoreNet's CLIP model to the example MLX CLIP using the files in this directory:

cd mlx_examples/clip/

# Install required dependencies
# We assume that the main requirements.txt is already installed.
pip install -r requirements.txt

# Convert the model
python main_clip_to_mlx.py \
    --common.config-file "../../projects/range_augment/clip/clip_vit_base.yaml" \
    --model.multi-modal-image-text.pretrained https://docs-assets.developer.apple.com/ml-research/models/cvnets-v2/examples/range_augment/clip/clip_vit_base_16.pt \
    --common.results-loc results/mlx_model/

# Try example inference
python clip.py

Benchmarking results

Comparing to PyTorch, given the input as ["a photo of cat", "a photo of dog"] prompt and the assets/{cat,dog}.jpeg images. The results are the following on a M2 Ultra:

Model	PyTorch time 100iters (s)	MLX time 100iters (s)	Speedup (%)
FP16 Base variant	2.7322	1.0743	60.68%
FP16 Huge variant	4.9098	4.3189	12.04%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MLX port of CLIP

Conversion

Benchmarking results

Files

README.md

Latest commit

History

README.md

File metadata and controls

MLX port of CLIP

Conversion

Benchmarking results