What is Rapid-MLX and how do I install Gemma 4?

Pip install rapid-mlx. Then rapid-mlx serve gemma-4-26b. Downloads 4-bit model, spins OpenAI API on localhost:8000.

Does Gemma 4 on Apple Silicon beat Ollama?

Yes – 85 vs 75 tok/s on 26B. Wider on small models (2.4x). Full tools included.

M3 Ultra (192GB) hits 85 tok/s. 32GB Pro fine for 26B. 16GB Air crushes 4B at 168 tok/s.

Everyone figured big open models like Gemma 4 would crawl on Apple Silicon. Wrong. One pip, 85 tokens/second, tools included – Ollama's toast.

theAIcatchup Apr 07, 2026 3 min read

Gemma 4 runs at 85 tok/s on M3 Ultra via one pip install – beats Ollama decode speed. 𝕏
Built-in tool calling for 18 model families, OpenAI-compatible for all major frameworks. 𝕏
MLX stack with prompt cache makes multi-turn agents buttery smooth. 𝕏

Published by

Ship faster. Build smarter.

#Rapid-MLX #apple silicon

Get the best Developer Tools stories of the week in your inbox — no noise, no spam.

Originally reported by dev.to