🤖 AI Dev Tools
Gemma 4 Hits 85 Tokens/Second on Your Mac – Pip Install Magic
Everyone figured big open models like Gemma 4 would crawl on Apple Silicon. Wrong. One pip, 85 tokens/second, tools included – Ollama's toast.
theAIcatchup
Apr 07, 2026
3 min read
⚡ Key Takeaways
-
Gemma 4 runs at 85 tok/s on M3 Ultra via one pip install – beats Ollama decode speed.
𝕏
-
Built-in tool calling for 18 model families, OpenAI-compatible for all major frameworks.
𝕏
-
MLX stack with prompt cache makes multi-turn agents buttery smooth.
𝕏
The 60-Second TL;DR
- Gemma 4 runs at 85 tok/s on M3 Ultra via one pip install – beats Ollama decode speed.
- Built-in tool calling for 18 model families, OpenAI-compatible for all major frameworks.
- MLX stack with prompt cache makes multi-turn agents buttery smooth.
Published by
theAIcatchup
Ship faster. Build smarter.
Worth sharing?
Get the best Developer Tools stories of the week in your inbox — no noise, no spam.