🤖 AI Dev Tools

Gemma 4 Hits 85 Tokens/Second on Your Mac – Pip Install Magic

Everyone figured big open models like Gemma 4 would crawl on Apple Silicon. Wrong. One pip, 85 tokens/second, tools included – Ollama's toast.

Benchmark chart showing Gemma 4 at 85 tokens per second on M3 Ultra Mac

⚡ Key Takeaways

  • Gemma 4 runs at 85 tok/s on M3 Ultra via one pip install – beats Ollama decode speed. 𝕏
  • Built-in tool calling for 18 model families, OpenAI-compatible for all major frameworks. 𝕏
  • MLX stack with prompt cache makes multi-turn agents buttery smooth. 𝕏
Published by

theAIcatchup

Ship faster. Build smarter.

Worth sharing?

Get the best Developer Tools stories of the week in your inbox — no noise, no spam.

Originally reported by dev.to

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.