🤖 AI Dev Tools

Gemma 4 at 21 tok/s on Ryzen Mini PC: Vulkan's Messy Win

Forget cloud LLMs. A $500 Ryzen mini PC cranks Gemma 4 at 21 tokens per second—locally. But it's a Vulkan-fueled headache that exposes local AI's dirty secrets.

Minisforum UM760 Slim Ryzen mini PC running llama.cpp with Gemma 4 model inference

⚡ Key Takeaways

  • Ryzen mini PC with 96GB RAM hits 21 tok/s on Gemma 4 via llama.cpp Vulkan—no cloud needed. 𝕏
  • Setup's messy: BIOS tweaks, compiles, OOM fights. Not for casuals. 𝕏
  • Unique edge: Local APIs mimic OpenAI, perfect for VS Code/Copilot alternatives. 𝕏
Published by

theAIcatchup

Ship faster. Build smarter.

Worth sharing?

Get the best Developer Tools stories of the week in your inbox — no noise, no spam.

Originally reported by dev.to

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.