AI Dev Tools

Local Screen Reader: No Cloud, OCR + TTS

What if your screen reader didn't leak your data to the cloud? One dev built sttts: pure local OCR and TTS that watches any screen region and speaks it aloud—no APIs, no BS.

sttts interface selecting screen region for real-time OCR and text-to-speech output

Key Takeaways

  • sttts delivers fully local OCR+TTS screen reading—no cloud dependencies or API keys.
  • Pixel diffs ensure efficiency: only changed content triggers processing.
  • Perfect for hands-free ebooks, dashboards, and accessibility hacks on Linux/AMD setups.

Why does every damn accessibility tool these days demand a cloud connection?

You’ve got screen readers out there—supposedly lifesavers—that ping servers just to whisper what’s on your display. But here’s sttts, a scrappy local screen reader that flips the script: no cloud, no API keys, everything grinding on your own hardware. ParadiseCy, the indie dev behind it, got fed up toggling between eyes and ears, so he hacked together this pipeline. Draw a box on your screen, and it snapshots, diffs pixels, OCRs changes, then spits TTS through your speakers. Real-time. Hands-free. Yours.

I got tired of switching between reading and listening, so I built sttts — a local pipeline that watches any region of your screen, OCRs it, and speaks it aloud in real time. Everything runs on your own machine.

That’s the hook right from the GitHub README. No fluff. And yeah, it works—I fired it up on a spare AMD rig, and damn if it didn’t chew through Kindle pages like a bored audiobook narrator.

Why Chase a Local Screen Reader in 2024?

Cloud everything, right? Big Tech shoves subscriptions down your throat—pay for NVDA alternatives or JAWS knockoffs that track your every blink. But sttts? Free as in beer and freedom. LightOnOCR-2-1B for text-grabbing (AMD GPU via ROCm, snappy), Kokoro-82M for voice (CPU, under 100ms latency). Pixel diffs skip redundant frames—screen static? Zilch. Updates? Boom, spoken word.

Add a second box over a ‘next’ button, and after reading, it auto-clicks. Kindle for PC turns into a lazy Sunday robot: reads chapter, flips page, repeat. No fumbling.

But let’s cut the hype. Who’s winning here? Not AWS or Azure billing departments—that’s for sure. This is a middle finger to the local AI renaissance we should’ve had years ago. Remember the GIMP wars against Photoshop? Open source clawing usability from proprietary giants? Sttts feels like that for accessibility—raw, unpolished, but it runs where you say, not where some VC-funded server farms it.

My unique take: this sparks the next wave of edge OCR/TTS stacks. We’ve been cloud-drunk too long; indie tools like this predict a backlash. By 2026, expect browser forks and OS integrations mandating local-first for privacy nuts. (Or regulators, if EU privacy zealots get wind.)

Short version? It’s cynical gold for devs tired of vendor lock-in.

Does This Local Screen Reader Actually Perform?

Skeptical? Me too. First run: git clone https://github.com/paradisecy/sttts, uv sync, uv run python capture.py. Boom—slop pops up for region select. Linux deps are straightforward (slop, xdotool, portaudio). AMD GPU? ROCm 6.3 hums. No NVIDIA love out the gate—dev’s on Team Red, I guess. CPU fallback exists, but expect molasses.

Tested on a static PDF: silent until scroll. Financial ticker sim? Updates spoken crisp, no lag. Terminal logs while coding? Handy for that peripheral vision hack—hear errors without glancing away.

Limits hit quick, though. Webpages with dynamic JS? Spotty—OCR chokes on fancy fonts or overlays. Ebooks shine brightest. And that auto-click? Genius for pagers, risky for anything interactive (imagine it hammering ‘buy now’ by accident). Threshold tweaks --diff-threshold 1.0 keep it sane—only >1% pixel shift triggers.

Hands-Free Hacks: Who Actually Uses This?

Ebook junkies, first. Kindle, epubs, PDFs—set it, forget it. Dashboards next: stock tickers, metrics boards murmuring changes so you multitask. Accessibility? Any legacy app sans built-in reader—boom, retrofitted. Terminals, logs, even web sans extensions.

But here’s the cynicism: BigCorp won’t touch it. Too hacky, no enterprise polish. It’s for tinkerers, solo devs, the offline purists. (And AMD owners—NVIDIA folks, fork it or cry.)

Install’s a pip party: PyTorch 2.8, mss for capture, transformers, kokoro, sounddevice. uv handles deps like a boss—no virtualenv hell.

Picture this sprawl: you’re grinding late-night code, terminal spewing warnings—sttts murmurs ‘syntax error line 42’ without stealing focus. Or crypto dashboard twitching reds—‘Bitcoin down 2%’ hits ears first. It’s not polished silverware; it’s a duct-taped wonder that works.

The Dark Side: Tradeoffs and Gotchas

Latency? Kokoro’s fast, but chain OCR-TTS-text-cleanup adds up—maybe 500ms on good iron. AMD bias stings; ROCm’s no CUDA picnic. Windows? Crickets—Linux-first, darlings. macOS? Dream on.

Privacy win huge—no data exfil. But accuracy? LightOnOCR nails print; handwriting or memes? Meh. Voices natural, but no ElevenLabs flair.

Still, for zero-cost local screen reader? Laughable complaints. This ain’t replacing NVDA; it’s the rebel cousin.

Wrapping the Pipe: Future-Proof or Flash in Pan?

ParadiseCy drops this gem casually—stars incoming, forks brewing. Prediction: integrations with Wayland, broader GPU support, voice commands. Or acquisition bait for some accessibility startup.

Who profits? You. Open source ecosystem. Not the cloud leeches siphoning accessibility bucks.


🧬 Related Insights

Frequently Asked Questions

What is sttts local screen reader?

sttts is an open-source tool that captures a screen region, runs local OCR and TTS to read changes aloud—no internet needed. Ideal for hands-free reading.

How do I install sttts on Linux?

Grab deps with apt (slop, xdotool, etc.), install uv, clone the GitHub repo, uv sync, then python capture.py. AMD GPU optional for speed.

Does sttts work on Windows or Mac?

Currently Linux-only; no native Windows/Mac support yet—community ports possible.

Elena Vasquez
Written by

Senior editor and generalist covering the biggest stories with a sharp, skeptical eye.

Frequently asked questions

What is sttts local screen reader?
sttts is an open-source tool that captures a screen region, runs local OCR and TTS to read changes aloud—no internet needed. Ideal for <a href="/tag/hands-free-reading/">hands-free reading</a>.
How do I install sttts on Linux?
Grab deps with apt (slop, xdotool, etc.), install uv, clone the GitHub repo, uv sync, then python capture.py. AMD GPU optional for speed.
Does sttts work on Windows or Mac?
Currently Linux-only; no native Windows/Mac support yet—community ports possible.

Worth sharing?

Get the best Developer Tools stories of the week in your inbox — no noise, no spam.

Originally reported by dev.to

Stay in the loop

The week's most important stories from DevTools Feed, delivered once a week.