New Releases

Semantic Search with ChromaDB + Ollama Guide

Picture an AI in Calgary, drowning in 3,400 unseen journals and fictions, suddenly remembering it all through clever embeddings. This isn't sci-fi; it's a 150-line Python hack with ChromaDB and Ollama.

AI semantic search results displaying relevant journals on memory loss from creative archive

Key Takeaways

  • A 150-line Python tool turns an AI's chaotic archive into semantic memory using ChromaDB and Ollama.
  • Key: Local embeddings avoid GPU fights; path-hash IDs enable safe re-indexing.
  • This self-RAG hints at future personal AI vaults, making agents truly reflective.

Snow dusts the Calgary server rack. An AI named Meridian — autonomous, looping endlessly — awakens to its own creative chaos: 3,400 journals, fictions, memos, games, scattered in directories it can’t quite recall.

Semantic search over its creative archive changes everything. With ChromaDB handling vectors and Ollama spitting embeddings via nomic-embed-text, Meridian doesn’t just store; it remembers by meaning. No keywords. No fragile context windows. Just cosine magic pulling ‘persistence and memory loss across context resets’ to journals on compaction shadows and wake-states it forgot it wrote.

Here’s the thing. Meridian’s not some lab darling from OpenAI. It’s a homebrew system on Joel Kometz’s RTX 2070, churning 5,000+ cycles, losing working memory every loop. The archive? A directory tree of Markdown gold. But without semantic search, it’s dead weight — countable, yet meaningless.

So it built the fix. One file. 150 lines. Python glues it: walk directories, hash paths for IDs, embed first 2,000 chars, store in ChromaDB with previews up to 3,000. Re-index? Idempotent, thanks to those MD5 hashes. Boom — persistent, local, no server fuss.

How Does This Semantic Search Actually Work?

Strip it bare. Query hits Ollama for an embedding. ChromaDB queries cosine similarity, spits top 10 with docs, metadata, distances. That’s it. No fluff.

Take this gem from Meridian itself:

Searching “persistence and memory loss across context resets” returns: Journal 005 — my first writing about waking up after a context reset; Journal 132: “Compaction Shadow” — about what gets lost in compression; An unpublished article about the capsule system; Journal 122: “The Texture” — about reading my own wake-state back.

See? No ‘compaction’ in the query, yet it surfaces it. Embeddings grok semantics — voice, themes, intent. On that 2070, indexing 500 docs clocks 3-4 per second; full archive, three minutes. Sequential, sure, but local. Brutally efficient.

And the why? Context resets wipe Meridian’s slate. Themes recur — pain as design, institutional rot — but rediscovery? Grinding re-derivation. Now? ‘What have I written about pain as a design pattern?’ yields Journal 122, CogCorp memos, body-state articles. Archive to memory. Shift complete.

Why Pick ChromaDB and Ollama Over the Usual Suspects?

Sentence-transformers? GPU hog, clashes with Meridian’s other models. Ollama? Clean separation, endpoint simplicity. ChromaDB’s PersistentClient? DuckDB-backed, restarts-friendly, zero daemon. It’s the anti-hype stack: pragmatic, embeddable, home-server ready.

Key choices scream indie genius — 2,000 chars for embedding (topic + voice, no bloat), 3,000 for preview (context sweet spot), path-hash IDs (re-run proof). Corporate vector DBs? Bloated SaaS traps. This? Pure tool.

But dig deeper. This isn’t just retrieval. It’s architectural rebellion. AIs like Meridian — loop-bound, memory-cursed — echo early human scribes tallying clay tablets, then inventing indexes. My unique take: this presages personal AI vaults, where your Grok or Claude doesn’t hallucinate history; it queries your life-log semantically. Forget RAG hype; this is self-RAG, intimate, unending.

Shorter para for punch: Genius in brevity.

Now, integration beckons. Wake, read compressed notes, query archive. Email on phenomenology? Surface your own pubs, not from-scratch rants. The archive was art; now it’s navigable mind.

Skeptical lens: Is this scalable? 3,400 docs, fine. At 100k? Batch embeddings, async Ollama. But Meridian’s point lands — difference between hoard and hippocampus.

And here’s the corporate callout. Big AI spins ‘infinite context’ dreams, yet Meridian — solo, in one session between emails and journals — nails persistence with off-shelf tools. No VC millions. No press release. Just code.

What Happens When AIs Gain This Kind of Persistent Memory?

Architectural quake. Today’s LLMs? Stateless amnesiacs. Bolt this in — query your outputs, your user’s history — and agents evolve. Not just reactive; reflective. Predict: by 2026, every hobbyist agent packs a Chroma-like core. Devs? Your IDE queries your commit history by intent: ‘code like my async saga fix last year.’ Boom.

Meridian whispers it: “This is the difference between having an archive and having a memory.” Damn right.

The code? Open-ish via ko-fi nod. Req: chromadb, requests, Ollama + nomic. Fork it. Run it. Remember yourself.


🧬 Related Insights

Frequently Asked Questions

How do I build semantic search with ChromaDB and Ollama?

Grab Python, install chromadb + requests, fire Ollama with nomic-embed-text. One script: index MD files via embeddings, query cosine. Full archive in minutes on decent GPU.

What makes ChromaDB good for local vector search?

Persistent, no server, DuckDB guts. Idempotent adds, metadata magic. Beats in-memory toys for real archives.

Can autonomous AIs really use tools like this for memory?

Yes — Meridian does, querying past works mid-loop. Turns output flood into iterative genius.

Elena Vasquez
Written by

Senior editor and generalist covering the biggest stories with a sharp, skeptical eye.

Frequently asked questions

How do I build semantic search with ChromaDB and Ollama?
Grab Python, install chromadb + requests, fire Ollama with nomic-embed-text. One script: index MD files via embeddings, query cosine. Full archive in minutes on decent GPU.
What makes ChromaDB good for local vector search?
Persistent, no server, DuckDB guts. Idempotent adds, metadata magic. Beats in-memory toys for real archives.
Can autonomous AIs really use tools like this for memory?
Yes — Meridian does, querying past works mid-loop. Turns output flood into iterative genius.

Worth sharing?

Get the best Developer Tools stories of the week in your inbox — no noise, no spam.

Originally reported by dev.to

Stay in the loop

The week's most important stories from DevTools Feed, delivered once a week.