DevOps & Platform Eng

Databases on Kubernetes: StatefulSets & Beyond

Kubernetes was built for stateless apps. Databases, not so much. Here's how to wrestle stateful workloads into K8s without losing your data.

{# Always render the hero — falls back to the theme OG image when article.image_url is empty (e.g. after the audit's repair_hero_images cleared a blocked Unsplash hot-link). Without this fallback, evergreens with cleared image_url render no hero at all → the JSON-LD ImageObject loses its visual counterpart and LCP attrs go missing. #}
Diagram showing a Kubernetes cluster with database pods connected via StatefulSets, illustrating primary and replica roles and replication flow.

Key Takeaways

  • Kubernetes was initially designed for stateless workloads; running stateful databases requires careful consideration and specific K8s features like StatefulSets.
  • The three main options for running databases on Kubernetes are cloud-managed services, vendor-specific managed services, and self-managed solutions (ideally using Kubernetes Operators).
  • StatefulSets provide stable network identities, persistent storage, and ordered deployment for pods, crucial for database reliability, but don't solve all stateful challenges on their own.

Everyone thought Kubernetes was for ephemeral, stateless workloads. You know, those apps where you can spin up a new instance, destroy an old one, and not blink. It’s perfect for a web server. A database? Not so much.

Databases are the poster children for stateful systems. They hoard data on disk, they have this whole primary/replica song and dance, and if you poke them with a stick the wrong way, you risk data corruption or the dreaded split-brain scenario. Classic.

Now, the K8s community, bless their hearts, eventually wised up. They introduced StatefulSets — which have been pretty stable since version 1.9. But even with these fancy new toys, running a database in production still requires more than a passing glance. You need actual knowledge. And planning. Lots of planning.

So, what are your options when your app needs a database humming along in Kubernetes? Three main paths, as it turns out.

The ‘Cloud Provider Did It For Me’ Route

This is the easy button. Managed services from AWS, GCP, Azure. They handle backups, high availability. Sounds great, right? You can get started lickety-split. The downside? You’re not the DBA. Slow queries are your problem, not theirs. And forget about vendor lock-in — it’s baked in. Plus, the cost scales with usage, which can get ugly. Oh, and if you need air-gapped environments or have strict data sovereignty rules? Tough luck.

The ‘Vendor-Specific’ Approach

Then there are managed services from database vendors themselves, like Crunchy Data for PostgreSQL or Percona for MySQL. These are optimized for their specific database engine. They boast deep expertise. Sounds good. But it’s still vendor lock-in, just a different flavor. And you’re usually limited to just one database engine.

The ‘DIY Warrior’ Path

This is where you roll up your sleeves. Full control. No vendor lock-in. Works anywhere – on-prem, any cloud. Most flexible. The catch? You need deep Kubernetes and database knowledge. All operational tasks? They fall squarely on your shoulders. And if you mess it up, the risk is high. It’s also incredibly time-consuming to maintain.

But here’s the kicker: the DIY route can be made dramatically safer and simpler. How? Kubernetes Operators. More on that later.

Why StatefulSets Aren’t Just Fancy Deployments

A regular Kubernetes Deployment is like a box of identical LEGO bricks. Pod names are random, interchangeable. Destroy one, spin up another – no biggie. But databases? They need stability. They need identity.

A StatefulSet changes the game. It gives each pod a persistent, predictable identity. Think myapp-0, myapp-1, myapp-2. These names stick. If myapp-1 crashes and gets rescheduled, it comes back as myapp-1, not some random string.

This stable identity brings crucial features:

  1. Ordered Startup: Pods boot up one by one, in order. myapp-1 won’t start until myapp-0 is alive and well. Essential for syncing.
  2. Stable Network Identity: Each pod gets a predictable DNS name. myapp-0.myapp-svc.default.svc.cluster.local. Replicas know exactly where to find the primary.
  3. Stable Storage: Each pod gets its own dedicated disk (via PersistentVolumeClaim). If myapp-1 moves nodes, it reconnects to its own disk. No data loss.

A simplified StatefulSet looks something like this:

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: myapp
spec:
serviceName: "myapp-svc"
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: mysql
image: mysql:8.0
ports:
- containerPort: 3306
volumeClaimTemplates: # ← Each pod gets its own PVC
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi

Replication: The Heartbeat of High Availability

In a typical three-replica database StatefulSet, there’s a strict protocol:

Rule #1: All writes go to the primary. Only myapp-0 (the primary) accepts writes. You connect using its stable DNS name.

Rule #2: Reads can be distributed. Replicas (myapp-1, myapp-2) handle read-only traffic. This boosts read throughput. You connect to them directly or use the headless service for load balancing.

This setup ensures replication consistency, but it’s not without its perils.

Avoiding the Inconsistency Abyss

What happens when things go sideways? Replication lag is the enemy. If a replica isn’t synced, it might serve stale data. Or worse, if the primary goes down and a replica isn’t fully caught up, you might elect a new primary that’s missing recent writes. Data loss. Split-brain. Nightmares.

This is where careful configuration, monitoring, and a deep understanding of your database’s replication mechanisms become non-negotiable. It’s not just about setting up StatefulSets; it’s about mastering the database’s internal workings within that Kubernetes context.

Self-Managed vs. Kubernetes Operators: The Ultimate Showdown

Remember the DIY route? It can be made manageable with a Kubernetes Operator. An Operator is essentially a custom controller that codifies operational knowledge for a specific application—in this case, a database. It automates tasks like:

  • Deploying the database.
  • Handling upgrades.
  • Performing backups and restores.
  • Managing replication and failover.

For self-managed databases, you’re building all that logic yourself, often from scratch. It’s like performing open-heart surgery with a butter knife. Operators provide a more refined, specialized tool.

This distinction is critical. When someone says ‘running databases on Kubernetes,’ the how is paramount. Are they just dropping a MySQL container into a StatefulSet and praying? Or are they leveraging an Operator designed by experts? The difference is stability vs. chaos.

When to Choose What: A Simple Breakdown

  • Cloud Managed: Easiest, highest vendor lock-in, can be costly. Good for standard apps, less so for specialized needs.
  • Vendor Managed: Deeper database expertise, still vendor lock-in. Good if you’re committed to a specific DB and its vendor ecosystem.
  • Self-Managed (with Operator): Full control, minimal lock-in, complex to set up initially but easier to manage long-term. Best for complex needs, hybrid clouds, or when you absolutely need control.
  • Self-Managed (without Operator): Maximum control, maximum risk, maximum effort. Only for the truly masochistic or those with unique, underserved requirements.

The Verdict

Kubernetes is no longer just for stateless apps. But treating your databases with the same cavalier attitude you might afford a web server is a recipe for disaster. StatefulSets are a huge step forward, but they’re just the foundation. The real work lies in understanding replication, consistency, and, increasingly, leveraging the power of Kubernetes Operators to bring strong, battle-tested operational logic to your stateful workloads.


🧬 Related Insights

Jordan Kim
Written by

Cloud and infrastructure correspondent. Covers Kubernetes, DevOps tooling, and platform engineering.

Worth sharing?

Get the best Developer Tools stories of the week in your inbox — no noise, no spam.

Originally reported by dev.to

Stay in the loop

The week's most important stories from DevTools Feed, delivered once a week.