Did you know that most supposedly successful VM migrations are actually time bombs? It’s a grim thought, but one we need to confront. Especially when consultants pack up their briefcases, the project lead closes the ticket, and everyone breathes a sigh of relief. Three weeks later, the silence is deafening. Backup jobs are failing. Monitoring dashboards are dark. Nobody knows what ‘normal’ even looks like on the new platform. The VM conversion worked, alright. The migration? Not so much.
This is the insidious lift-and-shift KVM fallacy. And let’s be clear: it’s not a KVM problem. It’s a scoping problem. A colossal, blindingly obvious, scoping problem. Most VMware-to-KVM migration plans fixate on the hypervisor itself – the shiny new ESXi replacement. Everything built around that hypervisor? Suddenly, it’s “someone else’s project.” That’s where the Operating Model Gap creeps in, a gaping hole left by faulty assumptions.
Lift-and-shift KVM means compute moves. Disk images transfer. Network configurations get ported. VM settings are painstakingly recreated on the other side. From a purely data-plane perspective, it looks like a success. The workloads are running, aren’t they? But that’s like saying a car is fixed because the engine starts, even though the steering wheel is gone.
What doesn’t move? Oh, just the entire operational backbone.
- Operational runbooks referencing vCenter constructs. Gone.
- Backup architecture built against vSphere APIs. Poof.
- Monitoring thresholds calibrated to vSphere metrics. Meaningless.
- Provisioning workflows targeting vCenter endpoints. Dead.
- Snapshot behavior assumptions encoded in recovery procedures. Useless.
- Storage policy logic tied to vSAN semantics. Erased.
- Identity and access models mapped to vCenter RBAC. Invalid.
- Operator muscle memory built over years of vCenter navigation. Unusable.
None of this makes it into the migration plan. All of it breaks after cutover. The Operating Model Gap is the yawning chasm between what the plan claimed to capture and what the platform actually required to function. Every single item on that list is a component of the operating model. The hypervisor swap? It touches precisely none of them.
The framing that spawns these disastrous lift-and-shift KVM plans is deceptively simple: VMware equals ESXi. Replace ESXi with KVM. Migration complete. That framing is, to put it mildly, horseshit. VMware was never just ESXi. VMware was the control plane your entire operating model was built around, the invisible hand guiding everything.
| What the plan says | What actually changes |
|---|---|
| ESXi → KVM | vCenter (lifecycle and provisioning control) |
| vMotion semantics (live migration behavior) | |
| vSAN (storage abstraction and policy model) | |
| NSX (network policy and microsegmentation) | |
| vROps / vRealize (observability and alerting logic) | |
| VADP (backup API framework) | |
| DRS (scheduling and placement policy) | |
| Snapshot behavior (application-consistent logic) |
A VMware environment isn’t some hypervisor with a few tacked-on features. It’s a deeply integrated control surface. Compute scheduling, storage policy, network segmentation, observability, recovery operations—they all converge there. Replace ESXi with KVM, and every single one of those layers needs a replacement or a complete rebuild. And unlike ESXi, KVM doesn’t ship with an instruction manual for assembling them all.
KVM is a kernel module. The management plane, the storage architecture, the network abstraction, the observability stack—that’s all on you to assemble, integrate, and operate. That assembly is the real migration work. The work that most lift-and-shift plans conveniently forget to scope.
The Operating Model Test: What If vCenter Vanished?
If vCenter disappeared tomorrow, what percentage of your operating model would vanish with it? For the vast majority of VMware shops, the honest answer is likely between 60% and 90%. That percentage is the scope of what a lift-and-shift to KVM spectacularly fails to address. These migrations don’t fail at cutover. They fail in the trenches of operations. The failures are predictable, they arrive in a sequence, and they are almost never, ever, ever in the migration plan.
Why Did It All Break?
You didn’t just replace ESXi. You nuked vCenter. vCenter was the operational control surface for everything: provisioning new workloads, managing VM lifecycle, enforcing placement policy, controlling access, automating tasks. Move to KVM, and vCenter is gone. Poof. And everything that pointed at it? It needs a new target. The KVM ecosystem offers options—libvirt for direct management, Proxmox VE for a GUI-centric approach, oVirt for a vCenter-like experience, OpenStack for massive cloud-scale orchestration. Each represents a fundamentally different operating model. None is a drop-in replacement. A team that spent a decade operating vCenter doesn’t magically know how to operate any of these under pressure at 2 AM. This is the first stall point. Not because a management plane doesn’t exist, but because the operating model loses its control surface. The team has to rebuild operational confidence from absolute zero.
And it’s not just the control plane. You didn’t lose shared storage. You lost the storage abstraction your platform behavior depended on. vSAN provided a distributed storage fabric with defined behavior around replication, failure domains, snapshot consistency, and policy-based placement. That abstraction encoded a set of assumptions that your entire backup architecture, your recovery procedures, and your performance baselines were built against. In a KVM environment, that abstraction is gone. You’re now operating raw storage—whether that’s Ceph, NFS, or something else entirely—and all the assumptions you made about its behavior need a complete re-evaluation. Suddenly, your backup verification jobs aren’t just silently failing; they’re screaming in your face.