Here’s a statistic to make you squint: 50 lines. That’s it. Apparently, you can build a working Linux container with Go using a mere fifty lines of code. Fifty. Forget your Dockerfiles, your Kubernetes YAMLs, your entire cloud-native empire. Someone decided we could just… write it ourselves in a weekend project. And honestly? They’re mostly right.
This isn’t some abstract concept for the theoretical elite. This is a hands-on dive into the guts of what makes a container behave. Part two of this series picks up where the last left off, fixing a rather gaping security hole. You thought CLONE_NEWUTS and forking were isolation? Cute. Turns out, your shiny new container could still cd .. and wreak havoc on the host. Amateur hour.
The Root of the Problem (Literally)
The magic word is <a href="/tag/chroot/">chroot</a>. Short for ‘change root,’ this humble syscall lets you tell a process, ‘Hey, everything you think is the universe? It’s just this folder now.’ For that process, anything outside is pure, unadulterated oblivion. The original code adds a syscall.Chroot(pwd) call, pointing our new universe to the current directory. Simple enough.
Except it spectacularly implodes. Run sudo go run main.go run /bin/bash and you’re met with a delightful panic: fork/exec /bin/bash: no such file or directory. Why? Because when you chroot, you’re not just changing a directory; you’re fundamentally altering the process’s filesystem perspective. It’s looking for /bin/bash within its own tiny, new universe – which, at that point, is just your Go project folder. Oops.
We just told our process that our current directory is the entire universe. So, when we ask
exec.Commandto run/bin/bash, it isn’t looking at your computer’s actual hard drive anymore. It is looking inside your project folder for a directory calledbincontaining an executable calledbash.
This is where common sense kicks in. You need an actual root filesystem. You need a place with /bin, /usr, /lib – all the plumbing a basic shell expects. And how do you get one? You could, you know, spend hours painstakingly assembling it. Or, you could do what these brave souls did: use Docker itself. A quick docker export $(docker create ubuntu) > ubuntu.tar and tar -xf ubuntu.tar -C ubuntu-rootfs later, and voilà. You have a miniature Ubuntu OS ready to be chrooted into. Now, point the chroot call to that ubuntu-rootfs folder, cd /, and suddenly, /bin/bash works. You can try to cd .. and find yourself back where you started, utterly locked out of the host. Nicely done.
PID Namespaces and the /proc Problem
But isolation isn’t just about filesystems. Remember ps aux inside a proper container? You see a handful of processes, not the whole host’s chaotic mess. Try running it now, and you get an error: Error, do this: mount -t proc proc /proc. The ps command trawls /proc, a special virtual filesystem that shows live process data. Your isolated rootfs has an empty /proc directory. The kernel hasn’t been told to populate it with your container’s process list. So, ps starves.
To fix this, you need two things: PID namespaces and mounting the proc filesystem. PID namespaces give your container its own process IDs, starting from 1. Crucially, the code adds syscall.CLONE_NEWPID to the Cloneflags in SysProcAttr. This is the key to making ps think it’s only seeing its own little world. The CLONE_NEWNS flag, though confusingly named, is for Mount namespaces, which are also essential for this kind of filesystem manipulation.
Finally, the code includes defer statements to mount and unmount the proc filesystem. This is the crucial step that actually populates /proc with data for the processes running within the container’s PID namespace. You’re essentially telling the kernel, ‘Hey, this /proc directory? Make it show these processes.’
A Glimpse Behind the Curtain
This is it. The core of it, anyway. You’ve got filesystem isolation via chroot and namespaces, and process isolation via PID namespaces. It’s a far cry from a full-blown container runtime like containerd or runC, which involve much more sophisticated syscalls like pivot_root for even tighter isolation and more complex mount propagation. But for understanding the fundamental building blocks? This 50-line wonder is surprisingly effective. It strips away the abstraction and shows you the bare metal. It’s a reminder that the powerful tools we use daily are built on surprisingly simple, albeit slightly arcane, Linux primitives. Don’t let the complexity of modern orchestration fool you; the foundations are still here, waiting to be explored.
This exercise highlights how much unnecessary ceremony often surrounds what are fundamentally basic OS features. While production systems need layers of security and management, this Go example proves that the core isolation mechanisms are accessible and, frankly, not that scary. It’s a valuable lesson for anyone who feels intimidated by the sheer scale of cloud-native tooling.
Is This a Real Container Runtime?
No, not really. This is a highly simplified demonstration for educational purposes. Production container runtimes like Docker or containerd use a more extensive set of Linux namespaces, cgroups for resource control, and more advanced filesystem manipulation (like pivot_root) for security and functionality. This code demonstrates the concept of isolation using basic Go and Linux syscalls.
What are Linux Namespaces and chroot?
Linux namespaces are a feature of the Linux kernel that partitions kernel resources such that one set of processes sees one set of resources, while another set of processes sees a different set. Key namespaces include PID, NET, UTS, MNT, IPC, and USER. chroot (change root) is a syscall that changes the apparent root directory for the current running process and its children. It’s a form of filesystem isolation but doesn’t offer process or network isolation on its own.
Can I Run Docker Containers with This Go Code?
Absolutely not. This code builds its own minimal container from scratch using OS primitives. It does not have the capability to interpret Docker images or interact with the Docker daemon. It’s a standalone example of how containerization works at a fundamental level.