Tech
How Containers Actually Work: Linux Namespaces and Cgroups Explained
An exploration of the Linux kernel primitives—namespaces and cgroups—that power containerization. Learn how these features create the isolation and resource limits that make Docker possible.
June 2026 · 6 min read · 1 views · 0 hearts
Advertisement
Before Docker, there was no magical docker run. Containers—the kind we actually use in production—are not virtual machines in disguise. They are ordinary Linux processes, tricked into believing they are alone on the machine. The illusion is built on two ancient pieces of Linux kernel machinery: namespaces and cgroups.
Here’s how they work, and why they’re the real reason containers became a revolution—not a fad.
The Prison Break Problem
Imagine a runaway process. It eats 100% CPU, hogs memory, and keeps crashing. Before containers, that process could take down the entire server. Worse, a process could see every other process, every file, every network socket. No isolation at all.
Containers solve that by giving every process its own jail cell—but without the overhead of a hypervisor. The key is that the jail cell is purely conceptual: namespaces create the view, cgroups enforce the limits.
Namespaces: The Great Lie
A namespace wraps a global system resource inside a private copy. When a process looks at that resource, it only sees what lives inside its namespace. What’s outside might as not exist.
Linux exposes eight namespace types. The three that matter most for containers:
| Namespace | What it isolates |
|---|---|
| PID | Process IDs. Process 1 inside a container is not process 1 on the host. |
| Network | Own network stack—interfaces, IP addresses, routing tables, firewall rules. |
| Mount | Filesystem mount points. A container sees its own root filesystem, not the host’s /. |
When Docker runs a container, it creates a new set of namespaces for each of these. The container’s init process (PID 1) is isolated inside a PID namespace—it cannot see or signal processes outside its own cell. The network namespace gives it a private eth0, possibly a different IP range. The mount namespace provides its own /proc, /sys, /etc/resolv.conf.
Real-world implication: You can have 10 containers all running an Nginx server on port 80. Each thinks it owns port 80 inside its own network namespace. The host maps their internal ports to external ports (like 8080, 8081) using network address translation.
The "Clone" System Call
Namespaces are created using the clone() system call with flags like CLONE_NEWPID, CLONE_NEWNET, CLONE_NEWNS. When a child process is created with CLONE_NEWPID, it gets PID 1 inside its namespace. The parent process on the host sees the child as having some other PID—say, 13245. To the child, it’s the lone init.
That means a container cannot kill -9 a process outside its namespace. It can’t even see it in /proc. This is isolation that chroot alone never provided.
Cgroups: The Iron Fist
Namespaces give illusion. Cgroups deliver enforcement. Without cgroups, a container could still eat all memory or spawn infinite processes until the host OOM-kills the kernel.
Cgroups (control groups) are a kernel feature that limits, accounts for, and isolates resource usage—CPU, memory, disk I/O, network bandwidth, and number of processes.
When you run:
docker run --memory=512m --cpus=1 nginx
Docker places the container’s processes into a cgroup with a memory limit of 512 MB and a CPU quota of 1 core. If the process inside tries to allocate 600 MB, the kernel refuses the allocation. If it tries to use more CPU time than allowed, it gets throttled.
The hierarchy
Cgroups are organized as a tree of subsystems. Each subsystem controls one resource:
cpu– sets CPU shares and quotasmemory– sets soft and hard limitsblkio– limits read/write rates for block devicespids– limits number of fork/exec syscalls (prevents fork bombs)
The kernel reads the cgroup configuration on every resource allocation. Overhead? Negligible. It’s a fast in-memory check, not a hypervisor trap.
The OOM Guardian
One underrated feature: cgroup’s OOM killer. If a container exceeds its memory limit, the kernel kills a process inside that container’s cgroup, not a random process on the host. That’s why one hungry container doesn’t knock out the rest—the cage locks itself.
Putting It Together: A Container’s Birth
When you type docker run, here’s what actually happens:
- Docker calls
clone()with a set of namespace flags to create a new, isolated process tree. - The new process gets its own PID namespace (sees itself as PID 1).
- Docker creates a new network namespace, attaches a veth pair, and bridges it to the host’s network.
- Docker creates a new mount namespace, pivot_root to the container image’s filesystem.
- Docker creates a cgroup for the container and assigns it CPU/memory/pids limits.
- Docker joins the container’s namespaces, so it can exec into it.
The container’s init process then runs as a normal Linux process—scheduled by the same kernel scheduler, sharing the same hardware. No virtualization overhead. No emulation. Just isolation.
Why This Matters Today
Understanding namespaces and cgroups helps troubleshoot without magic:
- "Why can’t I ping the host from the container?" – Network namespace isolation. You need a bridge or port mapping.
- "Why does my container get killed for no reason?" – Cgroup memory limit. Check
dmesgfor OOM events. - "Can I run a privileged container?" – Yes, by breaking out of namespaces (
--privileged). That’s a security risk, not a feature.
The ecosystem has moved to user namespaces, cgroup v2, and more granular control—but the fundamentals haven’t changed since 2008. Containers are not a new technology. They’re a clever repackaging of Linux’s most powerful isolation primitives.
Next time you run a container, remember: it’s just a process with a carefully constructed hallucination, locked inside a cell with a firm budget. That’s the secret sauce.
Advertisement
Comments
Questions, corrections, and tips stay visible for everyone reading this page.
Join the discussion
No comments yet
Be the first to leave a note — it helps the next reader.