Tech

Inside the Box: How Linux Containers Actually Work (and Why They're Not Just Lightweight VMs)

Learn how Linux containers use namespaces, cgroups, union filesystems, and seccomp to isolate processes and enforce resource limits—no magic, just kernel-level mechanisms that have been around for over a decade.

June 2026 · 7 min read · 2 views · 0 hearts

Try in editor Tutorial catalog

Inside the Box: How Linux Containers Actually Work (and Why They’re Not Just Lightweight VMs)

You've heard the pitch a thousand times: "Containers are lightweight, portable, and isolate your applications." But here's the secret that most Docker tutorials skip — containers aren't doing anything magical. They're not a new kernel feature. They're actually a clever combination of old-school Linux kernel mechanisms that have been around for over a decade.

Let's crack open that black box and see what's really going on inside.

The Core Trick: Shared Kernel, Isolated View

Containers work because every container on a host shares the same Linux kernel. That's the fundamental difference from virtual machines, which each run their own full OS. Containers don't need to boot — they just need to feel like they're on their own machine.

But how do you make a process feel like it's alone? You lie to it. Beautifully.

Namespaces: The Art of the White Lie

Linux namespaces are the isolation backbone. Each namespace wraps a global system resource in a way that makes it look unique to processes inside it. There are eight types, but these are the big players:

PID namespace: Process inside thinks it's PID 1. Outside, it's process 3456. This prevents process listing between containers.
Network namespace: Your container has its own lo interface, IP address, and routing tables. It can't see the host's network.
Mount namespace: The filesystem tree is completely independent. What your container sees as / might be a tiny directory on your disk.
UTS namespace: Hostname and domain isolation. hostname can be web-server-42.
IPC namespace: Inter-process communication stays contained.

When you run docker run, the container runtime creates a fresh set of namespaces, then forces the new process into them. The process never knows the difference.

Cgroups: The Quiet Enforcer

Namespaces handle what you see. Cgroups (control groups) handle what you use. Without them, one container could eat all your RAM or peg every CPU core.

Cgroups let you set hard and soft limits on:

CPU shares and quotas (proportional or capped)
Memory limits and swap
Block I/O priority
Network bandwidth
PID counts (prevent fork bombs)

The real power? These are enforced by the kernel itself — no polling, no monitoring agent needed. If a container tries to exceed its memory limit, the OOM killer kicks in right there.

The Filesystem Deception: Layers and Copy-on-Write

Containers need their own root filesystem. But copying entire OS images would be wasteful. Enter union filesystems (OverlayFS, AUFS, or similar).

Your container image is built from layers — each a read-only filesystem snapshot. When you write a file, the system uses copy-on-write: it copies the file from a lower layer to a writable upper layer. Most files stay shared across containers. This is why 50 containers running Ubuntu might only use a few hundred MB of disk total.

Seccomp and Capabilities: The Security Salt

Namespaces isolate. Cgroups throttle. But what about kernel-level attacks? You need to restrict system calls.

Seccomp (secure computing mode) creates a filter over syscalls. By default, Docker containers block over 40 dangerous syscalls (like reboot, kexec_load, or bpf). You can customize this per container.

Linux capabilities take a finer approach. Instead of giving root unlimited power, capabilities break it into individual permissions like CAP_NET_BIND_SERVICE (bind to ports under 1024) or CAP_SYS_ADMIN (mount filesystems). Containers drop almost all capabilities by default.

Why This Matters for Performance

This layered approach means container startup is instant — no BIOS, no kernel boot, no init system. And because there's no hypervisor overhead, you get near-native performance for CPU and memory.

The trade-off? Your isolation is only as good as the kernel. A kernel exploit can escape a container. That's why production setups use additional layers like SELinux, AppArmor, or user namespaces with non-root containers.

The Real Takeaway

Next time someone explains containers as "lightweight VMs," you can gently correct them. Containers are process-level isolation — namespaces for illusion, cgroups for enforcement, and layered filesystems for efficiency. It's not magic. It's just Linux, applied with surgical precision.

And that's exactly what makes them powerful.

Comments

Questions, corrections, and tips stay visible for everyone reading this page.

0 in thread

Join the discussion

No comments yet

Be the first to leave a note — it helps the next reader.