Tutorial
Taming State in Kubernetes: The Complete Guide to Persistent Volumes
Learn how to manage stateful workloads in Kubernetes using Persistent Volumes (PVs), Persistent Volume Claims (PVCs), and StatefulSets to ensure data persistence and resilience.
June 2026 · 8 min read · 1 views · 0 hearts
Advertisement
Taming State in Kubernetes: The Complete Guide to Persistent Volumes
Kubernetes was built for stateless apps. Containers crash, restart, and move between nodes at will. That’s the whole point of orchestration. But the real world runs on databases, file stores, and message queues. So how do you run stateful workloads in a system designed to treat infrastructure as cattle, not pets?
The answer is Persistent Volumes (PVs) and Persistent Volume Claims (PVCs)—Kubernetes’s abstraction layer for storage. Get them right, and you can run PostgreSQL, Elasticsearch, or even legacy file-based apps with the same resilience as your microservices. Get them wrong, and you’ll lose data, face 3AM pager alerts, or watch your storage bills balloon.
Let’s break down how PVs actually work, when to use each StorageClass, and how to manage stateful workloads without pain.
The Core Abstraction: PVs and PVCs
Before anything else, understand the two-part system:
- Persistent Volume (PV) – A piece of storage provisioned in the cluster. It exists independently of any pod. Can be backed by NFS, AWS EBS, GCE Persistent Disk, iSCSI, or cloud-managed services like Azure Disk.
- Persistent Volume Claim (PVC) – A request for storage by a user or pod. It specifies size, access modes (ReadWriteOnce, ReadOnlyMany, ReadWriteMany), and optionally a StorageClass.
The magic: pods don’t reference PVs directly. They use PVCs. Kubernetes binds PVCs to matching PVs automatically—like a job board matching a storage request to a storage supply.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: standard
This PVC says “I need 10 GiB of block storage, one writer only, provisioned by ‘standard’.” Kubernetes finds or creates a matching PV. Your MySQL pod mounts it. Clean.
The StorageClass Secret
PVs can be statically provisioned (someone in ops creates a raw NFS share, then defines a PV object) or dynamically provisioned. Dynamic provisioning is what separates “I can sleep at night” from “I spent Sunday debugging storage.”
StorageClass is the controller that knows how to talk to your storage backend. Each StorageClass defines a provisioner—like kubernetes.io/aws-ebs, filestore.csi.storage.gke.io, or rbd.csi.ceph.com—and its parameters (e.g., volume type, IOPS, replication factor).
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp3
iopsPerGB: "10"
reclaimPolicy: Retain
When a PVC requests storageClassName: fast-ssd, the provisioner spins up a new EBS volume. No human intervention. No ticket to the storage team.
Choose your StorageClass based on workload:
- Databases – Use block storage (AWS EBS, GCE Persistent Disk, Azure Disk). Low latency, consistent IOPS. Only ReadWriteOnce—one node can write at a time.
- Content repositories or logs – Use file storage (NFS, Azure Files, EFS). ReadWriteMany—multiple pods can read and write simultaneously.
- Large data lakes – Object storage via CSI drivers (MinIO, S3 CSI). You don’t mount it; you access it via API.
Access Modes: The Triangular Constraint
Understanding access modes saves you from late-night “why won’t my pod start” frustrations:
- ReadWriteOnce (RWO) – One node can read and write. Classic for a single-instance database.
- ReadOnlyMany (ROX) – Many nodes can read. Great for config maps or shared data that updates rarely.
- ReadWriteMany (RWX) – Many nodes can read and write. The rarest and hardest to get right. Needed for shared file systems in clustered apps.
Pro tip: Most cloud block storage only supports RWO. If you need RWX, you’re looking at NFS, GlusterFS, or cloud-specific file services (EFS, Azure Files). Don’t force a database onto RWX—it’s a recipe for corruption.
StatefulSets: The Pod Identity Cloak
A Deployment with a PVC is fine for a single instance. But what about a three-node Cassandra cluster? Each node needs its own persistent disk, with a stable identity, and the ability to recover its data after rescheduling.
Enter StatefulSet. Unlike a Deployment that creates pods with random names (like web-7hk3d), StatefulSets give each pod a fixed ordinal index and hostname (db-0, db-1, db-2). They also pair each pod with its own PVC via volumeClaimTemplates.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: "postgres"
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
When postgres-0 dies, the replacement pod gets the same name, the same hostname, and the same PVC (the old disk). Your cluster’s quorum is intact. Your data is safe.
When StatefulSet Makes Sense
- Clustered databases (Cassandra, MongoDB, CockroachDB)
- Message queues (RabbitMQ, Kafka)
- Key-value stores (Redis, etcd)
- Any app that needs stable network identity or ordered startup/shutdown
Do not use StatefulSet for every pod. If your app is stateless or uses an external database, stick with Deployments. Simpler. Cheaper.
Reclaim Policy: Keep or Destroy?
When a PVC is deleted, what happens to its underlying PV? That’s governed by reclaimPolicy:
- Retain – PV stays around. You must manually free it. Good for production databases where you want data to survive accidental PVC deletion.
- Delete – PV and its storage backend are automatically removed. Default for dynamic provisioning. Good for dev clusters or ephemeral data.
- Recycle – Deprecated. Avoid.
For production, explicitly set reclaimPolicy: Retain on critical PVs. It adds a manual step but saves your bacon during a config mistake.
Common Pitfalls and Fixes
Pods stuck in Pending state – Check PVC status with kubectl get pvc. If it’s in “Pending,” likely no matching PV or StorageClass exists. Also check node-level CSI driver logs.
Volume mounts failing after node restart – Ensure your storage backend supports multi-attach. If you use AWS EBS with ReadWriteOnce, only one node can mount it at a time. If a pod moves to a different node, the old node must release the volume first—which can take seconds to minutes.
Running out of disk but resizing is locked – Many cloud provisioners support volume expansion only from the StorageClass level. Check if your StorageClass has allowVolumeExpansion: true. Expanding a PVC then requires the CSI driver to support it.
RWX madness – You copy data into a shared volume, but pods see stale data. Filesystem-level caching is the culprit. For NFS, use mountOptions: ["hard","nfsvers=4.2"]. Better yet, use a CSI driver that understands Kubernetes semantics.
Real-World Stateful Workload Template
Deploy PostgreSQL with StatefulSet, explicit resource limits, a dedicated StorageClass for SSD-backed block storage, and a Headless Service for stable DNS:
apiVersion: v1
kind: Service
metadata:
name: postgres
labels:
app: postgres
spec:
clusterIP: None
selector:
app: postgres
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15
ports:
- containerPort: 5432
name: pgsql
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: standard-ssd
resources:
requests:
storage: 100Gi
This gives you a three-node PostgreSQL cluster with isolated disks, stable hostnames (postgres-0.postgres), and automatic recovery on failure.
Storage Is Not Free—Manage It
Dynamic provisioning creates volumes you can’t see in your cloud console unless you tag them. Use labels and annotations on PVCs to track ownership. Set cost alerts on your cloud provider. Regularly clean up orphaned PVs (volumes with status: Released).
Kubernetes storage is powerful, but it’s not magic. The same rules that apply in your data center apply here: plan your volume types, understand your access patterns, and always, always test your disaster recovery. Your stateful workloads deserve it.
Advertisement
Comments
Questions, corrections, and tips stay visible for everyone reading this page.
Join the discussion
No comments yet
Be the first to leave a note — it helps the next reader.