How-tos
How to Monitor Docker Containers: Mastering Logs and Performance Metrics
Learn how to track Docker container health using built-in logging and stats tools, and scale to professional monitoring stacks like Prometheus, Grafana, and the ELK stack.
June 2026 · 5 min read · 3 views · 0 hearts
Advertisement
If you’ve ever stared at a crashing Docker container and thought, "What on earth just happened?", you aren't alone. Containers are designed to be ephemeral—meaning they can vanish in a heartbeat—which makes capturing their "last words" and performance trends critical for survival in production.
Monitoring Docker containers is a two-pronged strategy: Logs (the "what" and "why") and Metrics (the "how much" and "how fast"). Here is how to master both.
Part 1: Mastering Docker Logs
Logs are the narrative of your application. When a Python script throws a Traceback or a database connection fails, the logs are where the evidence lives.
The Basics: docker logs
The most immediate way to see what's happening is the built-in logging driver.
- View all logs:
docker logs <container_id> - Follow logs in real-time:
docker logs -f <container_id>(Essential for debugging during a deployment). - See the last N lines:
docker logs --tail 100 <container_id> - Filter by time:
docker logs --since 30m <container_id>
The Golden Rule: Log to Stdout/Stderr
For Docker to capture logs, your application must write to the standard output (stdout) and standard error (stderr) streams rather than writing to a .log file inside the container.
If you are using Python’s logging module, ensure your handler is configured to stream to the console. This allows Docker to intercept the logs and forward them to whatever logging driver you've chosen.
Moving Beyond the CLI: Centralized Logging
In a production environment with 20+ containers, jumping from one to another with docker logs is impossible. This is where the ELK Stack (Elasticsearch, Logstash, Kibana) or Grafana Loki comes in.
Instead of storing logs locally, you configure a Docker logging driver (like splunk, gelf, or awslogs) to ship logs to a central server. This allows you to search across all containers using a single dashboard.
Part 2: Tracking Performance Metrics
While logs tell you why an app crashed, metrics tell you when it's about to crash. Metrics track CPU usage, memory consumption, network I/O, and disk pressure.
The Quick Look: docker stats
For a high-level, real-time snapshot, use the built-in stats command:
docker stats
This provides a live stream of: * CPU %: Is your Python loop eating all the available cores? * MEM USAGE / LIMIT: Are you hitting your memory ceiling and risking an OOM (Out of Memory) kill? * NET I/O: Is your container sending massive amounts of data unexpectedly?
Professional Monitoring: The Prometheus + Grafana Duo
docker stats is great for a quick check, but it doesn't provide history. To see a trend line of memory growth over the last 24 hours, you need a time-series database.
The industry standard is the prometheus + cAdvisor combination:
- cAdvisor (Container Advisor): An open-source agent by Google that runs as a container. It collects low-level resource usage and performance characteristics from all running containers on the host.
- Prometheus: Scrapes the data from cAdvisor at regular intervals and stores it.
- Grafana: Connects to Prometheus to turn that raw data into beautiful, alertable graphs.
Key Metrics to Watch
When setting up your dashboards, prioritize these "Red Flags":
- Memory Usage vs. Limit: If usage is consistently at 90% of the limit, you are one spike away from a crash.
- CPU Throttling: If your CPU usage is capped, your application response times will skyrocket.
- Restart Count: A container that restarts every 10 minutes is "stable" in the sense that it's running, but it's failing fundamentally.
Summary Checklist for Monitoring
| Tool | Best For... | Type |
|---|---|---|
docker logs -f |
Immediate debugging | Logs |
docker stats |
Instant resource checks | Metrics |
| Loki / ELK | Searching history & patterns | Logs |
| Prometheus | Long-term trend analysis | Metrics |
| Grafana | Visualizing health & alerting | Visualization |
By combining real-time logs with historical metrics, you move from reactive firefighting (fixing things after they break) to proactive management (scaling resources before the crash happens).
Advertisement
Comments
Questions, corrections, and tips stay visible for everyone reading this page.
Join the discussion
No comments yet
Be the first to leave a note — it helps the next reader.