Maintenance

Site is under maintenance — quizzes are still available.

Go to quizzes
Sponsored Reserved space — layout preview until AdSense is connected

Tutorial

The Complete Guide to Logging and Monitoring Python Applications

Learn production-grade logging and monitoring for Python apps: replace print() with structured JSON logs, set up Prometheus metrics, add health checks, and centralize logs with the ELK stack or SaaS tools.

June 2026 · 10 min read · 1 views · 0 hearts

The Complete Guide to Logging and Monitoring Python Applications

You’ve deployed your Python app. It’s running in production. Then something breaks — silently. No error message, no stack trace, just a user saying “it doesn’t work.” You stare at a blank terminal, realizing you have no idea what happened.

That’s the moment you learn that print() is not a logging strategy.

Logging and monitoring aren’t just "nice to haves" for Python applications. They’re your eyes and ears when you can’t be there. Let’s build a production-grade logging setup from the ground up.


Why print() Isn't Enough

Sure, print() works on your laptop. But in production, you need:

  • Log levels — not everything is a crisis
  • Structured output — so machines can parse your logs
  • Rotation — so your disk doesn’t fill up
  • Remote aggregation — so you can search across servers

print() gives you none of that. Let’s fix it.


Setting Up Python’s Built-in Logging Module

Python ships with a surprisingly powerful logging module. Here’s the baseline every app should start with:

import logging
import sys

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
    handlers=[
        logging.StreamHandler(sys.stdout),
        logging.FileHandler("app.log")
    ]
)

logger = logging.getLogger(__name__)

This gets you: - Timestamps on every entry - Different levels (DEBUG, INFO, WARNING, ERROR, CRITICAL) - Output to both console and a file - The module name as the logger name (huge for debugging)

Use it like this:

logger.info("User logged in successfully")
logger.warning("Rate limit approaching for user %s", user_id)
logger.error("Failed to process payment", exc_info=True)

That exc_info=True parameter will include the full traceback — don't skip it on errors.


Log Levels: Don’t Cry Wolf

The cardinal sin of logging is treating everything as an emergency. Here’s a sane level guide:

Level When to use it
DEBUG "I'm debugging locally and need to know every variable value"
INFO "The system is doing what it's supposed to do"
WARNING "Something unexpected happened, but we recovered"
ERROR "Something failed and a user was affected"
CRITICAL "The entire application is going down"

Rule of thumb: In production, set your root level to INFO. Set specific modules to DEBUG only when troubleshooting.

# In production config
logging.basicConfig(level=logging.INFO)

# During a firefight with payment processing
logging.getLogger("payments").setLevel(logging.DEBUG)

Structured Logging: Make Your Logs Machine-Readable

Text logs are fine for humans reading in real-time. But when you have 10 million log entries to search? You need structured data.

Enter python-json-logger:

pip install python-json-logger
from pythonjsonlogger import jsonlogger
import logging

handler = logging.StreamHandler()
formatter = jsonlogger.JsonFormatter(
    fmt="%(asctime)s %(levelname)s %(name)s %(message)s"
)
handler.setFormatter(formatter)

logger = logging.getLogger(__name__)
logger.addHandler(handler)
logger.setLevel(logging.INFO)

logger.info("Payment processed", extra={
    "user_id": 12345,
    "amount": 29.99,
    "currency": "USD",
    "payment_method": "card"
})

This outputs a JSON line:

{"asctime": "2024-03-15 10:30:00", "levelname": "INFO", "name": "__main__", "message": "Payment processed", "user_id": 12345, "amount": 29.99, "currency": "USD", "payment_method": "card"}

Now tools like Elasticsearch, DataDog, or Splunk can index and search every field. You can query "all failed payments over $100 last hour" in seconds.


Log Rotation: Don’t Fill Up the Disk

Production apps generate logs fast. Without rotation, your logs will eventually consume all available disk space and crash the server.

from logging.handlers import RotatingFileHandler

handler = RotatingFileHandler(
    "app.log",
    maxBytes=10_000_000,  # 10MB
    backupCount=5         # Keep 5 old log files
)

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
    handlers=[handler]
)

For high-traffic apps, consider TimedRotatingFileHandler — it rotates at midnight regardless of size:

from logging.handlers import TimedRotatingFileHandler

handler = TimedRotatingFileHandler(
    "app.log",
    when="midnight",
    interval=1,
    backupCount=30  # Keep 30 days of logs
)

Monitoring: Beyond Logs

Logs tell you what happened. Monitoring tells you what is happening right now. You need both.

Application Metrics with Prometheus

Prometheus is the industry standard for pulling metrics:

pip install prometheus-client
from prometheus_client import Counter, Histogram, start_http_server
import time
import random

# Define metrics
request_count = Counter("http_requests_total", "Total HTTP requests", ["method", "endpoint"])
request_duration = Histogram("http_request_duration_seconds", "Request duration", ["endpoint"])

# Start the metrics server on port 8000
start_http_server(8000)

# Use them in your app
def handle_request():
    endpoint = "/api/users"
    method = "POST"

    request_count.labels(method=method, endpoint=endpoint).inc()

    start = time.time()
    # ... your actual request handling logic ...
    time.sleep(random.uniform(0.1, 0.5))

    request_duration.labels(endpoint=endpoint).observe(time.time() - start)

Now Prometheus scrapes http://localhost:8000/metrics every 15 seconds and gives you dashboards for: - Request rate (requests per second) - Error rate - Latency percentiles (p50, p95, p99) - Active users

Health Checks: Is Your App Alive?

Every production app should expose a health endpoint:

from flask import Flask, jsonify

app = Flask(__name__)

@app.route("/health")
def health():
    return jsonify({
        "status": "healthy",
        "database": check_db_connection(),
        "cache": check_cache_connection(),
        "uptime_seconds": time.time() - start_time
    })

Your orchestrator (Kubernetes, Docker Compose, etc.) hits this every few seconds. If it returns non-200, the container gets restarted automatically.


Centralized Logging: The One True Pattern

Individual log files on each server are a nightmare. Centralize everything.

Option 1: The Elastic Stack (ELK)

App -> Filebeat -> Logstash -> Elasticsearch -> Kibana

Your app writes JSON logs to files. Filebeat tails those files and ships them to Logstash, which transforms and sends to Elasticsearch. Kibana gives you the dashboard.

Option 2: SaaS Solutions

  • Datadog: Just pip install ddtrace and set environment variables
  • Sentry: Perfect for error tracking with context
  • Better Stack / Logtail: Simple HTTP shipping

Example of shipping logs via HTTP to Logtail:

import logging
from logtail import LogtailHandler

handler = LogtailHandler(source_token="your-source-token")
logger = logging.getLogger(__name__)
logger.addHandler(handler)
logger.setLevel(logging.INFO)

logger.info("User action tracked", extra={"user_id": user.id, "action": "purchase"})

The Quick-Start Checklist

Before you deploy your next Python app, run through this:

  • [ ] Replace all print() with logger.info() / logger.error()
  • [ ] Set up JSON structured logging
  • [ ] Configure log rotation
  • [ ] Add a /health endpoint
  • [ ] Expose Prometheus metrics for request rate, error rate, and latency
  • [ ] Ship logs to a centralized location
  • [ ] Set up alerts (PagerDuty, Slack, email) for ERROR and CRITICAL logs

The Bare Minimum

If you take nothing else from this guide: install Sentry. It takes 5 minutes, gives you automatic error capture with full context (user, request, environment), and will save your weekend more times than you can count.

pip install sentry-sdk
import sentry_sdk

sentry_sdk.init(
    dsn="https://your-dsn@sentry.io/12345",
    traces_sample_rate=1.0
)

# That's it. Every unhandled exception is now captured.

Logging is infrastructure. You won't appreciate it until the moment you need it — and then, you need it desperately.

Comments

Questions, corrections, and tips stay visible for everyone reading this page.

0 in thread

Join the discussion

Shown next to your comment.

Up to 4,000 characters

No comments yet

Be the first to leave a note — it helps the next reader.