Tutorial
Observability for Python Developers: Metrics and Tracing with Prometheus and OpenTelemetry
A beginner-friendly guide to instrumenting Python apps with Prometheus metrics and OpenTelemetry tracing. Learn why print statements fail in production and how to set up production-ready observability in minutes with open-source tools.
June 2026 · 7 min read · 1 views · 0 hearts
Advertisement
A single print() statement will get you through a tutorial, but it will betray you the moment your code runs in production. When a background job silently fails at 3 AM, or an API endpoint suddenly slows to a crawl, you need more than a terminal window full of debug prints. You need metrics to know what is happening, and tracing to know where it’s happening.
This guide cuts through the buzzwords and gives you a practical, beginner-friendly approach to instrumenting your Python apps—without drowning you in over-engineered setups.
Why Not Just Print Statements?
Imagine your web app takes 10 seconds to load a page. With print("Loaded page"), you know a page loaded, but you have no idea which page, how long it took, or where the delay happened.
Metrics and tracing solve this: - Metrics are numbers (request count, latency, error rate). They tell you if something is wrong. - Tracing follows a single request through every function call and database query. It tells you why it’s wrong.
You don’t need a $10,000 observability platform to start. The tools covered here are free, open-source, and run on a laptop.
The Three Pillars: Metrics, Logs, and Traces
Before diving into code, it helps to know what you’re collecting.
| Pillar | What it measures | Example |
|---|---|---|
| Metric | Aggregate numbers | Requests per second, CPU usage |
| Log | Discrete events | "User login failed: invalid password" |
| Trace | Request flow through microservices or functions | "get_user() took 200ms → SQL took 180ms" |
For a single Python service, you really only need metrics and traces. Logs are for debugging specific errors. Metrics and traces tell you if your system is healthy before users complain.
Setting Up Your First Metrics with Prometheus
Prometheus is the industry standard for collecting metrics. It pulls data from your app at a regular interval (e.g., every 15 seconds). The Python client library makes this trivial.
Step 1: Install and Start Prometheus
Download Prometheus from the official site. Create a minimal config file (prometheus.yml):
scrape_configs:
- job_name: 'my_app'
static_configs:
- targets: ['localhost:8000']
Run it: ./prometheus --config.file=prometheus.yml
Step 2: Instrument Your Python App
Here’s a simple Flask app with Prometheus metrics:
from flask import Flask
from prometheus_client import Counter, Histogram, generate_latest, REGISTRY
import time
app = Flask(__name__)
# Define custom metrics
REQUEST_COUNT = Counter('http_requests_total', 'Total HTTP requests', ['method', 'endpoint'])
REQUEST_LATENCY = Histogram('http_request_duration_seconds', 'HTTP request latency', ['method', 'endpoint'])
@app.route('/')
def home():
REQUEST_COUNT.labels(method='GET', endpoint='/').inc()
with REQUEST_LATENCY.labels(method='GET', endpoint='/').time():
time.sleep(0.1) # Simulated work
return "Hello, world!"
@app.route('/metrics')
def metrics():
return generate_latest(REGISTRY), 200, {'Content-Type': 'text/plain'}
if __name__ == '__main__':
app.run(port=8000)
Hit http://localhost:8000/metrics and you’ll see raw metric data. Prometheus will now scrape that endpoint every 15 seconds.
Key insight: You don’t need to scrape Prometheus into your app’s logic. Your app just exposes the data; Prometheus handles the collection.
Adding Tracing with OpenTelemetry
Metrics tell you how many requests are slow. Tracing tells you which part of a single request is slow.
OpenTelemetry is the unified standard for traces (and metrics, but we’ll focus on traces). Use the opentelemetry-api and opentelemetry-sdk packages.
Step 1: Install OpenTelemetry
pip install opentelemetry-api opentelemetry-sdk opentelemetry-instrumentation-flask opentelemetry-exporter-otlp
Step 2: Automagically Instrument Your App
The SDK can automatically wrap Flask, requests, and many libraries:
from flask import Flask
from opentelemetry import trace
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
# Set up tracing
provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter())
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app) # Auto-instrument Flask
@app.route('/')
def home():
# Manual span (function-level tracing)
with trace.get_tracer(__name__).start_as_current_span("process_home"):
# Simulate a database call
response = fake_db_query()
return response
def fake_db_query():
# This will be traced automatically if you use a library like psycopg2
import time
time.sleep(0.05)
return "data"
if __name__ == '__main__':
app.run(port=8000)
Step 3: View Traces
You’ll need a backend to visualize traces. Jaeger is the easiest to set up:
docker run -d --name jaeger \
-e COLLECTOR_OTLP_ENABLED=true \
-p 16686:16686 \
-p 4317:4317 \
jaegertracing/all-in-one:latest
Now open http://localhost:16686. Hit your Flask app a few times, then search for traces in Jaeger. You’ll see a waterfall chart showing home → process_home → fake_db_query.
Making It Practical: The Minimal Observability Stack
You don’t need to install everything in production right away. Here’s a realistic beginner setup:
- Expose Prometheus metrics from your app (takes 10 lines of code).
- Scrape with Prometheus on the same machine (or a lightweight instance).
- Add OpenTelemetry auto-instrumentation to your app (also ~10 lines).
- Run Jaeger in Docker on your dev machine to view traces.
That’s it. Once you see the data, you’ll immediately know where to optimize.
Common Pitfalls (and How to Avoid Them)
- Over-instrumenting from day one: Start with one metric (request count) and one trace (your main route). Add more as you encounter real problems.
- Forgetting to export traces: Auto-instrumentation does nothing if you don’t configure an exporter (like OTLP or Jaeger).
- Using the same metric name for different things:
http_requests_totalwith different label sets is fine. Using it for both API calls and static file downloads is confusing. - Ignoring cardinality: Avoid labels with infinite values (like user IDs or session tokens) unless you want Prometheus to choke.
Where to Go Next
Once you’re comfortable with metrics and traces, add:
- Structured logging with structlog or python-json-logger so your logs integrate with trace IDs.
- Dashboards in Grafana that show latency histograms and error rates.
- Alerting in Prometheus for 5xx errors or high p99 latency.
The goal is not to know everything at once. It’s to know one thing that was invisible before—and then build from there.
Go instrument one endpoint today. You’ll thank yourself the next time something breaks.
Advertisement
Comments
Questions, corrections, and tips stay visible for everyone reading this page.
Join the discussion
No comments yet
Be the first to leave a note — it helps the next reader.