Maintenance

Site is under maintenance — quizzes are still available.

Go to quizzes
Sponsored Reserved space — layout preview until AdSense is connected

Detect Outliers in CSV Data Using Z-Score in Python

Read a CSV file and detect outliers in a numeric column by computing z-scores, flagging those exceeding a given threshold — no machine learning required.

Easy Python 3.9+ Jun 28, 2026 Files & data 2 views 0 copies

Python code

43 lines
Python 3.9+
import csv
import statistics
from math import sqrt

def detect_outliers(csv_path, column_name, threshold=2.0):
    """Detect outliers in a numeric column using z-score method."""
    values = []
    with open(csv_path, 'r', newline='') as f:
        reader = csv.DictReader(f)
        if column_name not in reader.fieldnames:
            raise ValueError(f"Column '{column_name}' not found")
        for row in reader:
            try:
                val = float(row[column_name])
                values.append(val)
            except (ValueError, TypeError):
                continue
    
    if len(values) < 2:
        return []
    
    mean = statistics.mean(values)
    stdev = statistics.stdev(values)
    if stdev == 0:
        return []
    
    outliers = []
    for i, val in enumerate(values):
        z_score = (val - mean) / stdev
        if abs(z_score) > threshold:
            outliers.append((i, val, z_score))
    return outliers

if __name__ == "__main__":
    # Example: create sample CSV data
    sample_data = "value\n10\n12\n11\n13\n100\n9\n11\n12\n10\n200\n14\n"
    with open('sample.csv', 'w') as f:
        f.write(sample_data)
    
    result = detect_outliers('sample.csv', 'value')
    print("Outliers detected (index, value, z-score):")
    for idx, val, z in result:
        print(f"  Row {idx+1}: {val} (z={z:.2f})")

Output

stdout
Outliers detected (index, value, z-score):
  Row 6: 100 (z=2.37)
  Row 11: 200 (z=4.68)

How it works

The function reads a CSV file with csv.DictReader to access columns by name. It parses only valid numeric values, skipping errors via try/except. Z-scores measure how many standard deviations a value is from the mean; a common threshold is 2 or 3. Values with abs(z-score) > threshold are flagged as outliers. Using the standard library avoids external dependencies while providing a simple, transparent outlier detection mechanism suitable for quick data screening.

Common mistakes

  • Forgetting to handle non-numeric or missing data, which crashes the script.
  • Using `statistics.pstdev` instead of `statistics.stdev` for sample standard deviation.
  • Setting the threshold too low (e.g., 1.5) and flagging normal variation as outliers.

Variations

  1. Use `pandas` with `scipy.stats.zscore` for vectorized outlier detection on large datasets.
  2. Apply the IQR method: flag values below Q1 – 1.5*IQR or above Q3 + 1.5*IQR.

Real-world use cases

  • Flag abnormal sensor readings in an IoT data pipeline before alerting operators.
  • Identify anomalous transaction amounts in financial log reviews to reduce fraud investigation scope.
  • Quickly spot data entry errors in survey results before running statistical analysis.

Sponsored

Sponsored Reserved space — layout preview until AdSense is connected

Run this sample

Open the browser IDE to tweak the example and see results without installing anything.

Open editor

More from Files & data

Related tutorials and quizzes for this topic.