Detect Outliers in CSV Data Using Z-Score in Python

Read a CSV file and detect outliers in a numeric column by computing z-scores, flagging those exceeding a given threshold — no machine learning required.

Easy Python 3.9+ Jun 28, 2026 Files & data 2 views 0 copies

outlier-detection z-score csv statistics data-cleaning files

Python code

43 lines

Python 3.9+

import csv
import statistics
from math import sqrt

def detect_outliers(csv_path, column_name, threshold=2.0):
    """Detect outliers in a numeric column using z-score method."""
    values = []
    with open(csv_path, 'r', newline='') as f:
        reader = csv.DictReader(f)
        if column_name not in reader.fieldnames:
            raise ValueError(f"Column '{column_name}' not found")
        for row in reader:
            try:
                val = float(row[column_name])
                values.append(val)
            except (ValueError, TypeError):
                continue
    
    if len(values) < 2:
        return []
    
    mean = statistics.mean(values)
    stdev = statistics.stdev(values)
    if stdev == 0:
        return []
    
    outliers = []
    for i, val in enumerate(values):
        z_score = (val - mean) / stdev
        if abs(z_score) > threshold:
            outliers.append((i, val, z_score))
    return outliers

if __name__ == "__main__":
    # Example: create sample CSV data
    sample_data = "value\n10\n12\n11\n13\n100\n9\n11\n12\n10\n200\n14\n"
    with open('sample.csv', 'w') as f:
        f.write(sample_data)
    
    result = detect_outliers('sample.csv', 'value')
    print("Outliers detected (index, value, z-score):")
    for idx, val, z in result:
        print(f"  Row {idx+1}: {val} (z={z:.2f})")

Output

stdout

Outliers detected (index, value, z-score):
  Row 6: 100 (z=2.37)
  Row 11: 200 (z=4.68)

How it works

The function reads a CSV file with csv.DictReader to access columns by name. It parses only valid numeric values, skipping errors via try/except. Z-scores measure how many standard deviations a value is from the mean; a common threshold is 2 or 3. Values with abs(z-score) > threshold are flagged as outliers. Using the standard library avoids external dependencies while providing a simple, transparent outlier detection mechanism suitable for quick data screening.

Common mistakes

Forgetting to handle non-numeric or missing data, which crashes the script.
Using `statistics.pstdev` instead of `statistics.stdev` for sample standard deviation.
Setting the threshold too low (e.g., 1.5) and flagging normal variation as outliers.

Variations

Use `pandas` with `scipy.stats.zscore` for vectorized outlier detection on large datasets.
Apply the IQR method: flag values below Q1 – 1.5*IQR or above Q3 + 1.5*IQR.

Real-world use cases

Flag abnormal sensor readings in an IoT data pipeline before alerting operators.
Identify anomalous transaction amounts in financial log reviews to reduce fraud investigation scope.
Quickly spot data entry errors in survey results before running statistical analysis.

Detect Outliers in CSV Data Using Z-Score in Python

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Files & data

Tutorials

Quizzes

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Files & data

Keep learning

Tutorials

Quizzes