How to Build a CSV Comparison Tool That Highlights Every Changed Cell in Python

Read two CSV files with DictReader, compare cell by cell, and return a list of dictionaries describing each changed cell using only the standard library.

Medium Python 3.9+ Jun 28, 2026 Files & data 2 views 0 copies

csv comparison diff file-processing dictreader

Python code

41 lines

Python 3.9+

import csv
from pathlib import Path

def csv_cell_diff(file_a: str, file_b: str) -> list[dict]:
    rows_a = list(csv.DictReader(Path(file_a).open('r', newline='')))
    rows_b = list(csv.DictReader(Path(file_b).open('r', newline='')))
    if not rows_a or not rows_b:
        return []
    columns = list(rows_a[0].keys())
    changes = []
    max_rows = max(len(rows_a), len(rows_b))
    for i in range(max_rows):
        row_a = rows_a[i] if i < len(rows_a) else {}
        row_b = rows_b[i] if i < len(rows_b) else {}
        for col in columns:
            val_a = row_a.get(col, "")
            val_b = row_b.get(col, "")
            if val_a != val_b:
                changes.append({
                    "row": i + 2,  # 1-indexed, header is row 1
                    "column": col,
                    "old": val_a,
                    "new": val_b
                })
    return changes

if __name__ == "__main__":
    import tempfile, os
    old_data = "name,age\nAlice,30\nBob,25"
    new_data = "name,age\nAlice,31\nBob,25\nCharlie,40"
    f1 = tempfile.NamedTemporaryFile(mode='w', suffix='.csv', delete=False)
    f1.write(old_data)
    f1.close()
    f2 = tempfile.NamedTemporaryFile(mode='w', suffix='.csv', delete=False)
    f2.write(new_data)
    f2.close()
    diffs = csv_cell_diff(f1.name, f2.name)
    for d in diffs:
        print(f"Row {d['row']}, Column '{d['column']}': '{d['old']}' -> '{d['new']}'")
    os.unlink(f1.name)
    os.unlink(f2.name)

Output

stdout

Row 2, Column 'age': '30' -> '31'
Row 4, Column 'name': '' -> 'Charlie'
Row 4, Column 'age': '' -> '40'

How it works

The csv_cell_diff function opens both CSV files with csv.DictReader to map each row to a dictionary keyed by the header row. It pads missing rows with empty dictionaries so files with different row counts are handled gracefully. For each row index and each column, it compares the old and new values using .get(col, "") to avoid KeyError on missing columns. The result is a list of change records with the row (1‑indexed, header = row 1), column name, old value, and new value. The final __main__ block writes temporary CSV files to demonstrate the output, then cleans them up.

Common mistakes

Assuming both files have the same number of rows, causing an IndexError when iterating.
Forgetting that csv.DictReader does not load the whole file into memory — calling `list()` is necessary for random access.
Using a 0‑based row index instead of offsetting by 2 (header row + 1‑based indexing).
Not handling empty files or files with different column sets.

Variations

Use `pandas.read_csv` with `pandas.DataFrame.compare` for a DataFrame‑based diff.
Write differences to a new CSV with an additional 'change' column instead of printing.

Real-world use cases

Validating that a nightly data export matches the previous day's output before loading into a data warehouse.
Comparing configuration CSV files across deployment environments to catch unintentional drift.
Auditing student grade spreadsheets after bulk updates to identify which cells changed between versions.

How to Build a CSV Comparison Tool That Highlights Every Changed Cell in Python

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Files & data

Tutorials

Quizzes

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Files & data

Keep learning

Tutorials

Quizzes