Maintenance

Site is under maintenance — quizzes are still available.

Go to quizzes
Sponsored Reserved space — layout preview until AdSense is connected

How to Build a CSV Comparison Tool That Highlights Every Changed Cell in Python

Read two CSV files with DictReader, compare cell by cell, and return a list of dictionaries describing each changed cell using only the standard library.

Medium Python 3.9+ Jun 28, 2026 Files & data 2 views 0 copies

Python code

41 lines
Python 3.9+
import csv
from pathlib import Path

def csv_cell_diff(file_a: str, file_b: str) -> list[dict]:
    rows_a = list(csv.DictReader(Path(file_a).open('r', newline='')))
    rows_b = list(csv.DictReader(Path(file_b).open('r', newline='')))
    if not rows_a or not rows_b:
        return []
    columns = list(rows_a[0].keys())
    changes = []
    max_rows = max(len(rows_a), len(rows_b))
    for i in range(max_rows):
        row_a = rows_a[i] if i < len(rows_a) else {}
        row_b = rows_b[i] if i < len(rows_b) else {}
        for col in columns:
            val_a = row_a.get(col, "")
            val_b = row_b.get(col, "")
            if val_a != val_b:
                changes.append({
                    "row": i + 2,  # 1-indexed, header is row 1
                    "column": col,
                    "old": val_a,
                    "new": val_b
                })
    return changes

if __name__ == "__main__":
    import tempfile, os
    old_data = "name,age\nAlice,30\nBob,25"
    new_data = "name,age\nAlice,31\nBob,25\nCharlie,40"
    f1 = tempfile.NamedTemporaryFile(mode='w', suffix='.csv', delete=False)
    f1.write(old_data)
    f1.close()
    f2 = tempfile.NamedTemporaryFile(mode='w', suffix='.csv', delete=False)
    f2.write(new_data)
    f2.close()
    diffs = csv_cell_diff(f1.name, f2.name)
    for d in diffs:
        print(f"Row {d['row']}, Column '{d['column']}': '{d['old']}' -> '{d['new']}'")
    os.unlink(f1.name)
    os.unlink(f2.name)

Output

stdout
Row 2, Column 'age': '30' -> '31'
Row 4, Column 'name': '' -> 'Charlie'
Row 4, Column 'age': '' -> '40'

How it works

The csv_cell_diff function opens both CSV files with csv.DictReader to map each row to a dictionary keyed by the header row. It pads missing rows with empty dictionaries so files with different row counts are handled gracefully. For each row index and each column, it compares the old and new values using .get(col, "") to avoid KeyError on missing columns. The result is a list of change records with the row (1‑indexed, header = row 1), column name, old value, and new value. The final __main__ block writes temporary CSV files to demonstrate the output, then cleans them up.

Common mistakes

  • Assuming both files have the same number of rows, causing an IndexError when iterating.
  • Forgetting that csv.DictReader does not load the whole file into memory — calling `list()` is necessary for random access.
  • Using a 0‑based row index instead of offsetting by 2 (header row + 1‑based indexing).
  • Not handling empty files or files with different column sets.

Variations

  1. Use `pandas.read_csv` with `pandas.DataFrame.compare` for a DataFrame‑based diff.
  2. Write differences to a new CSV with an additional 'change' column instead of printing.

Real-world use cases

  • Validating that a nightly data export matches the previous day's output before loading into a data warehouse.
  • Comparing configuration CSV files across deployment environments to catch unintentional drift.
  • Auditing student grade spreadsheets after bulk updates to identify which cells changed between versions.

Sponsored

Sponsored Reserved space — layout preview until AdSense is connected

Run this sample

Open the browser IDE to tweak the example and see results without installing anything.

Open editor

More from Files & data

Related tutorials and quizzes for this topic.