How to Build a CSV Comparison Tool That Highlights Every Changed Cell in Python
Read two CSV files with DictReader, compare cell by cell, and return a list of dictionaries describing each changed cell using only the standard library.
Python code
41 linesimport csv
from pathlib import Path
def csv_cell_diff(file_a: str, file_b: str) -> list[dict]:
rows_a = list(csv.DictReader(Path(file_a).open('r', newline='')))
rows_b = list(csv.DictReader(Path(file_b).open('r', newline='')))
if not rows_a or not rows_b:
return []
columns = list(rows_a[0].keys())
changes = []
max_rows = max(len(rows_a), len(rows_b))
for i in range(max_rows):
row_a = rows_a[i] if i < len(rows_a) else {}
row_b = rows_b[i] if i < len(rows_b) else {}
for col in columns:
val_a = row_a.get(col, "")
val_b = row_b.get(col, "")
if val_a != val_b:
changes.append({
"row": i + 2, # 1-indexed, header is row 1
"column": col,
"old": val_a,
"new": val_b
})
return changes
if __name__ == "__main__":
import tempfile, os
old_data = "name,age\nAlice,30\nBob,25"
new_data = "name,age\nAlice,31\nBob,25\nCharlie,40"
f1 = tempfile.NamedTemporaryFile(mode='w', suffix='.csv', delete=False)
f1.write(old_data)
f1.close()
f2 = tempfile.NamedTemporaryFile(mode='w', suffix='.csv', delete=False)
f2.write(new_data)
f2.close()
diffs = csv_cell_diff(f1.name, f2.name)
for d in diffs:
print(f"Row {d['row']}, Column '{d['column']}': '{d['old']}' -> '{d['new']}'")
os.unlink(f1.name)
os.unlink(f2.name)
Output
Row 2, Column 'age': '30' -> '31'
Row 4, Column 'name': '' -> 'Charlie'
Row 4, Column 'age': '' -> '40'
How it works
The csv_cell_diff function opens both CSV files with csv.DictReader to map each row to a dictionary keyed by the header row. It pads missing rows with empty dictionaries so files with different row counts are handled gracefully. For each row index and each column, it compares the old and new values using .get(col, "") to avoid KeyError on missing columns. The result is a list of change records with the row (1‑indexed, header = row 1), column name, old value, and new value. The final __main__ block writes temporary CSV files to demonstrate the output, then cleans them up.
Common mistakes
- Assuming both files have the same number of rows, causing an IndexError when iterating.
- Forgetting that csv.DictReader does not load the whole file into memory — calling `list()` is necessary for random access.
- Using a 0‑based row index instead of offsetting by 2 (header row + 1‑based indexing).
- Not handling empty files or files with different column sets.
Variations
- Use `pandas.read_csv` with `pandas.DataFrame.compare` for a DataFrame‑based diff.
- Write differences to a new CSV with an additional 'change' column instead of printing.
Real-world use cases
- Validating that a nightly data export matches the previous day's output before loading into a data warehouse.
- Comparing configuration CSV files across deployment environments to catch unintentional drift.
- Auditing student grade spreadsheets after bulk updates to identify which cells changed between versions.
Sponsored
More from Files & data
- Audit File Permissions Across a Project in Python easy
- Automatically Detect Corrupted Files Using SHA-256 Checksums in Python easy
- Automatically Highlight Data Validation Errors Inside Excel Files in Python easy
- Build a Command-Line To-Do List Application with Data Persistence in Python easy
- Build a Personal Work Hours Tracker in Python medium
- Build a Python Script That Detects and Deletes Empty Files Across Folders easy
Keep learning
Related tutorials and quizzes for this topic.