Automatically Highlight Data Validation Errors Inside Excel Files in Python
Load an Excel file with openpyxl, iterate over cells, and highlight invalid data (empty, negative) with a red fill and error message.
pip install openpyxl
Python code
48 linesimport openpyxl
from openpyxl.styles import PatternFill
from pathlib import Path
def highlight_validation_errors(filepath: str, output_path: str = None):
wb = openpyxl.load_workbook(filepath)
red_fill = PatternFill(start_color="FF0000", end_color="FF0000", fill_type="solid")
for sheet in wb.worksheets:
for row in sheet.iter_rows(min_row=2): # Skip header row
for cell in row:
value = cell.value
error = False
# Example validation: non-null, non-negative numbers
if value is None or value == "":
error = True
elif isinstance(value, (int, float)) and value < 0:
error = True
elif isinstance(value, str) and len(value.strip()) == 0:
error = True
if error:
cell.fill = red_fill
cell.value = f"ERROR: {value}" if value is not None else "ERROR: empty"
output = output_path or filepath.replace(".xlsx", "_validated.xlsx")
wb.save(output)
print(f"Errors highlighted and saved to: {output}")
return output
if __name__ == "__main__":
import tempfile
from openpyxl import Workbook
# Create a sample workbook with errors
test_wb = Workbook()
ws = test_wb.active
ws.append(["Name", "Age", "Score"])
ws.append(["Alice", 30, 95])
ws.append(["Bob", -5, 0]) # Negative age error
ws.append(["", 25, 100]) # Empty name error
ws.append(["Dave", None, 88]) # None age error
sample_path = Path(tempfile.gettempdir()) / "test_errors.xlsx"
test_wb.save(sample_path)
highlight_validation_errors(str(sample_path))
Output
Errors highlighted and saved to: /tmp/test_errors_validated.xlsx
How it works
The script uses openpyxl's PatternFill to apply a red background to cells that fail validation. It skips the first header row with min_row=2, checks for None, empty strings, negative numbers, and blank whitespace-only strings. Each flagged cell gets its value replaced with an ERROR: prefix so the problem is instantly visible. The modified workbook is saved to a new file (default _validated.xlsx suffix) to avoid overwriting the original.
Common mistakes
- Forgetting to skip the header row, causing column names to be flagged as errors.
- Using `end_color` wrong in `PatternFill` (both must match for solid fill).
- Not installing openpyxl before running the script.
Variations
- Use conditional formatting (`openpyxl.formatting.rule.CellIsRule`) for non-destructive highlighting without changing cell values.
- Log errors to a separate summary sheet or JSON file instead of overwriting cell data.
Real-world use cases
- Flagging negative inventory quantities in a warehouse stock spreadsheet before month-end reporting.
- Highlighting missing or blank employee IDs in HR payroll Excel exports during data migration.
- Automating QA for CSV-to-Excel data ingestion where source fields may contain invalid values.
Sponsored
More from Files & data
- Audit File Permissions Across a Project in Python easy
- Automatically Detect Corrupted Files Using SHA-256 Checksums in Python easy
- Build a Command-Line To-Do List Application with Data Persistence in Python easy
- Build a Personal Work Hours Tracker in Python medium
- Build a Python Script That Detects and Deletes Empty Files Across Folders easy
- Build a Secure Local Password Vault with Encrypted Storage in Python medium
Keep learning
Related tutorials and quizzes for this topic.