Build a Python Utility That Detects Duplicate Records Across Multiple Excel Sheets

A Python utility that uses pandas to find overlapping records across different Excel sheets based on specified key columns.

Medium Python 3.9+ Jun 28, 2026 Data pipelines & processing 2 views 0 copies

pandas excel data cleaning duplicates file processing

Requires third-party packages — install first

pip install pandas openpyxl

Python code

44 lines

Python 3.9+

import pandas as pd
from pathlib import Path

def find_duplicate_records_across_sheets(file_path: str, key_columns: list, sheet_names: list) -> dict:
    """
    Detect duplicate records across multiple Excel sheets based on specified key columns.
    
    Args:
        file_path: Path to the Excel file
        key_columns: List of column names to check for duplicates
        sheet_names: List of sheet names to compare
    
    Returns:
        Dictionary with duplicate records per sheet pair
    """
    xls = pd.ExcelFile(file_path)
    duplicates = {}
    
    for i, sheet1 in enumerate(sheet_names):
        df1 = pd.read_excel(xls, sheet_name=sheet1)
        for j, sheet2 in enumerate(sheet_names):
            if i >= j:
                continue
            df2 = pd.read_excel(xls, sheet_name=sheet2)
            merged = df1.merge(df2, on=key_columns, how='inner', suffixes=(f'_{sheet1}', f'_{sheet2}'))
            if not merged.empty:
                duplicates[f'{sheet1}_vs_{sheet2}'] = merged
    
    return duplicates

if __name__ == "__main__":
    sample_file = Path("sample_data.xlsx")
    if sample_file.exists():
        result = find_duplicate_records_across_sheets(
            "sample_data.xlsx", 
            key_columns=["ID"], 
            sheet_names=["Sheet1", "Sheet2", "Sheet3"]
        )
        for pair, records in result.items():
            print(f"Duplicate records in {pair}:")
            print(records.to_string(index=False))
            print()
    else:
        print("Create sample_data.xlsx with multiple sheets for testing.")

Output

stdout

Duplicate records in Sheet1_vs_Sheet2:
ID  Name_Sheet1  Age_Sheet1 Name_Sheet2  Age_Sheet2
1   Alice        30        Alice        30
2   Bob          25        Bob          25

Duplicate records in Sheet1_vs_Sheet3:
ID  Name_Sheet1  Age_Sheet1 Name_Sheet3  Age_Sheet3
3   Charlie      35        Charlie      35

How it works

The function loads an Excel file once with pd.ExcelFile for efficiency, then iterates over unique sheet pairs (using i >= j to avoid duplicates and self-comparison). For each pair, an inner merge on the specified key columns identifies records that exist in both sheets. The suffixes f'_{sheet1}' and f'_{sheet2}' prevent column name collisions when sheets share column names. The result is a dictionary of DataFrames, one per sheet pair, containing the full duplicate records.

Common mistakes

Forgetting to install `openpyxl` or `xlrd` as the Excel engine
Not excluding self-comparisons (Sheet1 vs Sheet1) leading to redundant output
Assuming exact column name matches across sheets without handling case or whitespace
Using a single key column when data requires composite keys for uniqueness

Variations

Use `df1[df1.duplicated(subset=key_columns, keep=False)]` to find duplicates within a single sheet instead of across sheets
Output duplicates to a consolidated CSV file with `pd.concat(result.values()).to_csv('duplicates.csv', index=False)`

Real-world use cases

Reconciling monthly sales data where customer records may appear across multiple regional sheets.
Auditing employee databases from different departments stored in separate Excel tabs.
Merging inventory lists from multiple warehouses before running a stock redistribution report.

Build a Python Utility That Detects Duplicate Records Across Multiple Excel Sheets

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Data pipelines & processing

Tutorials

Quizzes

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Data pipelines & processing

Keep learning

Tutorials

Quizzes