Data pipelines & processing
ETL-style flows, batch transforms, validation, and moving data between formats.
Build a Python Utility That Detects Duplicate Records Across Multiple Excel Sheets
A Python utility that uses pandas to find overlapping records across different Excel sheets based on specified key columns.
import pandas as pd
from pathlib import Path
def find_duplicate_records_across_sheets(file_path: str, key_columns: list, sheet_names: list) -> dict:
"""
Detect duplicate records across multiple Excel sheets based on specified key columns.
Args:
file_path: Path to the Excel file
key_co…
How to Find Missing Values in Large Datasets in Python
Analyze missing values across multiple large pandas DataFrames with counts and percentages.
import pandas as pd
import numpy as np
def find_missing_values_summary(datasets):
"""Analyze missing values across multiple datasets (dict of name: DataFrame)."""
summary = {}
for name, df in datasets.items():
missing_count = df.isnull().sum()
total_rows = len(df)
missing_pct = (mi…
Browse by section
Each section groups closely related Python snippets.
Data pipelines & processing — Python code examples
What you will find here
This page collects data pipelines & processing snippets — short, copy-ready Python you can paste into our free online IDE and run without installing anything. Each sample includes a plain-English explanation and the full source code.
Samples vs tutorials and challenges
Samples are quick reference — one concept per page. For step-by-step teaching, use our Python tutorials. To test yourself, try quizzes or coding challenges. Clean up style with the Python formatter.