Maintenance

Site is under maintenance — quizzes are still available.

Go to quizzes
Sponsored Reserved space — layout preview until AdSense is connected
Reference library

Data pipelines & processing

ETL-style flows, batch transforms, validation, and moving data between formats.

3 matches
Sponsored Reserved space — layout preview until AdSense is connected
Data pipelines & processing medium

Build a Python Utility That Detects Duplicate Records Across Multiple Excel Sheets

A Python utility that uses pandas to find overlapping records across different Excel sheets based on specified key columns.

pandas excel data cleaning
Python
import pandas as pd
from pathlib import Path

def find_duplicate_records_across_sheets(file_path: str, key_columns: list, sheet_names: list) -> dict:
    """
    Detect duplicate records across multiple Excel sheets based on specified key columns.
    
    Args:
        file_path: Path to the Excel file
        key_co…
1 0 Open
Data pipelines & processing medium

Extract Schema.org Structured Data from Any Website in Python

A Python tool that fetches a webpage and extracts all JSON-LD structured data (Schema.org) embedded in <script> tags with type="application/ld+json".

web-scraping structured-data schema-org
Python
import requests
from bs4 import BeautifulSoup
import json

def extract_schema_org(url):
    """Extract structured data (Schema.org) from a website."""
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()
    except requests.exceptions.RequestException as e:
        return {"err…
2 0 Open
Data pipelines & processing medium

How to Find Missing Values in Large Datasets in Python

Analyze missing values across multiple large pandas DataFrames with counts and percentages.

pandas missing-data data-cleaning
Python
import pandas as pd
import numpy as np

def find_missing_values_summary(datasets):
    """Analyze missing values across multiple datasets (dict of name: DataFrame)."""
    summary = {}
    for name, df in datasets.items():
        missing_count = df.isnull().sum()
        total_rows = len(df)
        missing_pct = (mi…
3 0 Open

Browse by section

Each section groups closely related Python snippets.

Data pipelines & processing — Python code examples

What you will find here

This page collects data pipelines & processing snippets — short, copy-ready Python you can paste into our free online IDE and run without installing anything. Each sample includes a plain-English explanation and the full source code.

Samples vs tutorials and challenges

Samples are quick reference — one concept per page. For step-by-step teaching, use our Python tutorials. To test yourself, try quizzes or coding challenges. Clean up style with the Python formatter.