Maintenance

Site is under maintenance — quizzes are still available.

Go to quizzes
Sponsored Reserved space — layout preview until AdSense is connected
Reference library

Files & data

Read and write files safely; parse JSON, CSV, and common text formats.

1 match
Sponsored Reserved space — layout preview until AdSense is connected
Files & data medium

Find Duplicate Web Pages by Content Similarity in Python

Compute SHA-256 hashes of file contents to detect and report duplicate HTML pages or any files in a directory.

duplicate-detection hashing sha256
Python
import hashlib
import os
from collections import defaultdict

def get_file_hash(filepath):
    """Compute SHA-256 hash of file contents."""
    sha256 = hashlib.sha256()
    with open(filepath, 'rb') as f:
        for chunk in iter(lambda: f.read(4096), b''):
            sha256.update(chunk)
    return sha256.hexdiges…
2 0 Open

Browse by section

Each section groups closely related Python snippets.

Files & data — Python code examples

What you will find here

This page collects files & data snippets — short, copy-ready Python you can paste into our free online IDE and run without installing anything. Each sample includes a plain-English explanation and the full source code.

Samples vs tutorials and challenges

Samples are quick reference — one concept per page. For step-by-step teaching, use our Python tutorials. To test yourself, try quizzes or coding challenges. Clean up style with the Python formatter.