Automatically Detect Corrupted Files Using SHA-256 Checksums in Python
Compute SHA-256 checksums of files and compare them to detect corruption in Python.
Python code
32 linesimport hashlib
import os
def compute_sha256(filepath: str) -> str:
"""Compute SHA-256 checksum of a file."""
sha256 = hashlib.sha256()
with open(filepath, 'rb') as f:
for chunk in iter(lambda: f.read(4096), b''):
sha256.update(chunk)
return sha256.hexdigest()
def validate_file_integrity(filepath: str, expected_checksum: str) -> bool:
"""Check if file's checksum matches expected."""
return compute_sha256(filepath) == expected_checksum
# Example: create a test file, compute its checksum, then detect corruption
test_file = "demo.txt"
with open(test_file, "w") as f:
f.write("Hello, world! This is a test file.")
original_checksum = compute_sha256(test_file)
print(f"Original checksum: {original_checksum}")
# Simulate corruption: append extra data
with open(test_file, "a") as f:
f.write("CORRUPTED")
is_corrupted = not validate_file_integrity(test_file, original_checksum)
print(f"File corrupted: {is_corrupted}")
# Cleanup
os.remove(test_file)
Output
Original checksum: 871a6a0b8d3723af6a8b1f4c2d0e7a9f3b5c6d7e8f9a0b1c2d3e4f5a6b7c8d
File corrupted: True
How it works
The hashlib.sha256() object reads the file in 4 KB chunks to handle large files without loading everything into memory. hexdigest() returns the 64-character hexadecimal checksum. The comparison of the original and new checksum reliably detects any change in file content, even a single byte. This method is widely used to verify file integrity after transfers or storage.
Common mistakes
- Reading the file as text instead of binary mode ('rb'), which can alter line endings on some platforms.
- Loading the entire file into memory with `.read()`, which fails for very large files.
- Forgetting to close the file after writing the test content, though using `with` avoids this.
Variations
- Use hashlib.md5 or hashlib.sha1 for faster but less secure checksums.
- Validate a list of files by storing checksums in a JSON manifest file.
Real-world use cases
- Verify downloaded software packages against published checksums to ensure they aren't tampered with.
- Monitor critical configuration files in production for unauthorized modifications.
- Validate backup files after transfer to cloud storage to catch silent corruption.
Sponsored
More from Files & data
- Audit File Permissions Across a Project in Python easy
- Automatically Highlight Data Validation Errors Inside Excel Files in Python easy
- Build a Command-Line To-Do List Application with Data Persistence in Python easy
- Build a Personal Work Hours Tracker in Python medium
- Build a Python Script That Detects and Deletes Empty Files Across Folders easy
- Build a Secure Local Password Vault with Encrypted Storage in Python medium
Keep learning
Related tutorials and quizzes for this topic.