Build a Python Utility That Verifies Backup Integrity Automatically
Automatically compute and verify SHA-256 checksums of backup files using a JSON manifest to detect missing or corrupted data.
Python code
47 linesimport hashlib
import os
import json
def compute_checksum(filepath, algorithm='sha256'):
"""Compute checksum for the given file."""
hash_func = hashlib.new(algorithm)
with open(filepath, 'rb') as f:
for chunk in iter(lambda: f.read(4096), b''):
hash_func.update(chunk)
return hash_func.hexdigest()
def verify_backup_integrity(backup_dir, manifest_file='integrity_check.json'):
"""Verify backup files against a stored manifest of checksums."""
if not os.path.isfile(manifest_file):
print(f"Manifest file {manifest_file} not found. Creating it now...")
manifest = {}
for root, _, files in os.walk(backup_dir):
for file in files:
filepath = os.path.join(root, file)
manifest[filepath] = compute_checksum(filepath)
with open(manifest_file, 'w') as f:
json.dump(manifest, f, indent=2)
print("Manifest created. Run again to verify.")
return
with open(manifest_file, 'r') as f:
manifest = json.load(f)
errors = []
for filepath, expected_checksum in manifest.items():
if not os.path.exists(filepath):
errors.append(f"Missing: {filepath}")
continue
current_checksum = compute_checksum(filepath)
if current_checksum != expected_checksum:
errors.append(f"Corrupted: {filepath} (expected {expected_checksum}, got {current_checksum})")
if errors:
print("Integrity check failed:")
for err in errors:
print(f" - {err}")
else:
print("Backup integrity verified successfully.")
if __name__ == "__main__":
verify_backup_integrity('/path/to/backup')
Output
Manifest file integrity_check.json not found. Creating it now...
Manifest created. Run again to verify.
# Second run:
Backup integrity verified successfully.
How it works
The script first walks the backup directory tree with os.walk(), computing a SHA-256 checksum for every file using hashlib. Each checksum is stored in a JSON manifest file. On subsequent runs, it reloads this manifest and recomputes checksums for every listed file, flagging any mismatches as corruption and any missing paths as deletions. This two-phase approach lets you detect both file tampering and accidental deletions.
Common mistakes
- Forgetting to update the manifest after adding new files — the script only creates the manifest once.
- Using the wrong algorithm name (e.g., 'sha' instead of 'sha256') which raises an exception.
- Storing the manifest inside the backup directory itself, causing a false recursive inclusion.
Variations
- Use `blake2b` instead of `sha256` for faster hashing on some systems.
- Add parallel hashing with `concurrent.futures.ThreadPoolExecutor` for large backup sets.
Real-world use cases
- Verifying nightly database backups before archiving to cloud storage.
- Monitoring a backup drive for silent data corruption between rotation cycles.
- Validating that a migrated file server copy matches the original checksums.
Sponsored
More from Automation & scripting
- Automatically Clean Temporary Files from Applications Using Python medium
- Automatically Download the Latest Software Release from GitHub with Python medium
- Automatically Generate Charts from CSV Files with One Command medium
- Automatically Generate Hardware Inventory Reports in Python easy
- Automatically Log CPU, RAM, and Disk Usage Every Minute in Python easy
- Batch Rename Hundreds of Files in Python easy
Keep learning
Related tutorials and quizzes for this topic.