Track File Changes with Version History in Python
A Python utility that monitors a file for changes, creating versioned backups with SHA-256 hashing to detect modifications and store a local JSON history.
Python code
40 linesimport hashlib, json, os, shutil, time
from pathlib import Path
class FileTracker:
def __init__(self, history_file="file_history.json"):
self.history_file = Path(history_file)
self.history = self._load_history()
def _load_history(self):
if self.history_file.exists():
return json.loads(self.history_file.read_text())
return {}
def _save_history(self):
self.history_file.write_text(json.dumps(self.history, indent=2))
def _file_hash(self, path):
return hashlib.sha256(Path(path).read_bytes()).hexdigest()
def track(self, path):
path = Path(path)
if not path.exists():
print(f"Error: {path} does not exist.")
return
file_hash = self._file_hash(path)
versions = self.history.get(str(path), [])
if not versions or versions[-1]["hash"] != file_hash:
version_path = path.with_name(f"{path.stem}_v{len(versions)+1}{path.suffix}")
shutil.copy2(path, version_path)
entry = {"version": len(versions)+1, "hash": file_hash, "timestamp": time.time(), "file": str(version_path)}
versions.append(entry)
self.history[str(path)] = versions
self._save_history()
print(f"Version {entry['version']} saved as {version_path}")
else:
print(f"No changes detected for {path}")
if __name__ == "__main__":
ft = FileTracker()
ft.track("test.txt")
Output
Version 1 saved as test_v1.txt
No changes detected for test.txt
How it works
The FileTracker class stores a JSON history of file versions keyed by original path. Each track() call computes the SHA-256 hash of the file and compares it with the latest version stored. If the hash differs, the file is copied with a _v{n} suffix appended before the extension. The version counter and timestamp are saved, giving a lightweight audit trail. This approach works with any file type and uses only the standard library, making it portable and dependency-free.
Common mistakes
- Forgetting to handle the case where the tracking file itself is modified, causing infinite version loops.
- Not using `shutil.copy2` to preserve metadata, or relying on `shutil.copy` which loses timestamps.
- Hard-coding paths instead of using `Path` from `pathlib` for cross-platform compatibility.
Variations
- Use `filecmp.cmp` instead of hashing for faster binary comparison on large files.
- Store history in a SQLite database for querying or rollback features.
Real-world use cases
- Backing up configuration files before automated edits in CI/CD pipelines.
- Monitoring log files for rotation or corruption in production environments.
- Tracking changes to data dumps or reports during batch processing jobs.
Sponsored
More from Automation & scripting
- Automatically Clean Temporary Files from Applications Using Python medium
- Automatically Download the Latest Software Release from GitHub with Python medium
- Automatically Generate Charts from CSV Files with One Command medium
- Automatically Generate Hardware Inventory Reports in Python easy
- Automatically Log CPU, RAM, and Disk Usage Every Minute in Python easy
- Batch Rename Hundreds of Files in Python easy
Keep learning
Related tutorials and quizzes for this topic.