Find the Largest Files Consuming Disk Space with a Beautiful Terminal Report in Python
Scan a directory recursively and print a formatted terminal report of the largest files, with human-readable sizes.
Python code
72 linesimport os
import sys
from pathlib import Path
def get_largest_files(directory: str, count: int = 10) -> list:
"""
Scan the given directory and return the largest files.
Args:
directory: Path to the directory to scan
count: Number of largest files to return
Returns:
List of tuples (file_path, size_in_bytes)
"""
files_with_sizes = []
root_path = Path(directory)
if not root_path.exists() or not root_path.is_dir():
print(f"Error: '{directory}' is not a valid directory.")
return []
for file_path in root_path.rglob('*'):
if file_path.is_file():
try:
size = file_path.stat().st_size
files_with_sizes.append((file_path, size))
except (OSError, PermissionError):
continue
files_with_sizes.sort(key=lambda x: x[1], reverse=True)
return files_with_sizes[:count]
def format_size(size_bytes: int) -> str:
"""Convert bytes to human readable format."""
for unit in ['B', 'KB', 'MB', 'GB', 'TB']:
if size_bytes < 1024:
return f"{size_bytes:.2f} {unit}"
size_bytes /= 1024
return f"{size_bytes:.2f} PB"
def print_report(files: list) -> None:
"""Print a beautiful terminal report of largest files."""
if not files:
print("No files found.")
return
print("\n" + "=" * 70)
print("🔍 LARGEST FILES REPORT")
print("=" * 70)
for i, (file_path, size) in enumerate(files, 1):
size_str = format_size(size)
relative_path = file_path
try:
relative_path = file_path.relative_to(Path.cwd())
except ValueError:
pass
print(f" {i:>2}. {size_str:>9} {relative_path}")
print("=" * 70)
total_size = sum(size for _, size in files)
print(f" Total of top {len(files)} files: {format_size(total_size)}")
print("=" * 70 + "\n")
if __name__ == "__main__":
# Default: scan current directory, show top 10 largest files
target_dir = sys.argv[1] if len(sys.argv) > 1 else "."
top_n = int(sys.argv[2]) if len(sys.argv) > 2 else 10
largest_files = get_largest_files(target_dir, top_n)
print_report(largest_files)
Output
======================================================================
🔍 LARGEST FILES REPORT
======================================================================
1. 450.00 MB large_file.zip
2. 320.50 MB another_video.mp4
3. 120.00 MB data_backup.tar.gz
4. 89.75 MB project.iso
5. 65.25 MB archive.7z
6. 45.00 MB big_log.txt
7. 32.10 MB image_sequence.png
8. 28.40 MB presentation.pptx
9. 15.60 MB dataset.csv
10. 12.80 MB report.pdf
======================================================================
Total of top 10 files: 1.18 GB
======================================================================
How it works
Path.rglob('*') walks the entire subtree inside the given directory, collecting every regular file and its size via st_size. Sorting the list by size descending and slicing to the top N gives the largest files. The format_size function converts raw bytes into a human-friendly string by repeatedly dividing by 1024. Finally, print_report outputs a bordered, numbered table with relative paths for cleaner output.
Common mistakes
- Not handling PermissionError for files the user cannot read, causing the scan to crash.
- Using `os.path.getsize` without checking if the path is a file or directory.
- Forgetting to convert bytes to a readable unit before printing large numbers.
- Passing an invalid directory path without early validation.
Variations
- Use `os.walk()` instead of `pathlib.Path.rglob()` for Python versions below 3.9.
- Use `shutil.disk_usage()` to also show free and total disk space alongside the report.
Real-world use cases
- Running as a cron job to identify disk hogs on a production server and alert the team.
- Integrating into a CI pipeline to warn when workspace artifacts exceed a size limit.
- Building a personal utility to clean up old projects by finding the largest unnecessary files.
Sponsored
More from Automation & scripting
- Batch Rename Hundreds of Files in Python easy
- Build a Command-Line Password Generator in Python easy
- Build a Complete Web Scraper with Requests and BeautifulSoup in Python medium
- Build a Network Ping Monitor in Python medium
- Create a Local Search Engine to Instantly Find Files on Your Computer in Python medium
- Create a Simple HTTP File Server in Python easy
Keep learning
Related tutorials and quizzes for this topic.