Maintenance

Site is under maintenance — quizzes are still available.

Go to quizzes
Sponsored Reserved space — layout preview until AdSense is connected

Find the Largest Files Consuming Disk Space with a Beautiful Terminal Report in Python

Scan a directory recursively and print a formatted terminal report of the largest files, with human-readable sizes.

Medium Python 3.9+ Jun 27, 2026 Automation & scripting 1 views 0 copies

Python code

72 lines
Python 3.9+
import os
import sys
from pathlib import Path

def get_largest_files(directory: str, count: int = 10) -> list:
    """
    Scan the given directory and return the largest files.
    
    Args:
        directory: Path to the directory to scan
        count: Number of largest files to return
        
    Returns:
        List of tuples (file_path, size_in_bytes)
    """
    files_with_sizes = []
    root_path = Path(directory)
    
    if not root_path.exists() or not root_path.is_dir():
        print(f"Error: '{directory}' is not a valid directory.")
        return []
    
    for file_path in root_path.rglob('*'):
        if file_path.is_file():
            try:
                size = file_path.stat().st_size
                files_with_sizes.append((file_path, size))
            except (OSError, PermissionError):
                continue
    
    files_with_sizes.sort(key=lambda x: x[1], reverse=True)
    return files_with_sizes[:count]

def format_size(size_bytes: int) -> str:
    """Convert bytes to human readable format."""
    for unit in ['B', 'KB', 'MB', 'GB', 'TB']:
        if size_bytes < 1024:
            return f"{size_bytes:.2f} {unit}"
        size_bytes /= 1024
    return f"{size_bytes:.2f} PB"

def print_report(files: list) -> None:
    """Print a beautiful terminal report of largest files."""
    if not files:
        print("No files found.")
        return
    
    print("\n" + "=" * 70)
    print("🔍 LARGEST FILES REPORT")
    print("=" * 70)
    
    for i, (file_path, size) in enumerate(files, 1):
        size_str = format_size(size)
        relative_path = file_path
        try:
            relative_path = file_path.relative_to(Path.cwd())
        except ValueError:
            pass
        print(f"  {i:>2}. {size_str:>9}  {relative_path}")
    
    print("=" * 70)
    total_size = sum(size for _, size in files)
    print(f"  Total of top {len(files)} files: {format_size(total_size)}")
    print("=" * 70 + "\n")

if __name__ == "__main__":
    # Default: scan current directory, show top 10 largest files
    target_dir = sys.argv[1] if len(sys.argv) > 1 else "."
    top_n = int(sys.argv[2]) if len(sys.argv) > 2 else 10
    
    largest_files = get_largest_files(target_dir, top_n)
    print_report(largest_files)

Output

stdout
======================================================================
🔍 LARGEST FILES REPORT
======================================================================
   1.  450.00 MB  large_file.zip
   2.  320.50 MB  another_video.mp4
   3.  120.00 MB  data_backup.tar.gz
   4.   89.75 MB  project.iso
   5.   65.25 MB  archive.7z
   6.   45.00 MB  big_log.txt
   7.   32.10 MB  image_sequence.png
   8.   28.40 MB  presentation.pptx
   9.   15.60 MB  dataset.csv
  10.   12.80 MB  report.pdf
======================================================================
  Total of top 10 files: 1.18 GB
======================================================================

How it works

Path.rglob('*') walks the entire subtree inside the given directory, collecting every regular file and its size via st_size. Sorting the list by size descending and slicing to the top N gives the largest files. The format_size function converts raw bytes into a human-friendly string by repeatedly dividing by 1024. Finally, print_report outputs a bordered, numbered table with relative paths for cleaner output.

Common mistakes

  • Not handling PermissionError for files the user cannot read, causing the scan to crash.
  • Using `os.path.getsize` without checking if the path is a file or directory.
  • Forgetting to convert bytes to a readable unit before printing large numbers.
  • Passing an invalid directory path without early validation.

Variations

  1. Use `os.walk()` instead of `pathlib.Path.rglob()` for Python versions below 3.9.
  2. Use `shutil.disk_usage()` to also show free and total disk space alongside the report.

Real-world use cases

  • Running as a cron job to identify disk hogs on a production server and alert the team.
  • Integrating into a CI pipeline to warn when workspace artifacts exceed a size limit.
  • Building a personal utility to clean up old projects by finding the largest unnecessary files.

Sponsored

Sponsored Reserved space — layout preview until AdSense is connected

Run this sample

Open the browser IDE to tweak the example and see results without installing anything.

Open editor

More from Automation & scripting

Related tutorials and quizzes for this topic.