Maintenance

Site is under maintenance — quizzes are still available.

Go to quizzes
Sponsored Reserved space — layout preview until AdSense is connected

Find Orphan Files Not Referenced Anywhere in Python

Scan a project directory for files whose names never appear in the content of other files, identifying potentially unused resources.

Medium Python 3.9+ Jun 28, 2026 Automation & scripting 3 views 0 copies

Python code

37 lines
Python 3.9+
import os
from pathlib import Path
import re

def find_orphan_files(root_dir: str, extensions: set = None, ignore_patterns: list = None):
    """Find files not referenced by any other file in the project."""
    if extensions is None:
        extensions = {'.txt', '.md', '.py', '.html', '.css', '.js', '.json', '.yaml', '.yml'}
    if ignore_patterns is None:
        ignore_patterns = ['.git', '__pycache__', '.DS_Store']

    root = Path(root_dir)
    all_files = []
    references = set()

    for filepath in root.rglob('*'):
        if filepath.is_file() and filepath.suffix in extensions:
            if not any(part.startswith(pattern.rstrip('*')) for pattern in ignore_patterns for part in filepath.parts):
                all_files.append(filepath)
                try:
                    content = filepath.read_text(encoding='utf-8', errors='ignore')
                    # Find references in content (simple pattern: filename without path)
                    for ref_file in all_files[:-1]:  # Check against previously found files
                        if ref_file.name in content:
                            references.add(ref_file)
                except Exception:
                    pass

    orphan_files = [f for f in all_files if f not in references]
    return orphan_files

if __name__ == "__main__":
    example_dir = "."  # Current directory
    orphans = find_orphan_files(example_dir)
    print(f"Found {len(orphans)} orphan file(s):")
    for orphan in orphans:
        print(f"  {orphan}")

Output

stdout
Found 2 orphan file(s):
  ./unused_config.old.json
  ./readme_backup.md

How it works

The function walks the directory tree, collects files with given extensions, then for each file reads its text content and uses a simple substring check to see if any other file's name appears. Files whose names are never referenced are returned as orphans. The approach is intentionally lightweight—it only checks filenames, not full paths, so it won't catch references with relative paths or aliases. For larger projects, consider a more thorough parser respecting imports or includes.

Common mistakes

  • Checking against all_files including the current file itself, which would never be an orphan.
  • Using a set for all_files and losing ordering, causing inconsistent results.
  • Forgetting to ignore hidden directories like .git or __pycache__ leading to false orphans.

Variations

  1. Switch to checking full paths (str(filepath)) to catch references with directory prefixes.
  2. Use a more robust regex to match only complete filenames (e.g., r'\b' + re.escape(ref_file.name) + r'\b') to avoid partial matches.

Real-world use cases

  • Clean up stale assets in a static site or documentation project.
  • Prepare a pull request that removes unused configuration files before a deployment.
  • Audit a legacy codebase for leftover test fixtures or data files that are no longer imported.

Sponsored

Sponsored Reserved space — layout preview until AdSense is connected

Run this sample

Open the browser IDE to tweak the example and see results without installing anything.

Open editor

More from Automation & scripting

Related tutorials and quizzes for this topic.