How to Detect Unused Images in a Project with Python
A Python script that scans a website project folder, identifies all image files, and checks HTML/CSS/JS files to find which images are never referenced.
Python code
63 linesimport os
import re
from pathlib import Path
def find_unused_images(project_path):
image_exts = {'.png', '.jpg', '.jpeg', '.gif', '.svg', '.webp'}
used_images = set()
all_images = set()
# Find all image files
for root, _, files in os.walk(project_path):
for file in files:
if Path(file).suffix.lower() in image_exts:
all_images.add(os.path.join(root, file))
# Find references in HTML/CSS/JS files
for root, _, files in os.walk(project_path):
for file in files:
if file.endswith(('.html', '.css', '.js')):
filepath = os.path.join(root, file)
try:
with open(filepath, 'r', encoding='utf-8', errors='ignore') as f:
content = f.read()
# Find image references in src, url(), etc.
refs = re.findall(r'[\'"]?([\w./-]+\.(?:png|jpg|jpeg|gif|svg|webp))[\'"]?', content, re.IGNORECASE)
for ref in refs:
# Resolve relative paths
abs_path = os.path.normpath(os.path.join(os.path.dirname(filepath), ref))
if os.path.exists(abs_path):
used_images.add(abs_path)
except (IOError, UnicodeDecodeError):
continue
unused = all_images - used_images
return sorted(unused)
if __name__ == "__main__":
import tempfile
import shutil
# Create test project structure
test_dir = tempfile.mkdtemp()
os.makedirs(os.path.join(test_dir, 'images'))
os.makedirs(os.path.join(test_dir, 'css'))
# Create some test files
Path(os.path.join(test_dir, 'images', 'logo.png')).touch()
Path(os.path.join(test_dir, 'images', 'banner.jpg')).touch()
Path(os.path.join(test_dir, 'images', 'old_bg.gif')).touch()
Path(os.path.join(test_dir, 'images', 'icon.svg')).touch()
with open(os.path.join(test_dir, 'index.html'), 'w') as f:
f.write('<img src="images/logo.png" alt="Logo">')
with open(os.path.join(test_dir, 'css', 'style.css'), 'w') as f:
f.write('background: url("../images/banner.jpg");')
unused = find_unused_images(test_dir)
print("Unused images found:")
for img in unused:
print(f" {os.path.relpath(img, test_dir)}")
shutil.rmtree(test_dir)
Output
Unused images found:
images/icon.svg
images/old_bg.gif
How it works
The script walks the entire project directory twice: once to collect all image paths, then again to scan source files for string references to images using a regular expression. References found in HTML src attributes, CSS url() calls, and JavaScript strings are resolved to absolute paths and recorded. Unused images are those in the full set minus the referenced set. The regex captures common extensions case-insensitively, and the script ignores read errors gracefully.
Common mistakes
- Forgetting to account for relative paths in CSS, which may use `../images/` style references
- Not including all common image extensions like `.webp` or `.svg` in the regex
- Assuming all file names are lowercase; the script handles case-insensitive extensions but reference paths may still vary
Variations
- Use `pathlib.rglob('*')` to simplify finding all image and source files in one pass.
- Add a dry-run mode that only reports without deleting, or integrate with a codebase cleanup tool.
Real-world use cases
- Cleaning up legacy website repos before deployment to reduce repository size and build time.
- Auditing image assets in a CMS static export to remove orphaned files from the server.
- Running in CI/CD pipelines to alert on unused assets and keep the project lean.
Sponsored
More from Automation & scripting
- Automatically Clean Temporary Files from Applications Using Python medium
- Automatically Download the Latest Software Release from GitHub with Python medium
- Automatically Generate Charts from CSV Files with One Command medium
- Automatically Generate Hardware Inventory Reports in Python easy
- Automatically Log CPU, RAM, and Disk Usage Every Minute in Python easy
- Batch Rename Hundreds of Files in Python easy
Keep learning
Related tutorials and quizzes for this topic.