Download Images from a Web Page Automatically in Python
Scrape all images from a webpage, filter by extension, and save them to a local folder using requests and BeautifulSoup.
pip install requests beautifulsoup4
Python code
42 linesimport requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import os
def download_images(url, output_folder="downloaded_images"):
"""Download all images from a given URL."""
os.makedirs(output_folder, exist_ok=True)
response = requests.get(url)
response.raise_for_status()
soup = BeautifulSoup(response.text, "html.parser")
img_tags = soup.find_all("img")
downloaded = 0
for img in img_tags:
src = img.get("src")
if not src:
continue
img_url = urljoin(url, src)
if not img_url.lower().endswith(('.png', '.jpg', '.jpeg', '.gif', '.webp')):
continue
try:
img_data = requests.get(img_url, timeout=5)
img_data.raise_for_status()
filename = os.path.join(output_folder, os.path.basename(img_url))
with open(filename, "wb") as f:
f.write(img_data.content)
downloaded += 1
print(f"Downloaded: {filename}")
except Exception as e:
print(f"Failed to download {img_url}: {e}")
print(f"\nDownloaded {downloaded} images to '{output_folder}/'")
if __name__ == "__main__":
url = "https://example.com" # Replace with actual webpage URL
download_images(url)
Output
Downloaded: downloaded_images/logo.png
Downloaded: downloaded_images/banner.jpg
Failed to download https://example.com/broken.gif: HTTPSConnectionPool(...)
Downloaded 2 images to 'downloaded_images/'
How it works
The script uses requests.get to fetch the HTML of a webpage and BeautifulSoup to parse it. It finds all <img> tags, extracts the src attribute, and resolves relative URLs with urljoin. Only images with common extensions (.png, .jpg, .jpeg, .gif, .webp) are downloaded. Each image is fetched individually with a 5-second timeout and saved to a local folder using the original filename. Errors for individual downloads are caught and printed so the script continues with other images.
Common mistakes
- Forgetting to handle relative URLs — using `urljoin` is necessary for 'src' attributes like '/images/photo.jpg'.
- Not filtering image extensions, which can lead to downloading SVGs, icons, or data URIs.
- Missing error handling for individual image downloads, causing the script to abort on a single 404.
Variations
- Use `os.path.splitext` with a set of allowed extensions for case-insensitive filtering.
- Limit concurrent downloads with `ThreadPoolExecutor` to speed up scraping on large pages.
Real-world use cases
- Backing up all product images from an e-commerce site during a catalog migration.
- Downloading reference images from a documentation page for offline use on a plane.
- Collecting all banner images from a blog site for a design audit and asset inventory.
Sponsored
More from Automation & scripting
- Batch Rename Hundreds of Files in Python easy
- Build a Command-Line Password Generator in Python easy
- Build a Complete Web Scraper with Requests and BeautifulSoup in Python medium
- Build a Network Ping Monitor in Python medium
- Create a Local Search Engine to Instantly Find Files on Your Computer in Python medium
- Create a Simple HTTP File Server in Python easy
Keep learning
Related tutorials and quizzes for this topic.