Reference library

Automation & scripting

CLI tools, scheduled jobs, filesystem tasks, and glue scripts that save time.

6 matches

Sponsored Reserved space — layout preview until AdSense is connected

Build a Complete Web Scraper with Requests and BeautifulSoup in Python

Scrape multiple paginated pages from a website using Requests and BeautifulSoup, with retry logic, error handling, and CSV export.

web scraping requests beautifulsoup

Python

import requests
from bs4 import BeautifulSoup
import csv
import time
from typing import List, Dict, Optional

class WebScraper:
    def __init__(self, base_url: str, output_file: str = "scraped_data.csv"):
        self.base_url = base_url
        self.output_file = output_file
        self.session = requests.Session()…

15 0 Open

Automation & scripting medium

Convert HTML Tables to Excel Reports in Python

Convert HTML tables into formatted Excel reports using BeautifulSoup and Pandas with auto-adjusted column widths.

html excel beautifulsoup

Python

import pandas as pd
from bs4 import BeautifulSoup
from pathlib import Path

def html_table_to_excel(html_file: str, excel_file: str) -> None:
    """Convert HTML table to formatted Excel report."""
    with open(html_file, 'r', encoding='utf-8') as f:
        html_content = f.read()
    
    soup = BeautifulSoup(html_…

4 0 Open

Automation & scripting medium

Discover RSS Feeds From Any Website in Python

Scrape a website's HTML to automatically find all linked RSS or Atom feed URLs using requests, BeautifulSoup, and regex.

rss web-scraping beautifulsoup

Python

import requests
import re
from urllib.parse import urljoin, urlparse
from bs4 import BeautifulSoup

def discover_rss_feeds(url):
    """Discover all RSS/Atom feeds linked from a given website."""
    try:
        headers = {'User-Agent': 'Mozilla/5.0 (compatible; RSSDiscovery/1.0)'}
        response = requests.get(url…

3 0 Open

Automation & scripting medium

Download Images from a Web Page Automatically in Python

Scrape all images from a webpage, filter by extension, and save them to a local folder using requests and BeautifulSoup.

web-scraping requests beautifulsoup

Python

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import os

def download_images(url, output_folder="downloaded_images"):
    """Download all images from a given URL."""
    os.makedirs(output_folder, exist_ok=True)
    
    response = requests.get(url)
    response.raise_for_status()
    …

7 0 Open

Automation & scripting medium

Find Broken Image References Across a Website in Python

Crawl internal pages of a website, collect all image source URLs, then check each with HEAD requests to report any that return HTTP 4xx or connection errors.

web scraping crawling broken links

Python

import requests
from urllib.parse import urljoin, urlparse
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor, as_completed

def find_all_links(base_url, max_pages=50):
    visited, to_visit = set(), {base_url}
    while to_visit and len(visited) < max_pages:
        url = to_visit.pop()
 …

5 0 Open

Automation & scripting medium

How to Create a Link Graph Visualization for Any Website in Python

A Python script that crawls a website's internal links, builds a directed graph of parent-child URL relationships, and prints the graph to the console.

crawler graph visualization

Python

import requests
from bs4 import BeautifulSoup
from collections import defaultdict
from urllib.parse import urljoin, urlparse
import sys

def get_links(url, max_links=20):
    try:
        response = requests.get(url, timeout=5)
        soup = BeautifulSoup(response.text, 'html.parser')
        base_url = f"{urlparse(u…

7 0 Open

Browse by section

Each section groups closely related Python snippets.