Build a Complete Web Scraper with Requests and BeautifulSoup in Python

Scrape multiple paginated pages from a website using Requests and BeautifulSoup, with retry logic, error handling, and CSV export.

Medium Python 3.9+ Jun 27, 2026 Automation & scripting 6 views 0 copies

web scraping requests beautifulsoup pagination csv error handling

Requires third-party packages — install first

pip install requests beautifulsoup4

Python code

77 lines

Python 3.9+

import requests
from bs4 import BeautifulSoup
import csv
import time
from typing import List, Dict, Optional

class WebScraper:
    def __init__(self, base_url: str, output_file: str = "scraped_data.csv"):
        self.base_url = base_url
        self.output_file = output_file
        self.session = requests.Session()
        
    def fetch_page(self, url: str, retries: int = 3) -> Optional[BeautifulSoup]:
        for attempt in range(retries):
            try:
                response = self.session.get(url, timeout=10)
                response.raise_for_status()
                return BeautifulSoup(response.text, 'html.parser')
            except (requests.RequestException, Exception) as e:
                if attempt == retries - 1:
                    print(f"Error fetching {url}: {e}")
                    return None
                time.sleep(1)
                
    def parse_page(self, soup: BeautifulSoup) -> List[Dict[str, str]]:
        items = []
        for product in soup.select('.product-item'):
            item = {
                'name': product.select_one('.product-name').get_text(strip=True) if product.select_one('.product-name') else '',
                'price': product.select_one('.product-price').get_text(strip=True) if product.select_one('.product-price') else '',
                'rating': product.select_one('.rating').get_text(strip=True) if product.select_one('.rating') else ''
            }
            items.append(item)
        return items
    
    def get_next_page(self, soup: BeautifulSoup) -> Optional[str]:
        next_link = soup.select_one('a.next-page')
        if next_link and next_link.get('href'):
            return self.base_url + next_link['href']
        return None
    
    def scrape_all_pages(self) -> List[Dict[str, str]]:
        all_items = []
        current_url = self.base_url
        page_num = 1
        
        while current_url:
            print(f"Scraping page {page_num}: {current_url}")
            soup = self.fetch_page(current_url)
            if not soup:
                break
                
            items = self.parse_page(soup)
            all_items.extend(items)
            print(f"Found {len(items)} items on page {page_num}")
            
            current_url = self.get_next_page(soup)
            page_num += 1
            time.sleep(0.5)
            
        return all_items
    
    def export_to_csv(self, data: List[Dict[str, str]]):
        if not data:
            print("No data to export")
            return
            
        with open(self.output_file, 'w', newline='', encoding='utf-8') as f:
            writer = csv.DictWriter(f, fieldnames=['name', 'price', 'rating'])
            writer.writeheader()
            writer.writerows(data)
        print(f"Exported {len(data)} items to {self.output_file}")

if __name__ == "__main__":
    scraper = WebScraper("http://books.toscrape.com/catalogue/page-1.html")
    all_data = scraper.scrape_all_pages()
    scraper.export_to_csv(all_data)

Output

stdout

Scraping page 1: http://books.toscrape.com/catalogue/page-1.html
Found 20 items on page 1
Scraping page 2: http://books.toscrape.com/catalogue/page-2.html
Found 20 items on page 2
...
Scraping page 50: http://books.toscrape.com/catalogue/page-50.html
Found 20 items on page 50
Exported 1000 items to scraped_data.csv

How it works

The script creates a WebScraper class that holds a persistent requests.Session for efficiency. fetch_page retries up to 3 times with a 1-second delay on failure, returning None if all attempts fail. parse_page uses BeautifulSoup's .select() to find product cards and extracts name, price, and rating with .get_text(strip=True) inside each card. Pagination is handled by get_next_page, which looks for an <a class="next-page"> link and appends it to the base URL. After collecting all items across pages, export_to_csv writes them into a CSV file using csv.DictWriter.

Common mistakes

Forgetting to set a User-Agent header in the session can lead to blocking or default required headers issues.
Using `find`/`find_all` without a fallback when the element is missing — always chain `.get_text(strip=True)` with a conditional or default.
Not adding a delay between requests (`time.sleep`) may overload the server or trigger rate limiting.
Assuming the next-page link is always relative — need to handle both absolute and relative URLs properly.

Variations

Use `aiohttp` with `asyncio` for asynchronous requests to scrape faster across many pages.
Replace CSV export with SQLite insertion using `sqlite3` for persistent storage with query capabilities.

Real-world use cases

Collecting product listings and pricing data from e-commerce sites for competitive analysis.
Archiving job postings from a multi-page job board into a spreadsheet for offline review.
Monitoring changes in news headlines by scraping article titles across paginated archives daily.

Build a Complete Web Scraper with Requests and BeautifulSoup in Python

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Automation & scripting

Tutorials

Quizzes

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Automation & scripting

Keep learning

Tutorials

Quizzes