Maintenance

Site is under maintenance — quizzes are still available.

Go to quizzes
Sponsored Reserved space — layout preview until AdSense is connected

Scrape HTML Tables and Convert Them to CSV Using Beautiful Soup in Python

Scrape a Wikipedia table with Beautiful Soup and write the data to a CSV file using the csv module.

Medium Python 3.9+ Jun 27, 2026 Files & data 2 views 0 copies

Requires third-party packages — install first
pip install requests beautifulsoup4

Python code

24 lines
Python 3.9+
import requests
from bs4 import BeautifulSoup
import csv

url = "https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

tables = soup.find_all('table', {'class': 'wikitable'})

if tables:
    target_table = tables[2]
    rows = target_table.find_all('tr')
    
    with open('countries_gdp.csv', 'w', newline='', encoding='utf-8') as f:
        writer = csv.writer(f)
        for row in rows:
            cols = row.find_all(['th', 'td'])
            cols = [col.get_text(strip=True) for col in cols]
            writer.writerow(cols)
    
    print("CSV file 'countries_gdp.csv' created successfully.")
else:
    print("No tables found with class 'wikitable'.")

Output

stdout
CSV file 'countries_gdp.csv' created successfully.

How it works

The code sends an HTTP GET request to the target URL and parses the HTML with BeautifulSoup. It finds all <table> elements with the CSS class 'wikitable' and selects the third one (index 2), which is the GDP table on that page. Each <tr> row is processed: table cells (<th> or <td>) are extracted with get_text(strip=True) to remove extra whitespace, then written to the CSV file. The newline='' argument in open() prevents blank rows on Windows. The result is a clean CSV file that can be opened in spreadsheet applications.

Common mistakes

  • Using an incorrect table index; the target table may be at a different position if the page changes.
  • Forgetting to install the required packages: `pip install requests beautifulsoup4`.
  • Not handling missing tables gracefully — always check if `tables` is non-empty before accessing an index.
  • Omitting `newline=''` in `open()` which can cause extra blank lines in the CSV on Windows.

Variations

  1. Use `pandas.read_html()` to directly parse HTML tables into DataFrames and then save as CSV.
  2. Loop through all wikitable tables and save each to a separate CSV file.
  3. Filter rows based on a condition (e.g., countries above a certain GDP) before writing.

Real-world use cases

  • Extracting a list of country statistics from Wikipedia for a data visualization project.
  • Automating collection of sports leaderboard tables from a website for a report.
  • Gathering product pricing tables from a comparison page to analyze market trends.

Sponsored

Sponsored Reserved space — layout preview until AdSense is connected

Run locally

This sample needs third-party packages, so it cannot run in the browser IDE. Copy the code above, install the packages shown at the top, then run it in your own Python environment.

More from Files & data

Related tutorials and quizzes for this topic.