Scrape HTML Tables and Convert Them to CSV Using Beautiful Soup in Python
Scrape a Wikipedia table with Beautiful Soup and write the data to a CSV file using the csv module.
pip install requests beautifulsoup4
Python code
24 linesimport requests
from bs4 import BeautifulSoup
import csv
url = "https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
tables = soup.find_all('table', {'class': 'wikitable'})
if tables:
target_table = tables[2]
rows = target_table.find_all('tr')
with open('countries_gdp.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
for row in rows:
cols = row.find_all(['th', 'td'])
cols = [col.get_text(strip=True) for col in cols]
writer.writerow(cols)
print("CSV file 'countries_gdp.csv' created successfully.")
else:
print("No tables found with class 'wikitable'.")
Output
CSV file 'countries_gdp.csv' created successfully.
How it works
The code sends an HTTP GET request to the target URL and parses the HTML with BeautifulSoup. It finds all <table> elements with the CSS class 'wikitable' and selects the third one (index 2), which is the GDP table on that page. Each <tr> row is processed: table cells (<th> or <td>) are extracted with get_text(strip=True) to remove extra whitespace, then written to the CSV file. The newline='' argument in open() prevents blank rows on Windows. The result is a clean CSV file that can be opened in spreadsheet applications.
Common mistakes
- Using an incorrect table index; the target table may be at a different position if the page changes.
- Forgetting to install the required packages: `pip install requests beautifulsoup4`.
- Not handling missing tables gracefully — always check if `tables` is non-empty before accessing an index.
- Omitting `newline=''` in `open()` which can cause extra blank lines in the CSV on Windows.
Variations
- Use `pandas.read_html()` to directly parse HTML tables into DataFrames and then save as CSV.
- Loop through all wikitable tables and save each to a separate CSV file.
- Filter rows based on a condition (e.g., countries above a certain GDP) before writing.
Real-world use cases
- Extracting a list of country statistics from Wikipedia for a data visualization project.
- Automating collection of sports leaderboard tables from a website for a report.
- Gathering product pricing tables from a comparison page to analyze market trends.
Sponsored
More from Files & data
- Build a Command-Line To-Do List Application with Data Persistence in Python easy
- Build a Python Script That Detects and Deletes Empty Files Across Folders easy
- Compare Two Folder Structures and Find Differences in Python easy
- Compress and Extract ZIP Files Programmatically in Python easy
- Convert CSV Files to JSON in Python easy
- Convert Image to ASCII Art in Python medium
Keep learning
Related tutorials and quizzes for this topic.