Convert HTML Tables to Excel Reports in Python
Convert HTML tables into formatted Excel reports using BeautifulSoup and Pandas with auto-adjusted column widths.
pip install pandas openpyxl beautifulsoup4
Python code
49 linesimport pandas as pd
from bs4 import BeautifulSoup
from pathlib import Path
def html_table_to_excel(html_file: str, excel_file: str) -> None:
"""Convert HTML table to formatted Excel report."""
with open(html_file, 'r', encoding='utf-8') as f:
html_content = f.read()
soup = BeautifulSoup(html_content, 'html.parser')
table = soup.find('table')
if not table:
raise ValueError("No table found in HTML file")
# Extract table data
data = []
for row in table.find_all('tr'):
cols = row.find_all(['th', 'td'])
data.append([col.get_text(strip=True) for col in cols])
# Create DataFrame
df = pd.DataFrame(data[1:], columns=data[0]) if data else pd.DataFrame()
# Write to Excel with formatting
with pd.ExcelWriter(excel_file, engine='openpyxl') as writer:
df.to_excel(writer, index=False, sheet_name='Report')
# Auto-adjust column widths
worksheet = writer.sheets['Report']
for column in worksheet.columns:
max_length = max(len(str(cell.value or '')) for cell in column)
worksheet.column_dimensions[column[0].column_letter].width = max_length + 2
if __name__ == "__main__":
# Create sample HTML table
sample_html = Path("sample_table.html")
sample_html.write_text(
"<table><tr><th>Name</th><th>Age</th><th>City</th></tr>"
"<tr><td>Alice</td><td>30</td><td>New York</td></tr>"
"<tr><td>Bob</td><td>25</td><td>Los Angeles</td></tr>"
"<tr><td>Charlie</td><td>35</td><td>Chicago</td></tr></table>"
)
html_table_to_excel("sample_table.html", "report.xlsx")
print("Excel report generated: report.xlsx")
# Cleanup
sample_html.unlink()
Output
Excel report generated: report.xlsx
How it works
The function read_html works well for simple tables, but using BeautifulSoup gives you more control over parsing and error handling. The code first reads the file with open and utf-8 encoding, then uses BeautifulSoup to find the <table> tag. It extracts all rows and cells, creates a Pandas DataFrame (with the first row as column headers), and writes it to Excel with pd.ExcelWriter. The column widths are auto-adjusted by measuring the maximum string length in each column and adding a small padding.
Common mistakes
- Missing the `engine='openpyxl'` parameter when writing Excel files with formatting.
- Assuming the first row is always the header without verification.
- Forgetting to install `openpyxl` and `html.parser` dependencies.
- Not handling tables with merged cells or nested tags properly.
Variations
- Use `pd.read_html(html_file)` for simpler tables if you don't need custom parsing.
- Add `header` and `footer` formatting using `StyledDataFrame` for professional reports.
Real-world use cases
- Converting web-scraped HTML tables from e-commerce sites into spreadsheet reports for team analysis.
- Automating export of HTML tabular data from legacy internal tools into formatted Excel for stakeholders.
- Generating shareable Excel summaries from HTML datasets pulled from public data portals.
Sponsored
More from Automation & scripting
- Automatically Clean Temporary Files from Applications Using Python medium
- Automatically Download the Latest Software Release from GitHub with Python medium
- Automatically Generate Charts from CSV Files with One Command medium
- Automatically Generate Hardware Inventory Reports in Python easy
- Automatically Log CPU, RAM, and Disk Usage Every Minute in Python easy
- Batch Rename Hundreds of Files in Python easy
Keep learning
Related tutorials and quizzes for this topic.