Convert HTML Tables to Excel Reports in Python

Convert HTML tables into formatted Excel reports using BeautifulSoup and Pandas with auto-adjusted column widths.

Medium Python 3.9+ Jun 28, 2026 Automation & scripting 2 views 0 copies

html excel beautifulsoup pandas automation

Requires third-party packages — install first

pip install pandas openpyxl beautifulsoup4

Python code

49 lines

Python 3.9+

import pandas as pd
from bs4 import BeautifulSoup
from pathlib import Path

def html_table_to_excel(html_file: str, excel_file: str) -> None:
    """Convert HTML table to formatted Excel report."""
    with open(html_file, 'r', encoding='utf-8') as f:
        html_content = f.read()
    
    soup = BeautifulSoup(html_content, 'html.parser')
    table = soup.find('table')
    
    if not table:
        raise ValueError("No table found in HTML file")
    
    # Extract table data
    data = []
    for row in table.find_all('tr'):
        cols = row.find_all(['th', 'td'])
        data.append([col.get_text(strip=True) for col in cols])
    
    # Create DataFrame
    df = pd.DataFrame(data[1:], columns=data[0]) if data else pd.DataFrame()
    
    # Write to Excel with formatting
    with pd.ExcelWriter(excel_file, engine='openpyxl') as writer:
        df.to_excel(writer, index=False, sheet_name='Report')
        
        # Auto-adjust column widths
        worksheet = writer.sheets['Report']
        for column in worksheet.columns:
            max_length = max(len(str(cell.value or '')) for cell in column)
            worksheet.column_dimensions[column[0].column_letter].width = max_length + 2

if __name__ == "__main__":
    # Create sample HTML table
    sample_html = Path("sample_table.html")
    sample_html.write_text(
        "<table><tr><th>Name</th><th>Age</th><th>City</th></tr>"
        "<tr><td>Alice</td><td>30</td><td>New York</td></tr>"
        "<tr><td>Bob</td><td>25</td><td>Los Angeles</td></tr>"
        "<tr><td>Charlie</td><td>35</td><td>Chicago</td></tr></table>"
    )
    
    html_table_to_excel("sample_table.html", "report.xlsx")
    print("Excel report generated: report.xlsx")
    
    # Cleanup
    sample_html.unlink()

Output

stdout

Excel report generated: report.xlsx

How it works

The function read_html works well for simple tables, but using BeautifulSoup gives you more control over parsing and error handling. The code first reads the file with open and utf-8 encoding, then uses BeautifulSoup to find the <table> tag. It extracts all rows and cells, creates a Pandas DataFrame (with the first row as column headers), and writes it to Excel with pd.ExcelWriter. The column widths are auto-adjusted by measuring the maximum string length in each column and adding a small padding.

Common mistakes

Missing the `engine='openpyxl'` parameter when writing Excel files with formatting.
Assuming the first row is always the header without verification.
Forgetting to install `openpyxl` and `html.parser` dependencies.
Not handling tables with merged cells or nested tags properly.

Variations

Use `pd.read_html(html_file)` for simpler tables if you don't need custom parsing.
Add `header` and `footer` formatting using `StyledDataFrame` for professional reports.

Real-world use cases

Converting web-scraped HTML tables from e-commerce sites into spreadsheet reports for team analysis.
Automating export of HTML tabular data from legacy internal tools into formatted Excel for stakeholders.
Generating shareable Excel summaries from HTML datasets pulled from public data portals.

Convert HTML Tables to Excel Reports in Python

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Automation & scripting

Tutorials

Quizzes

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Automation & scripting

Keep learning

Tutorials

Quizzes