Maintenance

Site is under maintenance — quizzes are still available.

Go to quizzes
Sponsored Reserved space — layout preview until AdSense is connected

Convert HTML Tables to Excel Reports in Python

Convert HTML tables into formatted Excel reports using BeautifulSoup and Pandas with auto-adjusted column widths.

Medium Python 3.9+ Jun 28, 2026 Automation & scripting 2 views 0 copies

Requires third-party packages — install first
pip install pandas openpyxl beautifulsoup4

Python code

49 lines
Python 3.9+
import pandas as pd
from bs4 import BeautifulSoup
from pathlib import Path

def html_table_to_excel(html_file: str, excel_file: str) -> None:
    """Convert HTML table to formatted Excel report."""
    with open(html_file, 'r', encoding='utf-8') as f:
        html_content = f.read()
    
    soup = BeautifulSoup(html_content, 'html.parser')
    table = soup.find('table')
    
    if not table:
        raise ValueError("No table found in HTML file")
    
    # Extract table data
    data = []
    for row in table.find_all('tr'):
        cols = row.find_all(['th', 'td'])
        data.append([col.get_text(strip=True) for col in cols])
    
    # Create DataFrame
    df = pd.DataFrame(data[1:], columns=data[0]) if data else pd.DataFrame()
    
    # Write to Excel with formatting
    with pd.ExcelWriter(excel_file, engine='openpyxl') as writer:
        df.to_excel(writer, index=False, sheet_name='Report')
        
        # Auto-adjust column widths
        worksheet = writer.sheets['Report']
        for column in worksheet.columns:
            max_length = max(len(str(cell.value or '')) for cell in column)
            worksheet.column_dimensions[column[0].column_letter].width = max_length + 2

if __name__ == "__main__":
    # Create sample HTML table
    sample_html = Path("sample_table.html")
    sample_html.write_text(
        "<table><tr><th>Name</th><th>Age</th><th>City</th></tr>"
        "<tr><td>Alice</td><td>30</td><td>New York</td></tr>"
        "<tr><td>Bob</td><td>25</td><td>Los Angeles</td></tr>"
        "<tr><td>Charlie</td><td>35</td><td>Chicago</td></tr></table>"
    )
    
    html_table_to_excel("sample_table.html", "report.xlsx")
    print("Excel report generated: report.xlsx")
    
    # Cleanup
    sample_html.unlink()

Output

stdout
Excel report generated: report.xlsx

How it works

The function read_html works well for simple tables, but using BeautifulSoup gives you more control over parsing and error handling. The code first reads the file with open and utf-8 encoding, then uses BeautifulSoup to find the <table> tag. It extracts all rows and cells, creates a Pandas DataFrame (with the first row as column headers), and writes it to Excel with pd.ExcelWriter. The column widths are auto-adjusted by measuring the maximum string length in each column and adding a small padding.

Common mistakes

  • Missing the `engine='openpyxl'` parameter when writing Excel files with formatting.
  • Assuming the first row is always the header without verification.
  • Forgetting to install `openpyxl` and `html.parser` dependencies.
  • Not handling tables with merged cells or nested tags properly.

Variations

  1. Use `pd.read_html(html_file)` for simpler tables if you don't need custom parsing.
  2. Add `header` and `footer` formatting using `StyledDataFrame` for professional reports.

Real-world use cases

  • Converting web-scraped HTML tables from e-commerce sites into spreadsheet reports for team analysis.
  • Automating export of HTML tabular data from legacy internal tools into formatted Excel for stakeholders.
  • Generating shareable Excel summaries from HTML datasets pulled from public data portals.

Sponsored

Sponsored Reserved space — layout preview until AdSense is connected

Run locally

This sample needs third-party packages, so it cannot run in the browser IDE. Copy the code above, install the packages shown at the top, then run it in your own Python environment.

More from Automation & scripting

Related tutorials and quizzes for this topic.