Build a Website Accessibility Scanner Using Python
Scans a webpage for common accessibility issues like missing alt text, headings, labels, and landmarks using only Python.
pip install requests
Python code
58 linesimport requests
from urllib.parse import urljoin
from html.parser import HTMLParser
import re
class AccessibilityParser(HTMLParser):
def __init__(self):
super().__init__()
self.images_without_alt = []
self.missing_headings = True
self.has_main_tag = False
self.label_for_input = {}
self.inputs_without_label = []
def handle_starttag(self, tag, attrs):
attrs_dict = dict(attrs)
if tag == 'img' and 'alt' not in attrs_dict:
self.images_without_alt.append(attrs_dict.get('src', 'unknown'))
if tag in ('h1', 'h2', 'h3', 'h4', 'h5', 'h6'):
self.missing_headings = False
if tag == 'main':
self.has_main_tag = True
if tag == 'label':
for attr in attrs:
if attr[0] == 'for':
self.label_for_input[attr[1]] = True
if tag == 'input':
input_id = attrs_dict.get('id', '')
if input_id not in self.label_for_input:
self.inputs_without_label.append(attrs_dict.get('name', 'unknown'))
def scan_url(url):
try:
response = requests.get(url, timeout=10)
response.raise_for_status()
parser = AccessibilityParser()
parser.feed(response.text)
issues = []
if parser.images_without_alt:
issues.append(f"Missing alt text on {len(parser.images_without_alt)} images")
if parser.missing_headings:
issues.append("No heading tags (h1-h6) found")
if parser.inputs_without_label:
issues.append(f"{len(parser.inputs_without_label)} inputs missing associated labels")
if not parser.has_main_tag:
issues.append("No <main> landmark element found")
if not issues:
return f"{url}: No accessibility issues found"
else:
return f"{url}: Found accessibility issues:\n" + "\n".join(issues)
except requests.RequestException as e:
return f"{url}: Error scanning - {str(e)}"
if __name__ == "__main__":
test_url = "https://example.com"
print(scan_url(test_url))
Output
https://example.com: Found accessibility issues:
Missing alt text on 1 images
No heading tags (h1-h6) found
No <main> landmark element found
How it works
The HTMLParser class from html.parser lets you parse HTML without external dependencies. By subclassing it and overriding handle_starttag, you inspect each tag for accessibility attributes. The scanner checks for missing alt on <img>, presence of any heading tag, <main> landmark, and whether each <input> has an associated <label> (via for attribute). This approach is fast and lightweight, suitable for quick audits on simple pages.
Common mistakes
- Forgetting to handle URL redirects or timeouts from requests.get.
- Not accounting for inputs with an aria-label or aria-labelledby attribute as accessible alternatives.
- Assuming all images need alt text, but decorative images should have empty alt=''.
Variations
- Use BeautifulSoup instead of HTMLParser for easier traversal.
- Add checks for color contrast by fetching computed styles via a headless browser.
Real-world use cases
- CI pipeline hook that blocks deployment if a page has missing alt text or headings.
- Scheduled nightly scan of company websites to generate accessibility compliance reports.
- Quick audit of a static site before sending it for WCAG review.
Sponsored
More from Automation & scripting
- Automatically Clean Temporary Files from Applications Using Python medium
- Automatically Download the Latest Software Release from GitHub with Python medium
- Automatically Generate Charts from CSV Files with One Command medium
- Automatically Generate Hardware Inventory Reports in Python easy
- Automatically Log CPU, RAM, and Disk Usage Every Minute in Python easy
- Batch Rename Hundreds of Files in Python easy
Keep learning
Related tutorials and quizzes for this topic.