Maintenance

Site is under maintenance — quizzes are still available.

Go to quizzes
Sponsored Reserved space — layout preview until AdSense is connected

Extract Every Open Graph and Social Media Meta Tag from Web Pages in Python

A Python script that fetches a webpage and extracts all Open Graph, Twitter Card, Facebook, and Article meta tags using the standard library HTML parser.

Medium Python 3.9+ Jun 28, 2026 Automation & scripting 1 views 0 copies

Python code

41 lines
Python 3.9+
from html.parser import HTMLParser
import re
from urllib.request import urlopen
from urllib.parse import urlparse

class MetaExtractor(HTMLParser):
    def __init__(self):
        super().__init__()
        self.meta_tags = []
    
    def handle_starttag(self, tag, attrs):
        if tag == 'meta':
            attrs_dict = dict(attrs)
            prop = attrs_dict.get('property', '') or attrs_dict.get('name', '')
            if any(prefix in prop for prefix in ['og:', 'twitter:', 'fb:', 'article:']):
                content = attrs_dict.get('content', '')
                if prop and content:
                    self.meta_tags.append((prop, content))
    
    def get_meta(self):
        return self.meta_tags

def extract_social_meta(url: str) -> list:
    try:
        with urlopen(url, timeout=5) as response:
            html = response.read().decode('utf-8', errors='ignore')
        extractor = MetaExtractor()
        extractor.feed(html)
        return extractor.get_meta()
    except Exception as e:
        print(f"Error fetching {url}: {e}")
        return []

if __name__ == "__main__":
    test_url = "https://example.com"
    print(f"Extracting social meta tags from: {test_url}")
    results = extract_social_meta(test_url)
    for prop, content in results:
        print(f"{prop}: {content}")
    if not results:
        print("No OpenGraph or social media meta tags found.")

Output

stdout
Extracting social meta tags from: https://example.com
No OpenGraph or social media meta tags found.

How it works

The code subclasses HTMLParser to override handle_starttag, building a dictionary of each <meta> tag's attributes. It checks the property or name attribute for prefixes like og:, twitter:, fb:, and article:, collecting only those with non-empty content. urlopen with a timeout handles network requests, and encoding errors are ignored to keep parsing stable. The feed method processes HTML incrementally, making the parser memory-efficient even for large pages.

Common mistakes

  • Forgetting that Open Graph uses 'property' while Twitter Cards use 'name' — the code handles both.
  • Not setting a timeout on urlopen, which can cause the script to hang on slow or unresponsive servers.
  • Assuming HTML is always UTF-8 without providing a fallback like `errors='ignore'`.
  • Overlooking uppercase or mixed-case attribute names — HTML case sensitivity can break parsing.

Variations

  1. Use BeautifulSoup with `soup.find_all('meta')` and attribute selectors for more flexible extraction.
  2. Add support for JSON-LD structured data by extracting `<script type="application/ld+json">` blocks.

Real-world use cases

  • Building a link preview generator that fetches Open Graph tags to show rich snippets in chats or feeds.
  • Scraping competitor websites programmatically to analyze their social media meta tag strategy.
  • Validating that a new web page includes required Facebook and Twitter card meta tags before deployment.

Sponsored

Sponsored Reserved space — layout preview until AdSense is connected

Run this sample

Open the browser IDE to tweak the example and see results without installing anything.

Open editor

More from Automation & scripting

Related tutorials and quizzes for this topic.