Extract Every Open Graph and Social Media Meta Tag from Web Pages in Python

A Python script that fetches a webpage and extracts all Open Graph, Twitter Card, Facebook, and Article meta tags using the standard library HTML parser.

Medium Python 3.9+ Jun 28, 2026 Automation & scripting 1 views 0 copies

meta tags open graph twitter cards html parsing web scraping std library

Python code

41 lines

Python 3.9+

from html.parser import HTMLParser
import re
from urllib.request import urlopen
from urllib.parse import urlparse

class MetaExtractor(HTMLParser):
    def __init__(self):
        super().__init__()
        self.meta_tags = []
    
    def handle_starttag(self, tag, attrs):
        if tag == 'meta':
            attrs_dict = dict(attrs)
            prop = attrs_dict.get('property', '') or attrs_dict.get('name', '')
            if any(prefix in prop for prefix in ['og:', 'twitter:', 'fb:', 'article:']):
                content = attrs_dict.get('content', '')
                if prop and content:
                    self.meta_tags.append((prop, content))
    
    def get_meta(self):
        return self.meta_tags

def extract_social_meta(url: str) -> list:
    try:
        with urlopen(url, timeout=5) as response:
            html = response.read().decode('utf-8', errors='ignore')
        extractor = MetaExtractor()
        extractor.feed(html)
        return extractor.get_meta()
    except Exception as e:
        print(f"Error fetching {url}: {e}")
        return []

if __name__ == "__main__":
    test_url = "https://example.com"
    print(f"Extracting social meta tags from: {test_url}")
    results = extract_social_meta(test_url)
    for prop, content in results:
        print(f"{prop}: {content}")
    if not results:
        print("No OpenGraph or social media meta tags found.")

Output

stdout

Extracting social meta tags from: https://example.com
No OpenGraph or social media meta tags found.

How it works

The code subclasses HTMLParser to override handle_starttag, building a dictionary of each <meta> tag's attributes. It checks the property or name attribute for prefixes like og:, twitter:, fb:, and article:, collecting only those with non-empty content. urlopen with a timeout handles network requests, and encoding errors are ignored to keep parsing stable. The feed method processes HTML incrementally, making the parser memory-efficient even for large pages.

Common mistakes

Forgetting that Open Graph uses 'property' while Twitter Cards use 'name' — the code handles both.
Not setting a timeout on urlopen, which can cause the script to hang on slow or unresponsive servers.
Assuming HTML is always UTF-8 without providing a fallback like `errors='ignore'`.
Overlooking uppercase or mixed-case attribute names — HTML case sensitivity can break parsing.

Variations

Use BeautifulSoup with `soup.find_all('meta')` and attribute selectors for more flexible extraction.
Add support for JSON-LD structured data by extracting `<script type="application/ld+json">` blocks.

Real-world use cases

Building a link preview generator that fetches Open Graph tags to show rich snippets in chats or feeds.
Scraping competitor websites programmatically to analyze their social media meta tag strategy.
Validating that a new web page includes required Facebook and Twitter card meta tags before deployment.

Extract Every Open Graph and Social Media Meta Tag from Web Pages in Python

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Automation & scripting

Tutorials

Quizzes

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Automation & scripting

Keep learning

Tutorials

Quizzes