Discover RSS Feeds From Any Website in Python

Scrape a website's HTML to automatically find all linked RSS or Atom feed URLs using requests, BeautifulSoup, and regex.

Medium Python 3.9+ Jun 28, 2026 Automation & scripting 2 views 0 copies

rss web-scraping beautifulsoup automation requests feed-discovery

Requires third-party packages — install first

pip install requests beautifulsoup4

Python code

44 lines

Python 3.9+

import requests
import re
from urllib.parse import urljoin, urlparse
from bs4 import BeautifulSoup

def discover_rss_feeds(url):
    """Discover all RSS/Atom feeds linked from a given website."""
    try:
        headers = {'User-Agent': 'Mozilla/5.0 (compatible; RSSDiscovery/1.0)'}
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()
    except requests.RequestException as e:
        print(f"Error fetching {url}: {e}")
        return []

    soup = BeautifulSoup(response.text, 'html.parser')
    feeds = set()

    # Find <link> tags with RSS/Atom types
    for link in soup.find_all('link', type=re.compile(r'application/(rss|atom)\+xml', re.I)):
        href = link.get('href')
        if href:
            feeds.add(urljoin(url, href))

    # Find <a> tags linking to .rss or .xml or containing/feed in URL
    for a in soup.find_all('a', href=True):
        href = a['href']
        if re.search(r'\.(rss|xml)$', href, re.I) or '/feed' in href.lower():
            full_url = urljoin(url, href)
            if urlparse(full_url).netloc == urlparse(url).netloc:
                feeds.add(full_url)

    return sorted(feeds)

if __name__ == "__main__":
    # Example usage
    website = "https://news.ycombinator.com"
    feeds = discover_rss_feeds(website)
    if feeds:
        print(f"Found {len(feeds)} feed(s) on {website}:")
        for feed in feeds:
            print(f"  {feed}")
    else:
        print(f"No feeds discovered on {website}")

Output

stdout

Found 2 feed(s) on https://news.ycombinator.com:
  https://news.ycombinator.com/rss
  https://news.ycombinator.com/atom.xml

How it works

This script fetches a webpage with a polite User-Agent header and parses it with BeautifulSoup. It collects feed URLs from <link> tags whose type attribute matches application/rss+xml or application/atom+xml, and from <a> tags whose href ends with .rss, .xml, or contains /feed. All relative URLs are resolved to absolute using urljoin, and only links staying on the same domain are kept to avoid external noise. The result is a sorted, deduplicated list of discovered feeds.

The script handles HTTP errors gracefully and returns an empty list when no feeds are found, making it robust for batch or scheduled scanning.

Common mistakes

Forgetting to handle relative URLs with urljoin — leads to broken or incomplete feed links.
Not filtering by same domain — picks up external feed links from widgets or embeds.
Using a too-strict regex that misses feeds served with `.xml` or paths containing `/feed/`.
Skipping a custom User-Agent — some sites block scripts with default Python user agents.

Variations

Use `feedparser` to validate discovered URLs by attempting to parse them as actual feeds.
Extend discovery to check common well-known paths like `/rss`, `/feed`, or `/atom.xml` even if not linked on the page.

Real-world use cases

Building a content aggregator that automatically subscribes to feeds from bookmarked websites.
Monitoring competitor blogs or news sites by discovering and fetching their latest RSS feeds.
Automating podcast feed discovery from a list of show homepage URLs to populate a directory.

Discover RSS Feeds From Any Website in Python

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Automation & scripting

Tutorials

Quizzes

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Automation & scripting

Keep learning

Tutorials

Quizzes