Maintenance

Site is under maintenance — quizzes are still available.

Go to quizzes
Sponsored Reserved space — layout preview until AdSense is connected

Discover RSS Feeds From Any Website in Python

Scrape a website's HTML to automatically find all linked RSS or Atom feed URLs using requests, BeautifulSoup, and regex.

Medium Python 3.9+ Jun 28, 2026 Automation & scripting 2 views 0 copies

Requires third-party packages — install first
pip install requests beautifulsoup4

Python code

44 lines
Python 3.9+
import requests
import re
from urllib.parse import urljoin, urlparse
from bs4 import BeautifulSoup

def discover_rss_feeds(url):
    """Discover all RSS/Atom feeds linked from a given website."""
    try:
        headers = {'User-Agent': 'Mozilla/5.0 (compatible; RSSDiscovery/1.0)'}
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()
    except requests.RequestException as e:
        print(f"Error fetching {url}: {e}")
        return []

    soup = BeautifulSoup(response.text, 'html.parser')
    feeds = set()

    # Find <link> tags with RSS/Atom types
    for link in soup.find_all('link', type=re.compile(r'application/(rss|atom)\+xml', re.I)):
        href = link.get('href')
        if href:
            feeds.add(urljoin(url, href))

    # Find <a> tags linking to .rss or .xml or containing/feed in URL
    for a in soup.find_all('a', href=True):
        href = a['href']
        if re.search(r'\.(rss|xml)$', href, re.I) or '/feed' in href.lower():
            full_url = urljoin(url, href)
            if urlparse(full_url).netloc == urlparse(url).netloc:
                feeds.add(full_url)

    return sorted(feeds)

if __name__ == "__main__":
    # Example usage
    website = "https://news.ycombinator.com"
    feeds = discover_rss_feeds(website)
    if feeds:
        print(f"Found {len(feeds)} feed(s) on {website}:")
        for feed in feeds:
            print(f"  {feed}")
    else:
        print(f"No feeds discovered on {website}")

Output

stdout
Found 2 feed(s) on https://news.ycombinator.com:
  https://news.ycombinator.com/rss
  https://news.ycombinator.com/atom.xml

How it works

This script fetches a webpage with a polite User-Agent header and parses it with BeautifulSoup. It collects feed URLs from <link> tags whose type attribute matches application/rss+xml or application/atom+xml, and from <a> tags whose href ends with .rss, .xml, or contains /feed. All relative URLs are resolved to absolute using urljoin, and only links staying on the same domain are kept to avoid external noise. The result is a sorted, deduplicated list of discovered feeds.

The script handles HTTP errors gracefully and returns an empty list when no feeds are found, making it robust for batch or scheduled scanning.

Common mistakes

  • Forgetting to handle relative URLs with urljoin — leads to broken or incomplete feed links.
  • Not filtering by same domain — picks up external feed links from widgets or embeds.
  • Using a too-strict regex that misses feeds served with `.xml` or paths containing `/feed/`.
  • Skipping a custom User-Agent — some sites block scripts with default Python user agents.

Variations

  1. Use `feedparser` to validate discovered URLs by attempting to parse them as actual feeds.
  2. Extend discovery to check common well-known paths like `/rss`, `/feed`, or `/atom.xml` even if not linked on the page.

Real-world use cases

  • Building a content aggregator that automatically subscribes to feeds from bookmarked websites.
  • Monitoring competitor blogs or news sites by discovering and fetching their latest RSS feeds.
  • Automating podcast feed discovery from a list of show homepage URLs to populate a directory.

Sponsored

Sponsored Reserved space — layout preview until AdSense is connected

Run locally

This sample needs third-party packages, so it cannot run in the browser IDE. Copy the code above, install the packages shown at the top, then run it in your own Python environment.

More from Automation & scripting

Related tutorials and quizzes for this topic.