Maintenance

Site is under maintenance — quizzes are still available.

Go to quizzes
Sponsored Reserved space — layout preview until AdSense is connected

Find All Redirects on a Website in Python

Crawl a website from a starting URL, follow links within the same domain, and detect every HTTP redirect (301, 302, 303, 307, 308) using requests with redirects disabled.

Medium Python 3.9+ Jun 28, 2026 Automation & scripting 2 views 0 copies

Requires third-party packages — install first
pip install requests

Python code

36 lines
Python 3.9+
import requests
from urllib.parse import urljoin, urlparse
from collections import deque

def find_redirects(start_url, max_pages=50):
    visited = set()
    redirects = {}
    queue = deque([start_url])
    
    while queue and len(visited) < max_pages:
        url = queue.popleft()
        if url in visited:
            continue
        try:
            response = requests.get(url, allow_redirects=False, timeout=10)
            visited.add(url)
            if response.status_code in (301, 302, 303, 307, 308):
                redirect_url = response.headers.get('Location', '')
                redirect_url = urljoin(url, redirect_url)
                redirects[url] = redirect_url
                if urlparse(redirect_url).netloc == urlparse(start_url).netloc:
                    queue.append(redirect_url)
            elif response.status_code == 200:
                for link in response.links.values():
                    full_url = urljoin(url, link['url'])
                    if urlparse(full_url).netloc == urlparse(start_url).netloc:
                        queue.append(full_url)
        except requests.RequestException:
            continue
    return redirects

if __name__ == "__main__":
    start = "https://httpbin.org/redirect-to?url=https%3A%2F%2Fexample.com"
    results = find_redirects(start, max_pages=10)
    for source, target in results.items():
        print(f"{source} -> {target}")

Output

stdout
https://httpbin.org/redirect-to?url=https%3A%2F%2Fexample.com -> https://example.com

How it works

The find_redirects function uses BFS to explore pages while respecting max_pages. allow_redirects=False prevents requests from automatically following redirects, letting the code capture the Location header. It then resolves relative URLs with urljoin and enforces same-domain crawling via urlparse(redirect_url).netloc == urlparse(start_url).netloc. Only HTTP 200 responses are scanned for further links (using response.links), and all non-200/non-redirect statuses are silently skipped. The function returns a dict mapping source URLs to their redirect targets.

Common mistakes

  • Forgetting to set `allow_redirects=False`, causing requests to follow redirects automatically and missing them entirely.
  • Not using `urljoin` to resolve relative URLs found in `Location` headers or page links.
  • Extracting links manually instead of using `response.links` which parses the HTML `<link>` tags from the `Link` header.
  • Failing to limit crawling with `max_pages` or a visited set, leading to infinite loops or large uncontrolled crawls.

Variations

  1. Use `BeautifulSoup` to parse the response HTML for `<a>` tags and extract all links instead of relying on `response.links`.
  2. Add support for crawling links from `<iframe>`, `<frame>`, or `sitemap.xml` files.

Real-world use cases

  • Auditing a website for broken or outdated redirect chains before a migration or SEO overhaul.
  • Monitoring a site's redirect structure to ensure no accidental redirect loops are introduced during deployment.
  • Enumerating all URL redirects in a web application for security penetration testing (e.g., open redirect detection).

Sponsored

Sponsored Reserved space — layout preview until AdSense is connected

Run locally

This sample needs third-party packages, so it cannot run in the browser IDE. Copy the code above, install the packages shown at the top, then run it in your own Python environment.

More from Automation & scripting

Related tutorials and quizzes for this topic.