Find All Redirects on a Website in Python

Crawl a website from a starting URL, follow links within the same domain, and detect every HTTP redirect (301, 302, 303, 307, 308) using requests with redirects disabled.

Medium Python 3.9+ Jun 28, 2026 Automation & scripting 2 views 0 copies

redirects crawling requests web scraping seo automation

Requires third-party packages — install first

pip install requests

Python code

36 lines

Python 3.9+

import requests
from urllib.parse import urljoin, urlparse
from collections import deque

def find_redirects(start_url, max_pages=50):
    visited = set()
    redirects = {}
    queue = deque([start_url])
    
    while queue and len(visited) < max_pages:
        url = queue.popleft()
        if url in visited:
            continue
        try:
            response = requests.get(url, allow_redirects=False, timeout=10)
            visited.add(url)
            if response.status_code in (301, 302, 303, 307, 308):
                redirect_url = response.headers.get('Location', '')
                redirect_url = urljoin(url, redirect_url)
                redirects[url] = redirect_url
                if urlparse(redirect_url).netloc == urlparse(start_url).netloc:
                    queue.append(redirect_url)
            elif response.status_code == 200:
                for link in response.links.values():
                    full_url = urljoin(url, link['url'])
                    if urlparse(full_url).netloc == urlparse(start_url).netloc:
                        queue.append(full_url)
        except requests.RequestException:
            continue
    return redirects

if __name__ == "__main__":
    start = "https://httpbin.org/redirect-to?url=https%3A%2F%2Fexample.com"
    results = find_redirects(start, max_pages=10)
    for source, target in results.items():
        print(f"{source} -> {target}")

Output

stdout

https://httpbin.org/redirect-to?url=https%3A%2F%2Fexample.com -> https://example.com

How it works

The find_redirects function uses BFS to explore pages while respecting max_pages. allow_redirects=False prevents requests from automatically following redirects, letting the code capture the Location header. It then resolves relative URLs with urljoin and enforces same-domain crawling via urlparse(redirect_url).netloc == urlparse(start_url).netloc. Only HTTP 200 responses are scanned for further links (using response.links), and all non-200/non-redirect statuses are silently skipped. The function returns a dict mapping source URLs to their redirect targets.

Common mistakes

Forgetting to set `allow_redirects=False`, causing requests to follow redirects automatically and missing them entirely.
Not using `urljoin` to resolve relative URLs found in `Location` headers or page links.
Extracting links manually instead of using `response.links` which parses the HTML `<link>` tags from the `Link` header.
Failing to limit crawling with `max_pages` or a visited set, leading to infinite loops or large uncontrolled crawls.

Variations

Use `BeautifulSoup` to parse the response HTML for `<a>` tags and extract all links instead of relying on `response.links`.
Add support for crawling links from `<iframe>`, `<frame>`, or `sitemap.xml` files.

Real-world use cases

Auditing a website for broken or outdated redirect chains before a migration or SEO overhaul.
Monitoring a site's redirect structure to ensure no accidental redirect loops are introduced during deployment.
Enumerating all URL redirects in a web application for security penetration testing (e.g., open redirect detection).

Find All Redirects on a Website in Python

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Automation & scripting

Quizzes

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Automation & scripting

Keep learning

Quizzes