Extract All Links from Any Website in Python

Scrape a webpage and extract all absolute HTTP/HTTPS links using requests and regex.

Medium Python 3.9+ Jun 27, 2026 Automation & scripting 1 views 0 copies

web-scraping links requests regex url-parsing

Requires third-party packages — install first

pip install requests

Python code

25 lines

Python 3.9+

import requests
import re
from urllib.parse import urljoin

def extract_links(url):
    try:
        response = requests.get(url)
        response.raise_for_status()
        html = response.text
        # Find all href attributes in anchor tags
        pattern = r'href=["\'](.*?)["\']'
        raw_links = re.findall(pattern, html, re.IGNORECASE)
        absolute_links = set()
        for link in raw_links:
            absolute = urljoin(url, link)
            if absolute.startswith('http://') or absolute.startswith('https://'):
                absolute_links.add(absolute)
        return sorted(absolute_links)
    except requests.exceptions.RequestException as e:
        return [f"Error: {e}"]

if __name__ == "__main__":
    links = extract_links("https://example.com")
    for link in links[:10]:  # Show first 10 links
        print(link)

Output

stdout

https://www.iana.org/domains/example

How it works

The requests.get() fetches the page HTML. A regex pattern href=["'](.*?)["'] captures every link inside an href attribute. urljoin() converts relative URLs (like /about) to absolute ones. The script filters only http:// or https:// schemes and deduplicates results with a set. Finally it returns sorted links; the demo prints the first 10.

Common mistakes

Not using `urljoin` — relative links don't work without base URL expansion.
Forgetting to handle `RequestException` — network errors crash the script.
Using a case-sensitive regex and missing uppercase `HREF`.

Variations

Replace regex with `BeautifulSoup` and `soup.find_all('a')` for more robust HTML parsing.
Fetch the page with `httpx` (async) for concurrent scraping of multiple pages.

Real-world use cases

Auditing internal links on a website to find broken or outdated URLs before launch.
Building a sitemap generator that crawls a domain and lists all reachable pages.
Monitoring competitor sites for new blog posts or product pages by extracting links daily.

Extract All Links from Any Website in Python

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Automation & scripting

Tutorials

Quizzes

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Automation & scripting

Keep learning

Tutorials

Quizzes