Extract All Links from Any Website in Python
Scrape a webpage and extract all absolute HTTP/HTTPS links using requests and regex.
pip install requests
Python code
25 linesimport requests
import re
from urllib.parse import urljoin
def extract_links(url):
try:
response = requests.get(url)
response.raise_for_status()
html = response.text
# Find all href attributes in anchor tags
pattern = r'href=["\'](.*?)["\']'
raw_links = re.findall(pattern, html, re.IGNORECASE)
absolute_links = set()
for link in raw_links:
absolute = urljoin(url, link)
if absolute.startswith('http://') or absolute.startswith('https://'):
absolute_links.add(absolute)
return sorted(absolute_links)
except requests.exceptions.RequestException as e:
return [f"Error: {e}"]
if __name__ == "__main__":
links = extract_links("https://example.com")
for link in links[:10]: # Show first 10 links
print(link)
Output
https://www.iana.org/domains/example
How it works
The requests.get() fetches the page HTML. A regex pattern href=["'](.*?)["'] captures every link inside an href attribute. urljoin() converts relative URLs (like /about) to absolute ones. The script filters only http:// or https:// schemes and deduplicates results with a set. Finally it returns sorted links; the demo prints the first 10.
Common mistakes
- Not using `urljoin` — relative links don't work without base URL expansion.
- Forgetting to handle `RequestException` — network errors crash the script.
- Using a case-sensitive regex and missing uppercase `HREF`.
Variations
- Replace regex with `BeautifulSoup` and `soup.find_all('a')` for more robust HTML parsing.
- Fetch the page with `httpx` (async) for concurrent scraping of multiple pages.
Real-world use cases
- Auditing internal links on a website to find broken or outdated URLs before launch.
- Building a sitemap generator that crawls a domain and lists all reachable pages.
- Monitoring competitor sites for new blog posts or product pages by extracting links daily.
Sponsored
More from Automation & scripting
- Batch Rename Hundreds of Files in Python easy
- Build a Command-Line Password Generator in Python easy
- Build a Complete Web Scraper with Requests and BeautifulSoup in Python medium
- Build a Network Ping Monitor in Python medium
- Create a Local Search Engine to Instantly Find Files on Your Computer in Python medium
- Create a Simple HTTP File Server in Python easy
Keep learning
Related tutorials and quizzes for this topic.