Maintenance

Site is under maintenance — quizzes are still available.

Go to quizzes
Sponsored Reserved space — layout preview until AdSense is connected

How to Create a Link Graph Visualization for Any Website in Python

A Python script that crawls a website's internal links, builds a directed graph of parent-child URL relationships, and prints the graph to the console.

Medium Python 3.9+ Jun 28, 2026 Automation & scripting 2 views 0 copies

Requires third-party packages — install first
pip install requests beautifulsoup4

Python code

51 lines
Python 3.9+
import requests
from bs4 import BeautifulSoup
from collections import defaultdict
from urllib.parse import urljoin, urlparse
import sys

def get_links(url, max_links=20):
    try:
        response = requests.get(url, timeout=5)
        soup = BeautifulSoup(response.text, 'html.parser')
        base_url = f"{urlparse(url).scheme}://{urlparse(url).netloc}"
        links = set()
        for a_tag in soup.find_all('a', href=True):
            href = a_tag['href']
            full_url = urljoin(base_url, href)
            if urlparse(full_url).netloc == urlparse(url).netloc:
                links.add(full_url)
                if len(links) >= max_links:
                    break
        return list(links)
    except Exception as e:
        print(f"Error: {e}")
        return []

def build_graph(start_url, max_nodes=15):
    graph = defaultdict(list)
    visited = set()
    to_visit = [start_url]
    while to_visit and len(visited) < max_nodes:
        current = to_visit.pop(0)
        if current in visited:
            continue
        visited.add(current)
        print(f"Crawling: {current}")
        links = get_links(current)
        for link in links[:5]:
            if link not in visited:
                graph[current].append(link)
                to_visit.append(link)
    return graph

def print_graph(graph):
    print("\nLink Graph (parent -> child):")
    for parent, children in graph.items():
        for child in children:
            print(f"  {parent} -> {child}")

if __name__ == "__main__":
    start_url = sys.argv[1] if len(sys.argv) > 1 else "https://example.com"
    graph = build_graph(start_url)
    print_graph(graph)

Output

stdout
Crawling: https://example.com
Crawling: https://example.com/page1
Crawling: https://example.com/page2

Link Graph (parent -> child):
  https://example.com -> https://example.com/page1
  https://example.com -> https://example.com/page2
  https://example.com/page1 -> https://example.com/subpage

How it works

The script uses requests to fetch a page and BeautifulSoup to parse HTML. It finds all internal links (same domain) by checking urlparse(netloc). The build_graph function performs a breadth-first crawl, keeping a list of visited URLs to avoid loops. Each discovered internal link becomes a child in the graph dictionary. The printed output shows a parent-child relationship that represents the site's navigational structure.

Common mistakes

  • Not filtering internal links properly, causing external domains to pollute the graph
  • Using a crawler speed that overwhelns the target server or gets blocked (no rate limiting)
  • Ignoring duplicate URLs or URLs with trailing slashes that resolve to the same page

Variations

  1. Use NetworkX to visualize the graph as an image (draw_networkx)
  2. Export the graph to a JSON or CSV file for further analysis

Real-world use cases

  • Auditing a large website's internal linking structure for SEO improvements.
  • Mapping a wiki or documentation site to understand content connectivity.
  • Building a crawler that discovers broken links by checking if child pages return 404.

Sponsored

Sponsored Reserved space — layout preview until AdSense is connected

Run locally

This sample needs third-party packages, so it cannot run in the browser IDE. Copy the code above, install the packages shown at the top, then run it in your own Python environment.

More from Automation & scripting

Related tutorials and quizzes for this topic.