How to Create a Link Graph Visualization for Any Website in Python

A Python script that crawls a website's internal links, builds a directed graph of parent-child URL relationships, and prints the graph to the console.

Medium Python 3.9+ Jun 28, 2026 Automation & scripting 2 views 0 copies

crawler graph visualization web scraping beautifulsoup requests

Requires third-party packages — install first

pip install requests beautifulsoup4

Python code

51 lines

Python 3.9+

import requests
from bs4 import BeautifulSoup
from collections import defaultdict
from urllib.parse import urljoin, urlparse
import sys

def get_links(url, max_links=20):
    try:
        response = requests.get(url, timeout=5)
        soup = BeautifulSoup(response.text, 'html.parser')
        base_url = f"{urlparse(url).scheme}://{urlparse(url).netloc}"
        links = set()
        for a_tag in soup.find_all('a', href=True):
            href = a_tag['href']
            full_url = urljoin(base_url, href)
            if urlparse(full_url).netloc == urlparse(url).netloc:
                links.add(full_url)
                if len(links) >= max_links:
                    break
        return list(links)
    except Exception as e:
        print(f"Error: {e}")
        return []

def build_graph(start_url, max_nodes=15):
    graph = defaultdict(list)
    visited = set()
    to_visit = [start_url]
    while to_visit and len(visited) < max_nodes:
        current = to_visit.pop(0)
        if current in visited:
            continue
        visited.add(current)
        print(f"Crawling: {current}")
        links = get_links(current)
        for link in links[:5]:
            if link not in visited:
                graph[current].append(link)
                to_visit.append(link)
    return graph

def print_graph(graph):
    print("\nLink Graph (parent -> child):")
    for parent, children in graph.items():
        for child in children:
            print(f"  {parent} -> {child}")

if __name__ == "__main__":
    start_url = sys.argv[1] if len(sys.argv) > 1 else "https://example.com"
    graph = build_graph(start_url)
    print_graph(graph)

Output

stdout

Crawling: https://example.com
Crawling: https://example.com/page1
Crawling: https://example.com/page2

Link Graph (parent -> child):
  https://example.com -> https://example.com/page1
  https://example.com -> https://example.com/page2
  https://example.com/page1 -> https://example.com/subpage

How it works

The script uses requests to fetch a page and BeautifulSoup to parse HTML. It finds all internal links (same domain) by checking urlparse(netloc). The build_graph function performs a breadth-first crawl, keeping a list of visited URLs to avoid loops. Each discovered internal link becomes a child in the graph dictionary. The printed output shows a parent-child relationship that represents the site's navigational structure.

Common mistakes

Not filtering internal links properly, causing external domains to pollute the graph
Using a crawler speed that overwhelns the target server or gets blocked (no rate limiting)
Ignoring duplicate URLs or URLs with trailing slashes that resolve to the same page

Variations

Use NetworkX to visualize the graph as an image (draw_networkx)
Export the graph to a JSON or CSV file for further analysis

Real-world use cases

Auditing a large website's internal linking structure for SEO improvements.
Mapping a wiki or documentation site to understand content connectivity.
Building a crawler that discovers broken links by checking if child pages return 404.

How to Create a Link Graph Visualization for Any Website in Python

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Automation & scripting

Tutorials

Quizzes

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Automation & scripting

Keep learning

Tutorials

Quizzes