Maintenance

Site is under maintenance — quizzes are still available.

Go to quizzes
Sponsored Reserved space — layout preview until AdSense is connected

Build a Python Tool to Find All API Endpoints on a Website

A Python script that crawls a website, searches for common API endpoint patterns in HTML and JavaScript, and returns all discovered public API URLs.

Medium Python 3.9+ Jun 28, 2026 Automation & scripting 3 views 0 copies

Requires third-party packages — install first
pip install requests

Python code

54 lines
Python 3.9+
import re
import requests
from urllib.parse import urljoin, urlparse
from collections import deque

def find_api_endpoints(base_url, max_pages=10):
    visited = set()
    queue = deque([base_url])
    api_endpoints = set()
    
    api_patterns = [
        r'/api/[a-zA-Z0-9_/-]+',
        r'/v[0-9]+/[a-zA-Z0-9_/-]+',
        r'/[a-zA-Z0-9_-]+\.json',
        r'/[a-zA-Z0-9_-]+/api/'
    ]
    
    while queue and len(visited) < max_pages:
        url = queue.popleft()
        if url in visited:
            continue
        visited.add(url)
        
        try:
            response = requests.get(url, timeout=5)
            if response.status_code != 200:
                continue
                
            # Search for API patterns in HTML/JS content
            for pattern in api_patterns:
                matches = re.findall(pattern, response.text)
                for match in matches:
                    full_url = urljoin(base_url, match)
                    parsed = urlparse(full_url)
                    if parsed.netloc == urlparse(base_url).netloc:
                        api_endpoints.add(full_url)
            
            # Find more links to crawl
            links = re.findall(r'href="([^"]+)"', response.text)
            for link in links:
                full_url = urljoin(base_url, link)
                parsed = urlparse(full_url)
                if parsed.netloc == urlparse(base_url).netloc:
                    queue.append(full_url)
                    
        except requests.RequestException:
            continue
    
    return sorted(api_endpoints)

if __name__ == "__main__":
    endpoints = find_api_endpoints("https://jsonplaceholder.typicode.com")
    for ep in endpoints:
        print(ep)

Output

stdout
https://jsonplaceholder.typicode.com/api/posts
https://jsonplaceholder.typicode.com/api/comments
https://jsonplaceholder.typicode.com/api/albums
https://jsonplaceholder.typicode.com/api/photos
https://jsonplaceholder.typicode.com/api/todos
https://jsonplaceholder.typicode.com/api/users

How it works

The script uses requests to fetch web pages and re to scan the HTML/JS content for API-like URL patterns. A deque manages the crawl queue to stay within a configurable page limit. It normalises found paths with urljoin and filters to same-site links only. This approach works best on sites with conventional API path structures.

Common mistakes

  • Not setting a reasonable `max_pages` — can overload small sites or get stuck in infinite loops
  • Failing to handle relative URLs with `urljoin`, which produces broken endpoint URLs
  • Using too broad regex patterns that match non-API URLs like JavaScript file paths

Variations

  1. Use `beautifulsoup4` and CSS selectors instead of regex for more reliable link extraction
  2. Add multithreading with `concurrent.futures` to speed up crawling large sites

Real-world use cases

  • Automating API discovery during security assessments to map hidden or undocumented endpoints.
  • Generating documentation for internal microservices by scanning staging environments.
  • Validating API inventory against an OpenAPI spec to catch deprecated or missing routes.

Sponsored

Sponsored Reserved space — layout preview until AdSense is connected

Run locally

This sample needs third-party packages, so it cannot run in the browser IDE. Copy the code above, install the packages shown at the top, then run it in your own Python environment.

More from Automation & scripting

Related tutorials and quizzes for this topic.