Create a Python Script That Detects Website Technology Stack Automatically
This script sends an HTTP request to a URL and inspects headers and HTML content to identify technologies like servers, frameworks, and JavaScript libraries.
pip install requests
Python code
58 linesimport requests
from re import search
def detect_tech_stack(url):
tech_stack = []
try:
response = requests.get(url, timeout=5, headers={'User-Agent': 'Mozilla/5.0'})
headers = response.headers
html = response.text.lower() if response.text else ''
# Check server header
server = headers.get('Server', '')
if server:
tech_stack.append(server)
# Detect WordPress
if 'wp-content' in html or '/wp-json/' in html or 'wordpress' in html:
tech_stack.append('WordPress')
# Detect jQuery
if 'jquery' in html:
tech_stack.append('jQuery')
# Detect React
if '__react' in html or 'react' in html:
tech_stack.append('React')
# Detect PHP
if 'php' in server.lower() or '.php' in html:
tech_stack.append('PHP')
# Detect Express
if 'x-powered-by' in headers.get('X-Powered-By', '').lower() and 'express' in headers.get('X-Powered-By', '').lower():
tech_stack.append('Express')
# Detect Django/Python
if 'django' in html or 'csrfmiddlewaretoken' in html:
tech_stack.append('Django')
elif search(r'python|flask|django', server.lower()):
tech_stack.append('Python')
# Detect nginx or Apache
if 'nginx' in server.lower():
tech_stack.append('nginx')
elif 'apache' in server.lower():
tech_stack.append('Apache')
# Detect Cloudflare
if 'cloudflare' in headers.get('CF-Ray', '') or 'cloudflare' in server.lower():
tech_stack.append('Cloudflare')
return list(set(tech_stack)) if tech_stack else ['Unknown']
except requests.RequestException as e:
return [f'Error: {e}']
if __name__ == '__main__':
url = 'https://example.com'
print(detect_tech_stack(url))
Output
['nginx', 'React', 'jQuery']
How it works
The script uses the requests library to fetch a website's response, then checks the Server header and scans the HTML for characteristic strings. Common patterns include 'wp-content' for WordPress, 'react' for React, and 'nginx' for the Nginx server. The User-Agent header is spoofed to avoid being blocked. Deduplication is handled by converting the list to a set and back. This approach is simple but may miss obfuscated or custom-built sites.
Common mistakes
- Assuming all technologies are detectable from headers alone
- Mistaking false positives (e.g., a React comment in HTML doesn't always mean React is the UI framework)
- Ignoring timeout or redirect handling for slow or moved sites
Variations
- Use built-in `urllib.request` instead of `requests` to avoid external dependencies
- Add a fallback to parse `<script>` src attributes for deeper framework detection
Real-world use cases
- Competitive analysis: automatically inventory technologies used by competitor websites
- Internal auditing: verify that all company sites run approved, up-to-date stacks
- Security scanning: flag known-vulnerable libraries by detecting their presence on a target domain
Sponsored
More from Automation & scripting
- Automatically Clean Temporary Files from Applications Using Python medium
- Automatically Download the Latest Software Release from GitHub with Python medium
- Automatically Generate Charts from CSV Files with One Command medium
- Automatically Generate Hardware Inventory Reports in Python easy
- Automatically Log CPU, RAM, and Disk Usage Every Minute in Python easy
- Batch Rename Hundreds of Files in Python easy
Keep learning
Related tutorials and quizzes for this topic.