Find Sensitive Information in Log Files with Python
Scan log files for emails, IP addresses, API keys, and passwords using regular expressions in Python.
Python code
46 linesimport re
import os
from pathlib import Path
def find_sensitive_info(log_path):
"""Scans log files for patterns like emails, IPs, API keys, and passwords."""
patterns = {
'Email': r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}',
'IP Address': r'\b(?:\d{1,3}\.){3}\d{1,3}\b',
'API Key': r'(?i)(?:api[_-]?key|apikey)\s*[:=]\s*[\'"]?[a-zA-Z0-9]{16,}[\'"]?',
'Password': r'(?i)(?:password|passwd|pwd)\s*[:=]\s*[\'"]?[^\s\'"]+[\'"]?'
}
findings = []
log_file = Path(log_path)
if not log_file.exists():
print(f"File {log_path} not found.")
return
with open(log_file, 'r', encoding='utf-8', errors='ignore') as f:
for line_num, line in enumerate(f, 1):
for label, pattern in patterns.items():
matches = re.findall(pattern, line)
for match in matches:
findings.append((line_num, label, match))
if findings:
print(f"Sensitive information found in {log_path}:")
for line_num, label, value in findings:
print(f" Line {line_num}: [{label}] {value}")
else:
print(f"No sensitive information found in {log_path}.")
return findings
if __name__ == "__main__":
# Example usage with a sample log file
sample_log = "sample.log"
with open(sample_log, 'w') as f:
f.write("User email: john.doe@example.com\n")
f.write("Server IP: 192.168.1.1\n")
f.write("API key = sk-abc123def456ghi789\n")
f.write("password: super_secret_123\n")
f.write("Normal log entry at 2023-10-05\n")
find_sensitive_info(sample_log)
os.remove(sample_log) # Clean up test file
Output
Sensitive information found in sample.log:
Line 1: [Email] john.doe@example.com
Line 2: [IP Address] 192.168.1.1
Line 3: [API Key] sk-abc123def456ghi789
Line 4: [Password] super_secret_123
How it works
This script uses Python's re module to define patterns for common sensitive data like emails, IP addresses, API keys, and passwords. It reads a log file line by line, checking each line against all patterns and collecting matches with line numbers. The pathlib.Path ensures safe file handling and cross-platform path management. Writing matches to a list allows further processing or redaction. The example creates a temporary log for demonstration then cleans up.
Common mistakes
- Forgetting to handle errors for files with different encodings or binary content
- Using overly broad regex that flags false positives like version numbers as IPs
- Not masking or redacting matched values when outputting results
- Hardcoding file paths instead of accepting command-line arguments
Variations
- Load patterns from an external JSON or YAML config file for easy updates
- Use `os.walk` to recursively scan all `.log` files in a directory
Real-world use cases
- Auditing CI/CD build logs for accidentally committed credentials before archiving.
- Running a security scan on production log dumps to detect exposed API keys or passwords.
- Setting up a cron job that alerts the team whenever sensitive patterns appear in application logs.
Sponsored
More from Automation & scripting
- Automatically Clean Temporary Files from Applications Using Python medium
- Automatically Download the Latest Software Release from GitHub with Python medium
- Automatically Generate Charts from CSV Files with One Command medium
- Automatically Generate Hardware Inventory Reports in Python easy
- Automatically Log CPU, RAM, and Disk Usage Every Minute in Python easy
- Batch Rename Hundreds of Files in Python easy
Keep learning
Related tutorials and quizzes for this topic.