Maintenance

Site is under maintenance — quizzes are still available.

Go to quizzes
Sponsored Reserved space — layout preview until AdSense is connected

Find Sensitive Information in Log Files with Python

Scan log files for emails, IP addresses, API keys, and passwords using regular expressions in Python.

Medium Python 3.9+ Jun 28, 2026 Automation & scripting 2 views 0 copies

Python code

46 lines
Python 3.9+
import re
import os
from pathlib import Path

def find_sensitive_info(log_path):
    """Scans log files for patterns like emails, IPs, API keys, and passwords."""
    patterns = {
        'Email': r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}',
        'IP Address': r'\b(?:\d{1,3}\.){3}\d{1,3}\b',
        'API Key': r'(?i)(?:api[_-]?key|apikey)\s*[:=]\s*[\'"]?[a-zA-Z0-9]{16,}[\'"]?',
        'Password': r'(?i)(?:password|passwd|pwd)\s*[:=]\s*[\'"]?[^\s\'"]+[\'"]?'
    }
    findings = []
    log_file = Path(log_path)
    
    if not log_file.exists():
        print(f"File {log_path} not found.")
        return
    
    with open(log_file, 'r', encoding='utf-8', errors='ignore') as f:
        for line_num, line in enumerate(f, 1):
            for label, pattern in patterns.items():
                matches = re.findall(pattern, line)
                for match in matches:
                    findings.append((line_num, label, match))
    
    if findings:
        print(f"Sensitive information found in {log_path}:")
        for line_num, label, value in findings:
            print(f"  Line {line_num}: [{label}] {value}")
    else:
        print(f"No sensitive information found in {log_path}.")
    
    return findings

if __name__ == "__main__":
    # Example usage with a sample log file
    sample_log = "sample.log"
    with open(sample_log, 'w') as f:
        f.write("User email: john.doe@example.com\n")
        f.write("Server IP: 192.168.1.1\n")
        f.write("API key = sk-abc123def456ghi789\n")
        f.write("password: super_secret_123\n")
        f.write("Normal log entry at 2023-10-05\n")
    find_sensitive_info(sample_log)
    os.remove(sample_log)  # Clean up test file

Output

stdout
Sensitive information found in sample.log:
  Line 1: [Email] john.doe@example.com
  Line 2: [IP Address] 192.168.1.1
  Line 3: [API Key] sk-abc123def456ghi789
  Line 4: [Password] super_secret_123

How it works

This script uses Python's re module to define patterns for common sensitive data like emails, IP addresses, API keys, and passwords. It reads a log file line by line, checking each line against all patterns and collecting matches with line numbers. The pathlib.Path ensures safe file handling and cross-platform path management. Writing matches to a list allows further processing or redaction. The example creates a temporary log for demonstration then cleans up.

Common mistakes

  • Forgetting to handle errors for files with different encodings or binary content
  • Using overly broad regex that flags false positives like version numbers as IPs
  • Not masking or redacting matched values when outputting results
  • Hardcoding file paths instead of accepting command-line arguments

Variations

  1. Load patterns from an external JSON or YAML config file for easy updates
  2. Use `os.walk` to recursively scan all `.log` files in a directory

Real-world use cases

  • Auditing CI/CD build logs for accidentally committed credentials before archiving.
  • Running a security scan on production log dumps to detect exposed API keys or passwords.
  • Setting up a cron job that alerts the team whenever sensitive patterns appear in application logs.

Sponsored

Sponsored Reserved space — layout preview until AdSense is connected

Run this sample

Open the browser IDE to tweak the example and see results without installing anything.

Open editor

More from Automation & scripting

Related tutorials and quizzes for this topic.