Python

Python Regex Made Simple: A Beginner's Guide to Regular Expressions

Learn Python regular expressions from scratch — master patterns, character classes, quantifiers, groups, anchors, and real-world text extraction with the re module.

June 2026 · 8 min read · 1 views · 0 hearts

Try in editor Tutorial catalog

Holy smokes, regex looks like someone fell asleep on their keyboard — but once you crack the code, it's one of Python's most powerful tools for slicing, dicing, and searching through text.

What's the Big Deal?

Regular expressions (regex) let you define patterns instead of exact matches. Want to find every email in a 10,000-line log file? Regex does it in one line. Need to validate that a user typed a valid phone number? Regex handles it without a dozen if statements.

Python's re module gives you this superpower, and mastering it separates "I can write Python" from "I can wrangle data with Python."

The Basics: Your First Pattern

import re

text = "My email is hello@example.com"
pattern = r"hello@example\.com"
match = re.search(pattern, text)

The r before the string is crucial — it tells Python to treat backslashes literally. Without it, \n becomes a newline instead of a literal backslash-n character.

re.search() scans the entire string for the first match. If it finds one, you get a match object. If not, None.

Character Classes: The Power Moves

Instead of matching literal characters, you can match types of characters:

\d — any digit (0-9)
\w — any word character (letters, digits, underscore)
\s — any whitespace (space, tab, newline)
. — any character except newline

Put them in square brackets for custom sets: - [aeiou] — matches any vowel - [0-9] — same as \d - [^0-9] — NOT a digit (the ^ inside brackets means "not")

Quantifiers: How Many Times?

Patterns by themselves match once. Quantifiers let you say "how many":

* — zero or more times
+ — one or more times
? — zero or one time
{3} — exactly three times
{2,5} — two to five times

Now you can do real work:

# Find a phone number: three digits, dash, three digits, dash, four digits
pattern = r"\d{3}-\d{3}-\d{4}"

# Find an IP address (simplified)
pattern = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"

Groups: Grab the Good Stuff

Parentheses () create capture groups — they let you extract parts of a match:

text = "Contact: (555) 123-4567"
pattern = r"\((\d{3})\) (\d{3}-\d{4})"
match = re.search(pattern, text)

if match:
    area_code = match.group(1)    # "555"
    number = match.group(2)       # "123-4567"
    full_match = match.group(0)   # "(555) 123-4567"

Groups also let you reuse matched text with backreferences like \1, which is handy for finding duplicates or patterns that repeat.

Anchors: Where in the String?

^ — start of string
$ — end of string
\b — word boundary (between word and non-word character)

These prevent partial matches. Want to ensure the whole string is a valid date?

pattern = r"^\d{4}-\d{2}-\d{2}$"  # Exactly "YYYY-MM-DD"

Without ^ and $, "2024-13-99" might partially match other parts of a larger string.

The Three Main Functions

re.search(pattern, string) — finds first match anywhere
re.match(pattern, string) — finds match only at the start of string
re.findall(pattern, string) — returns list of all non-overlapping matches

text = "cats and dogs and cats and birds"
cats = re.findall(r"cats", text)  # ['cats', 'cats']

For advanced needs, re.finditer() yields match objects one at a time (better for memory with huge files).

Raw Strings: Your Best Friend

Always use raw strings (r"...") for your patterns. Here's why:

# Without raw string — nightmare
pattern = "\\d{3}-\\d{3}-\\d{4}"

# With raw string — sane
pattern = r"\d{3}-\d{3}-\d{4}"

Each backslash in a normal string needs to be escaped. Regex already uses backslashes heavily. Raw strings save your sanity.

Common Pitfalls

Greedy vs lazy matching: by default, * and + match as much as possible (greedy). Add ? to make them lazy:

text = "<h1>Title</h1><p>Body</p>"
greedy = re.findall(r"<.*>", text)   # ['<h1>Title</h1><p>Body</p>']
lazy = re.findall(r"<.*?>", text)    # ['<h1>', '</h1>', '<p>', '</p>']

Escaping special characters: If you want to match a literal . or *, put a backslash before it: \., \*.

Compiled patterns for speed: If you're reusing the same pattern, compile it once:

pattern = re.compile(r"\d{3}-\d{3}-\d{4}")
results = pattern.findall(big_text_file)

Real-World Example: Extracting Data from Logs

import re

log = """
ERROR 2024-01-15 10:23:45: Disk full on /dev/sda1
WARNING 2024-01-15 10:24:01: CPU usage at 92%
ERROR 2024-01-15 10:25:12: Connection timeout to database
"""

pattern = r"(ERROR|WARNING) (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}): (.+)"
for match in re.finditer(pattern, log):
    level, timestamp, message = match.groups()
    print(f"[{level}] {timestamp} - {message}")

Output:

[ERROR] 2024-01-15 10:23:45 - Disk full on /dev/sda1
[WARNING] 2024-01-15 10:24:01 - CPU usage at 92%
[ERROR] 2024-01-15 10:25:12 - Connection timeout to database

When Not to Use Regex

Regex is powerful but not always the right tool: - Parsing complex HTML/XML? Use BeautifulSoup or lxml - Processing JSON? Use json module - Matching nested structures (like parentheses inside parentheses)? Regex can't do it (theoretically impossible with standard regex)

Quick Reference Cheat Sheet

Pattern	Matches
`\d`	Digit
`\w`	Word character (letter, digit, underscore)
`\s`	Whitespace
`.`	Any character (except newline)
`*`	Zero or more
`+`	One or more
`?`	Zero or one
`{n,m}`	Between n and m times
`^`	Start of string
`$`	End of string
`\|`	OR (alternation)
`()`	Capturing group
`(?:)`	Non-capturing group

Regex looks intimidating, but start with small patterns, test them one piece at a time, and soon you'll be matching anything — without the headache.

Comments

Questions, corrections, and tips stay visible for everyone reading this page.

0 in thread

Join the discussion

No comments yet

Be the first to leave a note — it helps the next reader.