Python
Stop Wasting Memory: Why Python Generators Will Change How You Write Code
Learn how Python generators and lazy evaluation let you process massive datasets with minimal memory, using practical examples like infinite sequences and file processing pipelines.
June 2026 · 8 min read · 1 views · 0 hearts
Advertisement
Stop Wasting Memory: Why Python Generators Will Change How You Write Code
You're building a tool to process a 10GB log file, and your laptop just froze. Sound familiar? The problem isn't your machine — it's how you think about data. Python generators and lazy evaluation are the mental shift that lets you handle massive datasets with the memory footprint of a postage stamp.
What's the Big Deal About Lazy Evaluation?
Let's start with a concrete example. Imagine you need to process every line in a 10GB file. The naive approach:
# This will crash or swap like crazy
all_lines = open("huge_log.txt").readlines()
for line in all_lines:
process(line)
readlines() loads the entire file into memory. On a 10GB file, that's 10GB of RAM gone — plus Python's overhead. Instead, generators give you:
# Memory usage: one line at a time
def read_lines(filename):
with open(filename) as f:
for line in f:
yield line
for line in read_lines("huge_log.txt"):
process(line)
The magic? yield pauses execution, returns a value, and remembers where it left off. Each iteration pulls only what's needed — the rest of the file never occupies memory.
The Generator Mechanics That Matter
A generator isn't just a function with yield — it's a state machine. Here's what Python does under the hood:
def counter(n):
i = 0
while i < n:
yield i
i += 1
gen = counter(3)
print(next(gen)) # 0 — function runs until first yield
print(next(gen)) # 1 — resumes after yield, continues loop
print(next(gen)) # 2
print(next(gen)) # StopIteration — function exhausted
Think of each next() call as: "Run the function until you hit yield, hand me that value, then freeze everything — local variables, execution point, everything."
Generator Expressions: The One-Liner Power Move
If you've used list comprehensions, generator expressions will feel familiar but dangerously powerful:
# List comprehension — computes all values upfront
squares = [x**2 for x in range(10_000_000)] # 80MB+ memory
# Generator expression — computes on demand
squares = (x**2 for x in range(10_000_000)) # Basically zero memory
# Usage is identical for iteration
for sq in squares:
print(sq)
Memory difference? About 80MB vs 56 bytes. Generator expressions use parentheses () instead of brackets [], and they're memory-efficient by default. The trade-off: you can only iterate once, and you can't index into them.
Real-World Patterns: When Generators Save Your Day
1. Infinite Sequences
Lists can't be infinite. Generators can — they produce values on-the-fly forever:
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
fib = fibonacci()
first_20 = [next(fib) for _ in range(20)] # [0, 1, 1, 2, 3, 5...]
No upper bound, no memory growth. You take what you need.
2. Pipelining Data
Build processing chains that never create intermediate lists:
def read_ints(filepath):
with open(filepath) as f:
for line in f:
yield int(line.strip())
def filter_even(stream):
for x in stream:
if x % 2 == 0:
yield x
def multiply_by_10(stream):
for x in stream:
yield x * 10
# Pipeline — memory usage stays tiny
pipeline = multiply_by_10(filter_even(read_ints("numbers.txt")))
for result in pipeline:
print(result)
Each generator in the chain processes one element and passes it forward. No full copies, no buffer bloat.
3. Lazy File Processing
The most common win: handling files too large for memory.
def tail(filename, n=10):
"""Read last n lines of a huge file without loading all of it."""
with open(filename) as f:
# Fast-forward to near end
f.seek(0, 2) # Seek to end
buffer_size = 1024
# Read chunks backwards until we have n lines
# ... (implementation detail — but it's all generators underneath)
pass
When Not to Use Generators
Generators aren't free. They have overhead per next() call. For small datasets (under a few thousand items), a list is usually faster and simpler.
Key trade-offs:
| Scenario | Choose |
|---|---|
| Dataset fits in memory, accessed multiple times | List/ tuple |
| Large dataset, single pass | Generator |
| Need random access by index | List/ array |
| Infinite or unknown length | Generator |
| Debugging: need to inspect values | List (easier) |
The yield from Shortcut
Python 3.3+ introduced yield from for delegating to sub-generators:
def flatten(nested):
for item in nested:
if isinstance(item, list):
yield from flatten(item)
else:
yield item
nested = [1, [2, [3, 4], 5], 6]
print(list(flatten(nested))) # [1, 2, 3, 4, 5, 6]
Same logic as manual iteration, but cleaner and a bit faster.
Generator Sending: Two-Way Communication
Generators can receive values too, using .send():
def running_average():
total = 0
count = 0
avg = None
while True:
value = yield avg # Receives value via send()
if value is not None:
total += value
count += 1
avg = total / count
avg_gen = running_average()
next(avg_gen) # Initialize — run to first yield
print(avg_gen.send(10)) # 10.0
print(avg_gen.send(20)) # 15.0
print(avg_gen.send(30)) # 20.0
This is how coroutines work under the hood — and the foundation of async Python, but that's a rabbit hole for another article.
The Bottom Line
Generators aren't about being clever — they're about working within constraints. Every large-scale data processing tool you've used (pandas, Spark, database cursors) uses lazy evaluation somewhere. Learning to think this way makes you a better programmer, not just a Python developer.
Next time you write [x for x in something], ask yourself: do I really need all of this right now? If not, switch to () and save your RAM for what matters.
Advertisement
Comments
Questions, corrections, and tips stay visible for everyone reading this page.
Join the discussion
No comments yet
Be the first to leave a note — it helps the next reader.