Python
JSON in Python: Beyond json.loads() — Performance, Streaming, and Alternatives
Learn advanced JSON handling in Python: custom encoders, high-speed libraries like orjson and ujson, streaming for large files, and when to use MessagePack or Protobuf instead.
June 2026 · 8 min read · 1 views · 0 hearts
Advertisement
JSON in Python: More Than Just json.loads()
If you've written Python for more than a week, you've probably used json.dumps() and json.loads() without a second thought. They work. They're simple. But JSON processing in Python goes much deeper — and if you're doing anything beyond basic dictionary serialization, you're leaving performance and flexibility on the table.
Let's dig into what's actually happening when Python touches JSON, and how to do it smarter.
The Standard Library: json Under the Hood
Python's built-in json module is battle-tested and perfectly fine for 90% of use cases. But here's what most people don't realize:
json.loads() doesn't just parse text — it builds Python objects recursively. That means every nested dictionary, list, string, and number is allocated in memory. For small payloads, it's instant. For a 500MB JSON log file? That's when you feel the pain.
import json
# This is fine
data = json.loads('{"name": "Alice", "score": 42}')
# This might crash on a large file
with open("massive_log.json") as f:
data = json.load(f) # Tries to load everything into RAM
Custom Encoders and Decoders: Don't Fight the JSON Format
One common rookie mistake: storing Python datetime objects as strings, then manually parsing them back. Instead, subclass json.JSONEncoder and json.JSONDecoder.
from datetime import datetime
import json
class DateTimeEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
return super().default(obj)
class DateTimeDecoder(json.JSONDecoder):
def __init__(self):
super().__init__(object_hook=self.dict_to_object)
def dict_to_object(self, d):
for key, value in d.items():
try:
d[key] = datetime.fromisoformat(value)
except (TypeError, ValueError):
pass
return d
# Usage
original = {"timestamp": datetime.now()}
encoded = json.dumps(original, cls=DateTimeEncoder)
decoded = json.loads(encoded, cls=DateTimeDecoder)
This keeps serialization clean without ad-hoc string manipulation everywhere.
When json Isn't Enough: Performance Alternatives
orjson: The Speed Demon
If you're processing thousands of JSON payloads per second (think API responses, real-time data pipelines), the standard library starts showing its age. orjson is a Rust-based replacement that's typically 3-6x faster than Python's built-in json.
import orjson
# orjson returns bytes by default
data = orjson.dumps({"key": "value"}) # b'{"key":"value"}'
parsed = orjson.loads(data)
# For file I/O, it's a straight swap
with open("data.json", "wb") as f:
f.write(orjson.dumps({"hello": "world"}))
Caveats:
- No sort_keys parameter (it sorts alphabetically by default)
- default callbacks work differently — you'll need to handle custom types before serialization
- Only Python 3.8+
ujson: The Old Reliable
Before orjson existed, ujson (UltraJSON) was the go-to. It's still fast and widely compatible. Downside: it can silently mangle edge cases with NaN or Infinity.
import ujson
# Almost identical API to standard json
output = ujson.dumps({"data": [1, 2, 3]})
# ujson loads directly from file without full buffering
with open("large.json") as f:
parsed = ujson.load(f)
When to choose:
- Use orjson for raw speed and correctness with binary data
- Use ujson for compatibility with older Python versions
- Stick with standard json when you need maximum reliability or custom serialization hooks
Serialization Beyond JSON
JSON isn't always the right format. Here's when to consider alternatives:
| Format | Best For | Python Library |
|---|---|---|
| MessagePack | Compact binary, faster than JSON | msgpack |
| Protocol Buffers | Strict schemas, high performance | protobuf |
| YAML | Human-readable config files | PyYAML or ruamel.yaml |
| Pickle | Python-only interprocess communication | Built-in pickle |
Quick example with MessagePack:
import msgpack
# Binary serialization — smaller and faster than JSON
packed = msgpack.packb({"name": "Alice", "age": 30})
# packed is ~12 bytes vs ~27 bytes as JSON
unpacked = msgpack.unpackb(packed)
Pickle warning: Never unpickle data from untrusted sources. It can execute arbitrary code. Use JSON or MessagePack for external data.
Real-World Patterns That Save Headaches
Streaming Large JSON Lines (NDJSON)
If you're processing one JSON object per line (common in log files), don't load the whole file:
import ijson
# ijson streams JSON incrementally — no full parse
with open("giant_ndjson_file.json") as f:
objects = ijson.items(f, '') # '' means top-level items
for obj in objects:
process(obj) # Each object is parsed on-demand
This is a lifesaver when your JSON can't fit in memory.
Schema Validation with Pydantic
Instead of manually validating JSON structures, use Pydantic models. They also serialize/deserialize automatically:
from pydantic import BaseModel
from typing import List
from datetime import datetime
class User(BaseModel):
name: str
scores: List[int]
created_at: datetime
# Parse and validate in one step
raw = '{"name": "Bob", "scores": [95, 87, 92], "created_at": "2024-03-15T10:30:00"}'
user = User.model_validate_json(raw)
# user.name == "Bob", user.created_at is a datetime object
# Serialize back to JSON (with datetime handling built-in)
print(user.model_dump_json())
Caching Serialized Objects
If you repeatedly serialize the same data, cache the result:
from functools import lru_cache
import json
@lru_cache(maxsize=128)
def serialize_template(key, data_tuple):
# Data is passed as tuple to be hashable
return json.dumps(dict(data_tuple))
# Use it
cache_key = "user_profile_123"
profile_data = (("name", "Alice"), ("age", 30))
json_output = serialize_template(cache_key, profile_data)
The Bottom Line
JSON processing in Python isn't just about knowing json.loads(). It's about:
- Choosing the right tool — standard
jsonfor simplicity,orjsonfor speed, streaming parsers for memory management - Customizing serialization — datetime objects, enums, and custom classes don't need to be manual
- Knowing when JSON isn't the answer — MessagePack, Protobuf, or even plain binary can be better for performance-critical paths
Next time you reach for json.dumps(), ask yourself: "Is this the most efficient way for my use case?" The answer might surprise you.
Advertisement
Comments
Questions, corrections, and tips stay visible for everyone reading this page.
Join the discussion
No comments yet
Be the first to leave a note — it helps the next reader.