Python

JSON in Python: Beyond json.loads() — Performance, Streaming, and Alternatives

Learn advanced JSON handling in Python: custom encoders, high-speed libraries like orjson and ujson, streaming for large files, and when to use MessagePack or Protobuf instead.

June 2026 · 8 min read · 1 views · 0 hearts

Try in editor Tutorial catalog

JSON in Python: More Than Just `json.loads()`

If you've written Python for more than a week, you've probably used json.dumps() and json.loads() without a second thought. They work. They're simple. But JSON processing in Python goes much deeper — and if you're doing anything beyond basic dictionary serialization, you're leaving performance and flexibility on the table.

Let's dig into what's actually happening when Python touches JSON, and how to do it smarter.

The Standard Library: `json` Under the Hood

Python's built-in json module is battle-tested and perfectly fine for 90% of use cases. But here's what most people don't realize:

json.loads() doesn't just parse text — it builds Python objects recursively. That means every nested dictionary, list, string, and number is allocated in memory. For small payloads, it's instant. For a 500MB JSON log file? That's when you feel the pain.

import json

# This is fine
data = json.loads('{"name": "Alice", "score": 42}')

# This might crash on a large file
with open("massive_log.json") as f:
    data = json.load(f)  # Tries to load everything into RAM

Custom Encoders and Decoders: Don't Fight the JSON Format

One common rookie mistake: storing Python datetime objects as strings, then manually parsing them back. Instead, subclass json.JSONEncoder and json.JSONDecoder.

from datetime import datetime
import json

class DateTimeEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        return super().default(obj)

class DateTimeDecoder(json.JSONDecoder):
    def __init__(self):
        super().__init__(object_hook=self.dict_to_object)

    def dict_to_object(self, d):
        for key, value in d.items():
            try:
                d[key] = datetime.fromisoformat(value)
            except (TypeError, ValueError):
                pass
        return d

# Usage
original = {"timestamp": datetime.now()}
encoded = json.dumps(original, cls=DateTimeEncoder)
decoded = json.loads(encoded, cls=DateTimeDecoder)

This keeps serialization clean without ad-hoc string manipulation everywhere.

When `json` Isn't Enough: Performance Alternatives

`orjson`: The Speed Demon

If you're processing thousands of JSON payloads per second (think API responses, real-time data pipelines), the standard library starts showing its age. orjson is a Rust-based replacement that's typically 3-6x faster than Python's built-in json.

import orjson

# orjson returns bytes by default
data = orjson.dumps({"key": "value"})  # b'{"key":"value"}'
parsed = orjson.loads(data)

# For file I/O, it's a straight swap
with open("data.json", "wb") as f:
    f.write(orjson.dumps({"hello": "world"}))

Caveats: - No sort_keys parameter (it sorts alphabetically by default) - default callbacks work differently — you'll need to handle custom types before serialization - Only Python 3.8+

`ujson`: The Old Reliable

Before orjson existed, ujson (UltraJSON) was the go-to. It's still fast and widely compatible. Downside: it can silently mangle edge cases with NaN or Infinity.

import ujson

# Almost identical API to standard json
output = ujson.dumps({"data": [1, 2, 3]})
# ujson loads directly from file without full buffering
with open("large.json") as f:
    parsed = ujson.load(f)

When to choose: - Use orjson for raw speed and correctness with binary data - Use ujson for compatibility with older Python versions - Stick with standard json when you need maximum reliability or custom serialization hooks

Serialization Beyond JSON

JSON isn't always the right format. Here's when to consider alternatives:

Format	Best For	Python Library
MessagePack	Compact binary, faster than JSON	`msgpack`
Protocol Buffers	Strict schemas, high performance	`protobuf`
YAML	Human-readable config files	`PyYAML` or `ruamel.yaml`
Pickle	Python-only interprocess communication	Built-in `pickle`

Quick example with MessagePack:

import msgpack

# Binary serialization — smaller and faster than JSON
packed = msgpack.packb({"name": "Alice", "age": 30})
# packed is ~12 bytes vs ~27 bytes as JSON
unpacked = msgpack.unpackb(packed)

Pickle warning: Never unpickle data from untrusted sources. It can execute arbitrary code. Use JSON or MessagePack for external data.

Real-World Patterns That Save Headaches

Streaming Large JSON Lines (NDJSON)

If you're processing one JSON object per line (common in log files), don't load the whole file:

import ijson

# ijson streams JSON incrementally — no full parse
with open("giant_ndjson_file.json") as f:
    objects = ijson.items(f, '')  # '' means top-level items
    for obj in objects:
        process(obj)  # Each object is parsed on-demand

This is a lifesaver when your JSON can't fit in memory.

Schema Validation with Pydantic

Instead of manually validating JSON structures, use Pydantic models. They also serialize/deserialize automatically:

from pydantic import BaseModel
from typing import List
from datetime import datetime

class User(BaseModel):
    name: str
    scores: List[int]
    created_at: datetime

# Parse and validate in one step
raw = '{"name": "Bob", "scores": [95, 87, 92], "created_at": "2024-03-15T10:30:00"}'
user = User.model_validate_json(raw)
# user.name == "Bob", user.created_at is a datetime object

# Serialize back to JSON (with datetime handling built-in)
print(user.model_dump_json())

Caching Serialized Objects

If you repeatedly serialize the same data, cache the result:

from functools import lru_cache
import json

@lru_cache(maxsize=128)
def serialize_template(key, data_tuple):
    # Data is passed as tuple to be hashable
    return json.dumps(dict(data_tuple))

# Use it
cache_key = "user_profile_123"
profile_data = (("name", "Alice"), ("age", 30))
json_output = serialize_template(cache_key, profile_data)

The Bottom Line

JSON processing in Python isn't just about knowing json.loads(). It's about:

Choosing the right tool — standard json for simplicity, orjson for speed, streaming parsers for memory management
Customizing serialization — datetime objects, enums, and custom classes don't need to be manual
Knowing when JSON isn't the answer — MessagePack, Protobuf, or even plain binary can be better for performance-critical paths

Next time you reach for json.dumps(), ask yourself: "Is this the most efficient way for my use case?" The answer might surprise you.

Comments

Questions, corrections, and tips stay visible for everyone reading this page.

0 in thread

Join the discussion

No comments yet

Be the first to leave a note — it helps the next reader.

JSON in Python: More Than Just json.loads()

The Standard Library: json Under the Hood

Custom Encoders and Decoders: Don't Fight the JSON Format

When json Isn't Enough: Performance Alternatives

orjson: The Speed Demon

ujson: The Old Reliable

Serialization Beyond JSON

Real-World Patterns That Save Headaches

Streaming Large JSON Lines (NDJSON)

Schema Validation with Pydantic

Caching Serialized Objects

The Bottom Line

Comments

Join the discussion

JSON in Python: More Than Just `json.loads()`

The Standard Library: `json` Under the Hood

When `json` Isn't Enough: Performance Alternatives

`orjson`: The Speed Demon

`ujson`: The Old Reliable