Tech

The Technical Evolution of Video Compression Standards

From H.120 to VVC and neural codecs, this article traces the engineering breakthroughs that made streaming 4K and 8K video possible, explaining how each standard halved bitrates while exploiting human vision.

July 2026 14 min read 1 views 0 hearts

Try in editor Tutorial catalog

You’re streaming a 4K movie on a shaky mobile connection, and it looks flawless. That’s not magic—it’s the quiet triumph of decades of video compression engineering. Every pixel you see has been mathematically gutted, reassembled, and delivered in a fraction of its original size. Here’s how we got from clunky analog signals to AI-powered codecs that squeeze 8K video through a straw.

The Analog Era: Why We Needed Compression

Before digital video, analog signals like NTSC and PAL were inherently wasteful. They transmitted entire frames—line by line—with no regard for redundancy. A static news anchor’s background was sent in full, 30 times per second. The result? Massive bandwidth consumption and grainy, interference-prone pictures.

The first digital compression standards weren’t born from elegance—they were born from necessity. Satellite TV and early video conferencing needed to shove moving images through phone lines and radio waves. The solution was to stop sending everything and start sending only what changed.

H.120: The Awkward First Step

In 1984, the International Telecommunication Union (ITU) released H.120, the first digital video compression standard. It was a clunky hybrid: it used differential pulse-code modulation (DPCM) for intraframe compression and conditional replenishment for interframe differences. The result? Video at a whopping 1.5 Mbps for barely watchable 352x288 resolution.

H.120 was a proof of concept, not a product. It required expensive hardware and produced blocky artifacts. But it proved that digital compression could work—and that the real gains would come from exploiting how human vision actually works.

H.261: The Block-Based Revolution

The real breakthrough came in 1988 with H.261. This was the first standard to use a block-based hybrid coding approach—a framework that every major codec since has refined.

H.261 split each frame into 16x16 pixel macroblocks. For each block, the encoder decided: should I send this block fresh (intraframe), or just describe how it moved from the previous frame (interframe)? This motion-compensated prediction was the key insight. Instead of retransmitting a blue sky, you just say “the sky moved 3 pixels to the right.”

The standard also introduced the discrete cosine transform (DCT)—a mathematical trick that converts spatial pixel data into frequency coefficients. High-frequency details (sharp edges, noise) could be quantized more aggressively, while low-frequency information (smooth gradients) was preserved. This is where “lossy” compression gets its power: you throw away what the eye barely notices.

H.261 ran at p×64 kbps (where p ranged from 1 to 30), making it the backbone of early videoconferencing systems. It wasn’t pretty, but it worked.

MPEG-1 and MPEG-2: The Consumer Explosion

The Moving Picture Experts Group (MPEG) took H.261’s ideas and ran. MPEG-1 (1993) added bidirectional prediction—frames could look forward and backward in time. This allowed for more efficient compression of sequences with slow motion or repeated backgrounds. The result? VCD-quality video at 1.5 Mbps. It was good enough for The Lion King on a CD-ROM.

But the real game-changer was MPEG-2 (1995). It introduced scalable coding—the ability to encode video at multiple resolutions or quality levels in a single stream. This was critical for broadcast: a single MPEG-2 stream could serve both a grainy standard-definition TV and a crisp HDTV set, depending on the decoder’s capability.

MPEG-2 also added field coding for interlaced video, which was the dominant format for TV at the time. By treating odd and even scan lines separately, it avoided the combing artifacts that plagued earlier codecs. This standard powered DVD, digital satellite TV, and early HDTV broadcasts. It was the workhorse of the 1990s.

H.264/AVC: The Algorithm That Changed Everything

Released in 2003, H.264 (also known as AVC) was a quantum leap. It didn’t just improve compression—it redefined what was possible. At the same bitrate, H.264 could deliver twice the quality of MPEG-2. This wasn’t a tweak; it was a fundamental rethinking of the block-based approach.

Key innovations included:

Variable block sizes: Instead of fixed 16x16 macroblocks, H.264 could use blocks as small as 4x4. This allowed fine-grained motion estimation—a fast-moving object could be tracked with tiny blocks while static backgrounds used large ones.
Multiple reference frames: The encoder could look at several previous frames (not just the last one) to predict motion. This was crucial for scenes with repetitive motion, like a pendulum swinging.
In-loop deblocking filter: Blocky artifacts were smoothed out during the encoding process, not as a post-processing step. This dramatically improved visual quality at low bitrates.
Context-adaptive entropy coding: CABAC (Context-Adaptive Binary Arithmetic Coding) squeezed the final bitstream by assigning shorter codes to more probable symbols. It was computationally expensive but delivered 10-15% better compression than simpler methods.

H.264 became the universal standard. It powered Blu-ray, YouTube, Netflix, Skype, and virtually every video camera from 2005 onward. Its secret wasn’t just better math—it was a profile system that allowed the same standard to scale from a low-power mobile phone encoder to a Hollywood-grade studio encoder.

H.265/HEVC: Halving the Bitrate

By 2013, 4K video was knocking on the door, and H.264 couldn’t keep up. The answer was High Efficiency Video Coding (HEVC), or H.265. Its headline promise: 50% bitrate reduction for the same perceptual quality.

HEVC achieved this through several key changes:

Larger coding tree units (CTUs): Instead of 16x16 macroblocks, HEVC used blocks up to 64x64. This was a godsend for 4K content—large uniform areas (like a blue sky) could be encoded as a single massive block, saving bits.
More flexible partitioning: CTUs could be recursively split into smaller blocks (down to 4x4) using a quadtree structure. This allowed the encoder to adapt to fine detail without wasting bits on flat regions.
Improved motion compensation: HEVC used quarter-pixel motion vectors (H.264 used half-pixel) and allowed for asymmetric motion partitions. A moving car could be tracked with a 16x8 block instead of a square 16x16.
Sample adaptive offset (SAO): A post-processing filter that corrected banding artifacts by adding small offsets to reconstructed pixel values. It was a subtle but effective quality boost.

HEVC’s computational cost was steep—encoding 4K video could take hours on consumer hardware. But the payoff was undeniable. Netflix and YouTube adopted it for 4K streaming, and broadcasters used it to squeeze multiple HD channels into a single satellite transponder.

VP9 and AV1: The Open-Source Challenge

While MPEG and ITU developed standards behind paywalls, Google launched VP9 in 2013 as a royalty-free alternative. It was designed specifically for web streaming—YouTube’s dominant use case. VP9 matched HEVC’s compression efficiency but without the licensing headaches. It became the default codec for YouTube, Chrome, and Android.

But the real disruptor came in 2018: AV1, developed by the Alliance for Open Media (AOM)—a consortium including Google, Mozilla, Microsoft, Amazon, and Netflix. AV1 was designed from the ground up to be royalty-free and computationally intensive. Its compression efficiency surpassed HEVC by 20-30%, but at a staggering encoding cost—sometimes 100x slower than H.264.

AV1 introduced several novel tools:

Warped motion compensation: Instead of simple translation, AV1 could model affine transformations—rotation, scaling, and shearing. A spinning car wheel could be predicted with a single block, not dozens.
Compound prediction: Two motion vectors could be combined (averaged or weighted) to predict a single block. This was especially effective for transparent or semi-transparent objects.
Recursive block partitioning: AV1 used a 10-way partition tree (compared to HEVC’s 4-way), allowing for extremely fine-grained block shapes. A 128x128 block could be split into 4x4 sub-blocks if needed.
Film grain synthesis: Instead of encoding random noise, AV1 could store a film grain model and synthesize it at decode time. This saved massive bits for grainy content like classic movies.

The cost? AV1 encoding is computationally brutal. A 10-minute 4K video can take hours to encode on a high-end CPU. Hardware decoders are now common in GPUs and smartphones, but real-time encoding remains a challenge.

VVC/H.266: The Next Frontier

Just when you thought compression couldn’t get tighter, the Versatile Video Coding (VVC) standard (H.266) arrived in 2020. Its goal: 50% bitrate reduction over HEVC for the same perceptual quality. That’s a 4:1 compression ratio compared to H.264.

VVC’s bag of tricks includes:

Larger blocks: Coding tree units can be up to 128x128, with a 67-way partition tree. This allows for extremely efficient encoding of high-resolution content.
Intra-block copy: For screen content (like a video of a slideshow), VVC can copy pixel patterns from within the same frame—a huge win for remote desktop and game streaming.
Adaptive color transform: Instead of always using YCbCr, VVC can switch to RGB or other color spaces on the fly, improving compression for synthetic content.
Neural network-based tools: VVC includes a “neural network-based post-filter” that can be trained to remove compression artifacts. It’s optional, but it hints at the future.

The catch: VVC is even more computationally expensive than AV1. Real-time 8K encoding is still a server-farm problem. But for offline encoding (think Netflix’s content library), it’s a no-brainer.

The Perceptual Trick: What the Eye Doesn’t See

All modern codecs exploit the same biological loophole: human vision is lazy. We’re highly sensitive to luminance (brightness) changes but much less to color (chrominance) detail. That’s why every codec subsamples color—typically 4:2:0, meaning for every 4 luminance pixels, only 2 color pixels are stored horizontally and vertically.

We’re also terrible at detecting high-frequency noise in bright areas, and we’re slow to notice gradual changes. Codecs use psychovisual models to allocate bits where they matter most—sharp edges, faces, text—and skimp on flat, dark, or fast-moving regions.

The Bitrate Arms Race: From 64k to 8K

The evolution isn’t just about algorithms—it’s about the relentless push toward higher resolutions and lower bitrates. Here’s a rough timeline of what each generation enabled:

Standard	Year	Typical Bitrate for HD	Key Use Case
H.261	1988	1.5 Mbps for 352x288	Videoconferencing
MPEG-2	1995	4-8 Mbps for 720x480	DVD, satellite TV
H.264	2003	2-4 Mbps for 1080p	Blu-ray, YouTube, Netflix
H.265	2013	1-2 Mbps for 1080p	4K streaming, UHD Blu-ray
AV1	2018	0.5-1 Mbps for 1080p	Web streaming, YouTube
VVC	2020	0.3-0.6 Mbps for 1080p	8K broadcast, future VR

Notice the pattern: each generation roughly halves the bitrate for the same quality. But the law of diminishing returns is real. The jump from H.264 to H.265 was a 50% reduction. From H.265 to VVC? Another 30-50%. But the complexity cost is exponential.

The Hidden Complexity: Encoding vs. Decoding

Most people think of codecs as a single thing. In reality, there’s a massive asymmetry between encoding and decoding. Encoders are smart; decoders are dumb.

An encoder spends enormous computational effort searching for the best motion vectors, block partitions, and quantization parameters. It might try thousands of combinations per frame. A decoder just follows instructions: “copy block from frame 2, offset by (3, -5), apply this filter.” This asymmetry is by design—you encode once, but decode millions of times.

This is why hardware acceleration matters. A modern GPU can decode 8K AV1 in real time, but encoding the same stream might require a server farm. The industry is now building dedicated AI encoders that use neural networks to predict optimal encoding parameters, cutting encoding time from hours to minutes.

The Perceptual Frontier: Why 50% Isn’t the Limit

You might think we’re approaching the Shannon limit—the theoretical minimum bitrate for a given quality. But we’re not even close. The reason is that perceptual quality is not the same as mathematical fidelity.

Traditional codecs minimize mean squared error (MSE) or peak signal-to-noise ratio (PSNR). But these metrics don’t match human perception. A slightly blurry face is more annoying than a slightly blurry wall. A flickering edge is more distracting than a uniform color shift.

Modern codecs are moving toward perceptually optimized encoding. Netflix’s Video Multi-Method Assessment Fusion (VMAF) is a machine learning model that predicts human opinion scores. Encoders can now optimize for VMAF instead of PSNR, allocating bits to regions where humans actually look—faces, text, motion boundaries.

The AI Revolution: Neural Networks in the Loop

The next frontier isn’t a new standard—it’s neural network-based compression. Instead of hand-crafted transforms and motion models, deep learning can learn optimal representations from data.

End-to-end learned compression systems (like Google’s Ballé model or Facebook’s ELF-VC) replace the entire encoder-decoder pipeline with neural networks. They learn to map raw pixels to a compact latent space, then reconstruct them. Early results show 10-20% better compression than VVC for some content types.

But there’s a catch: these models are content-specific. A network trained on nature documentaries might fail on anime. And they’re computationally expensive—decoding a single frame can require billions of floating-point operations. Hardware acceleration is still in its infancy.

The Practical Trade-Offs

Choosing a codec isn’t just about compression ratios. Here’s what engineers actually weigh:

Encoding latency: For live streaming (sports, news), you need sub-second encoding. H.264 with hardware acceleration wins here. AV1 real-time encoding is still a research problem.
Decoding complexity: A 4K H.265 stream can be decoded on a five-year-old smartphone. AV1? Not so much. Hardware decoders are now common, but software decoding still drains battery.
Licensing: H.264 and H.265 are patent-encumbered. AV1 and VP9 are royalty-free. This is why the web is moving to AV1—no one wants to pay MPEG LA a per-stream fee.
Grain handling: Film grain is a nightmare for codecs. It looks like noise, so encoders try to smooth it out, destroying the cinematic look. AV1’s film grain synthesis is a clever workaround: encode the clean image, then add synthetic grain at decode time.

The Future: Neural Compression and Beyond

We’re entering an era where codecs are no longer designed by committee—they’re learned by neural networks. The MPEG-7 part 17 standard (VCM) is exploring neural network-based compression for machines, not humans. Think self-driving cars sending compressed video to a cloud server for analysis. The codec doesn’t need to look good to a human—it needs to preserve semantic information like lane markings and pedestrians.

Meanwhile, end-to-end learned compression is closing the gap with hand-crafted codecs. In 2023, Google’s Neural Video Codec (NVC) achieved compression efficiency comparable to VVC on certain content, with the added benefit of being adaptable to specific tasks (like face recognition or object detection).

The catch? These neural codecs are brittle. They fail on content they weren’t trained on—a nature documentary might look great, but a screencast of a spreadsheet could be a blocky mess. Hybrid approaches (neural tools inside traditional codecs) are the likely near-term future.

The Practical Reality: What You Should Use Today

If you’re building a video pipeline, here’s the pragmatic advice:

For maximum compatibility: H.264. Every device made in the last 15 years supports it. It’s the lingua franca of video.
For 4K streaming: H.265 (HEVC) is the safe bet. Hardware decoders are in every modern GPU and smartphone. But watch out for patent licensing—it’s a minefield.
For the web: AV1. YouTube, Netflix, and Facebook are all-in. The encoding cost is high, but the royalty-free nature and 30% bitrate savings over H.265 make it the future.
For archival or 8K: VVC. It’s the most efficient standard, but you’ll need serious hardware. Think of it as the “lossless” of lossy—you’re paying in compute, not in quality.

The Endless Arms Race

Video compression is a story of diminishing returns. The first codecs gave us 10x compression. The latest give us 1000x. But each new standard requires exponentially more compute. The next leap won’t come from better math—it will come from learned priors and content-adaptive encoding.

Imagine a codec that knows it’s encoding a talking head and allocates bits to the eyes and mouth, while aggressively compressing the background. Or a codec that recognizes a sports scene and preserves ball trajectories. That’s where we’re headed: compression that understands what it’s compressing.

The technical evolution of video compression isn’t just about squeezing bits—it’s about understanding what humans actually see. And as we push toward 16K, VR, and holographic displays, the only constant is that we’ll always need to send less than we think.

Comments

Questions, corrections, and tips stay visible for everyone reading this page.

0 in thread

Join the discussion

No comments yet

Be the first to leave a note — it helps the next reader.