SVT-AV1 1.0 Performance Improvements

The release of SVT-AV1 version 1.0 marked a major milestone for the open-source AV1 video encoder, delivering massive speedups, significantly reduced memory usage, and refined preset tunings. This article explores the most significant performance improvements introduced in libsvtav1 v1.0, highlighting how architectural optimizations, AVX2/AVX-512 instruction sets, and trade-off adjustments made AV1 encoding faster and highly viable for real-time and production workloads.

Refined Preset Architecture and Speedups

SVT-AV1 v1.0 overhauled its preset system (ranging from 0 to 13) to offer a much better trade-off between encoding speed and visual quality.

High-Speed Presets: Presets 9 through 13 were optimized to enable real-time and super-real-time encoding. Preset 10, in particular, became a popular sweet spot, offering near-real-time 4K encoding and high-speed 1080p encoding with minimal impact on compression efficiency.
Low-Speed Presets: Presets 0 through 3, which target maximum quality and archival encoding, received deep architectural optimizations. These changes reduced encoding times by up to 30-40% at these ultra-high-quality levels compared to older versions, while maintaining excellent VMAF (Video Multi-Method Assessment Fusion) scores.

Comprehensive Assembly and SIMD Optimizations

To maximize hardware utilization, libsvtav1 v1.0 introduced extensive AVX2 and AVX-512 assembly optimizations.

Motion Estimation and SAD/SSD: The encoder’s Sum of Absolute Differences (SAD) and Sum of Squared Differences (SSD) calculation engines were rewritten to leverage modern CPU vector instructions. This significantly accelerated the motion estimation phase, which is traditionally the most compute-intensive part of video encoding.
Intra-Prediction and Transforms: Hand-written assembly optimizations were added for block-copy, intra-prediction, and forward/inverse transform stages. This directly reduced CPU cycle consumption per frame across all modern AMD and Intel processors.

Dramatic Reduction in Memory Footprint

One of the biggest hurdles of early AV1 encoders was their massive random-access memory (RAM) requirement. Version 1.0 solved this by optimizing memory allocation patterns and restructuring how frames are buffered.

Lower RAM Overhead: SVT-AV1 v1.0 reduced memory usage by up to 50% in certain multi-threaded encoding scenarios.
Multi-Instance Friendly: By lowering the per-thread memory footprint, system administrators and cloud providers could run more parallel encoding instances on a single server without running out of system memory.

Improved Multi-Threading and Scaling

Older versions of SVT-AV1 struggled to scale efficiently across high-core-count processors, such as dual-socket servers or CPUs with 64+ threads. Version 1.0 addressed this by redesigning the threading framework.

Reduced Thread Synchronization Overhead: By optimizing row-based multi-threading and tile-parallelism, the encoder minimized the idle CPU time spent waiting for thread synchronization.
Better CPU Utilization: The encoder achieved near-linear scaling on modern high-core-count CPUs, ensuring that high-end server hardware is fully utilized during dense transcoding workloads.