SVT-AV1 Preset 0 vs 13: Computational Differences
This article explores the stark computational differences between Preset 0 and Preset 13 in the SVT-AV1 (Scalable Video Technology for AV1) encoder. It examines how these polar opposite configurations affect CPU utilization, encoding tools, search algorithms, and overall processing speed, helping video engineers understand the trade-offs between maximum compression efficiency and real-time execution.
The SVT-AV1 Preset Spectrum
SVT-AV1 utilizes a numerical preset system ranging from 0 to 13 to manage the trade-off between encoding speed and compression efficiency. Preset 0 represents the absolute maximum quality and compression efficiency at the cost of extreme computational complexity. Conversely, Preset 13 is designed for maximum throughput and ultra-low latency, discarding complex compression tools to achieve real-time encoding speeds even on modest hardware.
Preset 0: Exhaustive Search and Maximum Complexity
Preset 0 is computationally intensive and is generally reserved for academic research, archival purposes, or reference testing. It enables almost every advanced coding tool defined in the AV1 specification and forces the encoder to perform exhaustive searches across all possible encoding parameters.
Block Partitioning
AV1 allows coding tree blocks (CTBs) up to 128x128 pixels, which can be recursively split down to 4x4 blocks in various shapes (square, horizontal, vertical, and wedge-like partitions). In Preset 0, the encoder evaluates nearly every possible partition combination. It uses brute-force Rate-Distortion Optimization (RDO) to calculate the exact cost of each split, resulting in millions of mathematical calculations per frame.
Motion Estimation (ME)
Preset 0 utilizes an exhaustive motion estimation search. It searches across a large number of reference frames (both past and future) using wide search windows. It calculates precise quarter-pixel (sub-pel) motion vectors and evaluates complex global motion models (such as rotation, zoom, and perspective changes) for every block.
Rate-Distortion Optimization (RDO)
RDO is the most computationally expensive part of modern video encoding. In Preset 0, SVT-AV1 performs full RDO for every mode decision, intra-prediction direction, and transform type. The encoder literally decompresses the block to measure the exact distortion and counts the exact bits required, repeating this process for hundreds of permutations per block.
In-Loop Filtering
Preset 0 enables all three of AV1’s in-loop filters with maximum analysis depth: * Deblocking Filter (DF): Smooths block boundaries. * Constrained Directional Enhancement Filter (CDEF): Redundant ringing artifact reduction. * Loop Restoration Filter (LR): Uses Wiener and Self-Guided restoration filters with highly exhaustive coefficient searches.
Preset 13: Heuristic-Driven Speed
Preset 13 is designed for high-speed streaming, screen sharing, and real-time communication. To achieve this speed, the encoder bypasses the vast majority of AV1’s advanced features, relying on fast heuristics and early-termination algorithms.
Block Partitioning
Instead of evaluating the entire partition tree, Preset 13 restricts partitioning. It avoids deep recursive splits (often keeping block sizes larger and uniform) and relies on quick spatial heuristics to guess the best partition size without running RDO calculations on multiple options.
Motion Estimation (ME)
Motion estimation is stripped down to the bare minimum. The encoder restricts the search to a single reference frame and uses a highly localized, small search window. Sub-pel refinement is either bypassed or highly simplified, and complex global motion estimation is completely disabled.
Rate-Distortion Optimization (RDO)
Full RDO is largely deactivated in Preset 13. Instead, the encoder uses fast cost estimators—mathematical approximations of bit-rate and distortion that require simple additions rather than full encoding and decoding loops. Mode decisions are made instantly based on local spatial and temporal gradients.
In-Loop Filtering
In-loop filters are either completely disabled or run in their fastest, lowest-complexity configurations. Loop Restoration is bypassed, and CDEF is tuned to evaluate only a fraction of its normal search directions, significantly reducing CPU cache thrashing and memory bandwidth usage.
Computational Comparison Summary
| Computational Metric | Preset 0 | Preset 13 |
|---|---|---|
| Primary Use Case | Archival / Research | Real-time Streaming |
| CPU Cycle Demand | Extremely High (Thousands of cycles/pixel) | Minimal (Fraction of a cycle/pixel) |
| Partition Search | Full recursive depth (128x128 to 4x4) | Restricted, heuristic-driven sizes |
| RDO Complexity | Exhaustive (Full encode/decode loops) | Fast Cost Estimation (Heuristic approximations) |
| Reference Frames | Multi-reference (Maximum allowed) | Single reference / minimal |
| In-Loop Filters | Full DF, CDEF, and Loop Restoration | Disabled or highly simplified |
| Memory Bandwidth | High (Massive reference cache/search areas) | Low (Localized cache access) |