Motion Search Algorithms in SVT-AV1

This article provides a technical overview of the motion search and estimation algorithms actively implemented in the SVT-AV1 (Scalable Video Technology for AV1) encoder. It explores how the encoder utilizes hierarchical motion estimation, specific full-pixel search patterns, and sub-pixel refinement techniques to optimize the trade-off between compression efficiency and encoding speed.

Hierarchical Motion Estimation (HME)

SVT-AV1 relies heavily on Hierarchical Motion Estimation (HME) to accelerate the search process. HME performs motion estimation across multiple resolution layers (typically three levels: 1/16th, 1/4th, and full resolution) rather than searching only the original frame.

The encoder first searches for motion vectors in the highly downscaled representation of the frames, which drastically reduces the search area. The motion vectors found at these lower-resolution stages are then scaled up and used as initial predictors (starting points) for the subsequent, higher-resolution search stages.

Full-Pixel Motion Search Algorithms

At different stages of HME and the final full-resolution Motion Estimation (ME) process, SVT-AV1 implements several classic and optimized full-pixel search algorithms:

Full Search (Exhaustive Search): This method evaluates every single candidate block within a predefined search window. While it guarantees finding the absolute mathematically optimal motion vector, it is computationally expensive. SVT-AV1 restricts its use to high-quality/slow presets where compression efficiency is prioritized over encoding speed.
Diamond Search (DS): A fast search algorithm that uses diamond-shaped search patterns (large diamond search and small diamond search). It is highly efficient for tracking slow to moderate motion and is widely used in the faster encoding presets of SVT-AV1 to drastically reduce CPU cycles.
Hexagon Search (HEX): This algorithm utilizes a hexagonal search pattern to locate motion vectors. It is slightly more robust than Diamond Search for complex or fast motion while remaining significantly faster than Full Search.

Sub-Pixel (Fractional) Motion Search

Because physical motion in video does not always align perfectly with integer pixel boundaries, SVT-AV1 implements fractional-pixel (sub-pel) motion search to achieve high precision. This is executed after the full-pixel search has identified the best integer-pixel motion vector.

Half-Pel (1/2-pixel) Refinement: The encoder interpolates pixels surrounding the best integer-pixel match to a 1/2-pixel grid and searches the immediate vicinity for a better match.
Quarter-Pel (1/4-pixel) Refinement: SVT-AV1 refines the search further to a 1/4-pixel resolution. Specialized AV1 interpolation filters (such as 8-tap regular, sharp, or smooth filters) are utilized to calculate the sub-pixel values accurately.

Bipredictive Motion Search

For B-frames (bi-directionally predicted frames), SVT-AV1 implements bipredictive motion search. This algorithm searches for two motion vectors pointing to two different reference frames (typically one in the past and one in the future) simultaneously. SVT-AV1 uses iterative refinement techniques to find the optimal combination of these two motion vectors, which maximizes temporal redundancy reduction.