Motion Search Algorithms in SVT-AV1

This article provides a technical overview of the motion search and estimation algorithms actively implemented in the SVT-AV1 (Scalable Video Technology for AV1) encoder. It explores how the encoder utilizes hierarchical motion estimation, specific full-pixel search patterns, and sub-pixel refinement techniques to optimize the trade-off between compression efficiency and encoding speed.

Hierarchical Motion Estimation (HME)

SVT-AV1 relies heavily on Hierarchical Motion Estimation (HME) to accelerate the search process. HME performs motion estimation across multiple resolution layers (typically three levels: 1/16th, 1/4th, and full resolution) rather than searching only the original frame.

The encoder first searches for motion vectors in the highly downscaled representation of the frames, which drastically reduces the search area. The motion vectors found at these lower-resolution stages are then scaled up and used as initial predictors (starting points) for the subsequent, higher-resolution search stages.

Full-Pixel Motion Search Algorithms

At different stages of HME and the final full-resolution Motion Estimation (ME) process, SVT-AV1 implements several classic and optimized full-pixel search algorithms:

Because physical motion in video does not always align perfectly with integer pixel boundaries, SVT-AV1 implements fractional-pixel (sub-pel) motion search to achieve high precision. This is executed after the full-pixel search has identified the best integer-pixel motion vector.

For B-frames (bi-directionally predicted frames), SVT-AV1 implements bipredictive motion search. This algorithm searches for two motion vectors pointing to two different reference frames (typically one in the past and one in the future) simultaneously. SVT-AV1 uses iterative refinement techniques to find the optimal combination of these two motion vectors, which maximizes temporal redundancy reduction.