SVT-AV1 Thread Pinning on Dual Socket Servers

This article explains how the SVT-AV1 (libsvtav1) encoder manages thread pinning and core allocation on dual-socket systems to maximize encoding efficiency. It covers the challenges of Non-Uniform Memory Access (NUMA) in multi-socket environments and details the specific parameters and architectural strategies SVT-AV1 uses to pin threads, restrict core usage, and avoid inter-socket latency bottlenecks.

On dual-socket servers, memory access is split into distinct NUMA nodes. When threads running on CPU Socket A attempt to access memory physically attached to CPU Socket B, they experience high latency via interconnects like Intel UPI or AMD Infinity Fabric. To mitigate this, libsvtav1 uses a highly structured, multi-stage threading model. It divides tasks such as motion estimation, mode decision, and entropy coding into parallel pipelines. However, without strict core allocation, the operating system scheduler may migrate these threads across sockets, degrading performance.

SVT-AV1 provides built-in parameters to control resource allocation directly without relying solely on external OS utilities. The primary controls are:

--ss (Socket Selection): This parameter binds the encoder process to a specific CPU socket. For example, setting --ss 0 restricts the execution and memory allocation to the first socket (NUMA node 0), eliminating inter-socket communication overhead.
--lp (Logical Processors): This parameter defines the exact number of logical processors (threads) the encoder instance is allowed to spawn, preventing the encoder from over-saturating the host CPU.

When --ss is configured, SVT-AV1 internally utilizes OS-specific affinity APIs—such as sched_setaffinity on Linux or SetProcessAffinityMask on Windows—to pin the allocated threads to the logical cores belonging exclusively to the targeted socket.

While SVT-AV1 can scale across multiple sockets, doing so often introduces synchronization overhead that diminishes returns on highly parallel CPUs. The optimal deployment strategy for dual-socket servers is “multi-instance encoding.” Instead of running a single SVT-AV1 process spanning both sockets, administrators run parallel encoding instances. By using --ss 0 for the first instance and --ss 1 for the second, each encoder runs entirely isolated within its local NUMA node. This ensures 100% cache locality, eliminates cross-socket memory bus saturation, and results in a significantly higher aggregate throughput.