SVT-AV1 Thread Pinning on Dual Socket Servers
This article explains how the SVT-AV1 (libsvtav1)
encoder manages thread pinning and core allocation on dual-socket
systems to maximize encoding efficiency. It covers the challenges of
Non-Uniform Memory Access (NUMA) in multi-socket environments and
details the specific parameters and architectural strategies SVT-AV1
uses to pin threads, restrict core usage, and avoid inter-socket latency
bottlenecks.
On dual-socket servers, memory access is split into distinct NUMA
nodes. When threads running on CPU Socket A attempt to access memory
physically attached to CPU Socket B, they experience high latency via
interconnects like Intel UPI or AMD Infinity Fabric. To mitigate this,
libsvtav1 uses a highly structured, multi-stage threading
model. It divides tasks such as motion estimation, mode decision, and
entropy coding into parallel pipelines. However, without strict core
allocation, the operating system scheduler may migrate these threads
across sockets, degrading performance.
SVT-AV1 provides built-in parameters to control resource allocation directly without relying solely on external OS utilities. The primary controls are:
--ss(Socket Selection): This parameter binds the encoder process to a specific CPU socket. For example, setting--ss 0restricts the execution and memory allocation to the first socket (NUMA node 0), eliminating inter-socket communication overhead.--lp(Logical Processors): This parameter defines the exact number of logical processors (threads) the encoder instance is allowed to spawn, preventing the encoder from over-saturating the host CPU.
When --ss is configured, SVT-AV1 internally utilizes
OS-specific affinity APIs—such as sched_setaffinity on
Linux or SetProcessAffinityMask on Windows—to pin the
allocated threads to the logical cores belonging exclusively to the
targeted socket.
While SVT-AV1 can scale across multiple sockets, doing so often
introduces synchronization overhead that diminishes returns on highly
parallel CPUs. The optimal deployment strategy for dual-socket servers
is “multi-instance encoding.” Instead of running a single SVT-AV1
process spanning both sockets, administrators run parallel encoding
instances. By using --ss 0 for the first instance and
--ss 1 for the second, each encoder runs entirely isolated
within its local NUMA node. This ensures 100% cache locality, eliminates
cross-socket memory bus saturation, and results in a significantly
higher aggregate throughput.