How is Netflix trying to achieve 800 Gb / s data transfer?



In recent years, the number of users of video streaming services has increased, and it is

pointed out that they account for the majority of Internet traffic. Netflix, one of the world's largest video streaming services, has released a slide explaining how it is devising ways to send large amounts of data to users all over the world every day.

2022-Streaming-Summit-Netflix.pdf
(PDF file) http://nabstreamingsummit.com/wp-content/uploads/2022/05/2022-Streaming-Summit-Netflix.pdf

Netflix has come a long way toward building a system that can transmit 800Gb/s of video data on a single server.



In Netflix's workload, videos are sent as 'static media files', and video quality is maintained by pre-encoding all codecs/bitrates.



The main steps in improving data transmission explained in this slide are 'Asynchronous Sendfile (2014)', 'Kernel TLS (2016)', 'NUMA (2019)', 'Inline Hardware (NIC) Kernel TLS (2022)'.



First of all, about 'Asynchronous

Sendfile ' started in 2014. Sendfile is a Linux command that copies and transfers data, and Netflix is sending static media files, so Sendfile can be used. This eliminates the need to copy data into and out of the kernel .



By using Sendfile, data is sent directly from the disk to the network card, bypassing the host CPU.



In this case, if the

nginx worker process, which is the web server, is blocked, it will not be able to respond to other requests, and file transfer may be blocked. Netflix addresses this issue with an asynchronous system that adds an empty buffer to the TCP socket's buffer and tells TCP it's ready to send when the disk read is complete.



Subsequently, in 2016, '

kernel TLS (kTLS) ' was introduced. ,



TLS is a protocol for encrypting traffic, but if data sent from disk to memory is sent to the host CPU, encrypted, sent back to memory and sent to the network card, it is not possible to send large amounts of data quickly.



Therefore, Netflix says that by implementing TLS in the kernel, the need for copying in user space and kernel space is greatly reduced and performance is improved. Note that the TLS handshake itself is processed by the host CPU.



In 2019, ``

NUMA (Non-Uniform Memory Access / non-uniform memory architecture) '' was adopted.



The previous multi-CPU had equal access rights to memory, and each core could access all memory and IO devices equally and directly, but ...



In NUMA, each core has uneven access rights and is divided into local zones called NUMA domains or NUMA nodes.



If data is exchanged across NUMA, the fabric will be saturated and it will not be able to withstand the load ...



Netflix says that it allows everything to run on the NUMA node where the content is stored so that the data does not span NUMA.



In 2022, ``inline hardware (

NIC/network interface card ) kernel TLS'' was introduced.



In the conventional method using Sendfile and kTLS, the data encryption itself is processed by the host CPU, but the idea is to do this in the NIC.



Since 2016, Netflix has been in talks with Mellanox, an Israeli data center-related semiconductor company, and has continued efforts to commercialize high-performance NICs even after

its acquisition by NVIDIA in 2020. And introduce the commercialized NIC ConnectX-6 Dx ... ...



It is said that the 'NIC kTLS' method, which does not pass the data through the CPU for encryption, has been realized.



The configuration of the 800Gb / s prototype constructed in this way is 'Dell R7525', '2 AMD EPYC 7713 64c / 128t', '3 xGMI link sockets', '512GB RAM', '4 Mellanox ConnectX-6 Dx' (8 ports of 100 GbE)' '16 NVME (14 TB) compatible with Intel Gen 4 x 4'.



After trial and error using a prototype, Netflix was able to pull out performance up to 720 Gb / s.



The slide was also talked about on social news site Hacker News, saying, 'I love technical content like this. Not only is it incredibly interesting and informative, but it's also popular on forums like Hacker News.' It also serves as a counterpoint to the claim that why does Netflix need thousands of engineers? It's much more difficult,' commented .

Serving Netflix Video Traffic at 800Gb/s and Beyond [pdf] | Hacker News
https://news.ycombinator.com/item?id=32519881

in Software,   Web Service,   Hardware, Posted by log1h_ik