Alibaba Cloud researchers unveil technology to interconnect 15,000 GPUs in AI data centers, abandoning NVIDIA technology in favor of Ethernet



Ennan Zhai, an engineer and researcher at Alibaba Cloud, has published a research paper on how to interconnect GPUs in data centers for LLM training with Ethernet. The technology is notably novel in that it uses Ethernet rather than

NVLink , NVIDIA's GPU interconnect protocol.

Alibaba HPN: A Data Center Network for Large Language Model Training
(PDF file) https://ennanzhai.github.io/pub/sigcomm24-hpn.pdf

Alibaba Cloud ditches Nvidia's interconnect in favor of Ethernet — the tech giant uses its own High Performance Network to connect 15,000 GPUs inside data center | Tom's Hardware
https://www.tomshardware.com/tech-industry/alibaba-cloud-ditches-nvidias-interconnect-in-favor-of-ethernet-tech-giant-uses-own-high-performance-network-to-connect-15000-gpus-inside-data-center

Alibaba Cloud reveals datacenter design and homebrew network • The Register
https://www.theregister.com/2024/06/27/alibaba_network_datacenter_designs_revealed/

According to a paper published by Zhai and his research team, typical cloud computing generates continuous data flows of less than 10Gbps.

Meanwhile, AI workloads regularly generate data bursts of up to 400Gbps, which can cause hash polarization when used with Equal-Cost Multi-Path (ECMP), a common data center load balancing method, resulting in a significant reduction in available bandwidth due to a lack of load balancing.



To get around this problem, Zhai and his team built their own high-performance network (HPN) in Alibaba's data centers. The HPN uses a two-layer dual-plane architecture to suppress ECMP, avoid hash polarization, and accurately select network paths that can hold elephant flows (huge data traffic).

With this HPN, Alibaba's AI data center can now have 1,875 hosts with eight GPUs and nine network interface cards (NICs) communicating at 400 Gbps each, for a total bandwidth of 3.2 Tbps. With 1,875 hosts with eight GPUs each, this data center has 15,000 GPUs interconnected.

There are two main points that are attracting attention in Alibaba's research presentation. The first is that it uses Ethernet instead of NVIDIA's NVlink for interconnection between hosts.

The research team explained that they chose Ethernet because they wanted to avoid

vendor lock-in and leverage the power of the Ethernet Alliance as a whole to achieve faster evolution, but this supports the claims of vendors who are moving away from NVIDIA. The Register, an IT news site, points out.

The choice also benefits competitors such as AMD, which are trying to catch up with Nvidia in the data center space, and could potentially avoid using Nvidia's technology or reduce the costs of building data centers.


by EdTech Stanford University School of Medicine

The second point is that we chose a 51.2Tbps single-chip switch instead of a multi-chip switch. This is because we dislike the instability and failure rate of multi-chip switches, but single-chip switches have another drawback: their operating temperature becomes too high and they shut down when it exceeds 105 degrees.

Alibaba has developed a cooling system that uses its own 'vapor chamber (VC) heat sink' to keep the chip from exceeding 105 degrees. This allows the wick structure in the vapor chamber to be optimized and more wick structure columns to be placed in the center of the chip, which allows heat to be dissipated more efficiently, the paper explains.

Alibaba's HPN has already been in operation for eight months at the time of the paper's publication. Zhai and his team plan to present the technology at the Special Interest Group on Data Communications (SIGCOMM) conference in Sydney in August 2024.

in Hardware, Posted by log1l_ks