What is memory bandwidth in ultra-high performance servers such as those used in data centers?



If your memory

bandwidth is slowing down the applications you run, choosing your chips wisely can help you build a better PC. Overseas media The Next Platform speculates on how expensive and high-performance memory bandwidth affects application performance.

Building The Perfect Memory Bandwidth Beast
https://www.nextplatform.com/2023/01/24/building-the-perfect-memory-bandwidth-beast/

The next-generation processor ` ` POWER10 '' for the cloud announced by IBM in 2020 achieves large memory bandwidth. In 2019, IBM announced a server built with a machine equipped with POWER10 and a `` Open Memory Interface (OMI) '' that supports multi-protocols as a high-speed interface. Intel reveals that IBM's POWER10 processors are capable of supporting a variety of technologies.

POWER10 has a bandwidth of about 320GB/s per core, and memory capacities ranging from 256GB to 4TB. Optimized processors also reduce the number of memory modules by a factor of 4 and deliver DDR4 capacities from 128GB to 512GB per core, and changing to DDR5 memory can boost bandwidth to 800GB/s. Is possible. Also, the POWER10 processor called 'Cirrus' is said to have a maximum memory bandwidth of 256 GB / s per core and a sustained memory bandwidth of 120 GB / s per core.



The image below is a graph showing the performance improvement on general purpose sockets for POWER9 and

POWER10 processors. 'POWER10 Memory Streaming' is a dual-chip module, so unlike other single-chip sockets, it can be made even faster by adjusting the clock speed.



The

IBM Power E1050 , a rack server released by IBM, is equipped with up to four POWER10 dual-chip modules and a total of 96 cores. It supports DIMMs and can achieve a maximum bandwidth of 1.6 TB / s.

Also, it seems that not only can the bandwidth be doubled by reducing the number of cores, but the memory bandwidth can be further expanded by switching to DDR5 memory or Compute Express Link (CXL) memory.

Buying the expensive IBM Power E1050 isn't cheap, but it's a better choice than waiting for high-performance integrated CPU/GPU data center chips like AMD's Instinct MI300 or NVIDIA's Grace Hopper . It is said that Although these chips have high memory bandwidth per core, they have limited memory capacity, and they can only program smaller than IBM Power E1050 equipped with POWER 10 and ' Sapphire Rapids ' announced by Intel.

In addition, it has been pointed out that AMD and NVIDIA's high-performance chips tend to generate heat, and as a result, the speed of DRAM and HBM has to be lowered, so the expected memory bandwidth may not be reached.

The Next Platform cites Intel's Sapphire Rapids as the most suitable CPU processor for building perfect memory bandwidth.



Sapphire Rapids is a processor that can simultaneously support high bandwidth HBM2e memory and DDR5 memory. Some Sapphire Rapids products support multiple HBM2e memories, while others support 8

NUMA .

The regular model Sapphire Rapids Xeon SP has 8 DDR5 memory channels, with a maximum capacity of 2TB when using 1 DIMM per channel at an operating frequency of 4.8GHz. Also, when using two DIMMs per channel, the maximum capacity increases to 4TB, but the operating frequency is said to be 4.4GHz.

The 60-core Sapphire Rapids Xeon SP-8490H operates at an operating frequency of 1.9GHz, so the bandwidth per core is narrow at 5.1GB/s. On the other hand, the 16-core Sapphire Rapids Xeon SP-8444H operates at a high frequency of 2.9 GHz, so the bandwidth per core is 19.2 GB / s.

If you want to further increase the memory bandwidth per core, changing to Sapphire Rapids Xeon SP-6434 will increase the operating frequency to 3.7GHz and expand the bandwidth per core to 38.4GB/s.

Sapphire Rapids' Max series CPU has 56 cores, and four HBM2e stacks have a memory capacity of 64 GB and a bandwidth of 1.23 TB / s, realizing a memory bandwidth of 22 GB / s per core. Another model is said to run on 32 cores with a bandwidth of 1.23TB/s resulting in 38GB/s of memory bandwidth per core.

In addition, Sapphire Rapids Max series CPUs can achieve a high memory bandwidth of 13.912TB/s in total and 217.4GB/s per core by adding DDR5 memory and CXL memory. It is also said that higher performance can be achieved by interconnecting NUMA .

Sapphire Rapids is not only suitable for building servers that require high memory bandwidth, but also suitable for high-performance computing and AI machine learning acceleration, but requires huge costs. Therefore, the approach using Sapphire Rapids is considered unsuitable for AI learning.



Also in chips like AMD's Instinct MI300 and NVIDIA's Grace Hopper, the balance between GPU cores and HBM memory bandwidth is important for proper usage.

``Balancing computation, memory bandwidth, and memory capacity may be more important than chopping the memory of an ultra-high-performance CPU processor into pieces and distributing it to many CPUs,'' said The Next Platform. says.

in Hardware, Posted by log1r_ut