A low-latency data transfer system will be developed that enables external expansion of graphics card VRAM



When using a graphics board for computational processing such as image generation AI and large-scale language models, in addition to the 'GPU processing performance', the 'VRAM capacity' is also important. Although it is basically impossible to add VRAM to a graphics board, hardware manufacturer

Panmnesia has developed a system called 'CXL-GPU' that enables external memory expansion to a graphics board.

CXL-GPU
(PDF file) https://panmnesia.com/uploads/panmnesia-CXL-GPU.pdf

GPUs can now use PCIe-attached memory or SSDs to boost VRAM capacity — Panmnesia's CXL IP claims double-digit nanosecond latency | Tom's Hardware
https://www.tomshardware.com/pc-components/gpus/gpus-get-a-boost-from-pcie-attached-memory-that-boosts-capacity-and-delivers-double-digit-nanosecond-latency-ssds-can-also-be-used-to-expand-gpu-memory-capacity-via-panmnesias-cxl-ip

When running AI-related processing on a graphics board, if the VRAM capacity is low, problems such as 'model data cannot be loaded and processing cannot begin' and 'memory swapping occurs, slowing down processing speed' will occur. For this reason, if you want to run AI-related processing, you need to get a graphics board with a large VRAM capacity. Also, even if you get a graphics board with a large VRAM capacity, the size of model data is on the rise, so you may be plagued by a VRAM shortage in a few years.

DRAM and SSDs can be easily added later, but it is basically impossible to add VRAM to a graphics board. Meanwhile, Panmnesia has developed the 'CXL-GPU', a system that allows external VRAM to be added to a graphics board.



With CXL-GPU, you can increase the VRAM of your graphics card by recognizing DRAM or SSD as VRAM. In the example below, DRAM is installed in a PCIe adapter and the DRAM is recognized as VRAM using CXL-GPU.



There is already a technology called 'Unified virtual memory (UVM)' that treats DRAM as VRAM, but UVM has the problem that it has a large latency and is prone to reducing processing performance. The latency of the CXL-GPU is only double-digit nanoseconds, which reduces the degradation of processing performance.



Panmnesia testing has confirmed that systems using CXL-GPUs can achieve 3.23x faster kernel execution speeds than systems using UVM.



At the time of writing, it is unclear whether or not CXL-GPU will be commercially available.

in Hardware, Posted by log1o_hf