What is the performance of the NPU installed in Intel's 14th generation SoC 'Core Ultra' that emphasizes AI performance?



The 14th generation Core platform laptop processor '

Core Ultra ', officially announced by Intel in December 2023, is based on the Meteor Lake architecture announced in September 2023. The Core Ultra is equipped with a neural processing unit (NPU) specialized for AI processing, and the overseas media Chips and Cheese explains about this NPU.

Intel Meteor Lake's NPU – Chips and Cheese
https://chipsandcheese.com/2024/04/22/intel-meteor-lakes-npu/



The NPU installed in Core Ultra is called 'NPU 3720'. The NPU 3720 is equipped with two neural computing engine (NCE) tiles, and these units can perform 4096

multiplication and accumulation (MAC) operations at INT8 per cycle. In addition, the clock speed of the NPU 3720 is relatively low at 1.16 GHz, but the processing speed can reach up to 9.5 TOPS .



Essentially, an NPU, like a GPU, acts like a PCIe device and receives commands from the host to operate, and instead of building a custom command processor, Intel uses a 32-bit microcontroller to instruct the NPU to run a real-time operating system.

Additionally, the NPU usage can be monitored separately from the CPU and GPU in Task Manager.



Each NCE tile also has 2MB of software-managed SRAM, which allows data to be pulled directly from SRAM storage without tag comparison or virtual memory address translation, allowing data to be moved to SRAM without burdening the machine's compiler and software.



The MAC array in the NCE tile is further divided into up to 512 MAC Processing Engines (MPEs), each capable of four INT8 multiply-accumulate operations per cycle, with MAC for 16-bit floating-point numbers (FP16) performed at half the rate of INT8.

Below is a graph comparing the computing performance of the NPU, CPU, and built-in graphics installed in Core Ultra. The computing performance of the NPU, shown in orange, reaches a maximum of 1349.39 GFLOPS when the matrix size reaches 4096, but once it exceeds this, it again lags behind the performance of the built-in graphics.



In addition, NPUs are not good at processing that includes matrix multiplication such as graphics rendering, and the graph below shows that the processing performance of the NPU 3720 (orange) does not reach the performance of the GPU 'GTX 1080' released in 2016 (green), let alone the '

RX 6900 XT ' (red).



On the other hand, the NPU can access storage relatively quickly and has lower latency compared to integrated graphics.



Still, when comparing image generation speeds using Stable Diffusion, AMD's RX 6900 XT has the best performance, followed by the integrated graphics on the Core Ultra and the NPU. Chips and Cheese confides in us that 'running Stable Diffusion on the NPU is frustrating.'



Chips and Cheese explains that the NPU on Core Ultra is 'aiming to improve performance and reduce power consumption in machine learning workloads, focusing on INT8 and FP16.' On the other hand, 'Accelerators using NPUs inherently lack flexibility in general-purpose computing and may not be able to run specific machine learning models. Therefore, designing a custom accelerator requires a software ecosystem built to run specific machine learning models.'

He continued, 'For some machine learning models, the use of an NPU can reduce power consumption, but it does not necessarily improve performance. Indeed, the power of Core Ultra integrated graphics can reach up to 20W, while the power of the NPU rarely exceeds 7W. However, in exchange for high power consumption, integrated graphics can provide users with performance and flexibility that exceeds that of an NPU, and unless you are trying to run machine learning workloads, you will experience an order of magnitude higher performance with integrated graphics. 'It is true that NPUs may be useful in certain situations, but I think it is wrong to label it as an 'AI PC'.' He criticized.

Finally, Chips and Cheese stated, 'Over the last 15 years, GPU performance has improved dramatically. Today, GPU-based computing has reached a point where it is reasonably usable. I hope that a similar evolution will occur for NPUs.'

in Software,   Hardware, Posted by log1r_ut