Arm announces next-generation AI platform 'CSS for Client' and Armv9.2 architecture CPU core 'Cortex-X925' with 46% improvement in AI performance from the previous generation



Arm has announced the Arm Compute Subsystems (CSS) for Client as a computing platform for smartphones and tablets, revealing that it will add the CPU cores Cortex-X925 and Cortex-A725 designed with the Armv9 architecture.

Arm CSS for Client: The Compute Platform for AI-powered Consumer Experiences - Arm Newsroom

https://newsroom.arm.com/blog/arm-css-for-client-platform

New Armv9 CPUs for Accelerating AI on Mobile and Beyond - Arm Newsroom
https://newsroom.arm.com/blog/armv9-cpus-consumer-devices

Arm Unveils 2024 CPU Core Designs, Cortex X925, A725 and A520: Arm v9.2 Redefined for 3nm
https://www.anandtech.com/show/21399/arm-unveils-2024-cpu-core-designs-cortex-x925-a725-and-a520-arm-v9-2-redefined-for-3nm-

CSS for Client is a comprehensive platform that integrates hardware such as second-generation Armv9.2 architecture CPU cores and fifth-generation GPU cores, a comprehensive reference software stack for Android, the AI framework KleidiAI, the image processing library KleidiCV, and a robust tool environment through Arm Performance Studio to optimize the performance and efficiency of client devices.



The CSS for Client configuration lineup is as follows:



The Cortex-X925, designed with the Armv9.2 architecture, is a CPU prime core that was codenamed 'Blackhawk' during development. Compared to the top-class smartphones in 2023, it will have a 36% increase in single-threaded performance and a 46% increase in AI performance compared to the previous generation Arm Cortex-X4.



The Armv9.2 architecture is said to have been improved to maximize

IPC (instructions per cycle). This enhancement allows cores to execute more instructions simultaneously, improving utilization of execution units and increasing overall throughput.

To support this wider instruction path, Arm has doubled the size of the instruction window, which reduces program and system stalls and improves the efficiency of the execution pipeline. In addition, Arm has doubled the bandwidth of the L1 instruction cache and similarly increased the size of the L1 instruction TLB . These enhancements allow cores to fetch and decode instructions quickly, minimizing latency and maximizing performance.



The Cortex-A725 was announced as a high-performance CPU core.



Arm claims that the Cortex-A725 is 35% more efficient in performance and 25% more power efficient than the Cortex-A720. In addition, the L2 cache has been increased to 1MB, which reduces latency and improves performance, especially in applications that require fast data retrieval.



In addition, the highly efficient core Cortex-A520 has been updated to adopt a 3nm node for CSS for Client, improving power efficiency by 15% compared to the Cortex-A520 on the previous generation platform, TCS23.

The GPU is the Immortalis-G925, an evolution of the 5th generation GPU architecture from the previous generation Immortalis-G720, with 14 cores and 4MB of L2 cache, delivering 37% improved graphics performance.

The DynamIQ Shared Unit (DSU), which is responsible for power management of CSS for Client, is the same DSU-120 as the TCS23, but it has new performance, efficiency and low power modes, includes enhancements for consumer devices, and maintains the option to scale up to 14 cores. These improvements are said to reduce power consumption by 50% for typical workloads and reduce cache miss power by 60% across the CPU cluster, improving battery life for consumer devices.



According to Arm, CSS for Client is intended for AI, and compared to the previous generation Cortex-X4, the time to run the large-scale language model (LLaMA 3) and output the first token is 42% faster, and in the case of Phi-3, it is 46% faster.



Compared to the previous generation platform TCS23, CSS for Client is said to be 59% faster in AI inference on the CPU and 36% faster in AI inference on the GPU. In addition, Arm claims that by using two Cortex-X925s, AI inference on the CPU is 2.7 times faster than the TCS23 configuration with one Cortex-X4.



Arm cites camera image processing to blur the background of photos and add a realistic bokeh effect as a use case for CSS for Client, which is specialized for AI. Compared to TCS23, CSS for Client has 24% improved performance in AI processing to add bokeh to photos, Arm says, allowing you to enjoy faster and smoother bokeh effects in photos and videos without sacrificing battery life.



According to Arm, the physical implementation of CSS for Client can achieve clock speeds of over 3.6GHz and provide optimal power, performance and area metrics at the 3nm node. Regarding the 3nm node, Arm said, 'TSMC and Samsung's 3nm processes are the primary choices for manufacturing the core cluster for CSS for Client.'

in Hardware, Posted by log1i_yk