Apple is working on miniaturizing the Gemini processor and fitting it into the iPhone to power the new Siri.



Apple is expected to soon announce a '

completely redesigned Siri .' For this new Siri, Apple has signed a multi-year agreement to use Google's generative AI, Gemini , and The Information reports that Apple is trying to miniaturize Gemini so that it can run on iPhones.

Apple to Renew Push for AI That Runs on Devices, Instead of the Cloud — The Information
https://www.theinformation.com/articles/apple-renew-push-ai-runs-devices-instead-cloud

Apple working to cram massive Gemini model into iPhone to power new Siri - Ars Technica
https://arstechnica.com/ai/2026/05/apple-reportedly-trying-to-distill-googles-multi-trillion-parameter-gemini-ai-to-run-on-iphone/

Apple has long touted that running AI on devices leads to greater privacy. Apple is also aiming to miniaturize Gemini, but it appears that processing will need to be done on cloud servers because it is difficult to process everything solely on the device. The Gemini-powered version of Siri will run both on the device and in the cloud, which contradicts Apple's long-held commitment to 'local AI for privacy protection.'

With each announcement of a new chip, it's heavily promoted that the chip is optimized for AI. Apple, too, is highlighting the upgrade of its 'Neural Engine,' a dedicated processor (NPU) specifically designed for AI and machine learning processing, in its Apple Silicon. As a result, it's easy to assume that high-performance AI models can be processed on smartphones. However, the GPU in most smartphones can process more AI tokens than an AI-dedicated NPU. In contrast, NPUs like the Neural Engine are designed for contextually efficient AI processing, and while they can speed up AI processing on the device, they don't have enough memory (RAM) to hold massive AI models.

AI models that run on smartphones are very small, with at most a few billion parameters. In contrast, Google's latest Gemini has trillions of parameters, making it far too large to run directly on an iPhone. Also, AI models on devices are quantized to operate with low precision, which speeds up processing but affects the accuracy of token generation. Because of these factors combined, AI running directly on smartphones can sometimes produce poor output compared to cloud-based AI.



Google offers Gemini Nano, optimized for mobile devices. However, Gemini Nano is designed to provide contextual features such as Magic Cue and voice summarization. In contrast, Siri is a conversational assistant that performs various processes when spoken to. This is a completely different experience and requires a different AI model, as technology media outlet Ars Technica pointed out. Furthermore, Google's Android does not process AI models locally; conversations with Gemini are always sent to the cloud.

According to reports, Apple has been optimizing Gemini since signing its agreement with Google. Ars Technica explains that this optimization is 'a process in which a small, resource-intensive AI model learns to mimic a large, expensive AI model.' With enough time, it's possible to remove less important elements from the AI model while ensuring that critical functions are transferred. This may allow Siri to handle some tasks on local computing, but 'the introduction of cloud components will likely be unavoidable,' The Information reports.

Apple operates 'Private Cloud Compute,' an AI processing server focused on privacy protection, for its personal AI, Apple Intelligence. However, The Information reports that running Google's massive Gemini on this Private Cloud Compute is proving extremely difficult.

Apple explains its security measures for 'Private Cloud Compute,' the AI processing server used for its assistant AI 'Apple Intelligence' - GIGAZINE



It had been reported previously that the new Siri processing would be performed on servers other than Apple's Private Cloud Compute.

The new Siri chatbot may run on Google's servers, not Apple's - GIGAZINE



According to reports, Gemini will not run on Google's servers, and Apple has reportedly entered into an agreement to use NVIDIA's Confidential Computing to run Gemini. NVIDIA's Confidential Computing keeps data on the GPUs encrypted while it is processed in the cloud, allowing Apple to argue that it still prioritizes user privacy.

Apple is scheduled to hold its annual developer conference, WWDC26, starting on June 8, 2026, where it is expected to announce various new features, including a new Siri, and a new operating system.

Apple's annual developer conference, WWDC26, is scheduled to take place the week of June 8, 2026, with the keynote address to be held on June 8 - GIGAZINE



in AI,   Software,   Smartphone, Posted by logu_ii