Apple aims to run large-scale language models locally on iPhone



Apple researchers have published a paper titled ' LLM in a flash: Efficient Large Language Model Inference with Limited Memory ' on the preprint server arXiv. The paper presents 'a solution that paves the way for efficient Large-Scale Language Model (LLM) inference on devices with limited memory,' i.e., a technique for running LLM on devices such as the iPhone. It is believed that Apple is aiming to run LLM on the iPhone in the future.

[2312.11514] LLM in a flash: Efficient Large Language Model Inference with Limited Memory

https://arxiv.org/abs/2312.11514



Paper page - LLM in a flash: Efficient Large Language Model Inference with Limited Memory
https://huggingface.co/papers/2312.11514



Apple Develops Breakthrough Method for Running LLMs on iPhones - MacRumors
https://www.macrumors.com/2023/12/21/apple-ai-researchers-run-llms-iphones/



In their paper, the researchers note that mobile devices such as smartphones have more abundant flash memory storage than the RAM traditionally used to run LLM. We aim to maximize the throughput of flash memory by using two technologies: ``Windowing'' and ``Row-Column Bundling.''

“Window processing” means that the AI model reuses some of the data it has already processed, rather than loading new data each time. This reduces the need to periodically fetch memory, making the process faster and smoother. Additionally, 'matrix bundling' is a technology that increases the size of data chunks to match sequential data access in flash memory.



According to Apple's research team, by combining 'windowing' and 'matrix bundling', it will be possible to run AI models on up to twice the size of available DRAM. The inference speed is said to be 4 to 5 times faster than that of a CPU on a standard processor, and 20 to 25 times faster than that of a GPU.

Google is already at the practical stage of trying to make LLM, which normally operates in data centers, operate locally on mobile devices such as smartphones. Google has announced that it will introduce ``Gemini Nano'', the smallest model of Gemini , a multimodal AI, to Pixel 8 Pro and operate it as a local AI that runs on the device rather than in the cloud.

Local-first LLM ``Gemini Nano'' that runs on a smartphone instead of on the cloud can now operate on Pixel 8 Pro, pre-launching Gboard's Smart Reply and recorder's automatic summary enhancement - GIGAZINE



On the other hand, Apple has installed the virtual assistant 'Siri' on Apple devices including the iPhone since 2011. However, Siri is not a chatbot that generates human-like conversations like ChatGPT, Bing Chat, and Gemini these days, but is only an assistant tool that allows operations using voice input.

It was said that ``Apple is lagging behind Google and Microsoft in AI technology,'' but in 2023, Apple has already built its own LLM called ``Ajax,'' and ``Apple GPT.'' It has been reported that the company is developing its own chatbot AI that will be called internally. According to Apple-related news site MacRumors, this Ajax is designed to be comparable to OpenAI's GPT-3 or GPT-4 and operates with 200 billion parameters.

Is Apple developing its own large-scale language model and chatbot AI 'Apple GPT'? - GIGAZINE



In November 2023, CEO Tim Cook said, ``We are proceeding with research regarding generative AI, and the time will come when we will unveil a product that is centered around generative AI.'' 'It will come,' he said, confirming that Apple is starting to develop generative AI.

Apple CEO Tim Cook once again says, ``We are working responsibly to develop generative AI'' - GIGAZINE



Eventually, Apple's generative AI efforts could be incorporated into Siri. In October 2023, it was already reported that the software engineering group will incorporate AI functionality into iOS 18, and that text generation using LLM will be applied to Siri and messaging apps. In addition, integration of generation AI into development tools such as Xcode is being considered, and there are plans to introduce coding support AI that automatically completes when writing code, such as Microsoft's GitHub Copilot.

In addition, Jeff Pu, an analyst at Haitong International Securities, predicts that Apple will include generation AI functions available on iPhone and iPad in iOS 18, which will be released around the second half of 2024. According to Mr. Pooh, Apple has already prepared several hundred AI servers in October 2023, and plans to build more in 2024.



The paper published this time can be said to be based on the assumption that LLM will run on the iPhone. 'This breakthrough is particularly important for bringing advanced LLM to resource-limited environments and expanding its applicability and accessibility,' the research team said. 'The design integration paves the way for effectively inferring LLM on memory-limited devices.'

in Mobile,   Software, Posted by log1i_yk