No GPU required / A library 'GGML' that runs chat AI on a real home PC with 16 GB of memory is under intense development, and a demo that runs voice recognition AI on Raspberry Pi has already appeared
The chat AI used in ChatGPT,
ggml.ai
http://ggml.ai/
ggerganov/ggml: Tensor library for machine learning
https://github.com/ggerganov/ggml
The features of GGML are as follows.
・Write in C
・Support 16bit float
・Supports quantization with 4bit, 5bit, and 8bit integers
・Automatic differentiation
・Equipped with optimization algorithms 'ADAM' and 'L-BFGS'
・Support and optimization for Apple silicon
Use AVX and AVX2 on x86 architecture
・Web support with WebAssembly, WASM, and SIMD
・No third-party dependencies
- Does not use memory during operation
・Supports guided language output
The GGML code is publicly available on GitHub , but it is noted in bold that 'Please note that this project is under development.'
Although GGML is a work in progress project, some demos have been published. For example, the movie below shows how commands are input by voice using GGML and whisper.cpp . If it's just this, it's a normal sight, but it's amazing that this is running on an ultra-lightweight PC called Raspberry Pi.
In addition, there is a demo that runs four models that combine LLaMA with 13 billion parameters (13B) and Whisper on the Apple M1 Pro at the same time, demonstrating its light weight.
If you run the LLaMA model with 7 billion parameters (7B) on the Apple M2 Max , you can process 40 tokens per second. It's pretty fast.
Other test results are as follows.
model | machine | result |
---|---|---|
Whisper Small Encoder | M1 Pro: 7 CPU threads | 600ms/run |
Whisper Small Encoder | M1 Pro: ANE via Core ML | 200ms/run |
7B LLaMA (4bit quantization) | M1 Pro, 8 CPU threads | 43ms/token |
13B LLaMA (4bit quantization) | M1 Pro, 8 CPU threads | 73ms/token |
7B LLaMA (4bit quantization) | M2 Max GPU | 25ms/token |
13B LLaMA (4bit quantization) | M2 Max GPU | 42ms/token |
GGML is provided under the MIT license and is free for anyone to use. In addition, the development team is widely recruiting development collaborators, stating, 'Writing code and improving the library will be the greatest support.'
In addition, the editorial department also tried to see if it could actually work, but when I proceeded as described in the document, an error occurred during the build and I could not proceed.
Related Posts:
in Software, Posted by log1d_ts