'llm.c', a large-scale language model training tool using pure C without PyTorch or Python, is released



Training of large-scale language models (LLMs), which can be said to be the main body of AI, is mostly done using PyTorch or Python, but a tool called ' llm.c ' has been released that implements such training using only C. Although it has not yet been optimized and is faster than conventional methods, it can implement training for GPT-2 with about 1,000 lines of clean code.

GitHub - karpathy/llm.c: LLM training in simple, raw C/CUDA
https://github.com/karpathy/llm.c


The author, Andrei Karpathy, was a member of the founding group of OpenAI and was previously the AI director at Tesla.

By using llm.c, it is possible to train large-scale language models without using PyTorch with a capacity of 245MB or cPython with a capacity of 107MB. When Karpathy actually implemented the code to train 'GPT-2', which can be said to be the ancestor of current large-scale language models, on a CPU, he was able to implement it with a small amount of code of about 1,000 lines while reducing dependencies.




The actual code is available on GitHub . The required amount of memory is acquired at the beginning, and memory usage does not fluctuate during training. This code does not use Python libraries, so the forward and backward passes of all individual layers are implemented manually.




Connecting layers required writing code while making sure all the pointers and tensor offsets were correctly placed, which was a very tedious and masochistic task.




At the time of writing, only the training code for the CPU was available, but Karpathy said he was also working on the code for training using CUDA. Karpathy said he expected that by porting to CUDA and making it more efficient, training would be possible at the same speed as PyTorch without heavy dependencies.




In the future, they plan to lower the precision from fp32 to fp16 and support modern architectures such as llama 2 , mistral , and gemma . Karpathy also said that once the system is in a more stable state, he plans to release a movie that builds these codes in detail from scratch.

in Software, Posted by log1d_ts