It is reported that the performance of OpenAI's GPT-4 may be exceeded by adjusting Meta's coding support AI 'Code Llama'



``

Code Llama '' released in August 2023 by Meta, which operates Facebook, Instagram, etc., is a coding support AI that generates program code based on text input. As a result of testing ' CodeLlama-34B ' and ' CodeLlama-34B-Python ', which are fine-tuned CodeLlama, with OpenAI's test dataset ' HumanEval ', the results exceeded the results of OpenAI's large-scale language model 'GPT-4'. has been reported.

Phind: AI Search Engine and Pair Programmer
https://www.phind.com/blog/code-llama-beats-gpt4



Code Llama is Meta's July 2023 release of

Llama 2, further trained on code-specific datasets. In addition to the function to generate the continuation of the code, it is possible to generate code based on natural language input and generate explanations about the code.

Meta releases commercially available coding support AI 'Code Llama', free release with the same license as Llama 2 - GIGAZINE



Phind , an AI development company, fine-tuned Code Llama and tested it.

Specifically, CodeLlama-34B and CodeLlama-34B-Python were each fine-tuned by presenting about 80,000 programming problems and solutions. ' DeepSpeed ZeRO-3 ' and ' Flash Attention 2 ' were adopted for fine tuning, and 32 NVIDIA A100 Tensor Core GPUs were used. According to Phind, the training took about three hours.

Additionally, Phind processed CodeLlama-34B and CodeLlama-34B-Python to decontaminate the evaluation data to ensure the validity of the results. According to Phind, for each evaluation example, three character strings were randomly sampled for evaluation if they exceeded 50 characters, and the entire generated example was evaluated for up to 50 characters.



Testing the adjusted CodeLlama-34B and CodeLlama-34B-Python using the test dataset HumanEval, GPT-4 solved 67% of the problems, CodeLlama-34B solved 67.6%, CodeLlama-34B-Python reportedly solved 69.5% of the problems.

Phind publishes the adjusted CodeLlama-34B and CodeLlama-34B-Python used for testing on HuggingFace for the purpose of further verification by users and strengthening the open source community. He also reports , 'The model released this time is just the beginning, and an improved version with further tests will be released soon.'

in Software, Posted by log1r_ut