Microsoft publishes ``DeepSpeed-Chat'' that can learn large-scale language models used for ChatGPT etc. at 15 times faster and lower cost than conventional systems



It is reported that chat AI such as “

ChatGPT ” provided by OpenAI can perform summarization, coding, translation, etc. with accuracy higher than that of human experts. However, the lack of an end-to-end pipeline that performs the human feedback-based reinforcement learning ( RLHF ) needed to train chat AI has made it difficult to train state-of-the-art chat AI. However, with ' DeepSpeed-Chat ' announced by Microsoft, anyone can create a model like ChatGPT.

DeepSpeed/blogs/deepspeed-chat/japanese at master microsoft/DeepSpeed GitHub
https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-chat/japanese



Until now, there was no pipeline that could easily and efficiently perform RLHF, which is necessary for training a model like ChatGPT. Also, training an AI model like ChatGPT requires multiple expensive GPUs, making it difficult for general developers to develop this type of AI model. In addition, even with a GPU, conventional software could only extract less than 5% of the performance of hardware, making it impossible to train models with hundreds of billions of parameters easily, quickly, and at low cost. reported.

Therefore, Microsoft announced the framework ' DeepSpeed-Chat ', which aims to allow developers to develop chat AI at a more affordable price.




DeepSpeed-Chat can generate its own ChatGPT-like model by executing the three steps of ``supervised fine-tuning '', ``reward model fine-tuning'', and ``RLHF training'' performed in InstructGPT , which is the basis of ChatGPT. provide the script. It also provides an inference API for testing the conversational form after learning.

Furthermore, the 'DeepSpeed-RLHF pipeline' installed in DeepSpeed-Chat performs 'supervised fine-tuning', 'fine-tuning of the reward model', and 'training of RLHF', and researchers and developers use multiple data resources. To help you train your own RLHF model with , it is possible to perform 'data abstraction' and 'blending functions'. ``Data abstraction'' creates an abstracted dataset to unify the format of different datasets, and ``blend function'' appropriately fuses multiple datasets to create 3 types of data such as ``supervised fine-tuning''. split into two workouts.



In addition, in order to execute learning by the 'DeepSpeed-RLHF pipeline' on a wide range of hardware at high speed and at low cost, the 'DeepSpeed hybrid engine' that fuses all systems for inference and learning such as

ZeRO announced by DeepSpeed so far ” is configured.



Using DeepSpeed-Chat, which is equipped with the DeepSpeed hybrid engine, and training with 64 NVIDIA A100 GPUs for data centers on Microsoft Azure , the ' OPT-13B ' model can be trained in about 7.5 hours. Done. Also, the cost at that time is $ 1920 (about 250,000 yen). Furthermore, in the ' BLOOM ' model, training will be completed for about 20 hours and 5120 dollars (about 680,000 yen). These figures show that it is possible to train much faster and at a lower cost than existing RLHF systems.

DeepSpeed-Chat is also capable of training and inferring large-scale models with billions to trillions of parameters, and is said to be capable of training and inference even in environments with limited GPU resources. .

Hacker News states , 'DeepSpeed-Chat doesn't make it easy to reproduce GPT-4, but it can definitely overcome some major hurdles to reproduction.' In addition, it is stated that Microsoft has invested $ 10 billion (about 1.3 trillion yen) free of charge in DeepSpeed , which develops DeepSpeed-Chat, to support research that incorporates functions like ChatGPT into Microsoft products. increase.

DeepSpeed-Chat source code etc. are published on GitHub.

GitHub - microsoft/DeepSpeed: DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://github.com/microsoft/DeepSpeed/

in Software, Posted by log1r_ut