Dec 01, 2023 17:00:00

China's Alibaba releases chat AI 'Qwen-72B' and local voice input compatible 'Qwen-Audio' as open source

Alibaba, a major Chinese IT company, has released a Transformer-based large-scale language model 'Qwen-72B' that boasts 72 billion parameters, and 'Qwen-Audio', a multimodal version of Qwen that also supports Japanese voice input. announced that it has been open sourced.

Qwen/README_EN.md at main · QwenLM/Qwen · GitHub

https://github.com/QwenLM/Qwen/blob/main/README_JA.md

Qwen/Qwen-72B-Chat · Hugging Face
https://huggingface.co/Qwen/Qwen-72B-Chat

GitHub - QwenLM/Qwen-Audio: The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
https://github.com/QwenLM/Qwen-Audio

Binyuan Hui, who researches neurolinguistic programming (NPL) at Alibaba DAMO Academy, Alibaba Group's cutting-edge research institute, posted on SNS, ``Our sincere open source works, Qwen-72B and Qwen-1.8. I'm proud to introduce B!'

We are proud to present our sincere open-source works: Qwen-72B and Qwen-1.8B! Including Base, Chat and Quantized versions!

???? Qwen-72B has been trained on high-quality data consisting of 3T tokens, boasting a parameter larger scale and more training data to achieve a… pic.twitter.com/VUAMbh83At
— Binyuan Hui (@huybery) November 30, 2023

Qwen-72B is a large-scale language model trained with over 3 trillion tokens, and has been tested in a series of benchmarks for natural language understanding, numerical processing, coding, etc., including MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, and BBH. This task outperformed Meta's LLaMA2-70B, a model of the same size. Additionally, it outperformed OpenAI's GPT-3.5 in 7 out of 10 tasks.

Users who have tried it out have reported that they can easily process Japanese.

Inference speed of Qwen-72B was compared with Stablem-base-70B

Qwen has a vocab size of 152,000
Stablelm is Llama2 series, 32,000

Qwen was quick in reasoning even in Japanese ??? pic.twitter.com/ft97X3UHmS
— AI????Satoshi⏩ (@AiXsatoshi) November 30, 2023

However, there are opinions that it is a little difficult to run it locally because it requires a large amount of memory.

Qwen-72B-Chat requires at least 144GB GPU memory to run on bf16/fp16 (e.g. 2xA100-80G or 5xV100-32G) and at least 48GB GPU memory to run on int4 (e.g. 1xA100-80G or 2xV100-32G).

It's impossible... https://t.co/8hrpCG47Sv
— Yousan (@ayousanz) November 30, 2023

In addition to Qwen-72B, Qwen also includes 'Qwen-1.8B,' 'Qwen-7B,' and 'Qwen-14B' with different parameter sizes. Additionally, a demo is available that allows you to run Qwen-72B in a browser, although user registration in Chinese is required.

通义千问-72B-对话-Demo · 创空间
https://modelscope.cn/studios/qwen/Qwen-72B-Chat-Demo/summary

Mr. Hui also announced that the model 'Qwen-Audio' that supports voice input has also been open sourced.

???? Thanks to all the enthusiasm, let's add some fuel to the fire! We've further open-sourced ???? Qwen-Audio, including Base and Chat, as well as the demo!

???? Base:
https://t.co/J8ACjUO4xE
???? Chat: https://t.co/p0RzfAvYxG
???? Demo: https://t.co/F1b30iHy5c https://t.co/B2sf4xsc7I
— Binyuan Hui (@huybery) November 30, 2023

Qwen-Audio is a combination of Qwen-7B and OpenAI's audio encoder Whisper-large-v2, and is said to outperform existing state-of-the-art (State of the Art: SoTA) models in all tasks.

Making audio a first-class citizen in LLMs: Qwen Audio ????

Using a Multi-Task Training Framework, Qwen Audio - Combines OpenAI's Whisper large v2 (Audio encoder) with Qwen 7B LM to train on over 30 audio tasks jointly.

Tasks ranging from Speech Recognition to Music Captioning… pic.twitter.com/7gzKAV6rfv
— Vaibhav (VB) Srivastav (@reach_vb) November 30, 2023

You can try the Qwen-Audio demo on Hugging Face without registering.

First, I entered Japanese text and asked a question.

Then, the following answer was output.

It seems that it also supports audio, so I tried uploading the audio file. What I used was the beginning of the video I created in a previous article .

Click Submit to submit.

The content of the audio file is ``How much gold is there on earth?'' For some reason, it was recognized as lyrics, so I couldn't have a conversation with it, but it seems to be highly accurate.