Amazon plans to release the conversational data set 'Topical Chat' consisting of more than 4 million words to the public


Gerd Altmann from On The Line

Amazon, which is actively engaged in research and development of speech recognition technology such as AI assistant Alexa, announced that a voice data set consisting of more than 4 million words will be released to the public.

Topical Chat Dataset Helps Researchers Address Hard Challenges in Natural Conversation: Alexa Blogs

Topical Chat was originally developed for the university-sponsored contest 'Alexa Prize' hosted by Amazon, and the participating teams of the contest will be able to access this Topical Chat and its extended data set. It is planned that the participating teams will start development in earnest in September 2019, and the general release of Topical Chat will be shortly after that.

According to Amazon's senior researcher Dilek Hakkani-Tur, Topical Chat consists of more than 4.1 million words and more than 210,000 statements, and the conversation comes from cloud workers rather than interacting with Amazon Alexa users. What was done. The topic and knowledge of conversations included in Topical Chat are selected by individual cloud workers, and they have not been organized and structured in particular in creating data sets.

By Tumisu From Pixabay

“The Topical Chat supports high-quality, reproducible research presentations, including the largest social conversations and knowledge among the datasets publicly available to the research community,” said Hakkani-Tur. Next-generation knowledge-based neural response generation system while tackling “difficult tasks in natural conversation” that can not be solved with conventional datasets, such as conversational transitions, knowledge selection, and intertwining facts and opinions. We can move on to the steps. '

in Software, Posted by log1i_yk