OpenAI will pay $1 million to $5 million a year in license fees to media that use its content to train AI



OpenAI, which develops generative AI such as ChatGPT, has been criticized for using news media content for AI training, and some media have filed lawsuits. In response to this, OpenAI is reportedly in talks to pay news media license fees of between $1 million (145 million yen) and $5 million (approximately 724 million yen). I did.

OpenAI Offers Publishers as Little as $1 Million a Year — The Information

https://www.theinformation.com/articles/openai-offers-publishers-as-little-as-1-million-a-year



OpenAI In Talks With Dozens of Publishers to License Content - Bloomberg
https://www.bloomberg.com/news/articles/2024-01-04/openai-in-talks-with-dozens-of-publishers-to-license-content

OpenAI's news publisher deals reportedly top out at $5 million a year - The Verge
https://www.theverge.com/2024/1/4/24025409/openai-training-data-lowball-nyt-ai-copyright

OpenAI offering media outlets as little as $1 million to use news articles for AI models: report
https://nypost.com/2024/01/04/business/openai-offering-media-outlets-as-little-as-1-million-to-use-news-articles-for-ai-models-report/

2023 was a year of remarkable advances in generative AI such as OpenAI's ChatGPT, Google's Bard, and Microsoft's Copilot. The basis of these generative AIs is large-scale language models (LLMs), which are known to be trained using a variety of data available on the Internet. However, the inclusion of books by famous authors in the dataset used for AI training has become a problem , and some have called for strict action from regulatory authorities .

Google, which is developing the chat AI Bard, initially announced that it would 'scrape everything published online for AI purposes ' in order to promote the development of AI tools, but content creators In response to backlash, we are announcing an option to prevent your website from being used to train generative AI.

Google announces option to prevent own website from being used for training of generated AI, some point out that it is already too late - GIGAZINE



News organizations and other media outlets are opposed to their content being used to train AI, and The New York Times, the third-largest daily newspaper in circulation in the United States, is using its own content to train generative AI. Not only is it blocking access to crawlers to prevent its use, but it is also suing OpenAI and Microsoft for copyright infringement.

Major daily newspaper The New York Times sues OpenAI and Microsoft for copyright infringement - GIGAZINE



The New York Times reported that LLM's GPT-4, which is the basis for generative AI such as OpenAI's ChatGPT and Microsoft's Copilot, used New York Times content for training, resulting in ``AI output that imitates the New York Times' style of expression.'' AI is now producing content that directly competes with The New York Times.'

In fact, in one of the documents submitted by the New York Times as evidence to the court , ``Text about 2019 Pulitzer Prize winners'' (left) output by LLM's GPT-4, which is the base of ChatGPT, is ``The New York Times It has been pointed out that the content is an almost exact imitation of the article 'Article' (right). In addition, it is obvious at a glance that the red text in the text is the part that is directly used from the New York Times article, and the content is almost like a complete copy.



In response to this backlash, it was also reported that OpenAI is ``discussing prices and conditions for licensing content'' with major American media outlets. According to this report, OpenAI has entered into licensing agreements with Gannett , a major American newspaper company and publisher of USA Today, News Corp. , publisher of The Wall Street Journal, and IAC, operator of The Daily Beast. It seems that they are in discussions. It seems that Microsoft, OpenAI's largest investor, also participated in the talks between OpenAI and the media.

Inside the News Industry's Uneasy Negotiations With OpenAI - The New York Times
https://www.nytimes.com/2023/12/29/business/media/media-openai-chatgpt.html

In addition, some media companies are already allowing the use of content by receiving payment from OpenAI. Axel Springer, the German media giant that owns media outlets such as Politico and Business Insider, signed a deal with OpenAI in December 2023 to allow ChatGPT to obtain data directly from Politico and Business Insider. Masu. The Associated Press also signed an agreement allowing OpenAI to train AI models based on news articles.




Now, The Information has revealed that OpenAI is offering a license fee of $1 million to $5 million to use news media content for AI training. The Verge, a foreign media outlet that picked up this report, reported, ``This report is one of the first indicators of how much money OpenAI plans to invest in data used for AI training.''

A similar example of content usage by technology companies is the introduction of Facebook's News tab in 2019. It was reported on this news tab that Meta paid media up to $3 million a year (approximately 435 million yen) as licensing fees for news articles. In addition, Google will pay a total of 100 million Canadian dollars (approximately 11 billion yen) annually to Canadian news organizations based on the ``Online News Act,'' which stipulates that Canadian news distribution companies pay usage fees to news organizations. I agree with that. From these cases, The Verge points out that the license fee of ``$1 million to $5 million'' is almost the same as the existing contract. On the other hand, there were voices on Reddit that the license fees paid to the media side were too low.

Google agrees with Canadian government to pay $100 million a year to resume news distribution - GIGAZINE



Additionally, OpenAI's annual sales have reached $1.6 billion (approximately 230 billion yen), and it has also been revealed that the company's monthly sales have reached a maximum of $130 million (approximately 19 billion yen). Masu. OpenAI's annual sales in 2022 were only $28 million (about 4 billion yen), so sales increased 58 times compared to the previous year. Additionally, OpenAI's annual revenue in 2024 is expected to reach $5 billion (approximately 720 billion yen), so it's easy to imagine that the licensing fees OpenAI plans to pay to media won't hurt the company too much. can.

It has been reported that Apple has also discussed multi-year contracts worth more than $50 million (approximately 7.24 billion yen) with multiple media outlets to train AI using news media content.

It has been revealed that Apple has discussed multi-year contracts worth more than $50 million with various media outlets to train generated AI on news articles - GIGAZINE



in Software, Posted by logu_ii