Aug 06, 2024 11:00:00

NVIDIA points out that it collects a lifetime's worth of videos of a human in one day to train AI

Leaked internal documents and chats from NVIDIA revealed that the NVIDIA team was considering downloading 80 years' worth of YouTube and Netflix videos per day to train AI. Downloading YouTube videos without the permission of the copyright holder is clearly prohibited by the terms of use, but it is unclear whether the team knew that they were violating the terms or had special permission.

Leaked Documents Show Nvidia Scraping 'A Human Lifetime' of Videos Per Day to Train AI

https://www.404media.co/nvidia-ai-scraping-foundational-model-cosmos-project/

According to internal Slack chats, emails and documents obtained by 404 Media, NVIDIA has instructed employees to scrape videos from YouTube, Netflix and other sources to train AI models for its Omniverse 3D world generator and self-driving car system. The project is named 'Cosmos' internally, and the results have not yet been made public. It is different from the existing cloud service called '

Cosmos Deep Learning '.

Slack messages in a channel set up by NVIDIA for the project show employees using the open source YouTube video downloader yt-dlp in combination with virtual machines that refresh IP addresses to avoid being blocked by YouTube. Employees were trying to download videos from a variety of sources, including Netflix, but were primarily focused on YouTube, and project managers were discussing using 20 to 30 virtual machines from Amazon Web Services to download 80 years' worth of videos per day, 404 Media reported.

When employees raised questions about the legal and ethical aspects of using copyrighted content to train AI models, NVIDIA managers allegedly said they had 'permission from the highest levels of the company to use the content.'

An NVIDIA spokesperson told 404 Media, 'We respect the rights of all content creators and are confident that our models and research efforts fully comply with the letter and spirit of copyright law. Copyright law protects certain expressions, but it does not protect facts, ideas, data or information. You are free to learn facts, ideas, data or information from other sources and use them to create your own expression. There is also the concept of 'fair use' which protects the ability to use a work for purposes such as training an AI model.'

Asked for comment about Nvidia's use of YouTube videos as training data for its models, a Google spokesperson told 404 Media that 'our previous statements still stand' and pointed to a Bloomberg article in which YouTube CEO Neal Mohan spoke about training AI.

YouTube CEO says 'Using AI for training is against the rules' and 'What's important is that creators succeed on YouTube' - GIGAZINE

A Netflix spokesperson told 404 Media that Netflix has no agreement with NVIDIA for content ingest and that the platform's terms of service do not allow scraping.

Related Posts:

Aug 06, 2024 11:00:00 in Software, Posted by log1p_kr