OpenAI's copyright lawsuit decides to disclose ChatGPT training data to some people, strict security measures prohibit offline and recording devices from being brought in



In a lawsuit filed by three authors, including Sarah Silverman, against OpenAI, alleging that their books were used to train an AI without their consent, the plaintiffs have been allowed to inspect the materials used to train the AI under tight security.

OpenAI Training Data to Be Inspected in Sarah Silverman Copyright Case

https://www.hollywoodreporter.com/business/business-news/openai-training-data-inspected-authors-copyright-case-1236011291/

In 2023, prominent authors, including comedian and author Sarah Silverman, sued OpenAI and Meta for copyright infringement. However, in the first half of the trial against OpenAI, most of Silverman's claims were dismissed by the court, making it difficult for the plaintiffs.

OpenAI wins almost complete victory in first half of copyright infringement lawsuit against ChatGPT, most of claims by three authors dismissed - GIGAZINE



The charges against Mr. Silverman and others are broadly divided into six categories: 'direct copyright infringement,' 'indirect copyright infringement,' 'violation of the Digital Millennium Copyright Act (DMCA),' 'violation of the California Unfair Competition Law (UCL),' 'negligence,' and 'unjust enrichment.' Of these, the four charges other than 'violation of the UCL' and 'direct copyright infringement' were dismissed in February 2024.

In response, the authors filed a lawsuit alleging that training an AI using copyrighted works without permission constitutes an unfair trade practice prohibited by California state law.

However, in a July 2024 ruling, Judge Araceli Martinez-Holguin of the District Court of California also dismissed the UCL claim, stating that 'the allegedly infringed works are copyrighted works and such claims are exempt from state law and should be handled under federal copyright law.'

This leaves the only option left for Silverman and his colleagues in their lawsuit against OpenAI: direct copyright infringement.

And in a court filing on September 24, it was revealed that Silverman and others had agreed on a procedure for inspecting the data. Until now, OpenAI has not made ChatGPT's training data public.



Under the contract, the training dataset will be viewed on secure computers in OpenAI's San Francisco offices that don't have access to the internet or other networks.

In addition, anyone reviewing the data must first sign a non-disclosure agreement, show identification, and sign their name on a visitor's register.

The use of all technology is strictly restricted during the viewing, with no PCs, cell phones, cameras or other recording devices allowed. OpenAI allows limited use of a computer to take notes, but these are to be copied to a separate device by the plaintiff's lawyer at the end of the day in the presence of a company-designated representative. Copying any part of the training data itself is not permitted.

'The examining parties' attorneys and experts may take handwritten or electronic notes on any note-taking computer provided, but may not copy the training data itself into their notes,' the filing states.

in Software, Posted by log1l_ks