OpenAI reveals that it mistakenly erased evidence data in copyright lawsuits



OpenAI is facing a copyright infringement lawsuit from

the New York Times, a major American daily newspaper that claims it used its content to train its generative AI model. However, in a court filing on November 20, 2024, OpenAI revealed that it had accidentally erased all of the evidence related to the trial.

gov.uscourts.nysd.612697.328.1.pdf
(PDF file) https://docs.google.com/viewerng/viewer?url=https://storage.courtlistener.com/recap/gov.uscourts.nysd.612697/gov.uscourts.nysd.612697.328.1.pdf

gov.uscourts.nysd.612697.210.2.pdf
(PDF file) https://storage.courtlistener.com/recap/gov.uscourts.nysd.612697/gov.uscourts.nysd.612697.210.2.pdf

OpenAI accidentally deleted potential evidence in NY Times copyright lawsuit | TechCrunch
https://techcrunch.com/2024/11/20/openai-accidentally-deleted-potential-evidence-in-ny-times-copyright-lawsuit/



OpenAI accidentally erases potential evidence in training data lawsuit - The Verge

https://www.theverge.com/2024/11/21/24302606/openai-erases-evidence-in-training-data-lawsuit

In December 2023, the New York Times sued OpenAI and Microsoft for copyright infringement, saying, 'The large-scale language models that power generative AI such as ChatGPT and Copilot are trained on New York Times content, enabling them to output content that mimics The New York Times' style of expression and produces content that directly competes with The New York Times.' 'As a result, The New York Times' relationship with its readers has been damaged, and not only have sources of revenue such as subscription fees, licensing fees, advertising revenue, and affiliate income been lost, but the provision of high-quality journalism has been threatened.'

Major daily newspaper New York Times sues OpenAI and Microsoft for copyright infringement - GIGAZINE



As the trial with The New York Times continues, OpenAI is providing the Times with two virtual machines that can search for copyrighted content in the AI training set. According to OpenAI, since November 1, 2024, New York Times lawyers have spent more than 150 hours working with experts to search the training data.

However, in a document filed in the U.S. District Court for the Southern District of New York on November 20, 2024, OpenAI reported that 'on November 14, 2024, our engineers erased all New York Times search data stored on one of our virtual machines.'

OpenAI immediately restored the data, but the data that was recovered had serious problems with the folder structure and file names. OpenAI said, 'The recovered data makes it difficult to properly understand how the New York Times article was used to build OpenAI's AI models.' In addition, OpenAI has not disclosed the cause of the accident or details of the erased data.



OpenAI described the data deletion as a 'glitch,' and The New York Times said, 'There is no evidence that the deletion was intentional.' Nevertheless, The New York Times criticized, 'The data that was recovered is largely unusable, and a week's worth of work by experts and lawyers will have to be redone. The data deletion will force us to spend countless hours recreating our work from scratch.'

At the time of writing, OpenAI's public relations representative had not commented.

in Software, Posted by log1r_ut