AiNews.com
Posts
OpenAI Accidentally Deletes Data in NY Times Copyright Lawsuit

OpenAI Accidentally Deletes Data in NY Times Copyright Lawsuit

Alicia Shapiro
November 21, 2024 • Estimated Reading Time: 4 minutes

A courtroom setting representing a copyright lawsuit involving AI, with a glowing computer interface symbolizing a virtual machine and fragmented data to illustrate accidental deletion. In the background, icons of newspapers and digital documents represent The New York Times and Daily News, alongside a balance scale to signify justice. The design avoids human imagery, focusing on abstract representations of legal and digital themes.

Image Source: ChatGPT-4o

OpenAI Accidentally Deletes Data in NY Times Copyright Lawsuit

In an ongoing lawsuit filed by The New York Times and Daily News against OpenAI for allegedly using their copyrighted content to train its AI models without permission, OpenAI engineers reportedly deleted key evidence stored on a virtual machine.

The issue arose after OpenAI agreed to provide virtual machines so that lawyers for the plaintiffs could search its training datasets for their copyrighted material. According to a letter filed in the U.S. District Court for the Southern District of New York, attorneys and experts spent over 150 hours since November 1st searching through OpenAI’s datasets before the deletion incident.

The Mistake and Its Consequences

On November 14, OpenAI engineers erased all search data from one of the virtual machines. Although the company attempted to recover the deleted files and succeeded in retrieving most of the data, the folder structure and file names were permanently lost. This rendered the recovered data unusable for determining where copyrighted articles from the plaintiffs were used in OpenAI’s model training.

The plaintiffs’ legal team wrote in the letter that “an entire week’s worth” of their experts’ and lawyers’ work needed to be redone as a result. While acknowledging that the deletion was unintentional, they argue the incident highlights why OpenAI is better equipped to search its own datasets for potentially infringing material.

OpenAI’s Defense in the Copyright Debate

OpenAI has not confirmed whether its AI systems were specifically trained on The New York Times or Daily News content. It maintains that training AI models using publicly available data is covered under fair use, a legal doctrine that allows limited use of copyrighted material without permission under certain conditions.

Despite its stance, OpenAI has recently struck licensing agreements with a number of publishers, including:

Associated Press
Business Insider owner Axel Springer
Financial Times
People parent company Dotdash Meredith
News Corp

The terms of these agreements remain confidential, but reports suggest that OpenAI is paying significant sums, with Dotdash reportedly receiving at least $16 million annually.

Looking Ahead

The data deletion incident, while accidental, adds a layer of complexity to the lawsuit. It underscores the challenges plaintiffs face in gathering evidence from proprietary AI systems, even when granted limited access.

As debates about fair use and AI training practices continue, this case could set a precedent for how courts handle disputes over copyrighted content used in machine learning. OpenAI’s growing number of licensing deals suggests a possible shift in its approach, but how this aligns with legal outcomes remains to be seen.

Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.