Judge Orders OpenAI to Hand Over 20 Million Anonymized ChatGPT Chats in Copyright Fight

Sidney H. Stein, a district judge in the Southern District of New York, on January 5, 2026, affirmed a November discovery order issued by Ona T. Wang, ruling that OpenAI’s proposed privacy safeguards adequately balance user confidentiality concerns with the relevance of anonymized ChatGPT conversation logs to ongoing copyright infringement claims.

The decision arises from litigation led by The New York Times and other major publishers, who allege that ChatGPT’s responses reflect the unauthorized use of copyrighted material during both training and deployment. As part of discovery, the plaintiffs sought access to a large sample of user chats to assess whether the system reproduces protected content or was trained on copyrighted works without permission.

According to the court, the 20 million conversations represent a random sample of complete user logs that OpenAI has de-identified. Judge Stein agreed with Magistrate Judge Wang that existing privacy protections and a strict protective order sufficiently reduce risks to users, making the production appropriate and enforceable.

OpenAI opposed the order, arguing that producing the dataset—roughly 0.5 percent of preserved logs—would be unduly burdensome and pose privacy concerns, particularly because the company claims that more than 99.99 percent of the conversations are irrelevant to the dispute. The court rejected those arguments, stating that federal discovery rules do not require plaintiffs to pursue the least burdensome method when the requested materials are relevant and proportional.

The company has also characterized the order as an unprecedented intrusion into user privacy, warning that it would expose private conversations unrelated to the lawsuit. OpenAI filed motions seeking to stay or reconsider the ruling and mounted a public campaign emphasizing its privacy commitments, but those efforts have so far failed to persuade the court.

Plaintiffs argue the data is critical to determining whether ChatGPT outputs demonstrate the regurgitation of copyrighted material and to challenging OpenAI’s assertions about fair use and the sources of its training data.

Legal analysts say the ruling could mark an important precedent at the intersection of artificial intelligence, privacy, and copyright law, signaling that courts may favor expansive discovery access over technology companies’ efforts to tightly control user data in intellectual property disputes.

Judge Orders OpenAI to Hand Over 20 Million Anonymized ChatGPT Chats in Copyright Fight

CyberDefender