AI Logs in Discovery: Court Orders Production of 20 Million ChatGPT Conversations

Nov 13, 2025

A discovery dispute in the In re: OpenAI, Inc., Copyright Infringement Litigation has raised significant confidentiality concerns for professionals using generative AI. A U.S. Magistrate Judge ordered OpenAI to produce 20 million anonymized consumer ChatGPT logs, prompting a request from the company to reconsider what it calls a “dangerous precedent” (Letter, p. 1).

The order, and the subsequent arguments over it, highlight the tension between broad discovery in copyright cases and the privacy expectations of AI users, including lawyers and inventors.

The Order to Compel

On November 7, 2025, Magistrate Judge Ona T. Wang granted the News Plaintiffs’ motion to compel the production of “20 million retained, anonymized consumer ChatGPT output logs” (Order, p. 1).

Order

160KB ∙ PDF file

Download

The court’s reasoning was twofold. First, it found that OpenAI “failed to explain how its consumers’ privacy rights are not adequately protected” by the existing protective order or by OpenAI’s “exhaustive de-identification” of the logs (Order, p. 1).

Second, the court pointed to OpenAI’s previous reliance on rulings from Concord Music Group, Inc. v. Anthropic PBC. Judge Wang noted that OpenAI “fails to explain why Judge Keulen’s subsequent order directing production of the entire 5-million record sample... is not similarly instructive here” (Order, p. 2).

OpenAI’s Request for Reconsideration

On November 12, 2025, OpenAI filed a letter asking the Court to reconsider, arguing the order forces the production of a “massive trove of irrelevant personal user conversations” (Letter, p. 1).

Letter

302KB ∙ PDF file

Download

OpenAI’s letter outlines several key objections:

Relevance and Proportionality: The company argues that the production is not proportional to the needs of the case, stating that “plaintiffs concede [more than 99.99%]... have nothing to do with this case” (Letter, p. 1). OpenAI characterized the request as a “speculative fishing expedition” that risks exposing data from “lawyers, doctors, therapists, and even journalists” (Letter, p. 1).
Limits of De-Identification: OpenAI directly challenged the court’s reliance on anonymization. The company noted that the court “did not acknowledge OpenAI’s sworn witness declaration explaining that the de-identification process is not intended to remove information that is non-identifying but may nonetheless be private” (Letter, p. 2).
Misapplication of Precedent: The company argued it “never had the opportunity” to explain why the Concord case is inapplicable (Letter, p. 2). OpenAI claims that, unlike in this case, “Anthropic had affirmatively proposed wholesale production... without any apparent concern for the privacy implications” (Letter, p. 3). Therefore, the Concord order was about the mechanism of an “already agreed-upon production,” not a contested order to compel (Letter, p. 3).
Nature of the Data: OpenAI also distinguished the data itself. The logs in Concord were “prompt-output pairs,” whereas the data at issue here are “complete conversations,” which are “much more likely to expose private information” (Letter, p. 3).

Balancing Discovery with Confidentiality

For IP owners and patent practitioners, this order brings abstract fears about AI confidentiality into sharp focus.

From the plaintiffs’ perspective, access to a large, random sample of outputs is necessary to prove their case at scale. They need to demonstrate whether the AI models systemically reproduce copyrighted content, rather than just in isolated instances that OpenAI might claim were manufactured by the plaintiffs.

The challenges and risks for non-party AI users, however, are substantial.

Exposure of Sensitive Information: The inclusion of “lawyers” and “financial analysts” in OpenAI’s list of affected users is specific (Letter, p. 1). It raises the possibility that privileged communications, trade secrets, or invention-related brainstorming sessions could be swept into the production.
Inadequate Anonymization: The core of the risk is OpenAI’s admission that “private” information may survive de-identification (Letter, p. 2). While a user’s name may be stripped, a “complete conversation” (Letter, p. 3) could easily reveal confidential business strategy, technical details of an unpatented invention, or legal analysis of a pending case.
The ‘Mosaic Effect’ and AI-Powered De-Anonymization: The court’s reliance on “exhaustive de-identification” (Order, p. 1) may not account for modern data analysis capabilities. The ‘mosaic effect’ describes a well-known privacy risk where individual, non-identifying data points can be combined to reveal a specific person or organization. In this context, a party receiving 20 million logs could potentially use their own AI-powered tools to cross-reference seemingly innocuous details within the “complete conversations” (Letter, p. 3). What a human reviewer cannot identify, a machine-learning model might, by connecting conversational patterns, technical jargon, or project-specific details to publicly available information, thereby linking anonymized logs back to a specific company or individual.
A Chilling Effect: If discovery orders can routinely pull in millions of user logs—even if 99.99% are irrelevant—professionals may become hesitant to use this specific tool (or similar AI tools) for substantive work. The fear is that any “private thoughts and confidential business information” (Letter, p. 1) entered into a chat interface could one day be reviewed by opposing counsel in unrelated litigation.

The outcome of OpenAI’s request for reconsideration will be watched closely. It may set a critical standard for how courts balance the scope of e-discovery with the unique privacy risks generated by large-scale AI models.

In the meantime, be sure to set AI policies and vendor agreements that consider court orders like this one.

Disclaimer: This is provided for informational purposes only and does not constitute legal or financial advice. To the extent there are any opinions in this article, they are the author’s alone and do not represent the beliefs of his firm or clients. The strategies expressed are purely speculation based on publicly available information. The information expressed is subject to change at any time and should be checked for completeness, accuracy and current applicability. For advice, consult a suitably licensed attorney and/or patent professional.

Discussion about this post

Ready for more?