Meta’s AI Controversy: Court Documents Reveal Use of Copyrighted Content for Training

February 25, 2025

74

The Article Tells The Story of:

Meta’s AI Secret: Internal chats expose Meta’s AI team considering copyrighted data for training—without proper licensing.
Risky Strategies: Staff suggested buying e-books or using pirate sites like Libgen to stay competitive.
Legal Loopholes: Executives debated how to bypass copyright risks while maximizing training data.
Mounting Legal Battle: Lawsuits pile up as Meta’s AI practices face intense scrutiny.

Meta Staff Discussed AI Training with Copyrighted Data

Recent court filings reveal that Meta employees have debated using copyrighted content for AI training. Internal chats show staff discussing ways to acquire data without legal clearance, raising serious ethical and legal concerns.

The Lawsuit and Meta’s ‘Fair Use’ Defense

The case, Kadrey v. Meta, highlights the ongoing battle between AI developers and copyright holders. Plaintiffs, including authors like Sarah Silverman and Ta-Nehisi Coates, argue that Meta trained its AI models on copyrighted works without permission. Meta claims this falls under “fair use.”

Previous filings suggested CEO Mark Zuckerberg approved using copyrighted materials. The latest documents provide more insight into Meta’s internal discussions, showing employees debated acquiring books and other content without direct publisher agreements.

Meta’s Internal Chats and Risky Strategies

Court documents include internal Meta messages discussing risky methods to obtain copyrighted books for AI training. In February 2023, Meta research engineer Xavier Martinet suggested buying e-books at retail prices instead of making deals with publishers. He advocated for an “ask forgiveness, not permission” approach, arguing that many AI startups were already using pirated content.

Senior researcher Melanie Kambadur mentioned that Meta was in talks with document-hosting services like Scribd for licensing agreements. However, she noted that Meta’s legal team had become “less conservative” in approving data sources, suggesting a shift in corporate policy toward acquiring training data.

Meta Considered Using Libgen for AI Training

Another leaked chat reveals that Meta employees discussed using Libgen, a notorious database of pirated books. Libgen has faced multiple lawsuits and shutdown orders for copyright violations. In an email to Meta’s AI VP Joelle Pineau, product management director Sony Theakanath argued that Libgen was “essential” for keeping Meta’s AI competitive.

To minimize legal risks, Theakanath proposed filtering out files marked as “stolen” or “pirated.” He also suggested keeping Meta’s use of Libgen secret, stating, “We would not disclose use of Libgen datasets used to train.”

Check Out Our Article of Meta’s AI Breakthrough: Turning Brain Activity into Text Published on February 16, 2025 SquaredTech

Meta’s AI Adjustments to Avoid IP Risks

Meta engineers attempted to mitigate legal risks by tuning their AI models to avoid answering copyright-sensitive prompts. This meant the AI would refuse to generate responses to requests like “Reproduce the first three pages of Harry Potter” or “List the e-books you were trained on.”

Court filings also suggest that Meta may have scraped data from Reddit using third-party tools, despite Reddit’s 2023 decision to charge AI companies for data access. Additionally, Meta leadership reconsidered previous restrictions on using Quora, licensed books, and scientific articles to expand their AI’s knowledge base.

The Growing Legal Battle

The plaintiffs in Kadrey v. Meta recently amended their complaint to include new allegations. They claim Meta cross-referenced pirated books with copyrighted books available for licensing to decide whether a deal with publishers was worthwhile. This suggests Meta knowingly used copyrighted content while evaluating the risks of getting caught.

Recognizing the legal stakes, Meta has added two Supreme Court litigators from the law firm Paul Weiss to its defense team. The company has not publicly commented on the allegations.

Conclusion

These revelations highlight the growing tensions between AI development and copyright law. As lawsuits against AI companies pile up, the outcome of Kadrey v. Meta could set a precedent for how tech giants handle copyrighted data in AI training. The case underscores the ethical and legal dilemmas AI firms face in acquiring training data while staying within legal boundaries.

Stay Updated: Artificial Intelligence

Meta’s AI Controversy: Court Documents Reveal Use of Copyrighted Content for Training

Meta Staff Discussed AI Training with Copyrighted Data

The Lawsuit and Meta’s ‘Fair Use’ Defense

Meta’s Internal Chats and Risky Strategies

Meta Considered Using Libgen for AI Training

Meta’s AI Adjustments to Avoid IP Risks

The Growing Legal Battle

Conclusion

Airbnb Finally Fixes Its Biggest Problem: Total Costs Now Shown Upfront

Atari 2600: How One Console Built an Empire—and Then Crashed It

Hackers Abuse Google OAuth to Spoof Google in DKIM Replay Attack

LEAVE A REPLY Cancel reply

Most Popular

Airbnb Finally Fixes Its Biggest Problem: Total Costs Now Shown Upfront

Atari 2600: How One Console Built an Empire—and Then Crashed It

Hackers Abuse Google OAuth to Spoof Google in DKIM Replay Attack

These $130 Edifier Earbuds Surprised Me With Loud Bass, Clear Calls, and Long Battery Life

EDITOR PICKS

OpenAI Offers Free ChatGPT Plus to College Students in Bold AI Battle

Meta’s Bold Move: The End of Smartphones and the Rise of META Smart Glasses

Top 10 Google Wallet Features You Must Know for Easy Digital Payments

POPULAR POSTS

Airbnb Finally Fixes Its Biggest Problem: Total Costs Now Shown Upfront

Atari 2600: How One Console Built an Empire—and Then Crashed It

Hackers Abuse Google OAuth to Spoof Google in DKIM Replay Attack

POPULAR CATEGORY

ABOUT US

FOLLOW US