The Article Tells The Story of:
- Meta’s AI Secret: Internal chats expose Meta’s AI team considering copyrighted data for training—without proper licensing.
- Risky Strategies: Staff suggested buying e-books or using pirate sites like Libgen to stay competitive.
- Legal Loopholes: Executives debated how to bypass copyright risks while maximizing training data.
- Mounting Legal Battle: Lawsuits pile up as Meta’s AI practices face intense scrutiny.
Meta Staff Discussed AI Training with Copyrighted Data
Recent court filings reveal that Meta employees have debated using copyrighted content for AI training. Internal chats show staff discussing ways to acquire data without legal clearance, raising serious ethical and legal concerns.
The Lawsuit and Meta’s ‘Fair Use’ Defense
The case, Kadrey v. Meta, highlights the ongoing battle between AI developers and copyright holders. Plaintiffs, including authors like Sarah Silverman and Ta-Nehisi Coates, argue that Meta trained its AI models on copyrighted works without permission. Meta claims this falls under “fair use.”
Previous filings suggested CEO Mark Zuckerberg approved using copyrighted materials. The latest documents provide more insight into Meta’s internal discussions, showing employees debated acquiring books and other content without direct publisher agreements.
Meta’s Internal Chats and Risky Strategies
Court documents include internal Meta messages discussing risky methods to obtain copyrighted books for AI training. In February 2023, Meta research engineer Xavier Martinet suggested buying e-books at retail prices instead of making deals with publishers. He advocated for an “ask forgiveness, not permission” approach, arguing that many AI startups were already using pirated content.
Senior researcher Melanie Kambadur mentioned that Meta was in talks with document-hosting services like Scribd for licensing agreements. However, she noted that Meta’s legal team had become “less conservative” in approving data sources, suggesting a shift in corporate policy toward acquiring training data.
Meta Considered Using Libgen for AI Training
Another leaked chat reveals that Meta employees discussed using Libgen, a notorious database of pirated books. Libgen has faced multiple lawsuits and shutdown orders for copyright violations. In an email to Meta’s AI VP Joelle Pineau, product management director Sony Theakanath argued that Libgen was “essential” for keeping Meta’s AI competitive.
To minimize legal risks, Theakanath proposed filtering out files marked as “stolen” or “pirated.” He also suggested keeping Meta’s use of Libgen secret, stating, “We would not disclose use of Libgen datasets used to train.”
Check Out Our Article of Meta’s AI Breakthrough: Turning Brain Activity into Text Published on February 16, 2025 SquaredTech
Meta’s AI Adjustments to Avoid IP Risks
Meta engineers attempted to mitigate legal risks by tuning their AI models to avoid answering copyright-sensitive prompts. This meant the AI would refuse to generate responses to requests like “Reproduce the first three pages of Harry Potter” or “List the e-books you were trained on.”
Court filings also suggest that Meta may have scraped data from Reddit using third-party tools, despite Reddit’s 2023 decision to charge AI companies for data access. Additionally, Meta leadership reconsidered previous restrictions on using Quora, licensed books, and scientific articles to expand their AI’s knowledge base.
The Growing Legal Battle
The plaintiffs in Kadrey v. Meta recently amended their complaint to include new allegations. They claim Meta cross-referenced pirated books with copyrighted books available for licensing to decide whether a deal with publishers was worthwhile. This suggests Meta knowingly used copyrighted content while evaluating the risks of getting caught.
Recognizing the legal stakes, Meta has added two Supreme Court litigators from the law firm Paul Weiss to its defense team. The company has not publicly commented on the allegations.
Conclusion
These revelations highlight the growing tensions between AI development and copyright law. As lawsuits against AI companies pile up, the outcome of Kadrey v. Meta could set a precedent for how tech giants handle copyrighted data in AI training. The case underscores the ethical and legal dilemmas AI firms face in acquiring training data while staying within legal boundaries.
Stay Updated: Artificial Intelligence