- Unauthorized AI Training: Plaintiffs allege that Meta trained its Llama AI models using LibGen, a repository of pirated e-books and articles, with direct approval from CEO Mark Zuckerberg, despite internal warnings about legal risks.
- Concealment Tactics: Court filings accuse Meta of stripping copyright information and torrenting files to obscure its use of copyrighted materials, effectively bypassing lawful acquisition methods.
- Legal and Ethical Implications: The case challenges Meta’s reliance on the “fair use” defense, raising concerns about the ethical boundaries of AI training and the potential for setting new legal precedents.
Meta is under fire for allegedly training its Llama AI models on copyrighted materials without permission, as revealed in the ongoing lawsuit Kadrey v. Meta. Plaintiffs, including notable authors, claim that Meta CEO Mark Zuckerberg approved the use of LibGen, a repository of pirated e-books and articles, despite internal concerns. LibGen has faced multiple lawsuits for copyright infringement, and Meta employees reportedly flagged its use as risky, suggesting it might damage the company’s reputation with regulators.
Newly unsealed court documents detail that Meta’s AI team sought Zuckerberg’s approval to use LibGen for training, despite acknowledging the dataset contained pirated material. Internal memos show that Meta employees referred to LibGen as a “data set we know to be pirated” and noted potential legal and ethical implications. The documents also allege that Meta justified its actions under the U.S. “fair use” doctrine, reasoning that training AI models was sufficiently transformative, though many creators contest this defense.
The lawsuit includes accusations that Meta deliberately concealed its use of copyrighted materials. According to the filing, engineers removed copyright markers and acknowledgments from the LibGen dataset, potentially to prevent AI-generated outputs from exposing the source material. This practice, the plaintiffs argue, reflects an attempt to obscure the infringement rather than a mere technical necessity for training.
The filing further alleges that Meta accessed LibGen via torrenting, a process that inherently involves redistributing files. Plaintiffs claim this amounted to additional copyright violations and criticize Meta’s decision to bypass lawful acquisition methods, like purchasing or licensing the materials. Internal emails reveal that Meta’s leadership, including Ahmad Al-Dahle, supported these actions, even as some employees expressed legal concerns.
While the case currently focuses on Meta’s earlier Llama models, it raises broader questions about the ethical and legal boundaries of AI training. Courts have previously sided with AI developers in similar cases, citing insufficient evidence of infringement. However, the allegations of intentional misconduct and efforts to avoid scrutiny paint a troubling picture of Meta’s practices. As the case unfolds, it could set significant precedents for AI and copyright law.