Meta is under fire again over the training data behind its Llama AI model. Major publishers including Elsevier, Hachette Book Group, Macmillan Publishers, and McGraw Hill, together with author Scott Turow, have filed a sweeping lawsuit against Meta Platforms and CEO Mark Zuckerberg. The suit claims Meta used millions of illegally obtained books and articles to train its AI systems.
The
lawsuit was filed Monday in federal court in New York, marking a new escalation in the fight over AI training data. The publishers accuse Meta of large-scale copyright infringement, including downloading books via torrent sites, scraping the web, and reproducing protected material for AI training.
Publishers allege “massive copyright infringement”
The 65-page complaint outlines how Meta allegedly copied “millions of copyrighted works” to build its Llama models—ranging from novels and textbooks to scientific papers and educational materials.
According to the publishers, Meta tapped illegal torrents and datasets from pirate sites to rapidly amass text, then copied that data multiple times while training different AI models.
The plaintiffs also say Meta deliberately sidestepped licensing models. The filing claims Meta initially explored official licenses for books and publications, but later walked away at Zuckerberg’s direction.
Llama at the center of a new AI copyright fight
The case zeroes in on Llama, Meta’s family of large language models, which underpin AI features across Facebook, Instagram, WhatsApp, and Meta’s assistants.
Publishers argue Llama can generate text closely resembling existing books, produce summaries of protected works, and even mimic authors’ styles—putting it in direct competition with human writers and publishers.
They call it a fundamental threat to the publishing industry, claiming AI firms profit from creative works without permission or payment to creators.
Rising pressure on the AI industry
This isn’t an isolated case. Tech giants worldwide face mounting legal challenges over the datasets used to build generative AI. OpenAI, Microsoft, and Anthropic have also been sued by authors, artists, and media companies.
But this case could hit harder. It’s the first time several major international publishers have jointly taken on an AI company of Meta’s size—raising both political and economic stakes for the sector.
The complaint targets not just AI training, but also how Meta allegedly gathered data. Publishers accuse the company of distributing via torrents, removing copyright information, and intentionally bypassing licensing frameworks.
Meta defends its use of training data
Meta has previously argued that training AI models can fall under “fair use” in U.S. copyright law. Tech companies say AI systems learn patterns from data rather than storing direct copies of entire books.
Critics counter that modern AI increasingly reproduces full passages, writing styles, and summaries—shifting the debate from technical use to economic impact.
In the coming months, we’ll see how aggressively publishers push the case—and whether Meta chooses a settlement or a prolonged legal battle.