Unlock Editor’s Digest Lock for Free
FT editor Roula Khalaf will select your favorite stories in this weekly newsletter.
Meta will face a group of US authors in court on Thursday in one of the first major legal tests of whether tech companies can train powerful artificial intelligence models using copyrighted materials.
The incident, brought about by around 12 authors, including Ta-Nehisi Coates and Richard Kadrey, centers on the use of the $140 million social media giant of Libgen, the so-called shadow library of millions of books, academic articles and comics.
The ruling has had a major impact on the fierce copyright battle between artists and AI groups, and is one of a surge in lawsuits around the world that claim technology groups are using content without permission.
Microsoft, Openai, and humanity face similar legal challenges with regard to the data used to train large-scale language models behind popular AI chatbots such as ChatGpt and Claude.
“The AI model is trained on hundreds of thousands, if not millions, of books downloaded from well-known pirated sites, but this was no coincidence,” said Mary Lasenberger, chief executive of the author’s guild. “The author should have earned a license fee for that.”
Meta argues that training LLMS using copyrighted materials is “fair use” when used in the development of conversion technologies, even if they are from a Pirated database. Libgen hosts much of its content without the permission of the rights holder. In its legal filing, Meta points out that “use is fair regardless of how it is acquired.”
According to court filings, the US tech giant engaged in early discussions with the book publisher, exploring options for obtaining licenses to train models. The plaintiffs argue that Meta has abandoned this because the work was available through libgen, leading to a loss of compensation and control of the author.
In the findings, Meta said, “Once you get a license for a single book (sic), you cannot lean towards a fair use strategy.” Meta argues that he defends that there was no market to license such work for this purpose.
However, emails excavated during the court’s discovery process suggest that Meta employees are in a legal gray area, and the claims suggest that they appear to discuss ways to avoid scrutiny when using Libgen.
In one email from last January, Joelle Pineau, head of AI Research Lab Fair, which Meta recently set out, recommended the use of the Libgen dataset.
In a subsequent email, Meta’s product director, Sony Teakanas, said, “We will never publicly disclose that we have been trained by Libgen.” The email included “Legal Risks” in the subtitle with the below risks or details edited, and another subtitle, “Policy Risks”, which included “Copyright and IP.” The email suggested mitigation, such as “deleting data marked pirated/theft.”
This case is because Meta is pouring billions of dollars into becoming an “AI leader” and developing the Llama model to compete with Xai from Openai, Microsoft, Google and Elon Musk.
Recommended
“There is an incredible amount of uncertainty right now,” Chris Mammen, partner at law firm Womble Bond Dickinson, stresses that copyright cases could take years to reach conclusions.
“It’s very important to solve these things. Things continue to happen around the world at a fierce pace at which technology and our economy is developing,” he added.
Another dispute in the suit involves how the plaintiffs claim the Meta used to obtain the Libgen database known as Torrenting.
That’s what the court documents state that Meta torrented the work but sought to limit its distribution. However, no guarantee has been provided that this was completely prevented, and some evidence related to outbound data has been removed according to information from the discovery process.
“Meta has developed a transformative open source AI model that promotes incredible innovation, productivity and creativity for individuals and businesses. This involves the fair use of copyrighted materials,” Meta said in a statement. “We () oppose the plaintiff’s claims, and the full record tells a different story. We will continue to defend ourselves vigorously and protect the development of genai for all our profits.”