Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

Meta accused of training its AI using pirated content from torrents


A new day, a new controversy around artificial intelligence. This time, Meta has been accused of using pirated content from torrents to train its large language model (LLM) Llama, which powers Meta AI. The case was one of the first copyright lawsuits filed against a tech company for training AI.

Documents reveal that Meta AI was trained with pirated content

As reported by Wired, Meta was hit with a lawsuit in 2023 for allegedly training Llama, the company’s LLM, with pirated content. The case became known as “Kadrey et al. v. Meta Platforms” and was filed by novelists Richard Kadrey and Christopher Golden, who claimed that Meta used copyrighted content without authorization.

Until now, Meta had handed over documents with redacted information to the court, but Judge Vince Chhabria of the United States District Court for the Northern District of California ordered that the original documents should be made public – and that’s what happened.

The documents reveal conversations between Meta employees about Meta AI and Llama. In one of the conversations, an engineer says that “torrenting from a [Meta-owned] corporate laptop doesn’t feel right,” which corroborates that the company used pirated content to train its AI. Another conversation suggests that “MZ” (Mark Zuckeberg) authorized the use of pirated material.

Evidence suggests that Meta used content from LibGen, a huge library of pirated books, magazines and academic articles. LibGen was created in Russia in 2008 and has been hit by multiple copyright lawsuits since then, even though no one knows who actually operates the “piracy hub.” Meta also reportedly used content from other “shadow libraries” for AI training.

The company argues that it used public materials under the legal doctrine of “fair use,” which allows the use of copyrighted content without permission in certain circumstances, which are analyzed on a case-by-case basis. Meta also claims that it’s just “using text to statistically model language and generate original expression.”

What about Apple Intelligence?

Most iPhone owners see little to no value in Apple Intelligence so far | AI icons seen on Mac, iPad, and iPhone

This is not the first time that big techs have been accused of training AI models with copyrighted content. Last year, an investigation revealed that the OpenELM model created by Apple included subtitles from more than 170,000 YouTube videos.

Although at first this led people to believe that Apple was using copyrighted content to train Apple Intelligence, the company later explained that OpenELM was an open-source model created for research purposes and that its database is not used to power Apple Intelligence.

According to Apple, its AI features available on iOS and macOS are trained “on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler.”

It’s worth noting that many large publishers such as The New York Times and The Atlantic have chosen not to share their content with Apple Intelligence training.

FTC: We use income earning auto affiliate links. More.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *