New Meta Emails Reveal That the Company Downloaded 81.7 TB of Copyrighted Books via BitTorrent to Train Its AI Models

Authored by xatakaon.com and submitted by Appropriate_Snow2112

The ongoing Kadrey v. Meta Platforms, Inc lawsuit accuses the tech giant of using copyrighted materials to train its artificial intelligence models. A few months ago, it was revealed that Meta CEO Mark Zuckerberg authorized the use of pirated books. New evidence recently emerged to support these claims.

Unsealed emails. Appendix A of the case includes several emails from Meta employees that reveal a significant number of downloads of copyrighted books. One employee named Melanie Kambadur expressed her refusal to participate in this kind of data collection in October 2022.

“Torrenting from a corporate laptop doesn’t feel right,” Nikolay Bashlykov, a Meta engineer responsible for this data collection, said in an April 2023 message. He added that the company needed to be cautious about the IP address from which they downloaded the materials.

Meta knew the risks. In September 2023, Bashlykov cautioned that torrenting could lead to “seeding,” which “could be legally not OK.” These internal discussions suggest that Meta recognized this type of activity as unlawful, according to authors who have sued the company.

Covering its tracks. In an internal message, Meta researcher Frank Zhang said that the company took measures to avoid using its servers when downloading the data set. This was to prevent anyone from being able to trace the seeding and the entity downloading the content.

81.7 TB of data. According to Ars Technica, new evidence indicates that Meta downloaded at least 81.7 TB of data from several libraries that offered copyrighted books via torrents. A recent document from the ongoing legal process revealed that at least 35.7 TB were downloaded from sites like Z-Library or LibGen (which was shut down in the summer).

Meta seeks to dismiss the allegations. The company has filed a motion to dismiss these charges. Meta claims there’s no evidence that any of its employees downloaded books via torrents or that they were distributed by Meta. Xataka has contacted the company for comments on the case and will update this post if we receive a response.

Plundering the Internet. This issue highlights the questionable practices that AI companies employ to train their models. It happened with Google, which updated its privacy policy in 2023 to say that it’ll “use publicly available information to help train Google’s AI models.” It’s also evident with OpenAI, which used millions of texts, many of them copyrighted, to train ChatGPT. Perplexity recently came under scrutiny for bypassing the “rules of the Internet” to feed its AI model.

Internet theft is being normalized. What’s remarkable is that as companies increasingly skirt the rules and violate copyright, this behavior is starting to be seen as normal. There seems to be little time for outrage, and people often treat it as an accepted practice so they can continue with their business.

Is this really “fair use”? Many companies rely on the concept of “fair use,” which allows for limited use of protected material without requiring permission. While copyright infringement lawsuits are emerging in the world of generative AI, they often seem to take a backseat as these large companies continue to thrive.

Related | How to Use DeepSeek: 36 Features and Tricks to Get the Most Out of This AI Model

thebudman_420 on February 7th, 2025 at 22:47 UTC »

Why is it that crime is only for everyone except the officials who had this done?

Why don't they go to jail and prison too?

If only people lower and less rich can get in trouble we have an unjust law. Legal for the rich illegal for you.

Criminal for you legal for the rich.

EUeXfC6NFejEtN on February 7th, 2025 at 22:27 UTC »

They used the books in a questionable way and now it turns out that they didn't even pay to acquire the books?

Pretty darned sure that's not a copyright violation. Pretty sure that's straight up theft.

NoirVPN on February 7th, 2025 at 22:23 UTC »

so piracy is legal if you are rich and above the law.