Meta’s Piracy: Are They Hoarding More Terabytes of Books?

TECH NEWS – Mark Zuckerberg’s company used books to train artificial intelligence models, but Meta didn’t exactly have legal access to the content…

 

A copyright lawsuit is being filed against Meta for using authors’ work to train large language models (LLMs). Dozens of emails, allegedly between Meta employees, claim that the company’s AI models were being pirated in bulk for training purposes, and that the downloaded torrents were then seeded. In January, court documents revealed that Meta obtained its AI training data from a large file-sharing database, LibGen, which contains everything from news articles and paywalled academic papers to books.

Meta is accused of downloading more than 80 terabytes of data from LibGen and another “shadow library” called Z-Library. 80 TB of data is almost 80 thousand (!) gigabytes! That’s a lot. This is piracy on a perhaps unprecedented scale. The company emails document Meta’s decision to take copyrighted works it knew were pirated and use them without permission, despite clear ethical concerns. In one email submitted as evidence, a purported Meta employee futilely advises that using pirated material should cross their ethical threshold, then adds that LibGen and similar databases are basically like PirateBay or something similar, distributing copyrighted and infringing content.

Many emails mention concerns about using LibGen. One Meta researcher suggested using a VPN as the only way to access it, and also joked that it didn’t seem acceptable to torrent from a company laptop. So Meta went into stealth mode, hiding the activity by downloading and seeding the torrents outside of Facebook’s official servers. According to the prosecution, this correspondence suggests that Meta executives up to and including Mark Zuckerberg knew that the company was using pirated material to train its AI models, and it has emerged that Meta employees also believed that OpenAI was using LibGen for its own models, claiming that it was a kind of arms race that they eventually resorted to.

If Meta is found guilty, how much of a fine will they have to pay? And why is the Internet Archive (archive.org) not allowed to lend books as a digital library?

Source:  PCGamer, Ars Technica, Wired, Court Listener

Avatar photo
theGeek is here since 2019.

No comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.