Reddit sues AI company Perplexity and others for 'industrial-scale' scraping of user comments


The lawsuit filed on Oct 22 is different in the way that it confronts not just an AI company but the lesser-known services the AI industry relies on to acquire online writings needed to train AI chatbots. — AP

Social media platform Reddit sued the artificial intelligence company Perplexity AI and three other entities on Oct 22, alleging their involvement in an "industrial-scale, unlawful” economy to "scrape” the comments of millions of Reddit users for commercial gain.

Reddit's lawsuit in a New York federal court takes aim at San Francisco-based Perplexity, maker of an AI chatbot and "answer engine” that competes with Google, ChatGPT and others in online search.

Also named in the lawsuit are Lithuanian data-scraping company Oxylabs UAB, a web domain called AWMProxy that Reddit describes as a "former Russian botnet”, and Texas-based startup SerpApi.

It's the second such lawsuit from Reddit since it sued another major AI company, Anthropic, in June.

But the lawsuit filed on Oct 22 is different in the way that it confronts not just an AI company but the lesser-known services the AI industry relies on to acquire online writings needed to train AI chatbots.

"Scrapers bypass technological protections to steal data, then sell it to clients hungry for training material. Reddit is a prime target because it’s one of the largest and most dynamic collections of human conversation ever created,” said Ben Lee, Reddit’s chief legal officer, in a statement Wednesday.

Perplexity said it has not yet received the lawsuit but "will always fight vigorously for users’ rights to freely and fairly access public knowledge. Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest.”

Oxylabs and SerpAPI didn't immediately respond to requests for comment Wednesday. AWMProxy could not immediately be reached for comment.

Reddit compares the companies it is suing to "would-be bank robbers” who can't get into the bank vault, so they break into the armoured truck instead. The lawsuit alleges they are evading Reddit’s own anti-scraping measures while also ”circumventing Google’s controls and scraping Reddit content directly from Google’s search engine results".

Lee said that because they're unable to scrape Reddit directly, "they mask their identities, hide their locations, and disguise their web scrapers to steal Reddit content from Google Search. Perplexity is a willing customer of at least one of these scrapers, choosing to buy stolen data rather than enter into a lawful agreement with Reddit itself.”

Much like its lawsuit against Anthropic, maker of the chatbot Claude, Reddit claims that Perplexity has accessed Reddit’s content despite being asked not to do so.

Reddit made a similar argument in its lawsuit against Anthropic. That case was initially filed in California Superior Court but was later moved to federal court and has a hearing scheduled for January.

Along with digitized books and news articles, websites such as Wikipedia and Reddit are deep troves of written materials that can help teach an AI assistant the patterns of human language.

Reddit has previously entered licensing agreementswith Google, OpenAIand other companies that are paying to be able to train their AI systems on the public commentary of Reddit’s more than 100 million daily users.

The licensing deals helped the 20-year-old online platform raise money ahead of its Wall Street debut as a publicly traded company last year. – AP

Follow us on our official WhatsApp channel for breaking news alerts and key updates!

Next In Tech News

'Moltbook' social media site for AI agents had big security hole, cyber firm Wiz says
US firm Aura to buy Australia's Qoria in $675 million deal, relist on ASX
Erli accuses Allegro of price manipulation in Poland e-commerce
Oracle rises as $50 billion raise eases data-center funding fears
Trendforce sees chip prices surging 90-95% in Q1 from previous quarter
Apple loses more AI researchers and a Siri executive in latest departures
A chatbot entirely powered by humans, not artificial intelligence? This Chilean community shows why
From fear to familiarity: empowering Malaysia's seniors in the digital age
Oracle says it plans to raise up to $50 billion in debt and equity this year
X back up after brief outage hits US users, Downdetector shows

Others Also Read