How AI scraper bots are putting Wikipedia under strain


Since January 2024, Wikimedia has seen a 50% increase in the bandwidth used to download multimedia content from its servers. This increase is mainly attributed to these scraper bots, and now represents a significant burden on the foundation's operations.​ — AFP Relaxnews

For more than a year, the Wikimedia Foundation, which publishes the online encyclopedia Wikipedia, has seen a surge in traffic with the rise of AI web-scraping bots. This increase in network traffic poses major infrastructure and cost management issues.

The Wikimedia Foundation is a non-profit organisation that manages Wikipedia and other projects related to free knowledge. It is highlighting the growing impact of web crawlers on its projects, particularly Wikipedia. These bots are automated programs that mass-retrieve freely licensed articles, images and videos in order to train different generative artificial intelligence models.

Since January 2024, Wikimedia has seen a 50% increase in the bandwidth used to download multimedia content from its servers. This increase is mainly attributed to these scraper bots, and now represents a significant burden on the foundation's operations.​

For example, when former US President Jimmy Carter died last December, his English-language Wikipedia page received more than 2.8 million views in one day, which is high but manageable. But at the same time, numerous bots also "read” in full a 1.5-hour video of the 1980 presidential debate between him and Ronald Reagan, which resulted in the usual network traffic doubling and saturating access to the servers. For users, this resulted in a significant slowdown in page loading. This shows that, in certain situations, Wikimedia can be significantly impacted by the activity of these bots.

The foundation emphasises the importance of implementing new mechanisms to manage this influx of traffic. The idea is, for example, to regulate bot-generated traffic, starting by limiting the number of requests per second that a bot can send to a site's servers or even imposing a time limit between each request to avoid congestion.

It could also be necessary to develop algorithms capable of differentiating real visitors from bots, or even charging companies that make massive use of its data for access to its services.

In any case, it will be necessary to negotiate directly with AI software companies very quickly to ensure that the development of their models does not affect the quality of service of Wikipedia and other websites. – AFP Relaxnews

 

 

 

Follow us on our official WhatsApp channel for breaking news alerts and key updates!

Next In Tech News

Smartphone on your kid’s Christmas list? How to know when they’re ready.
A woman's Waymo rolled up with a stunning surprise: A man hiding in the trunk
A safety report card ranks AI company efforts to protect humanity
Bitcoin hoarding company Strategy remains in Nasdaq 100
Opinion: Everyone complains about 'AI slop,' but no one can define it
Google faces $129 million French asset freeze after Russian ruling, documents show
Netflix’s $72 billion Warner Bros deal faces skepticism over YouTube rivalry claim
Pakistan to allow Binance to explore 'tokenisation' of up to $2 billion of assets
Analysis-Musk's Mars mission adds risk to red-hot SpaceX IPO
Analysis-Oracle-Broadcom one-two punch hits AI trade, but investor optimism persists

Others Also Read