- Wikipedia is experiencing traffic overload caused by AI bots ignoring access rules.
- Crawlers extract content to train models, overwhelming servers and displacing human users.
- Free software projects are also affected by increased traffic and associated costs.
- New measures and agreements between open platforms and AI companies are being considered to ensure the sustainability of the digital ecosystem.

In recent months, digital platforms focused on the free sharing of knowledge have begun to show signs of fatigue in the face of the increasing activity of the artificial intelligence trackers. Services like Wikipedia are experiencing unprecedented pressure on their infrastructure, generated not by a genuine increase in human users, but by The tireless activity of bots focused on capturing data to feed generative AI models.
These trackers, often camouflaged or not clearly identified, Their purpose is to massively collect texts, images, videos and other public materials available on the web. with the aim of improving the training of language models and visual content generation systems.
Wikipedia and the cost of being open
The Wikimedia Foundation, which maintains Wikipedia and related projects, has announced that Since the beginning of 2024, traffic on its servers has increased by 50%.This increase would not be driven by spontaneous interest of readers, but by bots that are dedicated to systematically scanning the available contentIn fact, it is estimated that About two-thirds of the traffic directed to the most expensive data centers comes from these automated tools..
The problem is compounded by the fact that many of these bots ignore established guidelines in the 'robots.txt' file, which is traditionally used to mark which parts of a website can or cannot be indexed by machines. This violation of rules has stretched Wikimedia's resources, hindering normal user access and affecting the service's overall performance. This type of activity can be compared to spyware that affects users' privacy.
“The content is open, but keeping it available is expensive."The organization explains. Hosting, serving, and protecting millions of articles and files isn't free, even though anyone can access them without paying.
The problem extends to other corners of the free ecosystem
It's not just Wikipedia that is suffering the effects of indiscriminate data harvesting by AI bots.Free software communities and developers are also negatively affected. Sites hosting technical documentation, code libraries, or open source tools report sudden increases in traffic, often impossible to handle without financial consequences. The concern about who is spying on you while you browse is increasingly relevant..
Engineer Gergely Orosz, for example, He saw how in a matter of weeks one of his projects multiplied its bandwidth consumption by seven.This situation ended up generating unexpected costs due to excess traffic, which he had to assume himself.
To counteract this situation, developers like Xe Iaso have created tools like Anubis, a reverse proxy that forces visitors to a website to pass a short test before accessing the contentThe goal is to filter out bots, which generally fail these tests, and prioritize human access. However, these methods have limited effectiveness, as AI crawlers are continually evolving to avoid these obstacles., using techniques such as the use of residential IP addresses or frequent identity changes.
From defense to offense: traps for bots
Some developers have adopted more proactive strategies. Tools such as Nepenthes o AI Labyrinth, the latter powered by services like Cloudflare, have been designed to lure bots into a maze of fake or irrelevant contentThis way, crawlers waste resources trying to scrape worthless information, while legitimate systems are less burdened.
The dilemma of the free web and AI models
This situation contains an underlying conflict: The paradox that the opening of the Internet, which has facilitated the development of artificial intelligence, now threatens the viability of the digital spaces that feed that same AI.Big tech companies make huge profits by training their models on free content, but They do not usually contribute to the maintenance of the infrastructure that makes it possible.
The affected foundations and communities insist that A new digital coexistence pact is necessaryThis should include at least the following aspects:
- Financial contributions from AI companies to the platforms they use as a data source.
- Implementation of specific APIs to access content in a regulated, scalable and sustainable manner.
- Scrupulous observance of bot exclusion rules, such as 'robots.txt', which many tools currently ignore.
- Attribution of reused content, so that the value of the original contributors is recognized.
Wikimedia and others urge action
Beyond individual initiatives, The Wikimedia Foundation is advocating for coordinated measures to prevent their infrastructure from collapsing. Platforms like Stack Overflow have already begun charging for automated access to their content, and it's possible that others will follow suit if the situation doesn't improve.
The excessive pressure that AI bots exert on voluntary and non-profit projects may end up accelerating the closure or restriction of free access to much of the knowledge onlineA paradoxical consequence, considering that these sources have been key to the advancement of the technology that today threatens their existence. The need for a secure browser is essential in this situation..
The current challenge is find a model for responsible use of open digital resources, which ensures the sustainability of both AI models and the collaborative knowledge network that supports them.
If a fair balance between exploitation and collaboration is not achieved, The web ecosystem that fueled the greatest advances in AI could also become one of its main victims..
I am a technology enthusiast who has turned his "geek" interests into a profession. I have spent more than 10 years of my life using cutting-edge technology and tinkering with all kinds of programs out of pure curiosity. Now I have specialized in computer technology and video games. This is because for more than 5 years I have been writing for various websites on technology and video games, creating articles that seek to give you the information you need in a language that is understandable to everyone.
If you have any questions, my knowledge ranges from everything related to the Windows operating system as well as Android for mobile phones. And my commitment is to you, I am always willing to spend a few minutes and help you resolve any questions you may have in this internet world.


Comments are closed.