Table of Contents

Bot Traffic and the Crawler Problem

The Numbers

Per the Imperva 2025 Bad Bot Report:

Bad bots: data scraping, fraud, credential stuffing, server overwhelming. Growth driven by genAI tools making bot deployment faster, cheaper, and accessible to people with minimal technical skill.

AI Crawlers

AI training crawlers operate at an extractive crawl-to-referral ratio:

These crawlers consume bandwidth that independent publishers pay for, return nothing, and a significant fraction disregard the standard opt-out mechanism entirely.

Why robots.txt Is Not Enough

The worst offenders do not act in good faith to honour robots.txt. Maintaining a blocklist is an uphill arms race — new crawlers appear faster than blocks can be added. For independent publishers, this is not a viable solution.

Real Mitigations

Intentional Apathy

One position: don't block AI crawlers. The Good Web is built on openness; attempting to restrict crawlers selectively is an unwinnable arms race, and legitimate archiving bots (Wayback Machine, search engines) use the same mechanisms. See IndieWeb Principles on the balance between openness and protection.

See Also

 * Return to folkzone