Site Tools


folkzone:networking:bot_traffic

Bot Traffic and the Crawler Problem

The Numbers

Per the Imperva 2025 Bad Bot Report:

  • Automated traffic surpassed human-generated activity in 2024
  • 51% of all web traffic is now bots
  • Good bots (search crawlers, etc.): 14%
  • Bad bots: 37% — six consecutive years of growth

Bad bots: data scraping, fraud, credential stuffing, server overwhelming. Growth driven by genAI tools making bot deployment faster, cheaper, and accessible to people with minimal technical skill.

AI Crawlers

AI training crawlers operate at an extractive crawl-to-referral ratio:

  • Anthropic's Claude crawler peaked at approximately 500,000:1 — 500,000 pages crawled per ~1 visitor sent back
  • 13.26% of AI bot requests actively ignored robots.txt directives in Q2 2025

These crawlers consume bandwidth that independent publishers pay for, return nothing, and a significant fraction disregard the standard opt-out mechanism entirely.

Why robots.txt Is Not Enough

The worst offenders do not act in good faith to honour robots.txt. Maintaining a blocklist is an uphill arms race — new crawlers appear faster than blocks can be added. For independent publishers, this is not a viable solution.

Real Mitigations

  • DDoS protection / CDN — absorbs volumetric bot traffic before it reaches your origin server. See Deflect (for independent/civil society sites) and CDN options
  • Rate limiting — limit requests per IP per time window at the reverse proxy layer (Caddy, nginx)
  • Honeypot paths — serve fake endpoints that only bots visit; block IPs that hit them
  • Netlify bandwidth — if on a free tier, bot traffic can exhaust your bandwidth allocation before human visitors are served. 106.8 GB in a single month is not unusual for a site with any visibility

Intentional Apathy

One position: don't block AI crawlers. The Good Web is built on openness; attempting to restrict crawlers selectively is an unwinnable arms race, and legitimate archiving bots (Wayback Machine, search engines) use the same mechanisms. See IndieWeb Principles on the balance between openness and protection.

See Also

folkzone/networking/bot_traffic.txt · Last modified: by 127.0.0.1