---
title: "Guests That Should Behave"
description: "Bots are modern guests. Most are welcome. But when they arrive in disguise with a real browser and spike your analytics, hospitality has limits. A Cloudflare story."
url: "https://hoeijmakers.net/guests-that-should-behave/"
date: 2026-04-19
updated: 2026-05-10
author: "Rob Hoeijmakers"
site: "hoeijmakers.net"
language: "en"
tags: ["AI in Practice"]
---

# Guests That Should Behave

The traffic spikes in [Plausible](https://hoeijmakers.net/cookie-free-analytics/) (Web analytics) made no sense. Peak after peak, no referral source, no pattern I recognised. Bots, clearly, but the kind that arrive carrying a real browser, behaving like a human long enough to slip past lightweight analytics. Not a security incident. More like guests who don't knock.

That framing stuck with me as I worked through the fix. Bots are guests. Most of them are welcome. The question is which ones, and on what terms.

## Welcome and unwanted

The web has always had crawlers. Search engines, archivers, feed readers: automated visitors that make the open web function. I have no objection to those. What changed over the past year or two is volume and intent. By mid-2025, crawling for AI model training accounted for nearly 80% of all AI bot activity on Cloudflare's network. Many of those crawlers identify themselves honestly. Some don't, cycling through residential IP addresses and real browsers to blend in.

The ones showing up in my Plausible dashboard were the second kind. Uninvited guests who arrive in disguise, consume resources, and distort the picture. Cloudflare's bot protection handles exactly this: using signature matching, heuristics, machine learning, and behavioural analysis to distinguish the two, while maintaining a Verified Bot directory for legitimate crawlers like Googlebot that are allowed through. I enabled the AI scrapers toggle. The spikes stopped.

## The Workers detour

Getting this working on a [Ghost](https://hoeijmakers.net/ghost-blog/) site took a step I hadn't anticipated. Cloudflare is not a simple product for an individual publisher. There are layers, and the terminology assumes familiarity with infrastructure concepts most bloggers don't carry day to day. I worked through the configuration with Claude, which made the complexity navigable. A few exchanges, some trial and error, and it was done.

The interesting byproduct came when Cloudflare suggested tightening up robots.txt. That small nudge opened a second door. I'd been serving llms.txt as a Ghost page, which worked, but felt like a workaround. With a Cloudflare Worker already in place, I wrote a proper script to serve it as a real txt file and referenced it from robots.txt. A hack became infrastructure.

[Llms.txt](https://hoeijmakers.net/llms-txt/) is an emerging convention for telling AI systems what to pay attention to on a site. Where robots.txt sets access rules, llms.txt provides semantic guidance: what's here, what matters, how to use it. Think of it as a house guide left out for guests who will never ring the bell.## Hospitality, restated

The distinction that matters is not bots versus humans. It's guests who identify themselves and follow the rules versus those who arrive in disguise and behave badly. The first kind are welcome. The second kind get turned away at the edge, before they reach the door.

llms.txt is the other side of that same logic. For the guests who do behave, who are reading the site to retrieve, to understand, to cite: here is what I've made, here is how to find it, here is what it's for. Blocking the bad actors and signalling clearly to the good ones. Two tools, one underlying idea.

The web's implicit contract, that content is public and humans read it, is being rewritten. I'm fine with that. I just want the new guests to knock first.

### Further reading

- [What llms.txt can do for your website](https://hoeijmakers.net/llms-txt/)
- [The End of Google Search (as we know it)](https://hoeijmakers.net/the-end-of-google-search-as-we-know-it/)
- [When Bots Become Readers: Publishing in the Age of AI Crawlers](https://hoeijmakers.net/when-bots-become-readers/)