# Rob Hoeijmakers — full content index # https://hoeijmakers.net/llms-full.txt # Source: RSS feed (most recent 15 posts) ## The Detective and the Swarm URL: https://hoeijmakers.net/the-detective-and-the-swarm/ Date: 2026-05-09 At 07:43 UTC this morning, something hit my site. Not a flood in the security sense: no alarms, no WAF triggers, nothing in the human analytics. Just a number that shouldn't have been that high, on a chart I check more out of habit than worry. Four hundred requests in sixty seconds. Then back to normal.My human-facing analytics saw nothing. That absence is itself a clue.Two layersMost publishers have one view of their traffic: the analytics dashboard. It shows pageviews, sessions, referrers, the countries their readers come from. It tells them what humans did.What it doesn't show is everything else. And everything else, it turns out, is interesting.I run a Cloudflare Worker that logs every request to a D1 database before passing it along. Every request: human, bot, crawler, scraper, agent. The Worker tries to classify each one, matching user-agent strings against a database of known bot signatures. What it can't classify, it logs as unknown.That second layer is where the detective work happens.Reading the evidenceThe 07:43 spike broke down like this: 202 of those 400 requests were for .md paths. Paths like /when-bots-become-readers.md, /web-traffic-and-the-rise-of-llms.md, /measuring-traffic-machines-bots.md. The Worker classified 192 of them as human, because the user-agent strings looked like browsers: Chrome 138, Firefox 115, Edge 114. Perfectly formed, perfectly plausible.But one user-agent hit 47 different .md paths in sixty seconds. Another hit 37. Both from the same two browser fingerprints, distributed across 25 countries.Then the robots.txt requests: 116 of them in the same minute. That's a preflight pattern, a swarm checking the rules before it reads the content.Chrome 114 hasn't been a current browser for a long time. Neither has Firefox 115. These are frozen strings, a signature of bot infrastructure that picks a browser version and pins it, never updating. The User Agent (UA) looks human. The behaviour doesn't.The conclusion assembled itself: a distributed scraper, running across a proxy network or botnet, using spoofed browser identities to avoid classification. Evasive, coordinated, and genuinely clever.The twistHere's where the detective story gets uncomfortable.Those .md endpoints don't exist by default in Ghost. I added them. A few months ago, as an experiment: serve each post as clean Markdown alongside the HTML version, reference them in llms.txt, see what happens. The idea was to make the content easier for AI systems to consume. Structured, clean, no JavaScript noise.The scrapers found them almost immediately. So did legitimate AI users, ChatGPT-User and OAI-SearchBot among them, reading the same paths through the same door.I set the bait. They smelled it.The idée fixeThe reflex response to a traffic spike like this is defensive. Block the IPs, rate-limit the endpoint, add a CAPTCHA, harden the WAF. There is an entire industry built around that reflex.It rests on a premise worth examining: that keeping machines out is possible, and that it is worth the effort.Neither is quite true. A scraper that can distribute across 25 countries and rotate frozen browser UAs is not stopped by a robots.txt entry or a Cloudflare rule. It routes around friction the way water routes around a stone. And the content, once published on the public web, is going to be consumed by machine pipelines whether or not you make it easy.The more interesting question is what you can learn by watching it happen.The spike told me that my Markdown experiment is working, in the sense that it is attracting exactly the traffic it was designed for. It told me that the machine layer of the web is active, distributed, and more sophisticated than most people assume. It told me that the gap between what human analytics show and what the full request log shows is where the real picture lives.Blocking that traffic would have closed the window. Watching it left it open.Signal in the noiseThe thing about running a second logging layer is that most of what it catches is unremarkable. Googlebots, Bingbots, ChatGPT-User ticking through recent posts, the usual crawl of SEO tools and RSS readers. Noise.But the noise is the baseline. Without it, the 07:43 spike is invisible. With it, you can ask: what's different about this minute? Why these paths? Why this many countries? Why frozen UAs?The detective work is in the filtering, not the blocking!🗒️The .md endpoints at hoeijmakers.net are intentional. Each post is available as clean Markdown alongside the HTML version. The llms.txt file indexes them. This is an ongoing experiment in machine-readable publishing.Related:I Thought I Was Optimising for Speed Thirty Years of Caching, Sorted in an Afternoon My Visitors Are Not All Human. That Is Fine. Guests That Should Behave Markdown, the WD-40 of Digital Information --- ## Markdown, the WD-40 of Digital Information URL: https://hoeijmakers.net/markdown-the-wd-40-of-digital-information/ Date: 2026-05-09 There is a can of WD-40 in almost every workshop in the world. It doesn't do any one thing brilliantly. It loosens, lubricates, displaces moisture, protects against rust. It works on hinges, bolts, zippers, bike chains, garden tools. The reason it's everywhere is not that it's the best at anything. It's that it's useful for almost everything, costs almost nothing, and leaves no mess.Markdown is the WD-40 of digital information.Just enough structureJohn Gruber published the original Markdown specification in 2004. The idea was simple: a plain text format that reads naturally as text but converts cleanly to HTML. A # becomes a heading. **bold** becomes bold. A blank line between paragraphs is a paragraph break. No tags, no toolbars, no proprietary file format.Twenty years later, the spec has barely changed. The software around it has been rebuilt dozens of times. The platforms it feeds into have come and gone. Markdown itself just kept working.That persistence is worth examining. It's not inertia. Rich Text Format persisted for decades out of inertia, dragging its bloated XML through every Office suite until it quietly died. Markdown persisted because it kept finding new uses its creator didn't anticipate.The reason is the sweet spot it occupies: between content and meaning.Content and meaningMost formats choose a side.HTML carries meaning but not content: the tags tell you what something is, but the structure is invisible to casual reading. A human can read raw HTML, but it requires effort. The format is for browsers, not people.A Word document carries content but obscures meaning: it looks right on screen, but extract the text and the structure collapses. Headings become large bold text. Tables become chaos. The meaning was in the rendering, not the file.Markdown carries both, lightly. A heading is marked as a heading, not just styled to look like one. Emphasis is marked as emphasis. A list is a list. But the markup is so minimal that a human reading the raw file sees structure without syntax noise. The # before a heading reads almost like an underline. The ** around a word barely interrupts the flow.That dual legibility is the trick. A human reads it as text. A machine reads it as structured data. Neither has to compromise.Where it keeps showing upGitHub made Markdown the default format for READMEs in 2009. Not because it was mandated, but because developers were already using it in plain text files and it rendered cleanly. Stack Overflow adopted it for questions and answers. Notion built an entire productivity platform around it. Obsidian turned it into a knowledge management system. Slack, Discord, WhatsApp: all support some dialect of Markdown for message formatting.Every time a new context needed lightweight structure, Markdown was already there.AI systems followed the same logic. Training data in Markdown is cleaner than HTML: no scripts, no navigation chrome, no cookie banners embedded in the text. The structure survives the extraction. A heading in Markdown is still a heading after ingestion. That's not true of most formats.When AI assistants started browsing the web and retrieving content for users, Markdown became the preferred output format for responses. Not because it was specified. Because it was the format that travelled best between systems.A deliberate experimentA few weeks ago I added something to this site: a Markdown copy of every post, served alongside the HTML version. A path like /this-post.md returns the same content as /this-post/, but as clean Markdown text. I also added an llms.txt file indexing all of them, a signal to AI systems that structured content is available here.The experiment was modest. Serve the content in the format machines prefer, see what happens.What happened was immediate. AI crawlers found the .md paths within days. ChatGPT-User, OAI-SearchBot, and others started reading through them systematically. And when a swarm of scrapers hit the site recently, 202 of their 400 requests in a single minute were for .md paths. They didn't find those endpoints by accident. They went looking for them, the way you go looking for the WD-40 when something needs loosening.The Markdown layer attracted exactly the traffic it was designed for. The format did the work.Why slim formats winRTF tried to carry everything. SGML tried to define everything. Both are essentially dead for everyday use. The formats that survive are the ones that carry just enough.Markdown carries hierarchy, emphasis, links, code blocks, and lists. It leaves everything else alone. That restraint is what makes it composable: you can feed it into a static site generator, a documentation tool, an AI pipeline, a note-taking app, a chat interface. It doesn't arrive with requirements. It arrives ready.The web itself works this way. HTTP is a slim protocol. HTML, at its core, is a slim format. The richness came later, layered on top. The slim foundation is why it all still runs.Markdown found its sweet spot and stayed there. Two audiences, one format, twenty years, no mess.That's not luck. That's the point.Related: Using markdown for bloggingMarkdown in Google Docs: A Practical Step ForwardThe Detective and the Swarm --- ## The shift from x86 to ARM is about power, not just performance URL: https://hoeijmakers.net/x86-to-arm/ Date: 2026-05-08 The new iMac is about half the weight of the one it replaced. I noted that in an earlier post and left it there, a physical curiosity. What I did not fully grasp at the time was that the weight difference is not a design choice. It is a consequence of a decision made in 1985 by a British company called Acorn Computers.A short YouTube video made it click.Two languagesEvery processor speaks a language. The architecture defines that language: what instructions the chip understands, how it processes them, how it talks to the rest of the machine. x86 and ARM are simply two different languages. Same job, very different dialects.x86 goes back to 1978. Intel's 8086 chip started a family tree that runs unbroken to every Windows PC, every AMD Ryzen, every server rack powering Google and Netflix today. Forty-five years of backward compatibility, each generation carrying the weight of the previous one.ARM came seven years later, from a different starting point. Acorn's engineers wanted a chip that was simpler, faster, and used less power. They called it the Acorn RISC Machine. The insight was almost contrarian: instead of building chips that could handle complex instructions in a single step, strip it down to simple ones executed quickly and cheaply. Most programs, it turned out, never needed the complexity anyway.Where the weight goesx86 chips are CISC: complex instruction set computing. A single instruction can do multiple things at once. That capability requires elaborate decoding hardware, transistors devoted not to computing but to figuring out what they have been asked to compute. At billions of operations per second, that overhead is not trivial.ARM is RISC: reduced instruction set computing. Fixed-length instructions, a simpler decoder, more transistors left over for actual work. The result: ARM chips do less per instruction but do it faster, cooler, and on far less power. A typical x86 desktop chip draws between 65 and 250 watts. A phone-grade ARM chip draws 1 to 15.That difference is why every smartphone runs ARM. You cannot run an x86 chip on a small battery. It would drain in an hour and burn your hand. And it is why the new iMac is half the weight of the old one. Apple Silicon is ARM. The lightness is not aesthetic. It is physics.The same logic, at scaleWhat surprised me, reading further, is that the efficiency argument does not stop at the desk.Amazon Web Services has been deploying its own ARM-based Graviton processors in data centers, claiming up to 40% better price-performance compared to equivalent x86 options. At data center scale, lower power consumption translates directly into electricity costs. The logic that makes a laptop last eighteen hours on a charge also makes a server hall cheaper to run.The cloud is not a single machine. It is tens of thousands of them. ARM's advantage compounds accordingly.ConvergenceThe cleaner story would be: ARM wins, x86 fades. The actual story is more interesting.Intel has added efficiency cores to its processors, smaller and simpler, borrowing directly from RISC thinking. ARM chips like Apple's M-series are no longer just efficiency champions; they are closing the gap on raw performance. The two architectures are converging, each absorbing what the other does well.x86 still leads in PC gaming, high-end creative workstations, and the entrenched server ecosystem. That will not shift quickly. Forty-five years of software compatibility is not something you dissolve in a product cycle.But the direction is clear. As energy efficiency becomes a constraint, at the battery level and at the grid level, ARM's fundamental design advantage becomes more relevant, not less.The lighter machineIn the first post about this iMac, I wrote that local computers are becoming lighter, replaceable instruments. The ARM architecture is part of why that is true in a literal sense. Less heat, no fan required, a thinner chassis, half the weight.I assumed there was still a clear distinction between phone chips and Mac chips. Apple Silicon in a Mac meant M-series: ARM as the foundation, but scaled up, redesigned, a different member of the family. The iPhone stayed in one category, the Mac in another.The MacBook Neo ended that story. It is the first Mac to use an A-series chip, the same family found in the iPhone, rather than the M-series chips in other Macs. Reports suggest the first production run used binned A18 Pro chips, originally intended for the iPhone 16 Pro. Apple is shipping a laptop powered by recycled phone silicon. And it sells for $599.The A18 Pro is not a compromise chip. It runs at M3-to-M4 class performance for single-threaded work.The boundary between phone and computer, already blurry at the architecture level, has now dissolved at the product level too. What started as a design principle in 1985, do more with less, has worked its way from pocket to desk to data center. The lighter machine was never just about weight.💡The A and M series, briefly Apple makes two families of ARM chips. The A-series powers iPhones and iPads: compact, optimised for a small battery, tuned for mobile workloads. The M-series powers Macs: the same ARM foundation, but scaled up with more CPU and GPU cores, more memory bandwidth, and support for larger pools of unified memory. Think of them as the same language spoken at different volumes. The MacBook Neo is the first Mac to use an A-series chip rather than an M.Further readingA lighter computer for a different eraFrom Silicon to Intelligence: Understanding the Hardware Behind AI --- ## I Thought I Was Optimising for Speed URL: https://hoeijmakers.net/i-thought-i-was-optimising-for-speed/ Date: 2026-05-05 I built a caching layer for my blog. The goal was performance. What I got, alongside that, was a clearer picture of my audience than I had ever had before.That was not the plan.What the numbers used to show meBefore, I had Plausible. Clean dashboard, daily visitors, top pages. Reassuring in the way that summaries are reassuring: they tell you something is happening without telling you much about what.A blog attracts a certain kind of traffic. Some of it is people reading. Some of it is search indexers doing their rounds. Some of it is AI systems pulling content for retrieval pipelines. Some of it is automated probes checking whether your server has vulnerabilities. All of this shows up as traffic. Almost none of it is distinguishable in a standard analytics view.The crowd that arrivedWhen the caching article hit Hacker News, thousands of requests came in within the hour. I had not planned for that. What I had planned for was having instruments in place when it happened.Every request that reaches the site now gets a label: human reader, AI crawler, SEO crawler, security scanner, unknown. Those labels feed a dashboard. So when the spike came, I could see what it was made of.Human readers, identifiable by dwell time and page depth: people who had seen the link, clicked, and stayed. RSS subscribers, a category I had underestimated: readers who had deliberately chosen to follow the feed, receiving the post through their reader of choice, a habit many had written off as extinct but which showed up clearly in the logs. Federation too: the post propagating through ActivityPub, reaching people on Mastodon and other decentralised platforms who had never visited the site directly but were reading through their own network. Indirect human readers, choosing deliberately, just through different pipes.Then the other kind of traffic. A vulnerability scanner from Singapore, arriving six minutes into the spike, probing paths that do not exist on Ghost, looking for webshells planted on other servers at other times. A residential proxy swarm with a 96 percent bounce rate and nine seconds average visit duration, arriving and leaving without reading anything. Five categories of arrival, all at once, all labelled.📒AI crawler vs AI user An AI crawler is an automated system that fetches your content to train a model or build a retrieval index. It reads your page so that, later, a person using an AI assistant might benefit from what you wrote. An AI user is a person interacting with a model in real time, where the model retrieves your content to answer a question. The crawler visits your site. The AI user may never come near it, but still encounters your writing.What seeing changesThe RSS readers and the federated followers were the most useful surprise. They represent deliberate readership: people who opted in, who receive content through infrastructure they control, who are in some ways a more committed audience than someone who clicked a social link and stayed for thirty seconds. They were always there. I just had not looked.The AI crawlers are also readers, of a kind. They do not render pages. They pull content and pass it on to systems that answer questions on behalf of people. Optimising for them means thinking about caching and headers and response consistency. It is a different kind of publishing, running in parallel with the one I was already doing.The scanner was useful in a different way. Once it had a name and a fingerprint, it was manageable. A firewall rule catches it before it reaches the Worker. Named things can be handled. Anonymous noise cannot.Cloudflare security rules no block certain probing attempts. It feels good, claiming sone agency back.What I would encourageYou do not need a custom Cloudflare setup to start seeing more clearly. But it is worth asking what your current analytics are hiding.RSS still works. Federation works. If you publish without offering a feed or an ActivityPub endpoint, you are not reachable by a meaningful slice of deliberate readers who prefer to control their own reading environment.Your content is already being read by AI systems, whether you have prepared for that or not. How it is cached, how consistently it is served, what headers it carries: these are not performance details. They are decisions about how your writing moves through the world.And the gap between “I have analytics” and “I understand my audience” is wider than it looks. The analytics show you what your platform was designed to measure. The actual audience is larger, stranger, and more varied than that.I thought I was optimising for speed. I ended up learning who I was writing for.Further readingMy Visitors Are Not All Human. That Is Fine.Thirty Years of Caching, Sorted in an AfternoonWhen Bots Become Readers: Publishing in the Age of AI Crawlers --- ## Thirty Years of Caching, Sorted in an Afternoon URL: https://hoeijmakers.net/thirty-years-of-caching-sorted-in-an-afternoon/ Date: 2026-05-03 I have a jar of screws on my workbench. For years, I would fish through it looking for the right size, usually not finding it.Last week I sorted them, by type, by thread, by length. I used ChatGPT to help: photographed a handful, asked what I was looking at, got the taxonomy straight. Once I could name them, I could organise them. You can only sort what you understand.HTTP caching was my jar of screws.Thirty years of fogI have been building for the web since the early nineties. Caching was always there, somewhere in the background, doing something. I knew enough to be aware of it, not enough to actually control it. Cache-Control headers, TTL values, edge behaviour, the difference between what a CDN caches and what a browser holds, what gets invalidated when and why. Every time I approached it seriously, I ran into a wall of context I did not quite have.The documentation exists. The concepts are not secret. But caching is one of those domains where the gap between understanding the vocabulary and being able to apply it correctly is surprisingly wide. I would read, nod, implement something plausible, and move on with lingering doubt.This year, working with Claude, that changed. The parallel is closer than it sounds. I had the pieces in front of me for years. What I was missing was someone to explain what I was looking at.New instrumentsWe went through the whole thing together. What my Cloudflare Workers were actually doing. What headers were being sent and why. What a browser would cache versus what the edge would cache. Where the inconsistencies were. What a coherent strategy would look like for a site like mine: a moderate personal blog with a global readership, running on Ghost, served through Cloudflare.It took an afternoon. Not because the subject got simpler, but because I had, for the first time, an instrument that could hold the full complexity with me. Ask a question, get an answer calibrated to my exact setup, follow a thread, revise, implement, check. The back-and-forth that used to require either a specialist or weeks of trial and error compressed into something manageable.The result was a caching strategy I actually understand. Headers that mean what I intend. Edge behaviour that is consistent. Rules I can read back and explain.The audience had already changedThe reason it finally felt urgent was not vanity metrics or pagespeed scores. It was a shift in who was actually reading.Human visitors are still there. But a growing share of traffic to a site like mine now comes from crawlers: search indexers, AI training pipelines, retrieval systems that serve content to agents rather than browsers. These systems do not render pages. They do not wait for JavaScript. They send a request, receive a response, and move on. For them, caching is not a convenience. It is the primary mechanism that determines cost, latency, and reliability of access.If you care about how your content moves through the world now, including through AI systems, you have to care about caching. Not as a performance optimisation for human browsers, but as infrastructure for machine readership.That reframing changed what I was optimising for. HTML cached at the edge, globally, with consistent headers and predictable expiry. Not because I expect a person in Singapore to shave 200ms off their pageload, but because the next request for that page is more likely to come from a retrieval system than a browser, and the request after that, and the one after that.Cloudflare tiered caching on top of Ghost blog.What the tools made possibleThe caching itself is not new. The concepts are decades old. What changed is that I could finally see the system clearly enough to shape it. With the right instrument, a domain that had been opaque for thirty years became workable in a single session.That is not a small thing. There are other jars of screws on the workbench.🗒️The caching strategy described here was implemented using Cloudflare Workers, Cache Rules, and D1 for request logging. A public cache-stats dashboard shows the current breakdown of traffic by type: human, AI crawler, SEO crawler, and unknown.Further readingMy Visitors Are Not All Human. That Is Fine.When Bots Become Readers: Publishing in the Age of AI CrawlersThe End of Google Search (as we know it) --- ## My Visitors Are Not All Human. That Is Fine. URL: https://hoeijmakers.net/my-visitors-are-not-all-human-that-is-fine/ Date: 2026-04-29 My site had a busy week. My analytics tool showed good numbers. Then I looked at the raw traffic logs and started counting visitors that weren't people.Not a problem. A discovery.For the past few months I've been running a custom traffic layer on this site, built on top of Cloudflare, which sits between my visitors and my server and sees everything that arrives. I wrote a small classifier that sorts incoming traffic into categories: human readers, crawlers from search engines, crawlers from AI companies, monitoring tools, and a catch-all for things I can't identify. On any given day, humans are a minority.That sentence used to feel alarming when I first framed it to myself. Now it just feels accurate. If you're curious, the live dashboard is public: hoeijmakers.net/bot-stats.The old mental modelThe web was built around a simple distinction. Humans used browsers. Bots sent raw HTTP requests, usually announced themselves honestly, and showed up cleanly in your server logs with a recognisable name.That distinction is largely gone.Modern automation often runs a real browser, the same engine that Chrome uses, and behaves accordingly: it loads the page, executes the code that runs on it, triggers the analytics beacon that tells my dashboard someone visited, scrolls, waits, and moves on. Add a residential proxy network, which routes traffic through actual household internet connections rather than servers in a datacentre, and the request arriving at my site looks indistinguishable from a person in Rotterdam reading an article over their home wifi.A visit is no longer proof of a visitor.Borrowed identitiesThe residential proxy industry routes automated traffic through pools of real consumer internet connections. Millions of them. A client pays to use them; the destination website sees a local address, an ordinary internet provider, nothing suspicious. It sees, in other words, a person.This infrastructure serves legitimate purposes: checking whether an ad actually appeared in a specific country, testing how a website looks from different locations, monitoring competitor prices. It also serves extractive ones. The point is that it is now a normal part of web traffic, running quietly beneath the surface of every publisher's analytics.🗒️Residential proxies route automated traffic through real household internet connections, making requests appear local and human. The same infrastructure is used for ad verification, AI training data collection, competitive intelligence, and sometimes fraud.What I actually seeMy traffic dashboard shows patterns rather than proof. No single signal identifies a visitor as a script rather than a person. Combinations do.Sessions that load a page and vanish, with no onward movement, no return visit, no sign of reading. Visitors arriving from a wide spread of countries with identical behaviour. Entry pages that cluster unnaturally: always the homepage, never a deep link from a real referrer. Requests arriving at suspiciously regular intervals.This peak in the analytics is a Residential Proxy Network. Some of these are browser based and fire off the Javascript beacon. Some of this traffic is fine. Search crawlers indexing my writing are doing something useful. Crawlers from AI companies reading my articles are, in some sense, a new kind of audience. I've added a small index file to the site specifically to help them navigate it more cleanly.Some of it I decided not to serve. Amazon's infrastructure was generating substantial crawl traffic with no plausible return. I blocked it. Not in outrage: as a quiet editorial decision. My content, my choice about who reads it.That act of blocking felt unexpectedly good. Not because of what it prevented, but because of what it clarified. I have a position on this. I can act on it.Guests That Should BehaveBots are modern guests. Most are welcome. But when they arrive in disguise with a real browser and spike your analytics, hospitality has limits. A Cloudflare story.Rob HoeijmakersRob HoeijmakersThe logfile returnsThere is a generation of web publishers who learned to read raw server logs before dashboards existed. They knew their traffic intimately. Dashboard analytics, when it arrived, abstracted most of that away. Traffic became a number, then a graph, then a metric.Infrastructure like Cloudflare is pulling some of that texture back. It sits in front of the server and sees things browser-based analytics cannot: how fast requests arrive, whether IP addresses rotate, whether a visitor ever fetches images or just the HTML. The signal is different from what a dashboard shows.I am not claiming logs are more truthful than dashboards. I am noting that they are different, and that the difference is currently interesting.What audience means nowThe web is no longer a medium where humans visit websites. It is a medium where humans, search crawlers, AI crawlers, monitoring scripts, and commercial intelligence systems all arrive through the same door. Some are reading. Some are indexing. Some are training models on what they find. Some are checking whether an ad appeared. Some I cannot classify at all.My analytics dashboard shows me one version of this. My traffic logs show me another. The honest position is that both are partial.What I have, with a little tooling and some curiosity, is a better picture than I had six months ago. And the ability to make choices about it.That is enough.Further readingFlying Blind: Measuring Traffic When Your Readers Are MachinesWhen Bots Become Readers: Publishing in the Age of AI Crawlers --- ## When the Platform Becomes the Policy URL: https://hoeijmakers.net/when-the-platform-becomes-the-policy/ Date: 2026-04-26 The Dutch state secretary for Digital Economy and Sovereignty now communicates with a small group of civil servants via a European messaging app. Which one? Her spokesperson won't say. The initiative is, as they put it, "still in its infancy." But the announcement itself is what matters. The Dutch government has concluded that its dependency on WhatsApp and Signal is a risk worth managing.That is late, in European terms. France has been running Tchap for government use since 2019. The Bundeswehr has had its own messaging infrastructure for years. Belgium launched Beam for 750,000 civil servants and military personnel last month. But the direction is now consistent across EU member states, and the reasons have shifted.Beyond PrivacyThis is no longer primarily about privacy ideology. It is about geopolitics. With the US increasingly willing to use technology as a lever of power, the question "can we access our own data tomorrow?" has moved from paranoid to prudent. A January survey by SWG found that 59% of Europeans already consider a US block on digital services a realistic scenario; 86% consider it at least plausible. Governments are drawing the obvious conclusions.What is less obvious is that the same logic applies to businesses, just more quietly.The AI Access DisputeWhile governments look for messaging alternatives, the European Commission is fighting a separate but related battle. It has now sent Meta a second "charge sheet" over the fees Meta wants to impose on external AI providers, including OpenAI and Anthropic, for access to the WhatsApp Business platform.The sequence is telling. Meta first banned third-party AI providers outright. When the EU challenged that, Meta introduced a pricing model: 5 to 13 cents per message for external AI chatbots. Meta AI, its own assistant, pays nothing. The Commission's position is that this is the same outcome in different legal clothing: a structure designed to foreclose competition in favour of Meta's own AI product.For most businesses running a WhatsApp chatbot today, the direct impact is limited. What changes if Meta prevails is the competitive landscape for AI assistants at scale. If a platform can effectively exclude rival AI providers, the choice of assistant is no longer yours to make.⚖️The EC charge sheet covers Meta's terms for external AI providers on WhatsApp Business, not the costs businesses already pay via BSPs like 360 Dialog or Twilio. Those conversation fees are a separate layer. The regulatory dispute is upstream, about who gets platform access at all.Platform Dependency as RiskWhatsApp reaches over 85% of the Dutch population. For businesses, that makes it close to mandatory infrastructure. The problem with mandatory infrastructure is that you stop thinking about it as a choice.A pricing change, a policy update, an account suspension: these arrive without negotiation. What the Dutch government is framing as a sovereignty question is, for businesses, a more familiar concept. It is supply chain risk, applied to communication infrastructure. When you build your entire customer contact layer on a single platform, you have created a single point of failure you did not consciously design.The government pilot is small and tentative. But it reflects something that has been accumulating for a while: the recognition that neutral infrastructure is a fiction. WhatsApp is a platform with commercial interests, regulatory entanglements, geopolitical exposure, and an AI strategy that is now under formal legal challenge. Those are not abstract conditions. They are conditions that shape what you can build on top of it.Further readingWhy WhatsApp Channels Trigger the Digital Services ActThe EuroStack explainedFrom Shopping Cart to Server Rack: How Lidl Builds a European Cloud --- ## GPT Image 2 Changed My Mind on AI Visuals URL: https://hoeijmakers.net/gpt-image-2-changed-my-mind-on-ai-visuals/ Date: 2026-04-25 I had, without quite deciding to, written ChatGPT off for images.The earlier output felt like a tool optimised for fantasy illustrations, product mockups, and memes. Useful for some things. Not for the way I work. My interest in visualisation has always been functional: diagrams, schematics, process maps. Images that explain something rather than represent something. For that category, AI image generation had consistently disappointed.Then last week I needed to understand how a washing machine actually works internally. The inner drum, the outer tub, the bearings, the door seal, where the shaft exits the housing. Text descriptions get you some of the way. A diagram would get you further. So I asked ChatGPT Images 2.0, running on the new GPT Image 2 model, to generate one.What came back made me revise my assessment.What arrivedThe output was not a pretty picture of a washing machine. It was a coherent technical infographic: multiple views of the same machine, labelled components, readable typography, front and rear logic, the drive belt and motor in correct relation to each other. It carried layout hierarchy. It had editorial polish. It looked less like image generation and more like a service manual that had not yet been published.The surprise was specific. A washing machine is a good test precisely because it has no aesthetics to fall back on. The image is either mechanically correct and useful, or it is not. But the deeper surprise was repeatability. This was not a lucky render. The model had reasoned about what a washing machine actually is before generating anything, and that showed in every label, every cross-section, every spatial relationship. You could ask again and get something coherent. That is new.Why this category mattersThe images I care about are not decorative. Every organisation explains something: how a process works, how a system fails, what happens in a given sequence. For years, turning that complexity into a clear visual required time, a specialist, and budget. The bottleneck was not the knowledge, it was the production.GPT Image 2 is built around the idea that the model thinks before it draws, reasoning about what an explanation actually requires before generating anything. Text inside images, historically the biggest weakness of these models, has improved substantially: labels, signage, UI elements, correct spelling and consistent spacing.The result is a model that does not just render, it plans. And because it understands the subject before it draws, the output is repeatable: ask the same question twice and you get something coherent both times, not two unrelated lucky guesses. That is the shift. Not better art. Better explanation, on demand.​​​​​​​​​​​​​​​​💡GPT Image 2 launched on April 21, 2026. It introduced thinking mode, web search during generation, and the ability to produce up to eight consistent images from a single prompt. Available to all ChatGPT users; extended outputs require a paid plan.The category I had dismissedMany people still assess AI image tools by the use cases that dominated the first wave: avatars, memes, synthetic photography, visual gimmicks. That assessment is not wrong, it is just stale.The more consequential category is knowledge visuals: images that help a professional understand something faster than prose alone. Technical diagrams, process flows, onboarding schematics, infrastructure maps. The kind of thing that organisations need constantly and produce slowly.If that gap is now measurably shorter, the operational implication is significant. Not because designers are replaced, but because explanation becomes cheap enough to happen more often.I only wanted to know how my washing machine works. The diagram that came back raised a different question: what else is now explainable in minutes that previously required a budget line?Further readingFrom text to diagrams: working with MermaidFrom Ideas to Precision: Why I Use Mermaid with LLMs --- ## Dressing up an AI Model in a Harness URL: https://hoeijmakers.net/harness-ai-model/ Date: 2026-04-24 A friend who is an engineer used the word harness. We were talking about the infrastructure around a model: the tools it can call, the memory it can access, the context it receives. The harness, he said, is what makes the model function as an agent.The word landed wrong.RestraintA harness restrains. It channels force that would otherwise go somewhere you don't want it. The equestrian harness, the safety harness, the climbing harness: all of them are about control, about keeping something contained and directed. There is an adversarial assumption built into the metaphor, a relationship between the wearer and the harness that is fundamentally one of managed risk.That is not what the infrastructure around an AI model does. It equips. It gives the model access to tools, continuity across sessions, a defined role within a specific context. The harness, in this sense, is less like a restraint and more like a workbench: the thing that makes work possible. You would not say a surgeon's instruments harness the surgeon.⚔️In AI development, a harness refers to the surrounding infrastructure that makes a model operational in a specific context: the tools it can call, the memory and session state it can access, the APIs it connects to, and the logic that governs how it responds. The harness is what turns a raw model into a working agent.Standard LanguageEthan Mollick uses the same term in his newsletter, framing AI as three interlinked concepts: models, apps, and harnesses. His harnesses are the tools an AI can use and how the model is hooked up to them. The definition is functional and clear, and it reflects how the term is already settling into common use among practitioners.Which makes it worth pausing on. The words that become standard in a new field carry assumptions forward. "Harness" inherits from a tradition where powerful things need to be held. "Infrastructure" is more neutral. "Toolkit" points in a different direction again. None of them are wrong exactly, but they weight the thinking differently.Sign of the future: GPT-5.5One impressive step on the curveOne Useful ThingEthan MollickThe Wrong QuestionThe distinction matters because the word shapes the thinking. If the surrounding infrastructure is a harness, the model is something to be tamed, a force that needs to be channelled before it can be trusted. That framing puts the engineering problem in the wrong place. The question becomes: how do we constrain it? When the more interesting question is: how do we equip it?My friend's use of the word was entirely competent. He knows what the infrastructure does. But the equestrian image kept surfacing: the horse, strong and fast, made useful by what is strapped around it. It is a working metaphor, in the sense that it functions. It just describes a different relationship than the one I want to be building toward.The model is not a horse. The infrastructure is not a harness. We are still looking for the right words.Further readingCan Today’s AI Agents Survive Their Own Runtime?When AI Moves Into Your Working EnvironmentWord of the year: modelWhat coding with AI feels like now --- ## Built in, not bolted on URL: https://hoeijmakers.net/built-in-not-bolted-on/ Date: 2026-04-24 A chatbot surfaces above an old document repository. A copilot attaches itself to fragmented data. A smart search layer sits on top of years of accumulated disorder. The AI works, up to a point. Then it runs into the same wall every time: the information underneath it.The problem underneathWhen people talk about AI disappointing them, they usually mean the model. The output is generic, the answers are wrong, the assistant keeps missing context. But the model is often not the issue. What it has to work with is.Most organisational knowledge still lives in SharePoint folders with inconsistent naming, email threads where decisions disappear, documents that exist in four versions with no clear indication of which is current. Add a copilot to that environment and you get a faster, more articulate reflection of the same confusion. The AI amplifies what is already there. If the structure is weak, the results are weak.Architecture firstBuilt-in AI assumes that information has structure, that documents carry provenance, that workflows are legible enough for a system to participate in them. Not a large IT replacement project. A different starting point.Knowledge work organisations that are getting real results from AI tend to share one thing: they treated the information architecture as part of the AI investment, not a precondition someone else would handle. That means version control that is actually used, consistent tagging, clear ownership of documents, workflows that do not rely on institutional memory to navigate.A well-structured knowledge base is useful long before any AI touches it. The AI just makes the gap visible faster.The compliance accelerantIn regulated sectors, the pressure is sharper. When an adviser needs to reconstruct a recommendation from three years ago, or a regulator asks which version of a policy was in force on a given date, the question is not whether the AI can answer it. The question is whether the information environment can support the answer at all.AI governance and information governance are converging. Organisations that have separated the two, running AI initiatives on top of legacy information infrastructure, will encounter this collision eventually.💡WiseWare structures organisational knowledge into governed memory objects: decisions, policies, commitments, evidence. Source-backed, human-reviewed, auditable. Built in Amsterdam. wiseware.nlThe question that shiftsFor a while, the distinguishing question was "do you have AI?" Now it is closer to "is your organisation built to work with it?" A copilot on top of a well-structured environment is a genuine productivity multiplier. The same copilot on top of a fragmented one is a better interface to the same old problem.The architecture was always the work. AI just made the cost of skipping it more visible.Further readingWhen Git Grows UpWhen AI Moves Into Your Working EnvironmentThe Workspace You Already HaveAI literacy: from definition to practice --- ## The Workspace You Already Have URL: https://hoeijmakers.net/the-workspace-you-already-have/ Date: 2026-04-24 My daily setup involves three AI systems, a shared drive, and a folder structure that only makes sense to me. It works. It also wouldn't survive a compliance audit, a colleague handoff, or a question about where the data goes.The Informal LayerMost knowledge workers operating with AI have built something like this without naming it. Conversations in Claude for research, Gemini inside Google Workspace for documents, ChatGPT for a second opinion. Files land in Drive or on a local disk. Context lives in chat history, or in memory, or in neither. The result is a personal productivity layer that is genuinely powerful and entirely ungoverned.The document management industry has spent decades building the governed version: structured repositories, metadata, audit trails, version control, retention policies. SharePoint. OpenText. M-Files. The category is called DMS, document management system, and the market is around ten billion euros and growing fast. What it has never managed to build is the experience of actually thinking with your documents, of asking a question and getting a useful answer rather than a search result.Two DirectionsAI is now arriving from both sides. The DMS vendors are bolting intelligence onto their repositories: auto-classification, semantic search, generative summaries. The AI vendors are adding structure to their chat interfaces: folders, projects, persistent memory, file search. Both are converging on the same space, the AI-native knowledge workspace, from opposite ends.The difference is where they start. Legacy DMS starts from governance and adds intelligence. AI-native tools start from intelligence and add governance. That ordering matters more than it sounds. A system designed around compliance and filing tends to feel like compliance and filing, even after AI is layered on. A system designed around thinking and conversation tends to feel like that, even after folders and access controls are added.The European GapThere is a third variable that rarely features in the US-centric DMS conversation: where the data lives and who controls the models. For European practitioners and organisations, working with AI means trusting that client conversations, internal documents, and strategic thinking are not being used to train someone else's model in another jurisdiction. The major platforms offer reassurances, but the architecture is American, the models are American, and the terms of service reflect that.🇪🇺The EU AI Act and GDPR together create real obligations around data residency and model governance. For many European organisations, "where is my data" is not a compliance question. It is a strategic one.This is where the category gets interesting for smaller European practices. The improvised setup, Claude plus Gemini plus Drive, is functional but not sovereign. An EU-hosted workspace that supports multiple models, keeps data in Europe, and organises knowledge the way teams actually work is a different proposition. Not a DMS in the legacy sense, and not just a chat interface. Something in between, and perhaps more useful than either.🇳🇱WiseWare.chat is one of the first products positioning explicitly in this space. Built in Amsterdam, EU-hosted, the product structures company knowledge into governed memory objects: decisions, policies, commitments, evidence. Source-backed, human-reviewed, auditable. Early product, coherent direction. One to watch.What that category ends up being called is still open. The tools are arriving before the language has.Further readingBuilt in, not bolted onWhen Git Grows Up --- ## When Git Grows Up URL: https://hoeijmakers.net/when-git-grows-up/ Date: 2026-04-24 The problem with git was never git. It was the assumption that everyone working with content could think like a developer.Version control is a sound idea for knowledge. Track changes, branch, merge, roll back. For organisations managing legal, financial, or medical content at scale, that discipline is exactly what is needed. The tooling just never matched the people. Authors are not engineers. They do not want to commit, push, or resolve merge conflicts. They want to write, review, and publish with confidence that nothing disappears.The interface problemWhat changes when you add agents to a git-based knowledge system is the interface between the system and the person using it. Branching becomes a guided act rather than a technical one. An agent can explain what a branch means in the context of a document review, prompt the right action, and flag when something looks inconsistent with the rest of the knowledge base.Editorial standards are where this becomes interesting. In most content organisations, style guides and governance rules live in a wiki page or shared drive folder that someone remembers to check. Complying with them is a personal responsibility, which means compliance is uneven. When an agent carries those standards into the editing environment itself, the guidance becomes active. It surfaces at the moment it is relevant, not when someone remembers to look it up.Governance as the use caseThis matters more in some sectors than others. For organisations where content carries legal weight, where a document version can be cited in a court or audit, the distance between "we have a process" and "the process is enforced" is significant. Version control closes part of that gap. Active editorial guidance closes more of it.The clients who need this are not looking for a productivity tool. They are managing risk. The question they are asking is not whether their teams can write faster, but whether the organisation can demonstrate that what was published, when, and under what review, is traceable and defensible.European footingFor organisations in regulated European industries, where content is held, and by whom, is not a neutral question. A knowledge platform with European hosting and ISO 27001 certification in progress is not a marketing differentiator. It is a procurement requirement waiting to be formalised.The certification path matters here in a specific way. For public sector and governance-heavy clients, ISO 27001 is the entry point to a longer chain: certification enables insurance, insurance enables enterprise contracts, enterprise contracts carry SLA penalties that in turn require the certification to hold. Once you commit to this path, you commit to maintaining it. The liability is real and personal. That is precisely why most smaller technology companies avoid it, and why those that do pursue it can credibly operate at a different scale.The combination of git-based versioning, agent-assisted authoring, and European infrastructure with serious certification underway is a specific answer to a specific problem. It is not the most obvious stack. But for the organisations that need it, it fits in a way that general-purpose tools do not.💡WiseWare is a platform for AI-assisted knowledge management at scale, with European hosting and ISO 27001 certification in progress. Schmuki resells and co-delivers WiseWare for governance-heavy organisations in legal, financial, and medical sectors.Further readingBuilt in, not bolted onThe Workspace You Already Have --- ## Delegating Past Your Own Ceiling URL: https://hoeijmakers.net/delegating-past-your-own-ceiling/ Date: 2026-04-19 This one goes a level deeper than usual. If you run a self-hosted blog or manage your own web infrastructure, stay with it. The payoff is real.Cloudflare, the network and security layer that sits in front of most of this blog, has been part of my setup since the beginning. DNS, CDN, basic security. The layer in front of Ghost, my publishing platform, that I configured once and mostly left alone. Not because there was nothing more to do, but because the gap between what Cloudflare can do and what I could confidently operate was wide enough to leave alone.That gap closed recently. Not because Cloudflare got simpler, but because I stopped being the one operating it.The ceilingEvery tool has a capability ceiling for any given user. For most practitioners running a Ghost site, Cloudflare's ceiling sits somewhere around DNS and caching. The dashboard is capable but not intuitive. Workers, R2, analytics at the edge: these are real features with real value, but they require a mental model that takes time to build and maintain. Most people stay at the surface. I did too.What Claude Code changed is where that ceiling sits. I can now ask it to read my zone configuration, deploy a Worker, or update a content file, and it does. Not because I learned Cloudflare more deeply, but because I delegated past the point where I would normally stop.What delegation made possibleI asked Claude Code to generate an llms-full.txt from my RSS feed, a complete text inventory for AI systems. It did. Then I asked whether it could produce a Markdown version of every page, served directly from the network layer without touching Ghost. It could. It wrote a Cloudflare Worker, a small script that runs at the network edge and intercepts requests before they reach the server.That Worker intercepts any request ending in .md, fetches the page content, converts it to Markdown, and returns that instead. Ghost never sees the request. Each page now also carries a rel="alternate" link in its HTML head, so machines that know to look will find the alternate format automatically.Example >> https://hoeijmakers.net/delegating-past-your-own-ceiling.mdThe Worker code was readable. I could follow it. I would not have written it myself, and I would not have attempted the feature without Claude Code as operator. That's the point: the outcome wasn't just the same task with less friction. It was a capability I didn't have before, sitting just above the ceiling I previously accepted.Ghost and Cloudflare, divided cleanlyGhost owns the content and the publishing workflow. Cloudflare owns the delivery layer and everything Ghost doesn't cover. Claude Code connects them. None of these systems needed to know about the others. The Worker intercepts requests and responds differently based on the URL. Ghost keeps doing what it does.This division feels stable. It also suggests something worth sitting with: the limiting factor in most small-scale web infrastructure isn't the tools. The tools are capable. The limiting factor is the operator's ceiling, the point where complexity exceeds confidence and you stop reaching for what's there.Delegation shifts that line. Not permanently, not universally, but enough to matter.Further readingGuests That Should BehaveWhat llms.txt can do for your websiteWhen Bots Become Readers: Publishing in the Age of AI Crawlers --- ## Guests That Should Behave URL: https://hoeijmakers.net/guests-that-should-behave/ Date: 2026-04-19 The traffic spikes in Plausible (Web analytics) made no sense. Peak after peak, no referral source, no pattern I recognised. Bots, clearly, but the kind that arrive carrying a real browser, behaving like a human long enough to slip past lightweight analytics. Not a security incident. More like guests who don't knock.That framing stuck with me as I worked through the fix. Bots are guests. Most of them are welcome. The question is which ones, and on what terms.Welcome and unwantedThe web has always had crawlers. Search engines, archivers, feed readers: automated visitors that make the open web function. I have no objection to those. What changed over the past year or two is volume and intent. By mid-2025, crawling for AI model training accounted for nearly 80% of all AI bot activity on Cloudflare's network. Many of those crawlers identify themselves honestly. Some don't, cycling through residential IP addresses and real browsers to blend in.The ones showing up in my Plausible dashboard were the second kind. Uninvited guests who arrive in disguise, consume resources, and distort the picture. Cloudflare's bot protection handles exactly this: using signature matching, heuristics, machine learning, and behavioural analysis to distinguish the two, while maintaining a Verified Bot directory for legitimate crawlers like Googlebot that are allowed through. I enabled the AI scrapers toggle. The spikes stopped.The Workers detourGetting this working on a Ghost site took a step I hadn't anticipated. Cloudflare is not a simple product for an individual publisher. There are layers, and the terminology assumes familiarity with infrastructure concepts most bloggers don't carry day to day. I worked through the configuration with Claude, which made the complexity navigable. A few exchanges, some trial and error, and it was done.The interesting byproduct came when Cloudflare suggested tightening up robots.txt. That small nudge opened a second door. I'd been serving llms.txt as a Ghost page, which worked, but felt like a workaround. With a Cloudflare Worker already in place, I wrote a proper script to serve it as a real txt file and referenced it from robots.txt. A hack became infrastructure.📎Llms.txt is an emerging convention for telling AI systems what to pay attention to on a site. Where robots.txt sets access rules, llms.txt provides semantic guidance: what's here, what matters, how to use it. Think of it as a house guide left out for guests who will never ring the bell.Hospitality, restatedThe distinction that matters is not bots versus humans. It's guests who identify themselves and follow the rules versus those who arrive in disguise and behave badly. The first kind are welcome. The second kind get turned away at the edge, before they reach the door.llms.txt is the other side of that same logic. For the guests who do behave, who are reading the site to retrieve, to understand, to cite: here is what I've made, here is how to find it, here is what it's for. Blocking the bad actors and signalling clearly to the good ones. Two tools, one underlying idea.The web's implicit contract, that content is public and humans read it, is being rewritten. I'm fine with that. I just want the new guests to knock first.Further readingWhat llms.txt can do for your websiteThe End of Google Search (as we know it)When Bots Become Readers: Publishing in the Age of AI Crawlers --- ## Bildung and the Function Problem URL: https://hoeijmakers.net/bildung-and-the-function-problem/ Date: 2026-04-17 Wilhelm von Humboldt is one of those figures I knew only by association: the brother, the geographical namesake, the name on the university. The actual ideas, I had missed.A piece from the Cosmos Institute put Humboldt's concept of Bildung in front of me. The argument is simple and uncomfortable: education shaped by Humboldt was about forming a whole person, not training a function. The modern professional world, by contrast, has spent a century doing the opposite. We have been very good at turning people into functions.The Bildung ideaBildung is hard to translate, which is part of why it stayed in German. It means something like self-cultivation through engagement with the world: art, science, language, culture, other people. German has a second word for the other model: Ausbildung, vocational training, formation toward a function. The prefix aus signals completion, extraction, being shaped into a finished product. One word for becoming a person. One word for becoming a role.The goal of Bildung is not competence in a domain. The goal is a richer, more capacious person. Humboldt believed the university existed to develop that person, not to fill a role.This is not naive idealism. Humboldt was a state official. He designed the University of Berlin in 1810 with this philosophy embedded in the structure: research and teaching together, students as active participants, knowledge pursued for its own sake. The model spread across Europe and shaped what we call research universities.The function trapWhat the Cosmos Institute piece points at, and what I find genuinely interesting, is that AI is now exposing a tension that was always there. When knowledge work was scarce and complex, being a good function was valuable. You knew the law, the process, the domain. That knowledge was your identity, your leverage, your job security.AI is eroding the scarcity. Not all of it, and not evenly, but the trend is clear enough that the question becomes: what remains when the function is cheaper to automate than to hire?The Humboldt answer, if taken seriously, is that what remains is the person. The capacity to make judgements, to hold complexity, to bring a formed perspective to an ambiguous situation. None of those come from function-training. They come from exactly the kind of broad, messy, integrative engagement that Bildung describes and modern professional education systematically devalues.What I do not knowI want to be honest about the limits of what I am saying here. I have not read Humboldt, I have encountered the idea. I am not a philosopher of education or a historian of the university. What I can say is that the frame feels generative: it names something I have been sensing without having the word for it.The risk of reaching for Humboldt now is that it becomes a nostalgic move, a way of dressing up the resistance to change in the language of classical education. The universities Humboldt influenced are not obviously doing Bildung today. They are credentialing machines with research departments attached.The more interesting question is whether the AI transition creates conditions where something like Bildung becomes practically useful again, not as philosophy, but as a competitive model for how people and organisations develop. Whether that is inspiration or a warning, I genuinely do not know.📎The Cosmos Institute piece that triggered this: "You Are Not a Function"Further readingVision, Judgement, Creativity: Reclaiming Agency in the Age of AIThe Thermostat Effect: Why AI Feels Distant but Works Deep