👋 Hey {{first_name|there}},
Your APIs are about to get hammered by callers that don't read docs, don't respect backoff hints, and never close the tab. Here's how to keep your system standing without locking out your real users.
Why this matters / where it hurts
A few months ago, a team I know noticed something weird in their traffic dashboards. Request volume to the catalog API had jumped about 40% over two weeks. No new feature launch, no marketing push, nothing seasonal. Just this steady, almost mechanical climb in authenticated calls to search and detail endpoints. And it was spread evenly across the day. No peaks at lunchtime, no dips overnight. Just... flat and relentless.
Turns out it was an AI agent. A partner had quietly shipped an LLM-powered assistant that pulled product data to answer customer questions. The agent had valid API keys, so it was technically authorized. But the thing was calling search in tight loops, firing off dozens of near-identical queries within seconds because the LLM's reasoning chain kept re-fetching data it had already seen two steps earlier. The rate limiter, built for human browsing patterns, didn't even blink. By the time the team actually caught it, the catalog database was running hot enough to drag down checkout latency for real customers.
I'm not pulling this from a conference talk or a thought-leadership blog. This is the shape of traffic that's heading your way, maybe already there. AI agents, built by your org, by partners, or by someone scraping your public endpoints, behave nothing like human users. They don't stop to read. They don't get tired at 2 a.m. They retry aggressively because the orchestration framework said to. And most of your current API design, rate limiting, and monitoring? It assumes the thing on the other end is a person clicking buttons in a browser. We covered in Lesson #37 how cascading retries from your own services can take a system down. Now picture that same retry pressure, except it's coming from code you didn't write and can't patch.
🧭 The shift
From: "Our API consumers are applications built by developers who read our docs and follow our conventions."
To: "Our consumers increasingly include autonomous agents that discover behavior through trial and error, at machine speed, with no human watching."
This changes what "well-behaved client" even means. When a human developer integrates your API, they've probably read the rate limit section at least once. They handle 429s, they build in some backoff, maybe because they got burned on a previous project. That's the normal case.
An AI agent's integration code might have been generated by an LLM that saw something vaguely similar in its training data. The agent doesn't understand your domain model. It understands tokens in and tokens out. If the orchestration logic decides the previous response was incomplete, it'll call the same endpoint again. And again. Seventeen times if that's what the loop produces. Return a generic 500? It retries harder, because that's what every retry tutorial on the internet taught the LLM to suggest.
The answer isn't to block agents outright. For most public-facing systems, that ship sailed a while ago. It's to design your APIs so they degrade gracefully under non-human traffic while staying responsive for the humans.
Treat non-human identity as a first-class concept in your auth layer. Not a tag someone adds to a spreadsheet after the incident.
Build rate limiting around behavioral signatures, not just raw request counts. Thirty calls to the same endpoint in ten seconds with slight parameter tweaks looks nothing like a user browsing, and your limiter should know that.
Make error responses machine-legible. If an agent can't extract a retry-after value from your 429, it's going to guess. It'll guess wrong.
🚀 Want the full architecture roadmap?
If you found this useful and you're not subscribed yet, I built something that might be worth your time. It's a free 5-day email crash course designed specifically for developers moving into architecture roles. One lesson per day, short enough to read over coffee, practical enough to apply the same week.
It covers the foundational shifts that most developers don't get taught: how to think in tradeoffs instead of "best practices," how to communicate technical decisions to non-technical stakeholders, and how to spot the architectural problems that don't show up until production traffic hits. Basically, the stuff I wish someone had walked me through when I made that transition myself.
No fluff, no upsell at the end. Just five days of focused, experience-based lessons.