This website uses cookies

Read our Privacy policy and Terms of use for more information.

👋 Hey {{first_name|there}},

We shipped event payloads that were too thin, fixed it by making them too fat, and learned the hard way that what goes in the event is a coupling decision, not a producer convenience. Here's the decision sheet I wish I'd had.

Why this matters / where it hurts

Design review, late afternoon, the kind where everyone has one foot out the door. Someone said "just put everything in the event, it's easier." I remember thinking there was something off about that, and I remember not saying anything, because the meeting was already running long and I wanted to leave. So we shipped it.

A few weeks later the platform team pinged our channel. Storage costs on one topic had grown by something they politely called "noticeable." Most of what was sitting in those partitions was duplicate user data nobody on the consumer side actually read. Then legal asked, separately, why personal data was showing up in every consumer's log retention. That conversation went how you'd expect.

The annoying part: we'd over-corrected to get there. The original design had been the opposite extreme, a thin notification event that forced every consumer to call back to the source service for details. That had been melting the user service under fan-out queries. So we'd swung the other way without really thinking about the middle.

In Lesson #35 we covered publishing events reliably with the outbox pattern. That matters. What you publish matters more, and it's the decision most teams skip past because the payload shape feels obvious in the moment. It isn't. It connects directly to Lesson #30 on data contracts, which is about evolving payloads safely once they're in the wild and you can't take them back.

🧭 The shift

From: Put everything in the event so consumers don't have to ask twice.
To: Include what the consumer needs for its next decision. Let it fetch the rest if it cares.

The mistake is treating payload shape as a producer-side convenience. It isn't. Consumers pay for it in compute and coupling, and your ops budget pays for it in storage and throughput. Producer convenience is the smallest of those costs, and somehow it's almost always the one that wins the meeting.

Thin events shove load back onto the source service through callback queries. Sometimes that's fine. Sometimes it's a slow self-inflicted DDoS. Fat events go the other direction and push cost into storage, log retention, and schema evolution pain. Different failure mode, equally annoying once it shows up on a dashboard.

A few defaults I now hold firmly:

  • Stable identifiers (userId, orderId, tenantId) go in every event. No exceptions worth arguing about.

  • Mutable state goes in only when you've actually measured that consumers use it. Guessing here is how bills run away from you.

  • Schema versions explicit, mismatches fail loud. The alternative is silent rot for six months.

Subscribe to keep reading

This content is free, but you must be subscribed to Tech Architect Insights to continue reading.

Already a subscriber?Sign in.Not now

Keep Reading