👋 Hey {{first_name|there}},
Why this matters / where it hurts
You add one field to an event and nothing crashes. At first. A week later a consumer’s dashboard is blank and no one knows why. The schema drift was “small” so it did not get a plan. Incidents at odd hours follow.
I have shipped changes like that. It felt harmless, maybe even obvious. Then a silent cast error or a missing default turned into a support ticket. We fixed the code and moved on. Until the next time.
This lesson is the calm version. Not perfect. Practical. A data contract writes down what an event is, who owns it, which changes are allowed, and how consumers validate at the edge. It turns schema changes into a routine you can rehearse.
🧭 Mindset shift
From: “Producers publish JSON and consumers cope”
To: “Events have contracts, and evolution follows a small set of rules”
Why it matters
Events live longer than their authors intended. Without a contract, every consumer re-discovers the shape and invents their own fixes. With a contract, you get two simple habits that prevent most drift.
Two rules to start
Producers own the schema and publish a validation policy that runs at the edge
Changes follow expand first and contract later with a deprecation date
🧰 Tool of the week: Data Contract Canvas
Keep this one page next to each event. Fill it once. Update it when you evolve the schema.
Event name and owner
Canonical name, domain, and a person or team on the hook.Purpose and consumers
What decision this event enables, which services read it, and any external sinks.Schema snapshot
Fields, types, required vs optional. Include examples. Keep it short and real.Identity and ordering
Primary key, idempotency key, and ordering strategy. Partition key if you have one.Allowed changes
Additive only, defaulted fields, enum additions. List breaking changes you will not do.Validation gates
Publish JSON Schema or Protobuf with version. Enforce at producer tests and at consumer ingress. Decide fail closed or fail open for each consumer.Compatibility policy
Expand now. Announce. Hold two versions. Contract after the deprecation date.Backfill and replays
Plan for historical parity if the new field matters. How to backfill. How to replay safely.Observability
Dashboards for volume, schema errors by field, dedupe hits, and poison queue counts. Add one alert for schema mismatch spikes.Rollback and quarantine
How to stop emitting the new field, how to route bad messages to quarantine. Include the command and the owner.Cadence
Review this canvas quarterly. Archive old versions with dates.
🔍 Example: “PriceChanged” event
Scope
Catalog publishes PriceChanged when merch updates base price or currency.
Context
Producers write to Kafka. Consumers include checkout, promo engine, and search indexer.
Step-by-step using the canvas
Owner: Catalog team, DRI Ana
Schema v1:
product_idstring,pricenumber,currencystringIdentity:
event_idUUID, idempotency key isproduct_id + updated_atAllowed changes: additive fields with defaults, enum additions for currency
Validation gates: JSON Schema v1 enforced in producer tests, consumer side gate drops non-conforming messages to a quarantine topic
Compatibility: add
list_priceas optional with default null, announce deprecation ofpricein favor ofeffective_pricein v2 laterBackfill: write a one-time job to populate
list_pricefrom current catalog for hot SKUsObservability: panel shows events per minute, schema errors, quarantine depth, dedupe hits
Rollback: feature flag on producer to stop emitting
list_price, consumer reads continueCadence: review in 90 days, remove
priceonly after all consumers confirm v2
What success looks like
Checkout and search continue to work. Quarantine remains empty. Dedupe hits are steady. After the overlap period, all consumers read v2 without code hotfixes.
Small confession
If a long-tail consumer is late to upgrade, I sometimes extend the overlap window. It is a tradeoff. I write it down and set a new date.
✅ Do this / avoid this
Do
Publish a schema with examples and a clear owner
Validate at producer tests and at consumer ingress
Evolve with expand, announce, contract on a date
Default new fields so old consumers do not crash
Track schema errors and quarantine depth on a dashboard
Avoid
Sneaking breaking type changes such as number to string
Renaming fields without an overlap plan
Free-form payloads that vary by feature flag
Silent drops without quarantine or metrics
One-off consumer patches that bypass the contract
🧪 Mini challenge
Goal: give one critical event a real contract today.
Pick one event that multiple services consume
Fill the Data Contract Canvas in 15 minutes
Generate a JSON Schema or Protobuf from the snapshot
Add a consumer-side gate that logs and quarantines mismatches
Add one dashboard tile for schema errors and one for quarantine depth
Emit a test message with a missing optional field and confirm the gate behavior
Reply with one sentence on what surprised you.
🎯 Action step for this week
Inventory the top five events by impact and list owners
Add a contract page and schema file to each event’s repo or docs
Wire producer tests to validate against the schema
Deploy ingress validation for two high-value consumers
Define expand and contract dates for the next planned change
Review dashboards with product and support so everyone sees the same picture
By end of the week, aim to have two events validated at both producer and consumer edges with dashboards live.
👋 Wrapping up
Producers own the shape. Consumers validate at the edge.
Evolve with expand first, contract later, on a date.
Default new fields. Quarantine the outliers.
Verify with dashboards, not vibes.
If you liked this, you will probably enjoy my free 5-day email course, “From Developer to Architect.”
Five short lessons on mindset, tradeoffs, and communication you can use at work this week.
https://www.techarchitectinsights.com/c/from-dev-to-architect-5-day-email-crash-course
I would love your input. What tool around event evolution would help you most right now
Hit reply and tell me in one sentence.
Thanks for reading.
See you next week,
Bogdan Colța
Tech Architect Insights