👋 Hey {{first_name|there}},

Why this matters / where it hurts

You add one field to an event and nothing crashes. At first. A week later a consumer’s dashboard is blank and no one knows why. The schema drift was “small” so it did not get a plan. Incidents at odd hours follow.

I have shipped changes like that. It felt harmless, maybe even obvious. Then a silent cast error or a missing default turned into a support ticket. We fixed the code and moved on. Until the next time.

This lesson is the calm version. Not perfect. Practical. A data contract writes down what an event is, who owns it, which changes are allowed, and how consumers validate at the edge. It turns schema changes into a routine you can rehearse.

🧭 Mindset shift

From: “Producers publish JSON and consumers cope”
To: “Events have contracts, and evolution follows a small set of rules”

Why it matters
Events live longer than their authors intended. Without a contract, every consumer re-discovers the shape and invents their own fixes. With a contract, you get two simple habits that prevent most drift.

Two rules to start

  • Producers own the schema and publish a validation policy that runs at the edge

  • Changes follow expand first and contract later with a deprecation date

🧰 Tool of the week: Data Contract Canvas

Keep this one page next to each event. Fill it once. Update it when you evolve the schema.

  1. Event name and owner
    Canonical name, domain, and a person or team on the hook.

  2. Purpose and consumers
    What decision this event enables, which services read it, and any external sinks.

  3. Schema snapshot
    Fields, types, required vs optional. Include examples. Keep it short and real.

  4. Identity and ordering
    Primary key, idempotency key, and ordering strategy. Partition key if you have one.

  5. Allowed changes
    Additive only, defaulted fields, enum additions. List breaking changes you will not do.

  6. Validation gates
    Publish JSON Schema or Protobuf with version. Enforce at producer tests and at consumer ingress. Decide fail closed or fail open for each consumer.

  7. Compatibility policy
    Expand now. Announce. Hold two versions. Contract after the deprecation date.

  8. Backfill and replays
    Plan for historical parity if the new field matters. How to backfill. How to replay safely.

  9. Observability
    Dashboards for volume, schema errors by field, dedupe hits, and poison queue counts. Add one alert for schema mismatch spikes.

  10. Rollback and quarantine
    How to stop emitting the new field, how to route bad messages to quarantine. Include the command and the owner.

  11. Cadence
    Review this canvas quarterly. Archive old versions with dates.

🔍 Example: “PriceChanged” event

Scope
Catalog publishes PriceChanged when merch updates base price or currency.

Context
Producers write to Kafka. Consumers include checkout, promo engine, and search indexer.

Step-by-step using the canvas

  • Owner: Catalog team, DRI Ana

  • Schema v1: product_id string, price number, currency string

  • Identity: event_id UUID, idempotency key is product_id + updated_at

  • Allowed changes: additive fields with defaults, enum additions for currency

  • Validation gates: JSON Schema v1 enforced in producer tests, consumer side gate drops non-conforming messages to a quarantine topic

  • Compatibility: add list_price as optional with default null, announce deprecation of price in favor of effective_price in v2 later

  • Backfill: write a one-time job to populate list_price from current catalog for hot SKUs

  • Observability: panel shows events per minute, schema errors, quarantine depth, dedupe hits

  • Rollback: feature flag on producer to stop emitting list_price, consumer reads continue

  • Cadence: review in 90 days, remove price only after all consumers confirm v2

What success looks like
Checkout and search continue to work. Quarantine remains empty. Dedupe hits are steady. After the overlap period, all consumers read v2 without code hotfixes.

Small confession
If a long-tail consumer is late to upgrade, I sometimes extend the overlap window. It is a tradeoff. I write it down and set a new date.

Do this / avoid this

Do

  • Publish a schema with examples and a clear owner

  • Validate at producer tests and at consumer ingress

  • Evolve with expand, announce, contract on a date

  • Default new fields so old consumers do not crash

  • Track schema errors and quarantine depth on a dashboard

Avoid

  • Sneaking breaking type changes such as number to string

  • Renaming fields without an overlap plan

  • Free-form payloads that vary by feature flag

  • Silent drops without quarantine or metrics

  • One-off consumer patches that bypass the contract

🧪 Mini challenge

Goal: give one critical event a real contract today.

  • Pick one event that multiple services consume

  • Fill the Data Contract Canvas in 15 minutes

  • Generate a JSON Schema or Protobuf from the snapshot

  • Add a consumer-side gate that logs and quarantines mismatches

  • Add one dashboard tile for schema errors and one for quarantine depth

  • Emit a test message with a missing optional field and confirm the gate behavior

Reply with one sentence on what surprised you.

🎯 Action step for this week

  • Inventory the top five events by impact and list owners

  • Add a contract page and schema file to each event’s repo or docs

  • Wire producer tests to validate against the schema

  • Deploy ingress validation for two high-value consumers

  • Define expand and contract dates for the next planned change

  • Review dashboards with product and support so everyone sees the same picture

By end of the week, aim to have two events validated at both producer and consumer edges with dashboards live.

👋 Wrapping up

  • Producers own the shape. Consumers validate at the edge.

  • Evolve with expand first, contract later, on a date.

  • Default new fields. Quarantine the outliers.

  • Verify with dashboards, not vibes.

If you liked this, you will probably enjoy my free 5-day email course, “From Developer to Architect.”
Five short lessons on mindset, tradeoffs, and communication you can use at work this week.
https://www.techarchitectinsights.com/c/from-dev-to-architect-5-day-email-crash-course

I would love your input. What tool around event evolution would help you most right now
Hit reply and tell me in one sentence.

Thanks for reading.

See you next week,
Bogdan Colța
Tech Architect Insights

Keep Reading

No posts found