👋 Hey {{first_name|there}},

“Just retry it” is not a strategy

Most incidents end with the same advice: “Retry the failed requests.”
Sounds simple. Until your system double-charges a customer, deducts inventory twice, or sends 50 welcome emails.

Here’s the truth:

Retries are inevitable. Idempotency is optional.
Great systems make retries safe by design.

Idempotency is what turns reversible decisions, canary releases, and chaos testing from risky theory into calm practice. Let’s make it a default, not an afterthought.

🧭 The Mindset Shift

From: “We’ll retry and hope it works.”
To: “Every effect can be replayed without harm.”

Architects design operations as carefully as features. They assume:

  • Networks flap.

  • Clients time out after the server succeeds.

  • Queues redeliver.

  • Humans click twice.

When you expect duplication, you stop fearing it, and you enable the team to ship with confidence.

🎯 Want to learn how to design systems that make sense, not just work?

If this resonated, the new version of my free 5-Day Crash Course – From Developer to Architect will take you deeper into:

  • Mindset Shift - From task finisher to system shaper

  • Design for Change - Build for today, adapt for tomorrow

  • Tradeoff Thinking - Decide with context, not dogma

  • Architecture = Communication - Align minds, not just modules

  • Lead Without the Title - Influence decisions before you’re promoted

It’s 5 short, focused lessons designed for busy engineers, and it’s free.

Now let’s continue.

🧰 Tool: The Idempotency Checklist

Use this when designing APIs, jobs, or message handlers. If you can’t check these boxes, retries are dangerous.

  1. Idempotency Key (Request Identity)

    • Every side-effecting request carries a unique key (Idempotency-Key, X-Request-Id).

    • The key is client-generated (or upstream), stable across retries.

    • Stored server-side with the final outcome (status, response body, affected IDs, timestamp).

  2. Dedupe Window

    • Define how long keys are kept (e.g., 24–72h or a business-specific horizon).

    • Storage choices: hot cache (Redis) + durable store (DB) for recovery.

  3. Exactly-Once Illusion, Safely

    • Accept that most infra is at least once.

    • Emulate exactly-once by deduping at the write boundary (unique constraints, upserts, “insert-if-not-exists” with the key).

  4. Operation Semantics

    • Prefer PUT (replace) over POST (create) when an operation is naturally idempotent.

    • For POST create, supply a client token so repeating the POST returns the same created resource.

  5. Data Model Guardrails

    • Add unique indexes on (business_key | idempotency_key) to block duplicates at the database layer.

    • For counters/balances, use compare-and-swap or versioned writes to avoid double increments.

  6. Outbound Effects

    • E-mail/SMS/webhooks: record outbound ledger keyed by the idempotency key; suppress duplicates.

    • Payment captures: idempotency keys become mandatory; reconcile against provider response.

  7. Idempotent Handlers

    • Message consumers check a processed table (or outbox/inbox pattern) before applying effects.

    • Keep handlers idempotent by default—replaying messages shouldn’t create new side effects.

  8. Observability

    • Dashboards showing dedupe rates, duplicate attempts blocked, and key expiry backlog.

    • Traces annotate the idempotency key across hops.

If you can tick these, you can retry freely.

🔎 How Idempotency Shows Up Across the Stack

Edge/API

  • Accept the key, validate presence/shape, and echo it back in responses.

  • For duplicate keys + same payload, return original status/body; for same key + different payload, 409 Conflict.

Services/Workers

  • Check the “processed”/“effects” table before applying.

  • Make handlers idempotent by default: read current state → apply change only if not already applied.

Database/Storage

  • Enforce uniqueness at the table that owns the effect.

  • Prefer upserts with deterministic conflict rules (e.g., “first write wins” keyed by idempotency_key).

Outbound Integrations

  • Store a send ledger (what, to whom, provider id).

  • If a retry hits, short-circuit and return the recorded provider response.

⚠️ Edge Cases & Gotchas (and how to handle them)

Near-simultaneous duplicates

  • Two retries land at the same time with the same key.

  • Fix: atomic write + unique index; one wins, the other reads the stored outcome.

Success after client timeout

  • Client times out and retries; server already succeeded.

  • Fix: key replay returns original 200/201 + same body; no second effect.

Out-of-order deliveries

  • A late retry arrives after a newer state exists.

  • Fix: versioned writes / sequence tokens; operations that don’t match the current version become no-ops.

TTL too short

  • Key expires before late retries; duplicates slip through.

  • Fix: base TTL on real business windows (chargebacks, daily batch cycles), not a random default.

Fan-out side effects

  • One operation triggers many downstream actions.

  • Fix: Saga steps are idempotent individually; each step carries its own step-id key.

Partial failures

  • Local write succeeded; outbound email failed.

  • Fix: outbox pattern, commit the intention with the write; a relay reliably delivers later, deduped by key.

Testing Idempotency Without Drama

Property-style tests

  • Given the same key + payload, N identical calls → one effect, one outcome.

Chaos retries

  • In staging, inject timeouts and 5xx, then auto-replay with the same key.

  • Assert: no duplicate rows, no double external calls, identical response bodies.

Out-of-order and duplication

  • Feed the same event multiple times and shuffle the order.

  • Assert: end state is correct; dedupe counters increment; no extra side effects.

Cold-start replays

  • Clear app caches (keep dedupe store) and replay a window.

  • Assert: reprocess is safe; performance is acceptable; invariants hold.

Observability checks

  • Every hop logs idempotency_key; dashboards show replay ratio and duplicate blocked trends.

🧪 Mini Case: Signup → Email → CRM → Billing

  • Signup: client sends POST /signup with Idempotency-Key: <uuid>.

    • DB unique index on (email) prevents duplicates; a mapping table ties <uuid> → user_id.

    • Duplicate POST returns 200 with the same user payload.

  • Welcome Email: outbound table keyed by send_key = 'welcome:' + user_id.

    • If present, respond “already sent” with original provider id.

  • CRM: webhook consumer checks inbox (webhook_id unique).

    • Duplicate deliveries become no-ops; last write wins.

  • Billing: card capture uses the provider idempotency key <user_id>:<plan>:<day>.

    • Late retries return the same charge ID; no double-charge.

Result: You can replay the entire signup flow safely, no fear of duplicates.

🛠 Patterns to Keep Handy

  • Client tokens for resource creation: repeat creates return the same resource.

  • Outbox/Inbox: transactional intention + deduped delivery.

  • Natural vs synthetic keys: use business keys where stable (e.g., monthly invoice), UUIDs elsewhere.

  • Logical tombstones: deletions are safe to repeat.

  • Compensations are idempotent too: refunds/cancellations can be retried without double effects.

🚫 Anti-Patterns to Avoid

  • Stateless “idempotency” that relies on timing.

  • Side effects hidden in reads.

  • No uniqueness at the write boundary.

  • Keys only at the edge (not propagated).

  • Infinite retries without backoff/jitter/circuit breakers (self-DDOS).

📔 What Good Architects Do Differently

  • They treat idempotency as table stakes, not a premium feature.

  • They push the dedupe guarantee to the boundary where the effect occurs (DB/payment/outbox), not just at the gateway.

  • They instrument idempotency: keys in logs, traces, and metrics.

  • They empower on-call to replay confidently because the system is built for it.

Mini challenge (this week)

Pick one side-effecting flow (payments, inventory, user signup).

  1. Add/propagate a stable idempotency key.

  2. Enforce a unique constraint or upsert at the effect boundary.

  3. Store and replay the original response for duplicate keys.

  4. Add a dashboard tile: “duplicate attempts blocked (24h).”

Small change, huge safety.

👋 Wrapping Up

Retries are inevitable. The difference between chaos and calm is idempotency.

Design every effect so it can be replayed without harm.
Propagate keys. Guard the write. Log the outcome.
Then roll back, reprocess, and recover, without fear.

Thanks for reading.

See you next week,
Bogdan Colța
Tech Architect Insights

Keep Reading