👋 Hey {{first_name|there}},

Stop Launch and Pray

You’ve probably lived this: big refactor, new provider, shiny rewrite. Staging looks fine, synthetic tests pass, confidence feels high. You flip traffic, and immediately discover the thing only fails with real users, real data skew, real chaos. Cue the incident channel.

There’s a calmer way:

Shadow traffic and dual-run let you prove a replacement in parallel, using the exact traffic patterns and edge cases your system actually sees without risking users.

This is the natural next step after our recent lessons:

  • Reversibility (Lesson #18): design the exit before the entry.

  • Idempotency (Lesson #19): make retries and replays safe.

  • Backpressure (Lesson #20): keep the core path alive under stress.

  • SLOs & Error Budgets (Lesson #21): ship with guardrails.

Shadow + dual-run ties them together: you validate in production conditions while your kill switch stays within reach.

Let’s make reliability a number your team can actually use.

🧭 The Mindset Shift

From: “Launch and see what breaks.”
To: “Prove it in parallel and cut over deliberately.”

And also:

  • From “staging truth” → to “production truth.”

  • From “correctness on paper” → to “behavior parity under reality.”

  • From “we think it’s faster” → to “we measured it against the SLO you care about.”

Dual-run isn’t an extra ceremony; it’s how you trade unknown risk for measured evidence.

🎯 Want to learn how to design systems that make sense, not just work?

If this resonated, the new version of my free 5-Day Crash Course – From Developer to Architect will take you deeper into:

  • Mindset Shift - From task finisher to system shaper

  • Design for Change - Build for today, adapt for tomorrow

  • Tradeoff Thinking - Decide with context, not dogma

  • Architecture = Communication - Align minds, not just modules

  • Lead Without the Title - Influence decisions before you’re promoted

It’s 5 short, focused lessons designed for busy engineers, and it’s free.

Now let’s continue.

🧰 Tool: The Dual-Run Checklist

Use this once for each replacement (service, algorithm, data store, provider). It’s intentionally pragmatic, so you’ll actually run it.

1) Define the decision you’re making

  • Success criteria: What must be true to cut over? (e.g., ≤0.5% output deviation on core scenarios; p95 latency ≤ old system + 10%; availability ≥ SLO.)

  • Scope: Which endpoints, tenants, or flows are in scope? Start narrow.

Write it down: “Cutover when deviation < 0.5% for 7 consecutive days and p95 ≤ +10%.”

2) Choose your traffic source (tee point)

  • Edge/gateway tee: Mirror requests at the gateway to the new system. Good for HTTP APIs.

  • Producer tee: Publish the same events to both old and new consumers (Kafka/Kinesis fan-out).

  • Span/sidecar tee: Service-mesh/agent duplicates calls without code changes.

Rule: Tee where you preserve the same inputs the old system saw (headers, auth context, locale, AB buckets).

3) Control sample & stickiness

  • Sampling: Start at 1–5% of total traffic; grow gradually.

  • Stickiness: Keep related requests for a session or entity consistent (same user → same cohort) to avoid bias.

  • Exclusions: Remove obviously risky tenants or VIPs initially.

4) Make the new path read-only

  • Block writes to external systems (email, payments) from the shadow path.

  • Sandbox third-party calls (or stub them).

  • Route side effects to a black-hole sink or a safe audit topic.

If you must exercise writes (e.g., storage), use isolated replicas and idempotent, reversible operations with strict egress controls.

5) Protect data & privacy

  • Mask or tokenize PII in the mirrored stream where lawful.

  • Partition logs/metrics for shadow traffic.

  • Access control: Observability for shadows follows least-privilege.

6) Decide how you’ll compare outputs (diffing strategy)

  • Exact match (binary): IDs, status codes, flags, deterministic decisions.

  • Tolerance window (numeric): scores, prices, totals (e.g., within ±0.1).

  • Semantic parity (ranking/top-K): overlap %, Kendall tau, business KPI equivalence.

Store (correlation_id, old_output, new_output, diff) for analysis.

7) Observe what matters (SLO-aligned)

  • Error parity: % mismatches by category (schema, validation, 4xx/5xx).

  • Latency parity: p50/p95/p99 deltas for the new path vs old.

  • Throughput headroom: CPU/mem/GC; connection pools; queue depth.

  • Business KPIs: conversion, fraud, authorization rate, in shadow only as simulated metrics.

Build a single dashboard with parity cards + a “go/no-go” banner.

8) Run on real-time and golden data

  • Golden set: Curated, nasty cases (edge locales, Unicode, leap days, max payloads, hot SKUs).

  • Real-time: Live stream exposes unknowns (seasonality, tenant quirks, traffic spikes).
    Cut only after the new path passes both.

9) Decide and script the cutover

  • Toggle location: feature flag, routing rule, DNS/LB weight, or consumer offset switch.

  • Blast radius plan: 1% → 5% → 25% → 100% with hold times and health checks at each step.

  • Rollback: exact command + condition (breach of SLO/latency/diff). Practice once.

10) Clean up the scaffolding

  • Retire the tee and shadow observers after stability.

  • Archive diffs for a postmortem doc (“what surprised us”).

  • Remove flags and dead configs to keep the system simple.

📔 Patterns that make dual-run smooth

  • Correlation IDs everywhere. Thread a single ID through teeing, new path, and diff store so analysis isn’t guesswork.

  • Seed determinism. Fix random seeds for tests; freeze time for golden replays.

  • Clock sanity. NTP skew makes TTLs and signatures fail; verify time alignment.

  • Eventual consistency-aware diffs. Compare after the same stabilization window on both sides.

  • Cost controls. Shadowing can be expensive; sample aggressively and auto-pause off-peak.

  • Business guardrails. If a new path is cheaper but subtly shifts a KPI (e.g., authorization rate vs. fraud), instrument that as a parity metric too.

Concrete example #1: Replacing the payment capture service

  • Decision: Move from Provider A → Provider B.

  • Tee point: At the payments gateway, mirror authorize and capture calls to Provider B (sandbox).

  • Read-only: B’s real money calls are replaced with sandbox + audit log; no real charges.

  • Diffing: Compare auth outcomes (approve/decline), reason codes, and expected fees within tolerance.

  • SLO: Authorization success rate ≥ A; p95 latency within +10%.

  • Cutover: 1% of tenants with low–medium volume → hold 24h → 5% → 25% → 100%.

  • Rollback: Single feature flag; auto-rollback if auth success dips >0.5% for 15 minutes.

Result: Evidence-based switch with zero customer risk.

Concrete example #2: Ranking model dual-run

  • Decision: New search ranking model.

  • Tee: Mirror top-X queries to the new model.

  • Diff: Measure rank correlation, top-3 overlap, and simulated business click-through.

  • Guardrails: If semantic parity stalls below the threshold for key categories, don’t cut.

  • Cutover: Canary on low-risk categories; measure live CTR lift; expand gradually.

Result: You launch with proof that the new model is actually better, not just different.

💭 Common pitfalls (and safer alternatives)

  • Pitfall: Shadow path accidentally sends emails/charges.
    Fix: Hard block egress to production side-effect systems; route to sink or sandbox.

  • Pitfall: Sampling bias hides the scary cases.
    Fix: Combine random sampling with targeted golden sets (tenants, locales, big payloads).

  • Pitfall: Chasing bit-for-bit parity on non-deterministic outputs.
    Fix: Use tolerance or semantic diffs; document accepted variance.

  • Pitfall: Declaring victory after a day.
    Fix: Run across peak cycles (day of week, month-end) and partner outages.

  • Pitfall: Diff store without privacy controls.
    Fix: Mask PII; access-scope shadow logs.

  • Pitfall: No owner for go/no-go.
    Fix: Name a DRI and set objective thresholds before you start.

🔎 Mini challenge (45 minutes)

  1. Pick a candidate (endpoint or consumer) that scares you to ship.

  2. Write one sentence of success criteria: “Cut when deviation < X for Y days; p95 ≤ +Z%.”

  3. Choose a tee point and start at 1% shadow sampling, read-only.

  4. Implement a single parity check (exact or tolerance) and log (corr_id, old, new, diff).

  5. Add a tiny parity card to your dashboard. Share a screenshot with your team.

You’ll be shocked how quickly “we feel good” becomes “we have evidence.”

Action step (this week)

  • Identify one risky change on your roadmap (provider swap, data-store move, algorithm change).

  • Run the Dual-Run Checklist end-to-end with a narrow scope.

  • Schedule a cut rehearsal (toggle + rollback) in staging.

  • Put the go/no-go thresholds into your team’s runbook and pin the parity dashboard.

Do this once, and your team will insist on it next time.

👋 Wrapping Up

Don’t test a new system on your users’ patience.
Prove it with shadow traffic and dual-run, on your terms:

  • Mirror real inputs.

  • Compare outputs with the right parity metric.

  • Watch SLO-aligned deltas.

  • Cut over gradually with rollback rehearsed.

That’s how launches become boring. And boring is beautiful.

Thanks for reading.

See you next week,
Bogdan Colța
Tech Architect Insights

Keep Reading