👋 Hey {{first_name|there}},

Why this matters / where it hurts

You add a filter and a sort to a “simple” list endpoint. Traffic is fine at first. Then, end users hit it in waves, and p95 jumps from 120 ms to 2.8 s. The ORM looks clean. The plan is not. A missing composite index forces a scan and an ugly sort. Caches hide it until they do not.

I have shipped that kind of change. It looked harmless. Later, we found slow dashboards, a surprise CPU bill, and a nightly batch that never finished. The query was not evil. It was unplanned.

Treat queries and indexes as part of the design. Tie them to an SLO. Check the plan before you merge. Give yourself an escape hatch if the plan flips in production.

🧭 Mindset shift

From: “Write the query, fix slow ones later.”
To: “Design the access path up front. Guard the plan and the SLO.”

Why it matters
Latency is mostly the access path. If the database can reach rows by an index that matches your predicates and sort, you win. If it must scan and sort, you pay. Plans also change over time due to stats and data skew. Treat plans as contracts you observe and, when needed, pin or guide.

Two keys

  • Design to the SLO and the user path, not to a vague “fast enough.”

  • Index for how you filter and order, then verify with an explain plan on production-like data.

Subscribe to keep reading

This content is free, but you must be subscribed to Tech Architect Insights to continue reading.

Already a subscriber?Sign in.Not now

Keep Reading

No posts found