👋 Hey {{first_name|there}},
Your delivery pipeline is slow, and everyone has a theory. Here's how to find the actual constraint in two days, not three months.
Why this matters / where it hurts
You've been in this meeting. Someone pulls up a velocity chart that's been trending down for two quarters. A VP asks, "why are we slow?" and suddenly everyone has an answer. The backend lead says it's code review. The PM says it's unclear requirements. The DevOps engineer says it's flaky tests. They're all probably right about something. But they can't all be the bottleneck.
Here's the thing about delivery speed: it's governed by exactly one constraint at a time. Not five. Not the general vibe. One queue, somewhere in your pipeline, where work piles up and waits. Everything upstream of it overproduces. Everything downstream of it starves. And until you find that one point, every optimization you make somewhere else is theater.
Last week in Lesson #44 on environment drift and containerization, we talked about environments that silently diverge. This week, we're zooming out: when the whole pipeline feels stuck, where do you actually look first?
🧭 The shift
From: "Delivery is slow because of many things. Let's improve everything a little."
To: "Delivery is slow because of one thing right now. Find it, fix it, then find the next one."
This is Eliyahu Goldratt's Theory of Constraints applied to software delivery. The idea is simple but counterintuitive: a system can only move as fast as its slowest point. Improving a step that isn't the bottleneck produces zero throughput gain. You just build up more inventory (PRs, tickets, builds) waiting in front of the actual constraint.
Most teams skip the diagnosis and jump straight to solutions. They add more CI runners when the real problem is that PRs sit in review for three days. They hire more developers when the constraint is a single manual QA gate. The audit forces you to look at the queues before you touch anything.
Optimize the constraint first. Everything else can wait.
Measure queue depth and wait time, not activity. Busy doesn't mean flowing.
Reassess after every fix. The bottleneck will shift, and that's expected.
📘 New: The Career Guide got an upgrade
I just finished a major update to the From Developer to Architect career guide. It now includes a self-assessment rubric, a week-by-week 90-day growth plan, architecture artifact templates, and interview prep frameworks. If you're actively working toward a Staff, Tech Lead, or Architect role, this is the structured roadmap.
Free download here: https://www.techarchitectinsights.com/from-developer-to-architect-free-career-guide
🧰 Tool of the week: The 48-Hour Bottleneck Audit Checklist
Bottleneck Audit Checklist: Find your delivery constraint in one focused pass.
PR age distribution - Pull the median and p90 age of open PRs right now. If the median age is over 24 hours, code review is likely your constraint. Check if it's a people problem (too few reviewers) or a process problem (PRs too large to review quickly).
Review-to-merge lag - Measure the gap between "first review comment" and "merge." If reviews happen fast but merges don't, you have an approval bottleneck: too many required approvers, or approval authority is concentrated in one or two people.
Build and CI duration - Record wall-clock time from push to green build. If it's over 15 minutes, developers batch their pushes and context-switch away. That batching cascades into larger PRs, which worsens review time. Note whether failures are legitimate or flaky.
Deployment frequency vs. capacity - Count deployments per week. Compare to how often the team could deploy if someone asked. A big gap here signals fear, not a technical limitation. Check for heavyweight change approval processes or missing rollback confidence.
Handoff queues - Map every point where work moves from one person or team to another. Measure how long items sit in each handoff queue. The longest queue is your most likely constraint. Common offenders: QA handoff, security review, architecture sign-off.
Work-in-progress count - Count tickets currently "in progress" across the team. Divide by the number of developers. If the ratio is above 2, the team is thrashing. High WIP is both a symptom and a cause: it means the constraint downstream is starving, and people upstream are starting new work instead of helping clear the jam.
Escaped-defect cycle time - When a bug is found in staging or production, how long does the fix take to reach production? If the hotfix cycle time is dramatically faster than the normal feature cycle time, your normal pipeline has ceremony that isn't adding safety, just delay.
🔍 In practice: The team that thought they needed more developers
Scenario: A platform team of eight engineers was delivering about one feature per sprint. Management's instinct was to grow the team to twelve. The tech lead pushed back and asked for 48 hours to run the audit first.
Scope: The audit covered everything from "ticket moves to In Progress" to "code is running in production." Nothing before planning, nothing after deploy.
Context: Eight engineers, monorepo, CI on GitHub Actions, deploy via Argo CD. Two required approvers per PR.
Step 1 - PR age: Median PR age was 3.2 days. That's where the first flag went up.
Step 2 - Review lag: The first review usually came within 4 hours. But the second approval took another 2 days. Only three people had approval rights for the core module.
Step 3 - CI time: 11 minutes. Not great, but not the bottleneck.
Step 4 - Deploy frequency: The team could deploy daily. Actually deployed twice a week. But this was downstream of the review constraint, so it wasn't the binding limit.
Step 5 - Handoff queues: No formal QA gate. Handoffs were minimal.
The tradeoff we accepted: We expanded approval rights to five people, knowing that two of them were less familiar with the legacy corners of the codebase. We accepted a short-term uptick in review comments per PR as the cost of unblocking flow.
Result: Median PR age dropped from 3.2 days to 18 hours within two weeks. Sprint throughput went from one feature to roughly two and a half. No new hires needed.
✅ Do this / ❌ Avoid this
Do this:
Start with the queues. Look at where work is waiting, not where people are busy.
Timebox the audit to 48 hours. Longer than that, and it becomes a project instead of a diagnosis.
Fix the constraint, then re-measure. The bottleneck will move, and that's progress.
Avoid this:
Don't optimize CI speed when PRs are aging for days in review. You're polishing the wrong pipe.
Don't add headcount as the first response to slow delivery. More people pushing into a constrained pipeline makes the pile bigger, not faster.
Don't try to fix everything at once. Parallel improvements at non-constraints produce no throughput gain and burn goodwill.
🎯 This week's move
Pull your team's current PR age distribution. Median and p90. Write the numbers down.
Map the handoff points in your pipeline and estimate wait time at each one. Even rough numbers work.
Identify the single longest queue. That's your candidate constraint.
Propose one specific change that directly addresses that queue. Not a process overhaul. One targeted fix.
By the end of this week, aim to: Have a one-page document that says "Our current delivery constraint is [X], average wait time is [Y], and the proposed fix is [Z]." Share it with your team lead.
👋 Wrapping up
Your pipeline has one constraint. Not five, not "general tech debt," not "culture." One queue where work piles up right now.
Find that queue. Fix it. Then find the next one.
That's the job. Not optimizing everything. Optimizing the thing that actually matters this week.
⭐ Good place to start
I just organized all 40 lessons into four learning paths. If you've missed any or want to send a colleague a structured starting point, here's the page.
Thanks for reading.
See you next week,
Bogdan Colța
Tech Architect Insights