👋 Hey {{first_name|there}},
Why this matters / where it hurts
So you migrated to microservices. It felt like real progress at the time: smaller repos, smaller deploys, and teams could finally move independently.
And then reality showed up uninvited.
Deployments started getting slower, not faster. That "small change" you wanted to ship? Turns out it needs three different services, two teams in a Zoom room, and a rollout plan that reads like a military operation. Incidents stopped looking like bugs and started looking like coordination failures. Oh, and latency? It's been creeping up because every request now takes a grand tour of your entire architecture.
For a while, I blamed the tooling. The CI pipeline is slow. Kubernetes is too complex. Our observability setup is a mess. And yeah, sometimes that's actually the problem, but not nearly often enough to explain what I kept seeing.
The real culprit is usually much simpler: you split your system into services, but the system still behaves like one giant unit. Only now all that coupling happens over the network instead of in-process. That's the distributed monolith. And honestly? It's often worse than the original monolith because you kept all the coupling and added retries, timeouts, partial failures, and version drift on top.
We're going to diagnose the coupling with a straightforward cohesion analysis, then merge the chattiest parts back together before you go splitting things up again, this time with a better rule.
🧭 Mindset shift
From: "Microservices are about nouns. Customer Service, Product Service, Order Service."
To: "Microservices are about cohesion. What changes together should run together."
Here's what happens when you split by nouns: you end up slicing a single behavior across multiple services. Checkout becomes Order + Payment + Inventory + Pricing + Promotions. On paper, each one looks clean and focused. In reality, that behavior is now a distributed transaction with network calls at every turn.
Cohesion is a much better lens. Put together the things that change together, deploy together, and fail together. Split where the change rates and operational concerns actually differ.
Two rules that'll keep you honest:
If a single user action requires 4-10 synchronous service calls, you didn't gain modularity. You just relocated it.
If two services have to ship in lockstep to deliver one feature safely, they're not really separate services yet.
🧰 Tool of the week: Service Cohesion Analysis Sheet
Think of this as a one-page decision sheet for any "problem area" flow. You can run through it during a design review or as a retro after a particularly painful release.
Pick one user behavior
Name a single flow, not an entire domain. Something like "Place order," or "Upgrade plan," or "Generate invoice PDF."
Draw the call chain
List out the synchronous hops for the critical path. Count the total hops and note any fan-out. This is your baseline.
Measure chatty coupling
For each service pair in the flow, write down how many calls happen per request and what data gets passed around. If two services are exchanging 3 or more calls per request, flag it.
Check transactional boundaries
Write down where the system actually needs atomicity. If you're relying on "do A, then B, then compensate if something breaks" for core money or state changes, flag it.
Score change-rate alignment
For each service, note how often it changes relative to the others. If two services are changing together most weeks, score them as strongly coupled.
Score ownership coupling
Who owns each service? If delivering one feature requires 2 or more teams to coordinate every single time, score that boundary as weak.
Identify merge candidates
Pick the 1-2 highest pain boundaries. These are usually the chatty pairs that share atomicity concerns and have a shared change rate.
Decide on the corrective move
Choose one of these:
Merge services into one deployable unit
Keep services separate, but change the interaction to async events
Introduce a facade that owns the behavior and hides the internal calls
State your expected outcome in one line, like "reduce checkout hop count from 7 to 3."
Define success signals
Pick 3 metrics you'll watch for two weeks. Things like: deployment steps reduced, p95 latency improved, incident rate down, fewer coordinated releases, fewer cross-service rollbacks.
Add a guardrail
Write one rule that prevents you from recreating the distributed monolith. For example: "No new synchronous calls in checkout without a hop budget review."
🔍 Example: Checkout split by nouns
Scope:
The behavior we're looking at is "Place order."
Context/architecture:
We've got these services: Customer, Cart, Pricing, Promotions, Inventory, Order, Payment. The UI calls an API gateway that fans out to all of them.
Step-by-step using the sheet:
Call chain: gateway → Cart → Pricing → Promotions → Inventory → Order → Payment → Order. That's 7 hops, with a loop back at the end.
Chatty coupling: Pricing calls, Promotions twice. Promotions call the Catalog. Order calls both Inventory and Payment, then re-reads its own state. Multiple service pairs are exchanging several calls per request.
Transaction boundaries: "Place order" really needs atomicity around inventory reservation, the payment intent, and order state. We have compensations in place, but they're brittle when retries get involved.
Change-rate alignment: Pricing, Promotions, and Cart all change together every sprint because product experiments touch all three at once.
Ownership coupling: The Checkout team owns Cart. Pricing belongs to a different team. Promotions is owned by a third team. Releases require constant coordination between all three.
Merge candidates: Cart + Pricing + Promotions form one cohesive unit for the checkout behavior. Order + Inventory reservation is another natural grouping.
Corrective move: Merge Cart, Pricing, and Promotions into a single "Checkout" service that owns the whole behavior and produces an OrderDraft. Keep Inventory and Payment as external dependencies with clear, stable contracts.
Expected outcome: Reduce hop count from 7 to 3. Reduce coordinated releases from 3 teams down to 1 for checkout experiments.
Success signals: p95 checkout API latency, number of services touched per feature, rollback frequency, and incidents tagged with "cross-service coordination."
Guardrail: The hop budget for checkout is 3. Any new synchronous dependency needs a review.
What success looks like:
Checkout can deploy independently again. Most changes ship with one service and one pipeline. Failures isolate better because the behavior has a single, accountable owner.
Small confession:
Merging services feels like going backwards. It's not. It's paying down the debt from a bad split so you can split again later, this time without lying to yourself about the boundaries.
✅ Do this / avoid this
Do:
Split by behavior and change rate, not by nouns
Budget synchronous hops per critical flow
Merge chatty, lockstep services back into one deployable unit
Use async events where you don't need immediate consistency
Align ownership to the boundary, one behavior, one accountable team
Avoid:
Using "Customer Service, Product Service" as your default template
Distributed transactions for core flows without a clear compensating strategy
Adding services just to reduce code size while increasing coordination costs
Treating network calls as if they were in-process function calls
Shipping features that require 3 services to change in the same week
🧪 Mini challenge
Goal: Identify one merge candidate in 45 minutes.
Pick one painful flow, the one that makes releases feel slow
Write out the synchronous hop chain and count the hops
Flag any service pair with 3 or more calls per request
Ask yourself one question: do these services change together most weeks?
Pick one boundary to fix and choose your move (merge, async, or facade)
Write down one success metric and one guardrail rule
Hit reply and tell me the flow and the hop count. One sentence is enough.
🎯 Action step for this week
Choose the top two flows that cause coordinated releases
Run a Service Cohesion Analysis Sheet for each one with the owning teams in the room
Decide on one corrective move and put it on the roadmap with an owner and a date
Add a hop budget review to your design review process for critical paths
Track one simple metric: "services touched per feature" for your top product area
By the end of this week, aim to have one merge candidate approved and scheduled, with success signals clearly defined.
👋 Wrapping up
Noun-based splits often just create chatty coupling over the network.
High hop counts usually mean you moved the monolith around instead of actually removing it.
Merge first when services change and fail together. Then you can split again with a better rule.
Measure success by fewer coordinated releases and stable p95 latency, not by how many services you have.
⭐ Most read issues (good place to start)
If you’re new here, these are the five issues readers keep coming back to:
Hit reply and tell me your biggest challenge with microservices.
Happy New Year! 🎉
As we close out 2025, I want to say thank you for being here and for letting me share these lessons with you. I hope you're spending these last days of the year with the people who matter most, your family, your friends, the ones who remind you there's more to life than deployment pipelines and service meshes.
Here's to 2026. A year for building systems that scale, yes, but also for building businesses that grow, teams that thrive, and making the kind of impact that actually matters. Whether you're preparing to scale up, level up, or just ship something you're proud of, I'm excited to be on this journey with you.
Take care of yourself and the people you love. I'll see you in the new year with more practical lessons and hopefully a few wins to celebrate together.
Cheers to what's ahead,
Bogdan Colța
Tech Architect Insights