This website uses cookies

Read our Privacy policy and Terms of use for more information.

👋 Hey {{first_name|there}},

Your architecture has a cost shape, whether or not anyone has drawn it. This week, the review card I use to find the line items nobody assigned to anyone, plus the story of the architect who used something like it to save his company $140,000 a year on a Wednesday afternoon.

Why this matters / where it hurts

I watched an architect save his company about $140,000 a year by reading the AWS bill line by line, on a Wednesday afternoon, while the rest of his team was in a planning meeting. He brought a printed copy. I had never seen anyone do that before, and I think I will remember it for a while. He had circled one line. NAT gateway egress is a little over $11,000 a month. He could not explain which decision had produced that number, and nobody else in the room could either.

That bill had been arriving for a year and a half. The architect of the system had changed twice during that time. The team had shipped four major features. Cost was on nobody's review checklist, partly because nobody owned it, and partly because architecture documents do not have a cost section. They have boxes and arrows. Cost lives in a different document, with a different audience, and the people designing the system rarely read the document that the people paying for it are reading. The gap between them is where most of the money quietly disappears, in my experience.

In Lesson #21 on SLOs and error budgets, we covered how reliability becomes a number you can spend down. Cost works the same way. Most teams I have seen have just not done the work to give it a number, and so it floats around the design unowned, like an undeclared dependency.

🧭 The shift

From: Cost is a finance problem, or a platform team problem, or a thing you clean up once a quarter.
To: Cost is a quality attribute. It belongs in the architecture review, on the same page as latency and reliability.

Reliability gets reviewed because it has SLOs and somebody on the hook for the p95. Cost usually gets reviewed once a quarter, after the bill arrives, by people who were not in the room when the architecture was decided. That timing alone is most of the problem. The team that wrote the cost into the system is not the team that finds it later.

The teams I have seen handle this well treat cost the way they treat latency. Every review answers, in writing, how much the proposed change will cost in steady state, what the per-request and per-customer impact looks like, and what the cleanup plan is if the assumptions turn out to be wrong. The teams that do not do this end up with one engineer reading the bill on a Wednesday afternoon. That is a fine fallback. The architect on the next system might not be that thorough.

A few defaults I now hold pretty firmly:

  • Treat cost as a non-functional requirement on every design doc, alongside reliability and performance. One paragraph minimum, with a number in it.

  • Tag everything you provision with a service dimension, a team dimension, and a customer or segment dimension. If a resource is not tagged, nobody is going to notice when it doubles in size.

  • Read the cloud bill once a month, with the architects in the room, not just finance. Half an hour. The first one is awkward. The third one starts paying for itself.

📘 Free Career Guide

The From Developer to Architect career guide just got a major update. It now includes a self-assessment rubric, a 90-day growth plan, architecture artifact templates, and interview prep frameworks. If you are working toward a Staff, Tech Lead, or Architect role, this is the structured roadmap.

🧰 Tool of the week: Cost Architecture Review Card

Find the spend that nobody assigned to anyone.

For each dimension, ask the question and answer honestly. Count the dimensions where the answer shows real pressure. Score at the bottom.

  1. Spend attribution. Can you, in under five minutes today, answer "what does this service cost" and "what does this customer cost"? If the answer requires a finance person and a spreadsheet, that is pressure. Most teams I have worked with discover their tagging story is patchy the first time someone actually tries to use it.

  2. Unit economics in the design doc. Pick three features your team shipped in the last quarter. For each, find the written estimate of cost per request or cost per customer that was made before the feature shipped. Two or three of them: clean. One or none: pressure. Almost every team scores high on this one the first time. It is a habit nobody has built yet.

  3. Idle and orphaned resources. Open the console. Look for staging environments older than six months with no clear owner, snapshots that have outlived the project, and scheduled jobs whose triggers no longer fire. If you find more than one or two, count it as pressure.

  4. Egress and inter-service traffic. Look at your top three line items. If any of them are NAT gateway, cross-AZ, or cross-region transfer, and your architecture diagram does not show those flows explicitly, that is pressure. Compare what is in the diagram to what is on the bill. They should match. They usually do not.

  5. Storage tiering and retention. How much of your storage spend is on data older than 90 days that nobody has read? If you do not know, that is pressure. Pick a sample bucket and check the access logs before the next review.

  6. Top-customer profitability. For your top five customers by infrastructure cost, do you know whether they are profitable at the per-customer level? If you have not run that analysis in the last twelve months, that is pressure. I have seen this dimension move slowly until it suddenly does not.

  7. Cost in the review checklist. Pull the last three architecture review documents. Count the ones with a cost section that includes a number, not a sentence. Two or three: clean. One or none: pressure. Of all seven dimensions, this is usually the easiest one to fix.

Scoring: 0 to 2 dimensions under pressure, your cost discipline is roughly in place. Run this card once a quarter. 3 to 4, you have real cost debt, pick the highest-pressure dimension and fix it before the next review cycle. 5 or more, stop. Schedule the afternoon. Read the bill the way the architect in the opening story did.

🔍 In practice: A B2B analytics SaaS that almost stopped being profitable

Scenario: A 25-engineer team running a B2B analytics product for about 30 enterprise customers. AWS bill grew 40% year over year. Customer count grew 15%. The CFO started asking gentle questions. The principal engineer ran the card.

  • Scope: AWS spend only. Third-party SaaS and API costs are out of scope for this pass.

  • Context: Four services, single region, AWS native, no dedicated platform engineer.

  • Spend attribution: Tagged by service, not by customer. Per-customer cost was a guess. Pressure.

  • Unit economics: The last four shipped features had no cost estimate in the design doc. The "advanced filters" feature was driving 18% of new query cost on its own. Nobody had predicted this. Pressure.

  • Idle resources: Three staging environments belonging to engineers who had left, two RDS snapshots from 2024, and one Lambda triggered by a CloudWatch event that no longer fired. About $2,400 a month combined. Pressure.

  • Egress: Cross-AZ traffic between materialized view writers and read replicas was the second-largest line item. Not on the diagram. Pressure.

  • Storage: A telemetry pipeline retained raw events for 365 days because the original spec said so. About 70% of it was untouched after 14 days. Pressure.

  • Top-customer profitability: Three customers were costing more in infrastructure than they were paying. One was a household name kept for marketing reasons. Two were on legacy plans grandfathered from 2022. They knew this in the abstract. They had not made a deliberate call. Pressure, sort of.

  • Cost in design review: Five recent docs. Zero cost sections. Pressure.

Six dimensions under pressure, plus a soft seventh. They stopped and ran a focused review.

The thing they got wrong: they assumed one of the orphaned environments was safe to kill. It was running a nightly job that fed a sales dashboard nobody had documented. They caught it the next morning, brought it back inside an hour, and added the dashboard to the catalog with a clear owner. Embarrassing in the standup. Not the worst lesson to learn that way.

The tradeoff they accepted: the customer profitability problem did not get solved in the same quarter. The CRO had real reasons. Renewal timing, contract terms, and the brand value of one logo. The architecture team accepted that the card had surfaced the problem honestly, and that solving it belonged in a different forum. That conversation is still going.

Result: about $96,000 in annualized run-rate cost cut in the first quarter. Mostly from killing orphaned environments, retiring the 365-day retention policy in favor of 90 days hot plus 12 months in S3 IA, and moving materialized view writers into the same AZ as the readers. They also caught one architecture in flight, a planned multi-region replication, and chose a cheaper async approach before it shipped. The follow-on quarters added more, but the first cut was the one that brought attention.

Do this / Avoid this

Do this:

  • Add a cost section to the architecture review template. Require a number, not a sentence.

  • Tag every resource with a service, a team, and a customer dimension at provisioning time. Retroactive tagging projects rarely finish.

  • Read the bill once a month with the architects in the room.

Avoid this:

  • Treating cost as a quarterly cleanup. By the time the cleanup happens, the architecture has hardened around the spend.

  • Letting "we will optimize later" stand as a design decision. Later is a bill.

  • Optimizing the wrong dimension. A 30% cut on the top three line items beats a 10% cut spread across everything.

🎯 This week's move

Pull your latest cloud bill. Find the top three line items by spend. For each, write one sentence answering two questions: which architecture decision produced this cost, and which person on the team can name it as theirs. If you cannot answer either question for any of the three, you have your starting point. Bring those three lines to your next architecture review. Watch what changes when cost has a face in the room.

By the end of this week, aim to: have your top three cloud line items mapped to specific architecture decisions and named owners.

👋 Wrapping up

Cost is a quality attribute, the same as latency and reliability. Architecture decisions write the bill. Make sure somebody reads it back to the architects, ideally before it has been arriving for a year and a half.

Most of the cost in your architecture is sitting in a line item nobody has read out loud yet. Read it out loud.

Want the full path to architect?

This issue covers one slice of the broader transition. If you want the structured roadmap, the From Dev to Architect 5-day email course walks through the foundations: systems thinking, decision-making under uncertainty, communicating tradeoffs, and the operational instincts that separate senior developers from architects.

Hit reply and tell me in one sentence: what is the largest line item on your current cloud bill, and can you name the architecture decision that produced it?

Help a friend think like an architect

Know someone making the jump from developer to architect? Forward this email or share your personal link. When they subscribe, you unlock rewards.

🔗 Your referral link: {{rp_refer_url}}

📊 You've referred {{rp_num_referrals}} so far.
Next unlock: {{rp_next_milestone_name}} referrals → {{rp_num_referrals_until_next_milestone}}

View your referral dashboard

P.S. I’m still working on two new rewards. If there’s something you are interested in, let me know 😉

⭐ Good place to start

I just organized the lessons into four learning paths. If you've missed any or want to send a colleague a structured starting point, here's the page.

Thanks for reading.

See you next week,
Bogdan Colța
Tech Architect Insights

Keep Reading