Your AI Stack Needs a Control Plane
I’ve been away for a while. A health scare last year forced me to step back from work and writing for the better part of 12 months. I shared the full story on LinkedIn a few months ago. I’ve been back for a few months, feeling strong, building again, and yes, starting to write again.
To say that a lot has changed in the last 6-9 months would be an understatement. Today, every company I talk to is building agents, shipping copilots, and integrating LLMs into production systems. The speed of adoption has been staggering. But here’s what I keep seeing: the demo works, the CEO is excited, the board is bought in — and then it hits production, and nobody can explain the invoice.
I’ve written before about why CTOs should care about gross margins and cost-to-serve. At Krux, we built a culture of frugality around AWS costs. CTS was a first-class citizen — a product feature, not an afterthought. We could tell you exactly what it cost to serve each customer and why. We used Spot instances before most people knew they existed. We redesigned core algorithms to make fewer passes over data. Every engineer on my team cared about margins, not just the platform team.
That playbook assumed a world where your cost-to-serve was primarily cloud infrastructure: compute, storage, bandwidth. Predictable. Optimizable. Measurable.
LLMs broke that model.
The New Cost-to-Serve Problem
Unlike SaaS, where you benefitted from economies of scale (costs-to-serve don’t go up linearly with number of customers), an AI-native application results in a non-linear increase in costs with the number of customers. Moreover, these expensive LLM API calls are also your least observable. A single LLM call can cost a dollar. Multiply that across features, users, and experiments, and suddenly AI spend is one of the fastest-growing line items on the P&L — and nobody knows which team, feature, or experiment is driving it.
At Krux, I could look at our AWS bill and trace every dollar back to a processing job, a customer, a product decision. With LLM calls? Total darkness.
It gets worse. Most teams hardcode a model (“Anthropic just shipped Claude Sonnet 4.6, and we’re using it everywhere!”) and apply it to every task regardless of complexity. Classification that returns one of five labels? Sonnet. Extracting a date from an email? Sonnet. Summarizing a paragraph? Sonnet. It works, but it’s like using a Ferrari to go get groceries.
And when that single provider goes down, is slow, or hits a rate limit? Your application stops working. No failover, no fallback, no recourse.
This isn’t a hypothetical. Across the super{set} portfolio, we kept running into the same three gaps:
The attribution gap. You can see aggregate spend on your provider dashboard. You can’t see which feature, workflow, or customer is driving it. Knowing you spent $2,400 on Anthropic last month doesn’t tell you what to do differently.
The optimization gap. You suspect a cheaper model could handle 80% of your calls. But you have no way to test that hypothesis with your actual production data and prompts. You’re guessing based on benchmarks, not evidence.
The resilience gap. Single provider, single point of failure. No automatic failover, no policy enforcement.
We built observability for everything else — databases, micro-services, API endpoints. Your LLM calls deserve the same.
Introducing Majordomo
So we built Majordomo: an open-source control plane for your AI stack. It gives you visibility and control over every LLM call: cost attribution, multi-provider routing, cascade failover, and the data you need to actually optimize model selection.
Majordomo is two composable projects, each usable independently or together:
The Gateway is a lightweight API proxy written in Go. Deploy it once, point your existing LLM calls at it, and you get automatic cost calculation, full request/response logging, and custom metadata tagging. Tag every call with the feature, user, team, or experiment that triggered it. No SDK changes, no code refactoring — one header added to your existing HTTP calls.
The LLM Library is a Python async client that gives you a unified interface across providers (OpenAI, Anthropic, Gemini, DeepSeek, Cohere) with per-request cost tracking baked into every response object. It also gives you cascade failover: define a priority list of provider-model pairs, and if your primary errors out, Majordomo automatically tries the next one.
The real payoff is what you can do once you have this data. Majordomo includes a replay tool that lets you take logged production requests, replay them against a different model, and compare outputs — including an LLM-as-judge for semantic equivalence. If Haiku agrees with Sonnet 98% of the time on your document classification task, you switch and save 90% on that feature’s LLM costs. If it doesn’t, you know exactly where and why. You’re testing with your actual data, not someone else’s benchmark.
Everything is open source under the MIT license. Details, docs, and code are at majordomo.superset.com. For the full technical walkthrough, read Your AI Stack Needs a Control Plane on the Majordomo blog.
Why Open Source, and Why Now
Majordomo is an open-source project from super{set}, not a company. We built it because we needed it across our own portfolio, and we think every team building with LLMs needs something like it. The operational gap is too wide and too universal to keep proprietary.
But Majordomo is just one piece of the puzzle. A control plane gives you visibility and cost discipline for your model operations. It doesn’t solve the harder enterprise-grade AI challenges: how you architect your data layer, how you build a knowledge graph that compounds over time, how you handle governance and auditability in regulated environments, how you manage the boundary between AI autonomy and human oversight.
Those are the topics I’ll be tackling in the next posts in this series. Stay tuned.



