Writing · AI Infrastructure

Build vs. Rent: When Proprietary AI Infrastructure Becomes A Strategic Asset

Most companies should rent AI capability until they hit a specific volume and data threshold. Here is the math, the inflection points, and the failure modes I see most often.

May 7, 2026 8 min read By Ajit Samuel

There is a question I get from operators every week. Should we build our own AI stack, or should we just call OpenAI. Most of the time the answer is rent. The default for ninety percent of companies should be to rent capability from a frontier provider, wrap it in a thin orchestration layer, and ship. Building your own infrastructure before you have a reason to is one of the most expensive forms of theater in modern software.

But there is a real inflection where renting starts to bleed margin, signal, and optionality. The teams that recognize the inflection and execute on it pull away from competitors that keep paying the per-token tax forever. The teams that mistake the inflection waste eighteen months and a senior engineering org rebuilding what GPT already does, and end up worse on every dimension that matters.

This is the framework I use to decide.

The default is rent

If you are not yet at product-market fit, or you are below roughly a hundred thousand model calls per day, or your use case is something a frontier model handles well out of the box, you should be renting. Period. The cost of a senior ML engineer for one year is roughly four hundred thousand dollars all in. The cost of two of them, plus a research engineer, plus the GPU bill, plus the data work, plus the eval system, is north of two million dollars per year before you ship anything that beats gpt-4-class output.

For two million dollars you can buy a lot of inference. At current pricing on a frontier API, two million dollars per year covers somewhere in the neighborhood of fifty to a hundred million high-quality completions a year, depending on token mix. If your business cannot generate measurable revenue with a hundred million completions, the bottleneck is not your AI stack. The bottleneck is product or distribution.

Renting also buys you something engineers undervalue: the right to change your mind. Provider markets are still ranking and reordering. Frontier capability per dollar has dropped roughly an order of magnitude every eighteen months for the last three years. The team that bet the company on a fine-tuned 13B model in 2023 is now stuck maintaining it while their competitor swaps the model line under the hood quarterly and rides the curve.

Why renting hits a ceiling

Renting hits a wall in a small number of specific places. I have seen it cap out for one of three reasons, almost always.

The first is unit economics. When inference cost crosses about thirty percent of contribution margin per transaction, you have a problem. At that point every percentage point of conversion you add gets clawed back by the inference bill. I have watched two companies grow revenue thirty percent year over year and watch gross margin compress by twelve points in the same window. That is not a strategy. That is a treadmill.

The second is latency. Frontier APIs are getting faster, but a round trip through someone else's data center plus their queue plus their token streaming will rarely beat 400 to 600 milliseconds end to end for a non-trivial completion. If your product is interactive — voice, real-time bidding, in-game agents, in-call assist — that is a ceiling you cannot break by paying more. You break it by owning the inference path.

The third is the data flywheel. If your product generates proprietary signal — labeled outcomes, expert corrections, structured behavioral data that nobody else has — then every prompt you send to a vendor that does not learn from it is a prompt you are paying for and getting nothing back. The asymmetry compounds. After eighteen months, a competitor with the same product but a closed-loop training system is operating on a model that knows things yours never will.

Three signals it is time to build

I look for three signals. Any one of them, on its own, is not enough. Two of them in combination should put you on a build path within six months.

Signal one: a real data moat. You are sitting on millions of high-quality outcome-labeled examples that no public dataset contains. Not chat logs. Not scraped pages. Outcome data — the kind that closes the loop between a decision and a result. If you have this, you have an asset that compounds in ways a vendor cannot match.

Signal two: a binding latency requirement. Your product breaks above a specific p95 latency, and you are at or near it on a frontier API. If your spec is sub-200ms p95 and the network alone eats 80ms, you are not going to fix that with prompt engineering.

Signal three: a unit economics gap. Inference cost as a percentage of contribution margin is rising and projected to cross your gross margin floor within twelve to eighteen months. That is a quantitative trigger, not a vibe.

The right time to build is when you can write a one-page memo with the numbers, and the numbers force the decision. If the memo requires hand-waving, you are not there yet.

What "build" actually means in 2026

The biggest mistake I see is conflating "build AI infrastructure" with "train a foundation model." For almost every company on earth, training your own foundation model is malpractice. It is also not what build means anymore.

Build, in 2026, means three concrete things.

Own the inference path. That means hosting open-weight models — Llama, Qwen, Mistral, DeepSeek, whatever the current frontier of open weights looks like when you read this — on your own GPU fleet or on rented but dedicated capacity. It means controlling the runtime, the batching, the quantization choices, the KV cache strategy. It means the round trip from your user to the model is yours, including the latency, and you can profile and optimize every step. In my experience you can typically get to roughly thirty to forty percent of frontier-API cost at comparable quality on a well-tuned vLLM or SGLang deployment, with p50 latency under 150ms for a 70B-class model on H100s.

Own the eval system. Off-the-shelf evals are noise. Build a golden set drawn from your own production traffic, label it with the people who actually understand the failure modes, and run every model change against it before it ships. Treat the eval system like CI. If it does not block deploys, it does not exist.

Own the data flywheel. Capture every interaction, every correction, every downstream outcome. Feed that into structured training datasets and into RAG indexes that get refreshed continuously. The flywheel is what compounds. Without it you are just self-hosting somebody else's model and paying yourself the margin instead of paying the vendor.

Notice what is not on this list. You do not need to pretrain a model from scratch. You do not need to invent a new architecture. You do not need a research team publishing papers. You need an applied engineering team that knows how to ship inference at scale and how to close the data loop.

A case sketch

A consumer health platform I worked with was spending roughly $180,000 a month on a frontier API for a workflow that ran across a few million sessions a month. Inference was already 22 percent of contribution margin and trending up two points a quarter as usage grew. They had four years of expert-labeled clinical decisioning data — a real moat, untouched by any public model.

We did not rebuild the world. We took an open-weight 70B model, fine-tuned on their data, and stood it up on a rented but dedicated four-node H100 cluster. Inference cost dropped to roughly $52,000 a month at higher quality on their domain-specific evals. The fine-tuned model outperformed the frontier API on three of the five tasks that mattered, because it had seen something the frontier model had not. p95 latency went from 720ms to 190ms. The eval harness — which we built before we touched the model — caught two regressions that would have shipped under the old API setup.

The total project was eight months. It paid back inside six months on the cost line alone. The bigger win was that they now had a system that improved every week from production traffic, and a competitor on the same stack would have to start from zero.

What most companies get wrong

The failure mode I see most often is treating the build decision as a status purchase. A founder reads that the cool companies have proprietary models, hires a head of ML, and burns a year before realizing they had no data moat to begin with. The model they trained is worse than a frontier API at twice the cost. The team is demoralized. The eval system was an afterthought.

Second-most common: building too narrowly. Teams stand up a model and skip the data flywheel. Six months in, they have a frozen artifact that ages out the moment a new frontier model lands. The build only works if it compounds.

Third: confusing "build" with "self-host an open-weight model and call it done." Self-hosting Llama with no fine-tuning, no domain-specific eval, and no data loop is just a more expensive way to rent. You inherit the operational burden without earning any of the moat.

The right pattern is a phased one. Rent until the numbers force the move. When they do, build narrowly and deeply: own the inference path, own the evals, own the data loop. Do not rebuild what frontier providers do better than you ever will. Do build the parts that are specific to your product and your data, because that is where the durable advantage actually lives.

The companies that get this right end up with a stack that gets cheaper, faster, and smarter every quarter. The companies that get it wrong end up with either a per-token tax that scales linearly with success, or a science project that scales with nothing.

Pick your moment. Then commit.

Ajit Samuel is a New York City based founder and operator. He architects, ships, and operates production AI, agentic systems, real-time data platforms, advertising technology, and growth infrastructure. ajitsamuel.com.

All writing About Ajit Contact