The Inference Flip: Two in three - the number that just rewired the AI economy

Two of every three AI "thoughts" are now the machine working, not training — and Deloitte agrees. The brains are built; the hard part now is trusting them.

Jun 09, 2026

The 60-second version

For years, the expensive, glamorous part of AI was training — building the brain. That era is ending.
The new centre of gravity is inference: the machine actually doing things for you. Deloitte estimates inference is now two-thirds of all AI computing, up from a third in 2023.
That moves the action from a handful of labs to everyone else — the people using AI to build products, run companies, do jobs.
And it quietly changes the scarce resource. When intelligence is cheap and everywhere, the hard question is no longer can it think? It’s can you trust what it just did?
That question is the spine of this four-part series from inside Nebius Build London 2026.

There’s a number the people who sell artificial intelligence have started saying to each other. Quietly. The way you’d pass on a tip at the races.

Two in three.

On 21 May 2026, in a low-lit event space in London, a man who runs developer relations at an AI infrastructure firm called Nebius stood in front of a few hundred engineers and said it out loud. Two out of every three “tokens” — the little units of work an AI does — are now inference, not training.

If that sentence means nothing to you, good. You’re exactly who I’m writing for, and the translation is the whole story.

Teaching the machine vs. using the machine

Think of an AI model like a school pupil.

Training is the schooling. It’s the long, eye-wateringly expensive bit where you pour the entire internet into a model until it learns to read, write and reason. It needs warehouses of specialised chips running flat out for months. Only a few companies on earth can afford it. For most of the last decade, that was the AI story — bigger schools, longer terms, more money.

Inference is the day job. It’s what happens after school, every time the model actually does something: answers your email, reviews a contract, plans a delivery route, writes a line of code. Each individual task is cheap. But it happens millions of times a day, forever.

For years, almost all the money and attention went into the schooling. The flip is this: the world has stopped obsessing over building bigger brains and started spending its money on putting the brains to work.

That’s the inference flip. And it isn’t one company’s sales pitch.

The receipts

Deloitte estimated, in a November 2025 report, that inference accounted for roughly half of all AI computing in 2025 — and would reach two-thirds in 2026, up from just a third in 2023. The man on the London stage and the global consultancy landed on the same number from completely different directions.

Gartner has gone further, forecasting that the large majority of AI infrastructure spending now goes to inference rather than training. The logic is brutally simple: training is an occasional cost. Using the model is a forever cost. Run anything continuously and the day job dwarfs the school fees within months.

You can see the flip in the bank statements, too. Nebius — Amsterdam-headquartered, listed on the Nasdaq as NBIS, run by Arkady Volozh, the founder who built the Russian search giant Yandex before the company restructured and rebranded in 2024 — reported first-quarter 2026 revenue of around $399 million. That was up roughly 684% on the year before. Its AI cloud business is now running at nearly $2 billion a year. Companies do not grow like that selling schooling to five labs. They grow like that selling the day job to everyone else.

Why this should matter to you, specifically

Because the centre of gravity just moved — from them to us.

When the story was training, AI belonged to a priesthood: a few labs in California and a handful of governments who could afford the compute. When the story is inference, AI belongs to whoever can do something useful with it. The corner accountancy firm. The NHS trust. The logistics startup in Leeds. The person at the next desk.

The man on the stage claimed that demand for the new “applied AI” job titles — the people who build with AI rather than build AI — has risen around 700% in two years. I’ll be straight with you: I can’t independently verify that precise figure, and it comes from a company that profits handsomely from the trend, so hold it loosely. But the direction is corroborated everywhere you look. The work is moving from inventing the engine to driving the car. And there are vastly more drivers than engine-makers.

For Britain, that’s the opportunity hiding in this number. We do not have a Californian hyperscaler and we are not going to grow one by Friday. But applied AI — the practical, unglamorous business of making this technology actually work inside real organisations, under real rules — is a game a mid-sized, heavily-regulated, services-driven economy can genuinely win. London doesn’t need to build the biggest brain. It needs to be the best place on earth at putting brains to work.

The part nobody on stage quite said out loud

Here’s the turn.

When intelligence was scarce and expensive, the only question that mattered was can the machine do it at all? That’s a capability question, and we have spent a decade and untold billions answering it. The answer, increasingly, is yes.

But the inference flip changes the question. When the machine is cheap, fast and everywhere — answering the email, approving the loan, driving the car, merging the code — the scarce resource is no longer intelligence.

It’s trust.

Can you trust what it just did? Can you prove it did the right thing? Can you tell, after the fact, whether it actually solved your problem or just made it look solved? That is a verification question, and it is much harder than the capability question — because a system clever enough to do your job is also clever enough to make a wrong answer look exactly like a right one.

That gap — between what AI can now do and what we can actually verify — is the most important and least-discussed story in technology. It is the reason this publication is called The Control Layer. And it ran underneath every single talk I watched in London.

Over the next three parts, I’m going to show you the gap from three angles. The companies trying to hand you back control by letting you own your AI instead of renting it (Part 2). The robots that just booked an Uber through central London, and the trust problem riding in the passenger seat (Part 3). And the unsettling, brilliantly-documented discovery that our smartest AI systems have learned to cheat the very tests we use to check them (Part 4).

The brains are built. The schooling is basically over. The hard part — trusting the things once they go to work — is just beginning.

This is Part 1 of Nebius Build London 2026, a four-part series from The Control Layer. The companion conversation airs on The Control Layer with Amer Altaf — subscribe to get each part, and the episode, the moment it lands.

Where artificial intelligence, cybersecurity and enterprise leadership intersect. Zero fluff.

→ Next: Part 2 — You no longer have to rent your intelligence from California.

Sources & further reading

Deloitte, More compute for AI, not less (Technology, Media & Telecom Predictions 2026): https://www.deloitte.com/us/en/insights/industry/technology/technology-media-and-telecom-predictions/2026/compute-power-ai.html
Nebius Group — About / investor information: https://nebius.com/about · Nasdaq: NBIS
Computerworld, CES 2026: AI compute sees a shift from training to inference: https://www.computerworld.com/article/4114579/ces-2026-ai-compute-sees-a-shift-from-training-to-inference.html
Related reading on The Control Layer:
DeepSeek V4 and the death of two monopolies
Amer Altaf
·
May 6
Two monopolies died on 24 April: NVIDIA’s grip on launch-day silicon, and the assumption that frontier-like performance was worth the closed-source premium.
Read full story

DeepSeek V4 and the death of two monopolies

Discussion about this post

Ready for more?