DeepSeek V4 and the death of two monopolies
Two monopolies died on 24 April: NVIDIA's grip on launch-day silicon, and the assumption that frontier-like performance was worth the closed-source premium.
Two monopolies died on 24 April: NVIDIA’s grip on launch-day silicon, and the assumption that frontier-like performance was worth the closed-source premium.
On the morning of 24 April 2026, a Chinese laboratory most British boardrooms still cannot pronounce correctly released a 1.6-trillion-parameter language model under the MIT licence, posted the technical paper to Hugging Face, and walked off whistling.[1] By the end of the same day, four separate Chinese chipmakers — Huawei Ascend, Cambricon, Hygon Information,
and Moore Threads — announced that DeepSeek V4 was already running, in production, on their hardware.[2] Within seventy-two hours, the independent benchmark roundups had begun to settle, and a picture emerged that was harder to dismiss than the headlines made it sound. On most of the evaluations that matter for enterprise procurement, V4 Pro was at parity with GPT-5.4 and Claude Opus 4.6 — performance that had been considered exceptional, frontier-grade, the bleeding edge of the field, four weeks earlier.[3]
Two sentences. Read them twice.
For roughly a decade, “Day 0 launch support” — the privilege of waking up on the morning a major model drops and finding it already optimised for your silicon — was a courtesy NVIDIA extended to itself. Every other chip vendor on earth queued up behind, hoping to ship a passable inference path within a few months. For roughly a decade, frontier-class performance had also been a privilege. The bleeding edge was where the value lived; everything below the bleeding edge was where one settled, with apologies to the budget. That is the assumption that has just gone, along with the queue.
Jensen Huang reportedly described the demonstration of V4 running on Huawei chips as “a disaster”.[4] The choice of word is interesting. Not a setback. Not a competitive challenge. A disaster. You do not reach for that vocabulary unless something load-bearing has just snapped — and what snapped was, in fact, two things at once.
The first thing to break was the deployment economics of frontier-class inference: NVIDIA’s quiet, durable, decade-long monopoly on launch-day silicon support. The second thing to break, less reported but more consequential for almost every enterprise reading this, was the assumption that the closed-source frontier was the only place to get usable, mature, mid-market-grade artificial intelligence. Because as of 24 April, frontier-class capability — capability the field considered exceptional one month earlier — has come down the pricing curve by roughly an order of magnitude, and it has done so under an MIT licence that lets you fine-tune the weights for your own organisation’s specialist needs. Closed APIs cannot match that. Not by design. Not by any feature roadmap that does not require a fundamental change to the closed-source business model.
This piece is the analytical note for both. What V4 actually is. What it actually does. What silicon it actually runs on. Why the absolute bleeding edge no longer answers most procurement questions. And why the open-weight option, with weights you can fine-tune for the specific shape of your organisation, has just become the default architecture for serious enterprise AI in a way that almost nobody on a procurement committee has yet sat down to think through.
What DeepSeek V4 actually is
Two checkpoints went up on Hugging Face on launch day.[5]
V4 Pro is a 1.6-trillion-parameter mixture-of-experts model with 49 billion active parameters per token. V4 Flash is the lightweight sibling — 284 billion total, 13 billion active. Both ship with a one-million-token context window. Both are released under an MIT licence, which means any organisation with the inference budget can take the weights, modify them, host them on its own infrastructure, fine-tune them on its own data, and deploy them commercially without phoning anyone for permission.
A mixture-of-experts model, for readers who have been reasonably busy with their actual jobs, is a bit like the British civil service in the Yes Minister sense: a vast institution containing many specialised departments, of which only a small number are doing any work on any given question. The genius is not that the institution is small — it is enormous — but that the routing is efficient. You activate the experts you need, leave the rest dormant, and the bill arrives only for the bit that actually answered the question. Nobody outside computer science thought this was an architecture worth getting excited about until DeepSeek made it the only architecture that mattered.
The V4 paper is a sober technical document containing two genuine engineering innovations.[6] The first is the Hybrid Attention Architecture: a combination of Compressed Sparse Attention and Heavily Compressed Attention that, in plain English, allows the model to look at very long documents without melting the GPU. At a one-million-token context, V4 Pro requires only 27 per cent of the per-token inference computation of its predecessor V3.2, and 10 per cent of the key-value cache memory. V4 Flash drops these further still — 10 per cent of the FLOPs and 7 per cent of the cache.[7]
The second is Manifold-Constrained Hyper-Connections, a phrase that does the impossible feat of being even less marketable than its acronym (mHC). What it does is keep the signal stable as it propagates through the deep stack of a 1.6-trillion-parameter network. This matters because most attempts to scale MoE architectures have failed not on capability but on stability — the network either stops learning or starts learning the wrong things, in ways that look fine on the metrics dashboard until they don’t. mHC is, on the evidence so far, a real solution to that problem.[6]
Trained on more than 32 trillion tokens, using a relatively new optimiser called Muon, with mixed-precision training (FP4 for the MoE experts, FP8 for the rest of the parameters). These are the technical details a senior infrastructure engineer would want to see; everyone else can take it on trust that the engineering is serious.[6]
Two frontiers, two stories — and the one that matters for procurement
There are two honest answers to the question “how does V4 Pro compare to the Western frontier?” They are answering different questions, both questions matter, and the one with implications for the line-by-line of an enterprise IT budget is not the one that has dominated the headlines.
The first answer is DeepSeek’s own. The V4 technical paper benchmarks V4-Pro-Max against GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro — the frontier models OpenAI, Anthropic, and Google had shipped as of the point V4 was being engineered. Those models, on 23 April 2026, were the absolute bleeding edge of large language modelling. The state of the art. The reason every Chief Information Officer in the City was being walked into the same enterprise pitch about why the closed frontier was the only credible deployment path. On those comparators, the picture V4 draws is striking.[8]
On MMLU-Pro, the standard knowledge-and-reasoning benchmark, V4 Pro scores 87.5 — exactly tied with GPT-5.4 at 87.5 and within touching distance of Opus 4.6. On SWE-bench Verified, the most widely cited software engineering benchmark, V4-Pro-Max scores 80.6, against Opus 4.6 Max at 80.8 and Gemini 3.1 Pro at 80.6 — a three-way tie within a single percentage point. On MCPAtlas Public, the tool-use and agent-coordination evaluation, V4-Pro-Max scores 73.6 to Opus 4.6 Max’s 73.8: again, parity. On LiveCodeBench, V4 Pro reaches 93.5 — ahead of Gemini 3.1 Pro at 91.7 and Claude Opus 4.6 at 88.8. On Codeforces, the live competitive programming rating, V4 Pro registers 3206 against GPT-5.4 at 3168. Ahead of GPT-5.4 on Codeforces. The model the world’s leading AI laboratory shipped, in 2026, as its single most expensive coding asset.
DeepSeek’s one-line summary of where V4-Pro-Max sits, in the paper itself: it “beats GPT-5.2 and Gemini-3.0-Pro on reasoning, trails GPT-5.4 / Gemini-3.1-Pro by approximately three to six months, and on internal agent evaluations surpasses Claude Sonnet 4.5 while approaching Opus 4.5.”[8] Three to six months. That is the manufacturer’s claim, in a paper that any of the labs being compared can refute, and so far none of them have.
The second answer is CAISI’s. The US Center for AI Standards and Innovation, the NIST evaluator that produces the most rigorous independent benchmarks of frontier models, ran V4 Pro on nine held-out and uncontaminated benchmarks — including the ARC-AGI-2 semi-private dataset and the agency’s internal PortBench software-engineering test. CAISI’s framing in May was that V4 “performs similarly to GPT-5, which was released about eight months ago,” and that V4 “is the most capable PRC model to date across the domains we evaluated.”[9]
Both framings are true. They answer two different questions.
Where does V4 sit relative to the absolute bleeding edge today? CAISI’s answer: roughly eight months behind GPT-5.5.
Where did V4 land relative to the frontier it was actually engineered against — the frontier the entire industry called state-of-the-art four weeks before V4 was released?
DeepSeek’s answer, supported by the independent benchmark roundups: at parity with GPT-5.4 and Opus 4.6, ahead on competitive coding, behind only on the hardest agentic and graduate-level reasoning tasks where the latest GPT-5.5 and Opus 4.7 have pulled clear.[10]
The procurement-relevant question is not which model holds today’s marginal lead? It is what level of capability does my workload actually require, and what is the price gap between getting that level from the closed frontier versus getting it from the open-weight option?
If the Premier League framing helps: V4 Pro is sitting where last season’s title-winning side finished — on the same points total, with the same trophy cabinet, in the same tier of the table. The current title-winners (GPT-5.5 and Opus 4.7) have pulled a few points clear at the top of this season’s table. Nobody is suggesting V4 Pro be relegated. The transfer market — every Chief Information Officer with an inference budget — has just acquired a new option who plays at trophy level for a fraction of the wage bill, and who, unlike the rented stars at the top of the table, can be coached into the specific tactical system the manager actually wants to play.
The capability that won last May has, four weeks later, become the affordable mid-tier capability of this June. That is the speed of commoditisation in the open-weight frontier, and it is the part of the V4 launch that has not yet been priced into a single Western enterprise IT roadmap I have seen this quarter.
The training stack you have probably been told wrong about
This is the part where I owe the reader a correction, because the headline most British technology coverage led with — DeepSeek V4 trained entirely on Huawei chips — is not what happened.
DeepSeek’s own disclosure, confirmed by independent reporting from The Register and ChinaTalk, is that the V4 training run used a hybrid stack: NVIDIA H800 GPUs and Huawei Ascend 910C accelerators.[11] The technical paper itself only states, in passing, that the team validated its expert-parallel scheme on both NVIDIA and Ascend platforms. Independent analysis — and the laws of physics governing large-scale pre-training stability — strongly suggests that the main pre-training run, where stability and scale matter most, relied on the mature NVIDIA infrastructure. Huawei silicon was used in the post-training and reinforcement learning stages, where stability is more forgiving and the workload maps more comfortably onto the Ascend architecture.[12]
The model that has been described to British boardrooms as the first frontier model trained fully on Chinese chips is, in fact, the first frontier model whose training run included Chinese chips at all. That is a genuine achievement, but it is a different achievement, and the difference matters when a Chief Information Officer is making a procurement decision about whether the Chinese AI stack is now a credible end-to-end alternative to the American one.
The honest answer, in May 2026, is not yet, and not for training.
The honest answer for inference — for the deployment side, where almost every enterprise dollar is actually spent — is more interesting. Because that is the factor that has actually shifted.
The day the queue dissolved (the first monopoly that died)
When DeepSeek released V4 on 24 April, four Chinese chipmakers had production-ready inference paths live by the end of the same trading day.[2]
Huawei Ascend’s full product line — A2, A3, and the new 950 — supported both V4 Pro and V4 Flash from launch. The Ascend 950 specifically was demonstrated running V4 Pro using fused kernels and multi-stream parallelism, with combined quantisation, at high throughput and low latency. Cambricon completed Day 0 adaptation through the open-source vLLM inference framework. Hygon’s Deep Computing Unit had its own inference pathway live and the supporting code published. Moore Threads, the smallest of the four, hit launch-day compatibility on its own GPU architecture.
Reread that paragraph. Then consider what it would have looked like in 2025: NVIDIA optimisation in the morning, AMD support a few weeks later, Intel hopefully by the end of the quarter, everyone else when they could manage. Day 0 collective adaptation by the entire Chinese domestic chip ecosystem — Huawei Ascend, Cambricon, Hygon, Moore Threads, all live, all production-grade, all on the same morning — has never happened before for any frontier-class model on any non-NVIDIA hardware anywhere on earth.
This is the factor that has changed. It is not that DeepSeek V4 has reached parity with the latest GPT-5.5 (it has not). It is not that Huawei Ascend can train a 1.6-trillion-parameter model from cold (it cannot, yet). It is that the deployment economics of frontier-class inference are no longer NVIDIA-monopolised. There is now a Chinese stack — model, framework, silicon, optimisation, ecosystem — that runs end-to-end without American hardware on the inference side.
For a UK or European board that has spent the last three years writing capital expenditure forecasts predicated on NVIDIA H100 and Blackwell allocations being the unavoidable bottleneck of any sovereign AI deployment, this is not an academic shift. The Ascend 950 supernodes — Huawei’s data-centre-scale clustered configuration — are scheduled for volume shipment in the second half of 2026.[13] If those ship on time and at the prices Huawei is briefing, the entire NVIDIA-or-nothing logic of European cloud procurement, which Part 3 of the Four Chokepoints series argued was the factor Washington was attempting to weaponise via the MATCH Act, has just acquired its first credible alternative.[14]
That alternative comes, of course, with its own jurisdiction problem. A European bank that cannot lawfully run customer data through a US cloud provider under the CLOUD Act is not going to find Huawei a more comfortable jurisdiction. But the procurement question is no longer NVIDIA or unavailable. It is now NVIDIA or Huawei or unavailable — and the introduction of a third option, even a politically uncomfortable one, fundamentally changes the negotiating posture of every conversation a CIO is now having with a US hyperscaler about pricing, terms, and capacity allocation.
Jensen Huang’s “disaster” reaction is, when you put the calendar on it, the rational reaction of a chief executive who has just watched his company’s structural pricing power degrade in a single news cycle. NVIDIA’s commercial position is not collapsing — at the GTC keynote in San Jose on 16 March, Huang himself raised guidance for combined Blackwell and Vera Rubin orders to $1 trillion through 2027, double the $500 billion through-2026 figure he gave at GTC 2025.[15] But the launch-day monopoly has gone, and once a monopoly goes, it does not come back without a structurally different policy intervention. Doubling the order book and losing the launch-day monopoly are not contradictory data points — they are the same data point. NVIDIA’s position is tighter at the top, more contested at the edges, and the edges are where margin compression eventually shows up.
Reading this far?
Subscribe to The Control Layer for one piece a week in this register — AI, cybersecurity, sovereignty, and the geopolitics of the technology stack. Free.
The inference economics: Aldi has come for the Champagne aisle (the second monopoly that died)
The pricing is the part of the V4 launch that should have triggered an immediate procurement review at every UK and European organisation running enterprise AI workloads at scale.
DeepSeek V4 Flash, the lightweight commercial variant, costs $0.14 per million input tokens and $0.28 per million output tokens. The Pro variant, at full list price, is $1.74 per million input cache-miss tokens and $3.48 per million output tokens; until 5 May 2026, the company is running a 75 per cent introductory discount, putting the effective rate at $0.435 input and $0.87 output.[16] Cache hits — where the application is sending a substantially repeated prompt prefix — drop the effective input cost to $0.03 per million.
Compared with Claude Opus 4.6, V4 Pro is roughly 11.6 times cheaper on input tokens and approximately 28.7 times cheaper on output tokens.[17] Compared with the latest GPT-5.5 ($30 per million output) or Opus 4.7 ($25 per million output), V4 Pro is 8 to 9 times cheaper on output. V4 Flash is somewhere between 90 and 100 times cheaper than the closed-source frontier on the same metric.[18] CAISI’s own assessment confirms that V4 was more cost-efficient than GPT-5.4 mini — OpenAI’s most cost-competitive contemporary model — on five of seven benchmarks tested.[9]
For a long time, the running joke about Chinese open-weight models was that they were the Aldi of foundation AI: cheaper, no-frills, fine for the tea-and-toast portion of the workload but obviously something different from the Waitrose ready meal you would put on the table when the in-laws were visiting. As of 24 April, that joke has stopped working. V4 Pro is on the same shelf, in the same aisle, blind-tasted within a percentage point of the branded option, and the price label still says eleven and a half quid less for the same bottle. Aldi has come for the Champagne aisle, and the Champagne is no longer obviously winning the blind taste test.
The complication is that the frontier premium is, in fact, justified by the small subset of workloads where V4 Pro genuinely lags — Terminal-Bench 2.0 agentic engineering, the hardest GPQA Diamond reasoning, the orchestration scaffolding (OpenAI’s Symphony specification, Anthropic’s Model Context Protocol, AWS Bedrock Managed Agents) that the open-weight ecosystem is still assembling. The frontier premium is real. The question for any organisation running production AI workloads is no longer whether the premium exists, but whether the workload actually requires it.
For the ninety per cent of UK enterprise use cases I have seen quoted in procurement requests this year — document summarisation, customer correspondence drafting, structured data extraction, internal search, code completion on bounded codebases, regulatory compliance review, contract analysis, internal Q&A across institutional knowledge — V4 Flash at $0.14/$0.28, or V4 Pro at $0.435/$0.87, is overwhelmingly the rational choice. Not because it is the bleeding edge. Because it is good enough, on the workload that actually pays the bills, at a price that makes the bleeding edge look like an extravagance an audit committee will eventually ask awkward questions about.
That sentence — good enough, on the workload that actually pays the bills, at a price that makes the bleeding edge look extravagant — is the procurement question that most UK and European boards have not yet asked themselves out loud. The release of DeepSeek V4 is the moment they will have to.
Off-the-rack frontier, or Savile Row tailoring?
There is a structural advantage the open-weight option holds over any closed-source API, and it is the part of the V4 release that most cost-comparison articles have undersold. Capability the field considered exceptional one month ago is now available not just at a fraction of the cost, but as weights you actually own. Weights you can fine-tune.
The closed-source frontier — GPT-5.5, Opus 4.7, Gemini 3.1 Pro — is access to bleeding-edge intelligence through a metered API, sitting in someone else’s data centre, in someone else’s jurisdiction, accessed under terms of service that the provider can revise on thirty days’ notice. It is the off-the-rack suit. The cut is excellent. The cloth is the best on the market this season. And it has been built to fit nobody in particular and almost everybody on average.
For a generic workload — summarise this document, draft a polite reply, extract these data fields — off-the-rack is fine. The cut works. Nobody is going to notice you bought the model in your inbox rather than commissioned it.
For a specialist workload, off-the-rack stops being fine. A barrister’s chambers needs a model that has actually read the case law in their practice area, with their preferred citation conventions, on the conventions of English legal drafting rather than American. A diagnostic imaging startup needs a model fine-tuned on the specific modality and pathology classes the radiologists actually report. A defence contractor working on classified material needs a model whose fine-tuning data never crosses a jurisdictional boundary that an export control lawyer will subsequently regret. A mid-market manufacturing business with thirty years of operational documentation needs a model that has read its own thirty years of operational documentation, not the average of every CNC manual on the open web.
These are Savile Row workloads. They need bespoke. They need a tailor who has measured the specific shoulders.
The closed APIs offer fine-tuning, to varying and grudging degrees. None of them offer it in a way that gives the customer ownership of the tailored garment. Your fine-tuning data goes to OpenAI’s servers. The fine-tuned model that emerges still lives on OpenAI’s infrastructure, billable per token, accessible only through the same metered API. The base model can change underneath you with thirty days’ notice, and your fine-tuned variant inherits whatever the underlying provider decides next quarter. You have rented a tailored suit from a rental shop. You did not commission it.
V4 Pro under the MIT licence is a different proposition. The weights are yours. You fine-tune on your own infrastructure, on your own data, against your own evaluation set, and the resulting model is a thing you own that does not change unless you change it. The classified material does not leave your jurisdiction. The base model does not get deprecated by a vendor’s product roadmap. The audit trail of what was the model trained on, on what date, and who has access to the weights is internal to the organisation and reviewable by your own compliance function.
That is a structural advantage no closed API can match, because the closed-source business model depends on the weights staying on the provider’s side of the API boundary. The choice is not between high-quality model with fine-tuning and low-quality model without. It is between bleeding-edge model rented through an API where the tailoring is also rented, and near-frontier model owned outright, where the tailoring is owned and reviewable too.
For any specialist enterprise workload — legal, medical, financial regulatory, defence, scientific research, regulated mid-market — the second option has just become the rational default. The first option survives only for the workloads where the marginal capability gap between V4 Pro and Opus 4.7 demonstrably matters more than the ability to tailor to your specific organisational shape. That is a smaller set of workloads than the procurement function has been told.
The procurement memo nobody has yet written
The honest version of the procurement question, for a UK Chief Information Officer reading this in May 2026, runs something like this.
For workloads that do not require the absolute bleeding edge — and that is most workloads — the rational architecture is now a tiered one: open-weight inference (V4 Flash, V4 Pro, or successors) on commodity infrastructure for the bulk of throughput, with selective routing to the closed-source frontier for the small subset of queries that demonstrably benefit from the additional capability. Anyone telling you the workload is all frontier-grade is either selling you the frontier or has not done the workload analysis.
For specialist workloads — and most enterprise AI is specialist once you look closely — the rational architecture is no longer call the closed frontier and live with the off-the-rack fit. It is take an open-weight model at near-frontier capability, fine-tune it on your own data, and deploy it on infrastructure you control. V4 Pro under the MIT licence has just made that approach not only technically viable but commercially compelling. The fine-tuning advantage is the part of the V4 release that the inference-cost story has overshadowed, and it is the part that compounds.
For sovereign deployment specifically, V4 Pro under the MIT licence can be self-hosted on infrastructure under the operator’s own jurisdiction. This does not solve every sovereignty problem — the model was trained partly on the Chinese chip stack, the underlying training data is opaque, and the geopolitical question of using a Chinese-origin model in regulated UK sectors is its own conversation — but it does fundamentally change the architecture of the discussion. Self-hosted, fine-tuned, frontier-class inference, on UK infrastructure, with weights you control and an inference path that does not depend on a US hyperscaler’s API rate limit, is now technically and commercially viable in a way it was not in 2025.
For the agentic workloads where V4 Pro genuinely lags — autonomous multi-step engineering tasks, complex tool-use orchestration, anything where Terminal-Bench 2.0 is a relevant proxy*** — the closed-source frontier is still the rational choice, and the frontier premium still has to be paid. But the addressable market for that premium has just contracted by, on a back-of-envelope calculation, somewhere between sixty and eighty per cent of total enterprise AI spend.
That contraction is the precise reason Western hyperscaler share prices wobbled in late April even as the four largest US hyperscalers — Amazon at approximately $200 billion, Microsoft at $190 billion, Alphabet at $180–$190 billion, and Meta at $125–$145 billion — pushed combined 2026 infrastructure capital expenditure to a record total in the region of $700 billion to $725 billion.[19] The investor question that broke through the earnings calls was not will the capacity be built? It was will the revenue per token sustain the build at the pace announced? DeepSeek V4 is the data point that turned that question from theoretical to immediate.
The prediction, made to be falsified
Within twelve months — by May 2027 — at least two of the three leading Western frontier API providers (OpenAI, Anthropic, Google) will publish per-million-token output prices at least 50 per cent below their May 2026 levels, with at least one earnings call explicitly citing open-weight competition as a contributing factor.
The signals to watch: pricing-page revisions on openai.com, anthropic.com, and ai.google.dev. Volume-tier renegotiations leaked to The Information or Bloomberg.
Quarterly earnings call commentary referencing “competitive pressure on inference economics” or similar language. The most precise leading indicator will be in the Microsoft Azure OpenAI Service pricing schedule, where reductions in the underlying API rate flow through to enterprise contract renewals on a roughly two-quarter lag.
The prediction is falsifiable. If by May 2027 the listed per-million-token output rates for GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro have not fallen by at least 50 per cent from May 2026 baseline on any of the three platforms, the prediction is wrong. Quiet enterprise discounts that do not appear in published rate cards do not satisfy the prediction; volume tier rebates do not satisfy the prediction; the price drop must appear in the public list price.
If it does, the inference-economics moat will have been visibly eroded by open-weight competition within a single year of V4’s release. If it does not, the closed-source frontier will have proved more economically defensible than this analysis implies, and I will publish the correction.
The publication that calls its predictions in writing.
Every Control Layer piece ends with a falsifiable prediction and a list of signals to watch. Subscribe to track them. One email a week. Free.
Subscribe free
The bottom line
Three things need to be on the procurement agenda this fortnight, and all of them are uncomfortable.
The first is the rate card audit. Every UK and European organisation currently paying for frontier API inference at GPT-5.5 or Opus 4.7 list prices needs to know which of those workloads are genuinely frontier-grade and which are paying the frontier premium for the comfort of the frontier brand. Most procurement functions cannot answer this question today because they have never had to. They will have to now.
The second is the chip stack audit. Anyone whose sovereign AI strategy is currently predicated on NVIDIA being the only viable frontier-class inference silicon needs to spend the next quarter understanding what changes, in their architecture and in their commercial leverage, if Huawei Ascend 950 supernodes ship on time and at the prices Huawei is briefing. The answer may be nothing changes for us, because of the China jurisdiction problem. That is a defensible answer. It is not a defensible answer to not have considered the question.
The third is the architecture audit. The release of V4 Pro under the MIT licence, with the option to fine-tune on your organisation’s own data, on infrastructure your organisation controls, is the moment self-hosted, specialist-tuned, near-frontier inference became a genuine architectural option for serious enterprise deployments. The decision to use it or not use it is not a technology decision. It is a sovereignty, jurisdiction, fine-tuning, and risk-tolerance decision, and it now has to be taken consciously rather than defaulted on.
DeepSeek V4 is not the Sputnik moment its champions claim. It is not the catch-up moment its detractors will dismiss it as next month. It is something more useful and less dramatic than either. It is the specific date — 24 April 2026 — on which two assumptions underpinning Western enterprise AI strategy stopped being true at the same time.
The first assumption was that NVIDIA’s launch-day grip on frontier-class deployment silicon was a fact of nature. It was not. It was a commercial position, and the commercial position has just been contested.
The second assumption was that the closed-source frontier was the only credible deployment path for serious enterprise AI, and that the bleeding edge was where the value lived. It was not. The capability the field considered state-of-the-art four weeks before V4 launched is now available at a fraction of the cost, with weights you can fine-tune for the specific shape of your own organisation, on infrastructure you control. The bleeding edge has its place. Most workloads do not require it. Most workloads require good enough, tailored to the specific job, deployed where you can audit it.
You cannot build a sovereign AI strategy on a single supplier’s silicon. That has, quietly, this fortnight, become an operational fact rather than an aspiration.
You cannot build a serious enterprise AI strategy on the assumption that the bleeding edge is the answer to most of your questions, either. Most of your questions are bespoke. Most bespoke questions deserve a tailored answer. And tailoring, this fortnight, became an open-weight conversation.
The next piece is on the open-weight fine-tuning question UK regulators have not yet asked.
Subscribe to The Control Layer to get the analytical thread continued — one piece a week, free, in the same register. From Amer Altaf, Managing Editor.
Subscribe free
Amer Altaf is Founder and CEO of Arkava,
a UK and European sovereign AI agentic automation business, and Managing Editor of The Control Layer, the publication where he tracks the convergence of cyber security, technology sovereignty, and geopolitics. A techUK member, he contributes to industry engagement on UK technology sovereignty policy. He is currently writing on cloud security for Oxford University Press’s Expert Essentials series.
References
[1]: DeepSeek AI. DeepSeek V4 Preview Release. DeepSeek API Documentation, 24 April 2026. Technical paper: DeepSeek V4: Towards Highly Efficient Million-Token Context Intelligence, posted to Hugging Face on the same date.
[2]: TrendForce. “Huawei Ascend, Cambricon and Hygon Completed Day 0 Adaptation to DeepSeek-V4.” 29 April 2026. Corroborated by Digital Quotient India, “Huawei Ascend, Cambricon, and Hygon complete Day 0 adaptation to DeepSeek-V4,” and Cryptopolitan, “DeepSeek adds vision as China’s chip supply chain shows it can finally keep pace,” April 2026.
[3]: Performance framing per the DeepSeek V4 technical paper and independent benchmark roundups. The DeepSeek paper benchmarks V4-Pro-Max against GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro and characterises the position as “trailing GPT-5.4 / Gemini-3.1-Pro by approximately three to six months.” See [8] for benchmark detail and [9] for the CAISI evaluation against the more recent GPT-5.5 frontier.
[4]: Reported by 36Kr (EU edition), “Jensen Huang Labels It a ‘Disaster’: DeepSeek Runs Successfully on Huawei Chips,” April 2026. Quote attribution should be treated as second-hand reporting; no direct NVIDIA press confirmation has been issued at time of writing.
[5]: Hugging Face. deepseek-ai/DeepSeek-V4-Pro model card, and deepseek-ai/DeepSeek-V4-Flash model card, . Both released 24 April 2026 under MIT licence. Hugging Face technical blog: “DeepSeek-V4: a million-token context that agents can actually use,” .
[6]: DeepSeek AI. DeepSeek V4 Technical Report (PDF), released 24 April 2026 via Hugging Face. Architecture details: Hybrid Attention (Compressed Sparse Attention + Heavily Compressed Attention), Manifold-Constrained Hyper-Connections, Muon optimiser, mixed precision (FP4 for MoE experts, FP8 for remaining parameters), pre-trained on more than 32 trillion tokens.
[7]: DeepSeek V4 Technical Report (above) and Hugging Face technical blog. At one-million-token context, V4 Pro requires 27 per cent of single-token inference FLOPs and 10 per cent of KV cache memory compared with V3.2. V4 Flash requires 10 per cent of FLOPs and 7 per cent of KV cache.
[8]: Benchmark figures from the DeepSeek V4 technical paper and corroborated by independent comparison reporting. MMLU-Pro: V4-Pro 87.5, GPT-5.4 87.5. SWE-bench Verified: V4-Pro-Max 80.6, Opus 4.6 Max 80.8, Gemini 3.1 Pro 80.6. MCPAtlas Public: V4-Pro-Max 73.6, Opus 4.6 Max 73.8. LiveCodeBench: V4-Pro 93.5, Gemini 3.1 Pro 91.7, Opus 4.6 88.8. Codeforces rating: V4-Pro 3206, GPT-5.4 3168. Terminal-Bench 2.0: V4-Pro-Max 67.9, Gemini 3.1 Pro 68.5, GPT-5.4 xHigh 75.1. Sources: Hugging Face technical report summary at https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/discussions/129; OfficeChai, “DeepSeek Releases V4-Pro & V4-Flash, Delivers GPT 5.4 & Opus 4.6-Level Performance At Fraction Of The Price,”; NxCode, “DeepSeek V4 vs Claude Opus 4.6 vs GPT-5.4: AI Coding Model Comparison,”; Verdent, “DeepSeek V4 vs Claude Opus 4.6 vs GPT-5.5 for Agentic Coding,” ; Funda AI, “DeepSeek V4 vs Claude vs GPT-5.4: A 38-Task Benchmark,” ; Simon Willison, “DeepSeek V4 — almost on the frontier, a fraction of the price,”.
[9]: U.S. National Institute of Standards and Technology, Center for AI Standards and Innovation (CAISI). “CAISI Evaluation of DeepSeek V4 Pro.” May 2026. CAISI’s evaluation included nine benchmarks across cyber, software engineering, natural sciences, abstract reasoning, and mathematics, including two held-out and uncontaminated benchmarks: ARC-AGI-2 (semi-private dataset) and CAISI’s internal PortBench software-engineering evaluation. CAISI’s framing: V4 “performs similarly to GPT-5, which was released about eight months ago” and is “the most capable PRC model to date across the domains we evaluated.”
[10]: Galaxy.ai composite-score comparison, “Claude Opus 4.6 vs DeepSeek V4 Pro,” . Composite weighted-average benchmark scores: Claude Opus 4.7 8.72, Claude Opus 4.6 (Thinking) 8.72, DeepSeek V4 Pro 8.27, Claude Opus 4.6 (standard) 8.17. V4 Pro sits between standard Opus 4.6 and Opus 4.6 with extended-thinking mode enabled.
[11]: The Register. “DeepSeek’s new models offer big inference cost savings,” 24 April 2026. . ChinaTalk Media. “DeepSeek V4”. Both report DeepSeek’s disclosure that V4 training used both NVIDIA H800 GPUs and Huawei Ascend 910C accelerators.
[12]: AIproem. “Part 2: What DeepSeek V4 means for Huawei and Nvidia.”. The China Academy. “Why DeepSeek V4 Hasn’t Fully Cut Ties with Nvidia.”. The DeepSeek V4 paper itself states only that the team “validated its fine-grained EP scheme on both NVIDIA GPUs and Ascend NPU platforms.”
[13]: Fortune. “DeepSeek unveils V4 model, with rock-bottom prices and close integration with Huawei’s chips.” 24 April 2026. DeepSeek pricing guidance suggests V4-Pro inference costs are scheduled to fall further once Ascend 950 supernodes reach volume shipment in the second half of 2026.
[14]: Altaf, Amer. “The equipment chokehold: ASML, the MATCH Act, and the end of the allied exemption.” The Control Layer, 28 April 2026. Part 3 of the Four Chokepoints series.
[15]: CNBC. “Nvidia GTC 2026: CEO Jensen Huang sees $1 trillion in orders for Blackwell and Vera Rubin through ‘27.” 16 March 2026. TechCrunch. “Jensen Huang just put Nvidia’s Blackwell and Vera Rubin sales projections into the $1 trillion stratosphere.” 16 March 2026. Data Center Knowledge. “GTC 2026: Nvidia Unveils Vera Rubin AI Platform, Eyes $1T by 2027.” The $1 trillion through-2027 figure represents a doubling of the $500 billion through-end-of-2026 guidance issued at GTC 2025.
[16]: DeepSeek AI. Models & Pricing (USD pricing schedule). V4 Flash: $0.14 per million input tokens, $0.28 per million output. V4 Pro list price: $1.74 per million cache-miss input, $3.48 per million output. Discount of 75 per cent on V4 Pro running until 5 May 2026, putting effective rates at $0.435 / $0.87. Cache-hit input: $0.03 per million.
[17]: Galaxy.ai cost-comparison data, “Claude Opus 4.6 vs DeepSeek V4 Pro,”. Anthropic Claude Opus 4.6 list pricing implies V4 Pro is approximately 11.6× cheaper on input tokens and approximately 28.7× cheaper on output tokens at full V4 Pro list price.
[18]: VentureBeat. “DeepSeek-V4 arrives with near state-of-the-art intelligence at 1/6th the cost of Opus 4.7, GPT-5.5.” April 2026. Output token comparison: GPT-5.5 listed at approximately $30 per million output tokens; Claude Opus 4.7 at approximately $25 per million output tokens. V4 Pro at $3.48 per million output is approximately 8–9× cheaper than the latest closed frontier; V4 Flash at $0.28 per million output is approximately 90–100× cheaper.
[19]: Hyperscaler 2026 capital expenditure figures aggregated from Q1 2026 earnings disclosures: Microsoft Corporation, Q3 FY2026 earnings call, 29 April 2026, raising calendar-2026 capex guidance to approximately $190 billion (CNBC, “Microsoft calls for $190 billion in 2026 capital spending on soaring memory prices,” ); Alphabet Inc., Q1 2026 earnings, 29 April 2026, raising full-year 2026 capex guidance to $180–$190 billion from $175–$185 billion (CNBC, “Alphabet (GOOGL) Q1 2026 earnings,” ); Meta Platforms Inc., Q1 2026 earnings, 29 April 2026, raising 2026 capex guidance to $125–$145 billion (CNBC, “Meta Q1 2026 earnings report,”); Amazon.com Inc., Q1 2026 earnings, 29 April 2026, with CEO Andy Jassy committing approximately $200 billion in calendar-2026 capital expenditure (CNBC, “Amazon (AMZN) Q1 earnings report 2026,”). Combined Big Four total reported by Tom’s Hardware as a record $725 billion ; OfficeChai cites $715 billion.
The Control Layer publishes weekly. Subscribe free.
Decision-grade analysis on AI, cybersecurity, technology sovereignty, and the geopolitics of the technology stack — written for the board paper, not the timeline. By Amer Altaf, Founder & CEO of Arkava and Managing Editor of The Control Layer.
Subscribe free
One email a week. No paywalls on the analytical pieces. Unsubscribe in one click.






