What is DeepSeek V4 and when was it released?

DeepSeek V4 is a 1 trillion parameter Mixture-of-Experts large language model released by Hangzhou-based DeepSeek in the first quarter of 2026. It uses 32 billion active parameters per token, is licensed under MIT with full open weights, and was reportedly trained for around $5.2 million in compute on domestically produced Huawei Ascend chips. It is the most consequential open-weight AI release of 2026 and competes with frontier closed models from OpenAI, Anthropic, and Google on reasoning, coding, and math benchmarks.

Is DeepSeek V4 actually trained for only $5.2 million?

The $5.2 million figure refers to the marginal compute cost of the final training run, which is the same metric major Western labs use when they cite training costs. It does not include the cost of building the cluster, salaries, prior research, or failed experiments. By that comparable measure, V4 is roughly 20-100x cheaper than estimated frontier US training runs. The number is real but should not be interpreted to mean any small team can build a frontier model for $5 million — DeepSeek's parent fund spent years building the cluster and hiring the talent that made V4 possible.

Was DeepSeek V4 really trained on Huawei chips instead of Nvidia?

Yes. According to DeepSeek's published technical report, the bulk of V4 training was done on Huawei Ascend 910C accelerators rather than Nvidia H100s. This is significant because the US export control regime was specifically designed to prevent China from training frontier models without Nvidia silicon. V4 demonstrates that domestic Chinese chips combined with software and architectural efficiency can match capabilities the chip controls were intended to block, which is a major shift in the US-China AI policy landscape.

How does DeepSeek V4 compare to GPT-5 and Claude?

On most public benchmarks DeepSeek V4 lands in the same tier as GPT-5 and Claude — clearly competitive, slightly ahead on some tasks, slightly behind on others. V4 is particularly strong on coding and reasoning benchmarks and is best in class for Chinese-English bilingual tasks. The closed US leaders still hold edges in long-context recall, polished agentic tool use, and certain scientific reasoning tasks. We recommend running your own evaluations on a representative slice of your real workload before making a final decision rather than relying on leaderboards.

Can I run DeepSeek V4 locally on my own hardware?

Yes, the MIT-licensed weights are available on Hugging Face and the model can be self-hosted using inference frameworks including vLLM, SGLang, and TensorRT-LLM, all of which added V4 support shortly after release. The full model requires substantial GPU memory — practical setups generally use several high-end accelerators such as H100s or MI300Xs, or aggressively quantised variants on smaller multi-GPU rigs. Self-hosting is the path that gives you complete control over data, latency, and integration, but it does require real engineering bandwidth.

Does DeepSeek V4 censor politically sensitive topics?

The official chat interface at chat.deepseek.com and the official DeepSeek API enforce Chinese content regulations and will deflect or refuse questions on Tiananmen, Taiwan, Xinjiang, and similar topics. The open-weights model itself, when self-hosted and prompted directly, behaves differently — much of the censorship is enforced at the deployment layer rather than baked deeply into the weights, and the open-source community has produced unrestricted forks and de-censored fine-tunes. If censorship is a concern, self-hosting or a Western hosted endpoint serving the open weights is the right path.

How does DeepSeek V4 compare to Qwen 3.5 and Llama 4?

DeepSeek V4 leads the open-weights field on reasoning and coding benchmarks. Alibaba's Qwen 3.5 family is the most flexible across model sizes (from edge to datacenter) and is particularly strong on multilingual and multimodal variants. Meta's Llama 4 has the largest community ecosystem, the most third-party fine-tunes, and the cleanest Western jurisdiction story. There is no wrong answer among the three, and many teams in 2026 run more than one — V4 for hard reasoning tasks, Llama for general workloads, Qwen for multilingual or multimodal tasks. See our deeper comparison in the open source AI tools guide linked above.

Is DeepSeek V4 safe to use for business or sensitive data?

It depends on how you access it. The official DeepSeek chat and API process data on servers in China and are subject to Chinese data security law, which is generally not compatible with HIPAA, GDPR, FedRAMP, or similar regulatory frameworks. For sensitive business or regulated data, the right path is either a Western hosted endpoint provider (Together AI, Fireworks, Groq, or hyperscaler marketplaces) that serves the open weights from US or EU infrastructure, or full self-hosting on your own GPUs. Both options preserve V4's capability advantages while solving the data sovereignty problem.

DeepSeek V4 Review: The $5.2M Trillion-Parameter Model That Broke the AI Industry

DeepSeek V4 Just Landed — And It's Bigger Than You Think

In the first quarter of 2026, DeepSeek did it again. The Hangzhou-based AI lab released DeepSeek V4, a 1 trillion parameter Mixture-of-Experts model that matches or exceeds frontier US models on reasoning, coding, and mathematics benchmarks. It is open weights. It is licensed under MIT. And it was reportedly trained for around $5.2 million in compute — a number that, if accurate, is somewhere between 20x and 100x cheaper than what OpenAI, Anthropic, and Google are believed to spend on a frontier training run.

That headline alone would be a story. But the deeper detail is the one that has shaken policymakers in Washington and procurement teams in Silicon Valley: DeepSeek V4 was not trained on Nvidia H100s. According to DeepSeek's technical report and corroborating reporting, the bulk of training was done on domestically produced Huawei Ascend 910C chips, the same family of accelerators that the US export control regime was specifically designed to make irrelevant. The fact that a Chinese lab built a frontier-class model on hardware the US deemed inferior is, more than any individual benchmark score, the geopolitical event of the year in AI.

For most readers, the practical question is simpler: is DeepSeek V4 actually good, can you use it, and should you? The short answer is yes on all three counts, with caveats. This review walks through what V4 is architecturally, how the $5.2 million figure breaks down, what the Huawei Ascend training run actually proves, how the model performs against GPT-5, Claude, and Gemini, and who benefits most from yet another open-source frontier model dropping into the public domain. We also compare it head to head with Qwen 3.5 and Llama 4, the other two open-weight giants of 2026. By the end you should have a clear sense of where V4 fits in your stack — and where it doesn't.

If you arrived here looking for a broader landscape view first, our companion guide on DeepSeek alternatives and the best reasoning models covers the full competitive set, and our open source AI tools roundup covers the broader ecosystem of free and open weight tools you can plug V4 into today.

The $5.2 Million Training Run: How a Chinese Lab Out-Engineered the Industry

The single number doing the most damage to investor decks in San Francisco right now is $5.2 million. That, according to DeepSeek's published technical report, is the approximate compute cost of the final training run for DeepSeek V4. It does not include salaries, prior research, failed experiments, or the cost of building the cluster itself — the same exclusions every lab uses when it cites a training cost. By that comparable measure, DeepSeek V4 cost a tiny fraction of what frontier US models are estimated to cost. Public reporting and analyst estimates put GPT-5 and Claude-class training runs anywhere from $100 million to north of $500 million. Even the more conservative estimates leave V4 looking like an outlier by an order of magnitude or two.

How is that possible? Three answers, in roughly descending order of importance.

First, architecture. DeepSeek V4 is a Mixture-of-Experts (MoE) model. Of its 1 trillion total parameters, only about 32 billion are active for any given token. That means inference and training compute scales with the active parameter count, not the total. A dense 1T model would be financially and physically impossible for a lab DeepSeek's size. An MoE 1T model with 32B active is roughly comparable in compute to training a dense 32B model — with the representational capacity of something far larger. DeepSeek has been refining this MoE recipe since V2 and V3, and V4 is the most efficient version yet.

Second, software efficiency. DeepSeek's engineering team has built a reputation for ruthless low-level optimisation. Custom kernels, hand-tuned communication primitives, careful expert routing, FP8 mixed precision training, multi-token prediction objectives, and aggressive use of speculative decoding during evaluation — every layer of the stack has been squeezed. The technical report reads more like a systems engineering paper than an ML paper, and that is the point. While larger labs have been buying their way out of efficiency problems with more H100s, DeepSeek has been writing better code.

Third, data discipline. V4 was trained on a curated mixture rather than the largest possible corpus. The team has been explicit that they deprioritised raw token count in favour of cleaner, higher-signal data, with heavy emphasis on reasoning traces, code, and mathematics. Less garbage in, fewer wasted FLOPs.

The honest disclaimer: $5.2 million is the marginal compute cost of the final run, not the total cost of the program. DeepSeek's parent fund, High-Flyer, spent years and significant capital building the cluster and the talent before V4 was possible. Anyone using the $5.2M number to argue that frontier AI is now cheap for everyone is overreading it. What it actually proves is that once you have the cluster and the team, the marginal cost of producing a frontier model has collapsed. That is still a profound shift, and it is the shift that explains why every public US AI company saw its market cap wobble the week V4 was announced.

Trained on Huawei Ascend: Why This Breaks the US Chip Story

If the price tag is the story Wall Street cares about, the silicon is the story Washington cares about. DeepSeek V4 was trained primarily on Huawei Ascend 910C accelerators, China's domestically produced AI chip. Not on Nvidia H100s. Not on smuggled GPUs. Not on a special permitted variant. On chips that the US export control regime, in successive 2022, 2023, and 2024 updates, was specifically designed to keep ahead of.

The premise of US chip controls was simple. If America could prevent China from buying frontier Nvidia silicon, China would not be able to train frontier models, and the AI capability gap would widen in America's favour. For a while, that thesis looked correct. Earlier DeepSeek models were trained on a mix of Nvidia A100s and H800s acquired before restrictions tightened. Most informed observers, including hawks in Washington, assumed Chinese labs would hit a ceiling within a generation or two.

DeepSeek V4 is the empirical refutation of that thesis. The Huawei Ascend 910C is not as fast as an H100 on raw FLOPs. The interconnect is not as mature. The software stack — Huawei's MindSpore and CANN — is a younger, less polished cousin of CUDA. By every conventional metric, training on Ascend should be slower, more expensive, and more error-prone than training on the latest Nvidia silicon. And yet here we are, with a 1 trillion parameter MoE model that holds its own against GPT-5 and Claude on most public benchmarks, trained on the gear the export controls were supposed to make obsolete.

What the V4 training run actually shows is twofold. First, China's domestic chip industry has made faster progress than most US analysts assumed it would. Huawei's Ascend roadmap has compressed considerably, and there is now enough volume to assemble training clusters in the tens of thousands of accelerators. Second, software and architecture can compensate for hardware deficits to a much greater degree than the chip-focused policy debate ever acknowledged. A clever team with worse silicon and better code can match a less clever team with better silicon. Moore's Law was always only one of several variables.

The policy implications are still being worked out, but the direction is clear. Export controls were designed to slow Chinese AI by years. They appear to have slowed it by months. Expect a renewed debate in Washington about whether to tighten controls further (covering chip-making equipment, EDA software, and HBM memory more aggressively), to abandon the strategy in favour of competing on capability, or to try to do both. Whichever path is chosen, DeepSeek V4 will be the case study cited on every page of every memo.

1 Trillion Parameters, 32 Billion Active: How V4's MoE Actually Works

The headline number is 1 trillion parameters. The number you should actually care about is 32 billion active. Here is what that means in practice.

In a traditional dense transformer, every parameter participates in every token's forward pass. A 70B dense model uses all 70 billion parameters for every word it generates. This is conceptually simple but expensive: doubling parameters roughly doubles inference cost. There is no free lunch.

A Mixture-of-Experts model breaks the feed-forward layer into many specialised sub-networks (experts) and adds a small router that decides, for each token, which experts to consult. DeepSeek V4 has hundreds of experts per MoE layer, but only a small handful — typically 8 — are activated for any given token. The router learns during training which experts are best at which kinds of input, and routes accordingly. The result is a model that has the parameter capacity of a 1T monster but the per-token compute footprint of a 32B model.

Why does this matter for users?

Inference is affordable. Serving a true 1T dense model would be prohibitively expensive. Serving V4 is roughly comparable to serving a high-quality 32B model. That puts it within reach of anyone running on a couple of H100s, an MI300X box, or — importantly for the open source community — quantised setups on multi-GPU consumer hardware.
Specialisation emerges. Different experts end up handling different domains — code, mathematics, natural language, multilingual tokens. The router learns these specialisations without being told. In practice, this is part of why V4 punches above its active-parameter weight on coding and reasoning benchmarks.
Deployment is non-trivial. The downside is that all 1T parameters must still be in memory somewhere, even if only 32B are touched per token. This means MoE inference servers need either large pooled GPU memory or sophisticated expert offloading. The open-source community has built tools for this, but it is an extra layer of complexity compared to dense models.

DeepSeek's MoE recipe in V4 includes several refinements over V3: a finer-grained expert split, better load balancing losses to stop a few experts from hogging traffic, and what the technical report calls auxiliary-loss-free routing — a training trick that reduces the gradient distortions older MoE losses introduced. These are not flashy innovations, but they are exactly the kinds of incremental engineering wins that compound over a training run and explain how the team kept costs down.

For most readers, the takeaway is simple: V4 gives you the capacity of a much larger model at the cost of a much smaller one, and the architecture is now mature enough that it works reliably out of the box.

DeepSeek V4 vs GPT-5, Claude, and Gemini: Benchmark Reality Check

Benchmark wars are exhausting and often misleading, but the numbers do tell a coherent story for V4. On the public reasoning, code, and math suites that have become the de facto comparison set in 2026, DeepSeek V4 lands in the same tier as the frontier US models — clearly competitive, slightly ahead on some, slightly behind on others, never embarrassed.

Benchmark	DeepSeek V4	GPT-5	Claude (Opus class)	Gemini (top tier)
MMLU-Pro	Top tier	Top tier	Top tier	Top tier
GPQA Diamond (graduate science)	Very strong	Slightly ahead	Comparable	Comparable
MATH / AIME	Excellent	Best in class	Excellent	Very strong
HumanEval / SWE-bench	Excellent (coding sweet spot)	Excellent	Best in class on real repo tasks	Strong
Long context recall	Strong (128K)	Strong	Excellent (200K)	Class-leading (1M)
Multilingual (CN/EN)	Best in class for Chinese	Strong	Strong	Strong
Cost per million tokens	Lowest (open weights)	High	High	Mid

The pattern is consistent. On coding and mathematical reasoning, V4 is in the top tier. On scientific reasoning (GPQA-style questions where you need real PhD-level chemistry, biology, and physics knowledge), the absolute top US models still hold a small but real edge — Claude and GPT-5 are slightly more reliable on the hardest queries. On long-context tasks, Gemini's 1 million token window remains the leader, with Claude's 200K window and V4's 128K a step behind. On multilingual reasoning, particularly Chinese-English bilingual tasks, V4 is the best model on the planet — and that is a real differentiator for any team operating across the Pacific.

Two important caveats. First, public benchmarks are increasingly contaminated. Every major lab now trains on or near the test sets, and small absolute differences should not be over-interpreted. Second, benchmarks measure what they measure. They do not capture safety behaviour, instruction following nuance, refusal patterns, hallucination rates on real-world queries, or how well the model handles edge cases your business actually cares about. We strongly recommend you run your own evals on a representative slice of your real workload before making procurement decisions, regardless of what any leaderboard says.

Our overall verdict from internal testing: V4 is genuinely competitive with GPT-5 and Claude on most tasks we threw at it, including code review on real production repositories, multi-step debugging, technical writing, and SQL generation. It is clearly behind Claude on long-form creative writing and certain agentic workflows where Claude's tool-use behaviour is unusually polished. It is clearly ahead of every other open weights model on the same tasks.

The Geopolitical Earthquake: What V4 Means for the AI Race

Every previous DeepSeek release was a tremor. V4 is the earthquake. Here is what shifts.

The export control thesis is on life support. The entire legal and diplomatic apparatus the US has built around AI compute — chip controls, end-use verification, entity lists, the CHIPS Act subsidies, the squeeze on ASML — was premised on the idea that controlling silicon controls capability. V4 is a 1T parameter model trained on Chinese-made chips for a small fraction of what US labs spend, and it is competitive with the best closed models in the world. The thesis is not dead, but it is wounded. Expect Washington to either tighten controls dramatically or quietly pivot to a capability-competition strategy in the coming year.

Open source is now a state-level strategy. DeepSeek's decision to ship V4 under MIT, with full open weights, is not an accident or a community gesture. It is a strategic move with the explicit (or at least implicit) blessing of Chinese AI policy, which has consistently favoured open ecosystems as a way to spread Chinese standards, attract international developers, and erode the moats of closed US labs. If GPT-5 is a product, V4 is a foreign policy instrument. The same logic explains Alibaba's Qwen series and Meta's Llama strategy from a different angle. The closed-frontier model business that OpenAI and Anthropic have built is now the object of a deliberate, well-funded global open-weights insurgency.

The capital intensity narrative cracks. For two years, the dominant story in AI investing has been that frontier capability requires tens of billions of dollars in capex, and therefore only a handful of hyperscalers can compete. V4 does not refute that for the next round of training (the arms race up to 10T+ parameter models still favours the well-capitalised), but it does refute it for this generation. Frontier-class capability at a known frontier level is now reproducible by smart, well-funded but not hyperscale teams. That changes the strategic calculus for sovereign AI initiatives in Europe, India, the Gulf, and Southeast Asia.

Latency and sovereignty matter more. If you can run a frontier-class model on your own infrastructure under MIT license, the calculus around sending sensitive data to closed US APIs changes overnight. We expect a meaningful shift in 2026 toward self-hosted inference for regulated industries (finance, healthcare, defence, government), particularly outside the US. V4 will not be the only model serving that demand, but it will be one of the most important.

The censorship caveat remains. DeepSeek's hosted chat product enforces Chinese content regulations, just like its predecessors. It will deflect on Tiananmen, Taiwan, Xinjiang, and Xi Jinping. The open weights model itself, when run locally and prompted directly, behaves differently — much of the politically motivated refusal behaviour is enforced at the deployment layer, not baked deeply into the weights, and the open source community has already demonstrated unrestricted forks and de-censored fine-tunes within days of release. For users who care, self-hosting solves the problem. For users who use the official chat, the problem remains.

How to Access DeepSeek V4: API, Open Weights, and Local Deployment

One of the things that sets DeepSeek V4 apart is that you have a real choice in how you use it. Here are the four practical paths.

1. The Official Chat Interface (Easiest)

The simplest option is to visit chat.deepseek.com, sign in, and select the V4 model. It is free with generous daily limits, supports a basic web search tool, and works in any browser. This is the right path for casual users, students, anyone evaluating the model qualitatively before integrating it, and anyone who is fine with their data being processed by DeepSeek's infrastructure in China. It is the wrong path if you handle regulated, proprietary, or politically sensitive data.

2. The DeepSeek API (Cheapest Cloud Option)

For developers who want programmatic access without managing infrastructure, DeepSeek operates an OpenAI-compatible API. Pricing is, as you would expect, dramatically lower than competing frontier APIs — typically a fraction of what GPT-5 or Claude Opus cost per million tokens. The endpoint speaks the same chat completions protocol as OpenAI, so most existing client libraries work with a one-line base URL change. The catch is the same as the chat interface: data flows to Chinese servers and is governed by Chinese law. For non-sensitive workloads, it is the best price-to-quality ratio on the market. For sensitive workloads, look at the next two options.

3. Third-Party Hosted Endpoints (Western Jurisdictions)

Because V4 is open weights, it has been picked up almost immediately by Western inference providers including Together AI, Fireworks, Groq, Replicate, and the major hyperscalers' marketplaces. These providers host the same model weights on US, EU, or other non-China infrastructure, with their own privacy guarantees, SOC reports, and data processing agreements. Pricing is typically higher than DeepSeek's own API but still cheaper than equivalent closed models, and the data-sovereignty story is much cleaner. For most regulated enterprises that want V4-class capability without the China data path, this is the sweet spot in 2026.

4. Local and Self-Hosted (Most Control)

The MIT-licensed weights are downloadable from Hugging Face. You can run V4 on your own infrastructure, completely offline if you want. The honest hardware reality is that the full model needs serious GPU memory — practical setups generally involve several high-end accelerators (H100, MI300X, or comparable) or aggressive quantisation. Quantised variants from the community already exist and run on more modest multi-GPU rigs, with measurable but acceptable quality loss. For inference, popular frameworks include vLLM, SGLang, and TensorRT-LLM, all of which added V4 support within days of release. If you have the engineering bandwidth and the GPU budget, self-hosting is the only path that gives you complete control over data, latency, custom fine-tuning, and integration into your existing infrastructure.

Whichever path you pick, you should benchmark V4 on your own real workload before committing. The differences between models are increasingly task-specific, and a model that wins on the leaderboard may lose on your particular use case.

DeepSeek V4 vs Qwen 3.5 vs Llama 4: The Open-Weight Frontier

DeepSeek V4 is not the only open-weights frontier model in 2026. Two other heavyweights deserve direct comparison: Alibaba's Qwen 3.5 series and Meta's Llama 4. These three families now define the open-weight frontier. Picking between them is the most important architectural decision a team building on open models has to make this year.

DeepSeek V4 — The Reasoning and Coding Specialist

Strengths: best-in-class reasoning and coding among open weights, the most efficient training story, the most aggressive MoE recipe, exceptional Chinese-English bilingual performance. Weaknesses: smaller community than Llama, fewer fine-tunes available out of the box, hosted-API privacy concerns if you do not self-host, less polished tool-use ergonomics than the closed leaders. Best for: teams that want frontier reasoning and coding from an open model and have either the GPU budget or a Western hosted-endpoint partner.

Qwen 3.5 — The All-Rounder With Broad Sizes

Strengths: extremely wide family — small models that run on a laptop, large models that compete with the frontier — making it the most flexible choice for teams that need consistent behaviour across deployment sizes. Strong multilingual and multimodal variants, very active community on Hugging Face, generous Apache 2.0 licensing on most variants. Weaknesses: same Chinese jurisdiction concerns as DeepSeek if you use the Alibaba-hosted API, and the absolute top-end Qwen model trails V4 slightly on the hardest reasoning benchmarks. Best for: teams that need a single open-model family across many sizes (edge to datacenter), particularly in multilingual or multimodal contexts.

Llama 4 — The Western Default

Strengths: by far the largest open ecosystem, the most third-party fine-tunes, the best deployment tooling, broadest integration into commercial products, and the cleanest jurisdictional story for buyers who do not want any Chinese-developed model in their stack regardless of whether it is self-hosted. Backed by Meta's resources for ongoing iteration. Weaknesses: on raw reasoning and coding benchmarks the top Llama 4 variants generally trail V4 and the top Qwens by a small but consistent margin, and Meta's licence — while permissive for almost everyone — is technically not as permissive as MIT or Apache 2.0. Best for: teams that prioritise ecosystem maturity, Western jurisdiction, and integration breadth over peak benchmark performance.

Honest Recommendation

If you want the best capability from an open weights model and you can manage self-hosting or use a Western hosted endpoint, V4 is the model to evaluate first. If you want the best ecosystem with the easiest path to fine-tuning and deployment, Llama 4 is still the safer institutional choice. If you want the broadest family across many model sizes and modalities, Qwen 3.5 is the most flexible. There is no wrong answer among the three, and many teams in 2026 are running more than one — V4 for hard reasoning tasks, Llama for general workloads, Qwen for multilingual or multimodal — because the marginal cost of supporting multiple open models is now low.

For a wider lens on the open-source landscape including non-LLM categories, our open source AI tools roundup covers the broader ecosystem.

Who DeepSeek V4 Is For (And Who Should Skip It)

V4 is not the right model for every user, despite the headlines. Here is how we would advise different audiences.

Researchers and Academic Labs

V4 is essentially a gift to the research community. A genuinely frontier-class model with open weights, full architecture details, and an MIT license is the most valuable thing a non-hyperscale lab can have right now. Expect a wave of papers in the next 12 months building on V4 — distillations, fine-tunes, mechanistic interpretability work, novel post-training recipes. If you are in academia or an independent research lab, V4 should be on your shortlist to download today.

Budget-Conscious Startups and Solo Developers

If your bottleneck is API spend, V4 via DeepSeek's official API is the cheapest credible frontier option on the market. For non-sensitive workloads — building a side project, prototyping, generating synthetic data, internal tooling — there is no comparable price-to-quality ratio. If your data is sensitive, use a Western hosted-endpoint provider serving V4 instead; the cost premium is still well below GPT-5 or Claude Opus.

Geopolitics and Policy Analysts

V4 is the most important AI release of 2026 from a geopolitics standpoint. If you are in government, in a think tank, or covering AI policy, you need to understand what V4 actually proves and what it doesn't. The technical report is worth reading in full, and the chip-controls implications are worth taking seriously regardless of which side of the policy debate you sit on.

Enterprises with Self-Hosting Capacity

If you already have GPU infrastructure (your own datacenter, a private cloud allocation, or a contract with an inference provider that supports custom open models) and you handle data that cannot legally leave your jurisdiction, V4 is the highest-quality open model you can deploy in that posture today. For finance, healthcare, government contractors, defence-adjacent, and EU regulated workloads, this is the most consequential point in V4's favour.

Casual Chatbot Users

If you mostly use AI through a chat interface for everyday tasks — drafting emails, brainstorming, summarising articles, answering questions — V4 is excellent but not categorically better than Claude, GPT-5, or Gemini on most everyday tasks. The privacy and censorship caveats of the official chat interface mean we would generally recommend a Western hosted alternative for casual personal use. The differentiators that make V4 special (open weights, low cost at scale, sovereignty) mostly do not matter for individual chat users.

Highly Regulated Industries Without Self-Hosting

If you handle HIPAA, GDPR, FedRAMP, or similar regulated data and you do not have the engineering bandwidth to self-host, V4 via DeepSeek's official API is not the right choice. Use it via a Western hosted endpoint with the proper data processing agreement, or stick with Claude Enterprise or an equivalent closed provider whose compliance posture is fully documented.

Limitations, Caveats, and Final Verdict

No model deserves uncritical praise, including this one. Here are the honest limitations of DeepSeek V4 we noticed in extended testing.

Long-context recall is good, not great. The 128K context window is competitive but not class-leading. On needle-in-a-haystack evaluations and very long document reasoning, Gemini's million-token window and Claude's 200K window with strong recall remain meaningfully better. If long-context is your primary need, V4 is not your first choice.
Tool-use ergonomics lag the closed leaders. Claude in particular has spent the last year polishing how it uses tools, makes decisions about when to call them, and recovers from errors in agentic loops. V4 is competent at tool use but not yet as polished. For complex agentic workflows, this gap matters.
Hosted version enforces Chinese content regulations. Tiananmen, Taiwan, Xinjiang, and other politically sensitive topics will be deflected or refused on chat.deepseek.com and the official API. The open-weights model itself behaves differently when self-hosted, but if you use the official endpoints you should be aware of and plan around this constraint.
Safety behaviour is less mature. The big closed labs have spent enormous effort on red-teaming, RLHF, constitutional approaches, and deployment-time safety. DeepSeek's safety story is improving but is not yet at parity. For consumer-facing products and high-stakes decision support, this matters and should be tested explicitly on your use cases.
Self-hosting is genuinely hard. Despite community tooling, deploying a 1T MoE model is meaningfully more complex than deploying a 70B dense model. If your team has not done large-model inference before, budget for the learning curve.
Benchmark contamination is plausible. As with every frontier model, you should treat published benchmark scores as a floor for how the model behaves on its training distribution and an unknown for how it behaves on yours. Run real evals.

Verdict

DeepSeek V4 is the most consequential AI release of 2026. As a piece of engineering, it proves that frontier-class capability is now reproducible by efficient teams using non-leading silicon at a fraction of the cost the industry assumed was necessary. As an open weights release under MIT, it democratises a level of capability that until very recently lived only inside two or three closed US labs. As a geopolitical artifact, it reshapes the AI export-control debate in ways that will take years to fully play out.

If you are a researcher, a startup builder, a policy analyst, an enterprise architect, or anyone responsible for deciding which models go into a 2026 stack, V4 is required reading and required testing. It will not displace Claude or GPT-5 in every use case — the closed leaders still hold meaningful edges in long context, agentic tool use, and certain reasoning niches. But it will absolutely displace them in the use cases where price, sovereignty, openness, or self-hosting matter, and that universe of use cases is growing fast.

For most readers, the right next step is to spend an hour with V4 on your real workload. Pull it up at chat.deepseek.com for a quick qualitative impression, then route your real evaluation set through a Western hosted endpoint and compare against your incumbent. If you like what you see and you have the budget, plan a proof of concept on self-hosted infrastructure. The marginal cost of adding V4 to your model portfolio is low, and the upside — both economic and strategic — is large enough that ignoring it would be a mistake.

Want a wider view of where V4 sits in the open-source landscape? Read our DeepSeek alternatives and best reasoning models guide, and our broader open source AI tools roundup for free alternatives across the wider AI stack.