Breaking the Quadratic Curse
How DeepSeek Rewrites the Rules of AI
On December 1, 2025, DeepSeek released V3.2—and promptly earned gold medals at both the International Mathematical Olympiad and the International Olympiad in Informatics. Only about 8% of human participants achieve gold at the IMO, a competition that demands deep insight, creativity, and rigour. The underlying model, DeepSeek-V3, cost just $5.5 million to train using 2,000 GPUs—a dramatic reduction compared to the massive computational resources American competitors typically deploy. The performance gap between these models has effectively closed; the cost gap has not.

This disparity points to something deeper than mere frugality. The AI industry has long been constrained by what engineers call the quadratic curse: the standard Transformer architecture requires each token to compute its relationship with every preceding token, meaning that when sequence length doubles, computational cost quadruples. The implications are severe. Data centres supporting AI workloads already consume 415 terawatt-hours annually—roughly 1.5% of global electricity—with projections suggesting this will double to around 945 terawatt-hours by 2030. American AI labs have largely addressed the quadratic constraint by scaling up—more GPUs, larger data centres, greater power consumption. DeepSeek chose a different path: algorithmic innovation.
Breaking the Quadratic Curse
To understand why this matters, consider how standard attention mechanisms work. Imagine reading a 500-page book, but every time you encounter a new sentence, you must re-read every previous sentence to understand its context. This is essentially what conventional Transformers do—and it explains why they excel at understanding but struggle to scale. The computational burden grows quadratically with context length, which is why processing long documents or maintaining extended conversations has traditionally required enormous resources. A model handling 100,000 tokens must perform far more than twice the work of one handling 50,000.
DeepSeek Sparse Attention (DSA) fundamentally reimagines this process. The mechanism operates in two stages. First, a lightning indexer rapidly scans the context to identify which previous tokens are most relevant to the current query. The vLLM team, which implemented DSA for production deployment, noted that the indexer uses a quantisation scheme that is “new and different” from standard approaches. Think of it as consulting a book’s index rather than reading cover to cover.
In the second stage, the system performs full attention computation only on the top 2,048 most relevant tokens—a selective focus rather than exhaustive review. This reduces complexity from O(L²) to O(Lk), where k is the fixed number of selected tokens. According to technical analyses, DSA achieves approximately 50% reduction in computational cost for long-context tasks while preserving model performance.
The efficiency gains are substantial. DeepSeek trained V3 at a cost of roughly $5.5 million using just 2,000 GPUs—a fraction of the computational resources American competitors typically deploy. The model employs a Mixture-of-Experts architecture with 671 billion total parameters, yet only 37 billion activate per token—specialised sub-networks engage only when needed, further reducing computational overhead.
IBM Fellow Kush Varshney, analysing DeepSeek’s approach, observed that the models demonstrate genuine reasoning capability—what he termed metacognition, or “thinking about thinking.” Rather than producing answers without explanation, the system verifies its outputs through step-by-step analysis. DeepSeek’s training methodology combined chain-of-thought reasoning with reinforcement learning, enabling the model to “discover the best way to think on its own” through correctness rewards rather than relying solely on human-labelled training data.
Gold Medal Proof
Yet efficiency alone proves little if capability suffers. The competitive results suggest otherwise.
DeepSeek-V3.2-Speciale, a variant optimised for extreme reasoning tasks, achieved gold-medal performance on the 2025 International Mathematical Olympiad—a distinction reached by only about 8% of human participants. The IMO is considered the world’s most prestigious mathematics competition precisely because its problems demand deep insight, creativity, and rigour; they cannot be solved through pattern matching or rote calculation. On Codeforces, the competitive programming platform, the model attained a rating of 2701, placing it in the Grandmaster tier. The International Olympiad in Informatics saw similar results, with another gold medal.
In direct benchmark comparisons with closed-source competitors, the pattern holds. On AIME 2025, V3.2-Speciale scored 96.0% against GPT-5 High’s 94.6%. On HMMT 2025, another elite mathematics competition, it reached 99.2%—surpassing Gemini 3 Pro’s 97.5%. On multilingual software engineering tasks, the gap widened further: 70.2% versus GPT-5’s 55.3%. These are not marginal differences.
What makes this significant extends beyond the numbers. DeepSeek’s V3.2 is the first open-source model to achieve gold-medal-level performance on the IMO. This breaks the assumption, common in industry discourse, that frontier capabilities must remain proprietary. The model also introduces Thinking in Tool-Use, a mechanism that maintains reasoning continuity across tool calls. Traditional models tend to “forget” their reasoning chain when invoking external tools; V3.2’s reasoning persists across interactions, enabling practical task execution—software development, research automation, data analysis—rather than conversation alone.
The Global South’s New Frontier
DeepSeek has not kept these capabilities locked away.
The model is released under an MIT licence with complete weights available on Hugging Face, permitting unrestricted commercial use, modification, and redistribution. This openness matters because the global landscape of AI access has shifted. Alibaba’s Qwen ecosystem now hosts over 100,000 derivative models, surpassing Meta’s Llama community to become the world’s largest open-source AI ecosystem. Chinese AI models have achieved significant market traction—DeepSeek commands a 24% share on OpenRouter, a major AI model marketplace, ranking second only to Google.
The cost implications are immediate and practical. DeepSeek’s API charges $0.28 per million input tokens and $0.42 per million output tokens. For a typical workload of 100,000 input and output tokens, DeepSeek costs approximately $0.07 compared to $1.13 with GPT-5—roughly sixteen times cheaper. What once required national-level resources is now accessible within realistic budgets for universities, startups, and government agencies across the developing world.
Nations are responding accordingly. India has announced plans to develop language models based on DeepSeek technology, with initial applications focused on agriculture and climate adaptation—domains where local knowledge and linguistic specificity are essential, and where dependence on American AI providers poses both practical and strategic concerns. South Korea has accelerated its national AI infrastructure programme. Brazil’s 2025 BRICS presidency has prioritised AI governance, exploring partnerships that could counter existing technology dependencies through coordinated infrastructure development and shared standards.
Yet clear-eyed assessment requires acknowledging limitations. As one Brazilian technology leader observed, “open-source cannot address the challenge of building sovereign infrastructure critical to local and national development.” Chips, servers, operating systems, and cloud platforms remain controlled primarily by Western corporations. The European Union’s experience is instructive: despite implementing strict data protection regulations through GDPR and requiring data localisation, American companies still control much of the underlying infrastructure that processes that data. Open-source models represent a significant step toward digital sovereignty—but not the final one.
The Value of Engineering
Prabir Purkayastha, in Knowledge as Commons, argues that “rapid technological change, from pharmaceuticals to electronics, should be an opportunity to deliver quicker cures, affordable access, and global cooperation.” The book challenges the treatment of knowledge as artificially scarce through patents and intellectual property restrictions, noting how such systems systematically disadvantage developing nations. Knowledge, Purkayastha observes, is an unlimited resource treated as artificially scarce, while genuinely finite resources like water are often treated as infinite.
DeepSeek’s trajectory illustrates a related point: engineering progress is not a lesser form of innovation. When American AI development encounters the constraints of the quadratic curse—data centres already consuming 4% of U.S. electricity with demand expected to more than double by 2030, computational costs escalating faster than capability gains—algorithmic efficiency becomes not merely convenient but necessary. Without engineering breakthroughs, even scientific advances reach practical ceilings.

The two approaches are not mutually exclusive. Scaling laws retain predictive power for certain capabilities; DeepSeek itself uses a 671-billion-parameter model. But efficiency innovations extend what is achievable within resource constraints. For nations and developers without access to tens of thousands of high-end GPUs, the efficiency-first path offers something the scale-first path cannot: participation.
DeepSeek has demonstrated that reaching frontier capability does not require frontier capital. The quadratic curse, long treated as an immutable constraint of the Transformer architecture, has been circumvented through careful algorithm design. And by releasing this work openly under a permissive licence, the company has extended the possibility—though not the guarantee—of digital sovereignty to those previously excluded from the conversation.
The hardware dependencies remain. The path forward requires building not just on open-source models but toward genuinely sovereign infrastructure. Yet for nations and developers who have long watched AI development from the outside, a door has opened. What they choose to build with this access will shape the next chapter of the technology’s global trajectory.

