That ‘cheap’ open-source AI model is actually burning through your compute budget

August 15, 2025

29

Table of Contents

A complete new research has revealed that open-source synthetic intelligence fashions devour considerably extra computing sources than their closed-source opponents when performing equivalent duties, doubtlessly undermining their value benefits and reshaping how enterprises consider AI deployment methods.

The analysis, carried out by AI agency Nous Analysis, discovered that open-weight fashions use between 1.5 to 4 instances extra tokens — the fundamental items of AI computation — than closed fashions like these from OpenAI and Anthropic. For easy information questions, the hole widened dramatically, with some open fashions utilizing as much as 10 instances extra tokens.

Measuring Considering Effectivity in Reasoning Fashions: The Lacking Benchmarkhttps://t.co/b1e1rJx6vZ

We measured token utilization throughout reasoning fashions: open fashions output 1.5-4x extra tokens than closed fashions on equivalent duties, however with enormous variance relying on job kind (as much as… pic.twitter.com/LY1083won8

— Nous Analysis (@NousResearch) August 14, 2025

“Open weight fashions use 1.5–4× extra tokens than closed ones (as much as 10× for easy information questions), making them generally costlier per question regardless of decrease per‑token prices,” the researchers wrote of their report revealed Wednesday.

The findings problem a prevailing assumption within the AI business that open-source fashions provide clear financial benefits over proprietary options. Whereas open-source fashions sometimes value much less per token to run, the research suggests this benefit could be “simply offset in the event that they require extra tokens to motive a few given drawback.”

The actual value of AI: Why ‘cheaper’ fashions could break your finances

The analysis examined 19 totally different AI fashions throughout three classes of duties: fundamental information questions, mathematical issues, and logic puzzles. The staff measured “token effectivity” — what number of computational items fashions use relative to the complexity of their options—a metric that has acquired little systematic research regardless of its important value implications.

“Token effectivity is a essential metric for a number of sensible causes,” the researchers famous. “Whereas internet hosting open weight fashions could also be cheaper, this value benefit may very well be simply offset in the event that they require extra tokens to motive a few given drawback.”

Open-source AI fashions use as much as 12 instances extra computational sources than essentially the most environment friendly closed fashions for fundamental information questions. (Credit score: Nous Analysis)

The inefficiency is especially pronounced for Giant Reasoning Fashions (LRMs), which use prolonged “chains of thought” to unravel complicated issues. These fashions, designed to suppose by issues step-by-step, can devour 1000’s of tokens pondering easy questions that ought to require minimal computation.

For fundamental information questions like “What’s the capital of Australia?” the research discovered that reasoning fashions spend “a whole lot of tokens pondering easy information questions” that may very well be answered in a single phrase.

Which AI fashions truly ship bang to your buck

The analysis revealed stark variations between mannequin suppliers. OpenAI’s fashions, notably its o4-mini and newly launched open-source gpt-oss variants, demonstrated distinctive token effectivity, particularly for mathematical issues. The research discovered OpenAI fashions “stand out for excessive token effectivity in math issues,” utilizing as much as 3 times fewer tokens than different business fashions.

Amongst open-source choices, Nvidia’s llama-3.3-nemotron-super-49b-v1 emerged as “essentially the most token environment friendly open weight mannequin throughout all domains,” whereas newer fashions from firms like Magistral confirmed “exceptionally excessive token utilization” as outliers.

The effectivity hole different considerably by job kind. Whereas open fashions used roughly twice as many tokens for mathematical and logic issues, the distinction ballooned for easy information questions the place environment friendly reasoning needs to be pointless.

OpenAI’s newest fashions obtain the bottom prices for easy questions, whereas some open-source options can value considerably extra regardless of decrease per-token pricing. (Credit score: Nous Analysis)

What enterprise leaders must learn about AI computing prices

The findings have quick implications for enterprise AI adoption, the place computing prices can scale quickly with utilization. Corporations evaluating AI fashions usually deal with accuracy benchmarks and per-token pricing, however could overlook the entire computational necessities for real-world duties.

“The higher token effectivity of closed weight fashions usually compensates for the upper API pricing of these fashions,” the researchers discovered when analyzing whole inference prices.

The research additionally revealed that closed-source mannequin suppliers seem like actively optimizing for effectivity. “Closed weight fashions have been iteratively optimized to make use of fewer tokens to scale back inference value,” whereas open-source fashions have “elevated their token utilization for newer variations, presumably reflecting a precedence towards higher reasoning efficiency.”

The computational overhead varies dramatically between AI suppliers, with some fashions utilizing over 1,000 tokens for inside reasoning on easy duties. (Credit score: Nous Analysis)

How researchers cracked the code on AI effectivity measurement

The analysis staff confronted distinctive challenges in measuring effectivity throughout totally different mannequin architectures. Many closed-source fashions don’t reveal their uncooked reasoning processes, as a substitute offering compressed summaries of their inside computations to forestall opponents from copying their methods.

To handle this, researchers used completion tokens — the entire computational items billed for every question — as a proxy for reasoning effort. They found that “most up-to-date closed supply fashions is not going to share their uncooked reasoning traces” and as a substitute “use smaller language fashions to transcribe the chain of thought into summaries or compressed representations.”

The research’s methodology included testing with modified variations of well-known issues to attenuate the affect of memorized options, corresponding to altering variables in mathematical competitors issues from the American Invitational Arithmetic Examination (AIME).

Completely different AI fashions present various relationships between computation and output, with some suppliers compressing reasoning traces whereas others present full particulars. (Credit score: Nous Analysis)

The way forward for AI effectivity: What’s coming subsequent

The researchers counsel that token effectivity ought to grow to be a main optimization goal alongside accuracy for future mannequin improvement. “A extra densified CoT may even enable for extra environment friendly context utilization and should counter context degradation throughout difficult reasoning duties,” they wrote.

The discharge of OpenAI’s open-source gpt-oss fashions, which display state-of-the-art effectivity with “freely accessible CoT,” might function a reference level for optimizing different open-source fashions.

The entire analysis dataset and analysis code can be found on GitHub, permitting different researchers to validate and prolong the findings. Because the AI business races towards extra highly effective reasoning capabilities, this research means that the actual competitors will not be about who can construct the neatest AI — however who can construct essentially the most environment friendly one.

In spite of everything, in a world the place each token counts, essentially the most wasteful fashions could discover themselves priced out of the market, no matter how effectively they’ll suppose.

Supply hyperlink

Tags
AI
AI News

Buy now

That ‘cheap’ open-source AI model is actually burning through your compute budget

The actual value of AI: Why ‘cheaper’ fashions could break your finances

Which AI fashions truly ship bang to your buck

What enterprise leaders must learn about AI computing prices

How researchers cracked the code on AI effectivity measurement

The way forward for AI effectivity: What’s coming subsequent

Related Articles

China’s open AI models are in a dead heat with the...

I Tried GPT 5.2 and This is How It Went..

Undetectable AI vs. Scribbr: Which One Detects AI Writing More Accurately?

Leave a Reply Cancel reply

Latest Articles

China’s open AI models are in a dead heat with the...

I Tried GPT 5.2 and This is How It Went..

Undetectable AI vs. Scribbr: Which One Detects AI Writing More Accurately?

AWS re:Invent was an all-in pitch for AI. Customers might not...

Bone AI raises $12M to challenge Asia’s defense giants with AI-powered...