Cerebras just announced 6 new AI datacenters that process 40M tokens per second — and it could be bad news for Nvidia

March 11, 2025

81

Table of Contents

Cerebras Programs, an AI {hardware} startup that has been steadily difficult Nvidia’s dominance within the synthetic intelligence market, introduced Tuesday a major growth of its information heart footprint and two main enterprise partnerships that place the corporate to change into the main supplier of high-speed AI inference companies.

The corporate will add six new AI information facilities throughout North America and Europe, rising its inference capability twentyfold to over 40 million tokens per second. The growth consists of services in Dallas, Minneapolis, Oklahoma Metropolis, Montreal, New York, and France, with 85% of the full capability situated in the US.

“This 12 months, our purpose is to actually fulfill all of the demand and all the brand new demand we anticipate will come on-line on account of new fashions like Llama 4 and new DeepSeek fashions,” mentioned James Wang, Director of Product Advertising and marketing at Cerebras, in an interview with VentureBeat. “That is our big progress initiative this 12 months to fulfill virtually limitless demand we’re seeing throughout the board for inference tokens.”

The information heart growth represents the corporate’s bold guess that the marketplace for high-speed AI inference — the method the place skilled AI fashions generate outputs for real-world functions — will develop dramatically as firms search quicker alternate options to GPU-based options from Nvidia.

Cerebras plans to broaden from 2 million to over 40 million tokens per second by This fall 2025 throughout eight information facilities in North America and Europe. (Credit score: Cerebras)

Strategic partnerships that carry high-speed AI to builders and monetary analysts

Alongside the infrastructure growth, Cerebras introduced partnerships with Hugging Face, the favored AI developer platform, and AlphaSense, a market intelligence platform extensively used within the monetary companies business.

The Hugging Face integration will permit its 5 million builders to entry Cerebras Inference with a single click on, with out having to join Cerebras individually. This represents a serious distribution channel for Cerebras, notably for builders working with open-source fashions like Llama 3.3 70B.

“Hugging Face is form of the GitHub of AI and the middle of all open supply AI improvement,” Wang defined. “The combination is tremendous good and native. You simply seem of their inference suppliers checklist. You simply verify the field after which you should utilize Cerebras straight away.”

The AlphaSense partnership represents a major enterprise buyer win, with the monetary intelligence platform switching from what Wang described as a “international, high three closed-source AI mannequin vendor” to Cerebras. The corporate, which serves roughly 85% of Fortune 100 firms, is utilizing Cerebras to speed up its AI-powered search capabilities for market intelligence.

“It is a super buyer win and a really massive contract for us,” Wang mentioned. “We pace them up by 10x so what used to take 5 seconds or longer, mainly change into on the spot on Cerebras.”

Mistral’s Le Chat, powered by Cerebras, processes 1,100 tokens per second—considerably outpacing opponents like Google’s Gemini, ChatGPT, and Claude. (Credit score: Cerebras)

How Cerebras is successful the race for AI inference pace as reasoning fashions decelerate

Cerebras has been positioning itself as a specialist in high-speed inference, claiming its Wafer-Scale Engine (WSE-3) processor can run AI fashions 10 to 70 instances quicker than GPU-based options. This pace benefit has change into more and more worthwhile as AI fashions evolve towards extra complicated reasoning capabilities.

“When you hearken to Jensen’s remarks, reasoning is the subsequent massive factor, even in response to Nvidia,” Wang mentioned, referring to Nvidia CEO Jensen Huang. “However what he’s not telling you is that reasoning makes the entire thing run 10 instances slower as a result of the mannequin has to assume and generate a bunch of inside monologue earlier than it provides you the ultimate reply.”

This slowdown creates a possibility for Cerebras, whose specialised {hardware} is designed to speed up these extra complicated AI workloads. The corporate has already secured high-profile prospects together with Perplexity AI and Mistral AI, who use Cerebras to energy their AI search and assistant merchandise, respectively.

“We assist Perplexity change into the world’s quickest AI search engine. This simply isn’t doable in any other case,” Wang mentioned. “We assist Mistral obtain the identical feat. Now they’ve a cause for folks to subscribe to Le Chat Professional, whereas earlier than, your mannequin might be not the identical cutting-edge stage as GPT-4.”

Cerebras’ {hardware} delivers inference speeds as much as 13x quicker than GPU options throughout standard AI fashions like Llama 3.3 70B and DeepSeek R1 70B. (Credit score: Cerebras)

The compelling economics behind Cerebras’ problem to OpenAI and Nvidia

Cerebras is betting that the mixture of pace and value will make its inference companies enticing even to firms already utilizing main fashions like GPT-4.

Wang identified that Meta’s Llama 3.3 70B, an open-source mannequin that Cerebras has optimized for its {hardware}, now scores the identical on intelligence checks as OpenAI’s GPT-4, whereas costing considerably much less to run.

“Anybody who’s utilizing GPT-4 immediately can simply transfer to Llama 3.3 70B as a drop-in substitute,” he defined. “The worth for GPT-4 is [about] $4.40 in blended phrases. And Llama 3.3 is like 60 cents. We’re about 60 cents, proper? So that you scale back price by virtually an order of magnitude. And if you happen to use Cerebras, you enhance pace by one other order of magnitude.”

Inside Cerebras’ tornado-proof information facilities constructed for AI resilience

The corporate is making substantial investments in resilient infrastructure as a part of its growth. Its Oklahoma Metropolis facility, scheduled to return on-line in June 2025, is designed to resist excessive climate occasions.

“Oklahoma, as you understand, is a form of a twister zone. So this information heart truly is rated and designed to be totally proof against tornadoes and seismic exercise,” Wang mentioned. “It should stand up to the strongest twister ever recorded on document. If that factor simply goes by way of, this factor will simply hold sending Llama tokens to builders.”

The Oklahoma Metropolis facility, operated in partnership with Scale Datacenter, will home over 300 Cerebras CS-3 techniques and options triple redundant energy stations and customized water-cooling options particularly designed for Cerebras’ wafer-scale techniques.

Constructed to resist excessive climate, this facility will home over 300 Cerebras CS-3 techniques when it opens in June 2025, that includes redundant energy and specialised cooling techniques. (Credit score: Cerebras)

From skepticism to market management: How Cerebras is proving its worth

The growth and partnerships introduced immediately signify a major milestone for Cerebras, which has been working to show itself in an AI {hardware} market dominated by Nvidia.

“I believe what was cheap skepticism about buyer uptake, perhaps once we first launched, I believe that’s now totally put to mattress, simply given the variety of logos we’ve,” Wang mentioned.

The corporate is focusing on three particular areas the place quick inference offers probably the most worth: real-time voice and video processing, reasoning fashions, and coding functions.

“Coding is one in all these form of in-between reasoning and common Q&A that takes perhaps 30 seconds to a minute to generate all of the code,” Wang defined. “Pace straight is proportional to developer productiveness. So having pace there issues.”

By specializing in high-speed inference moderately than competing throughout all AI workloads, Cerebras has discovered a distinct segment the place it may well declare management over even the most important cloud suppliers.

“No person typically competes towards AWS and Azure on their scale. We don’t clearly attain full scale like them, however to have the ability to replicate a key phase… on the high-speed inference entrance, we could have extra capability than them,” Wang mentioned.

Why Cerebras’ US-centric growth issues for AI sovereignty and future workloads

The growth comes at a time when the AI business is more and more centered on inference capabilities, as firms transfer from experimenting with generative AI to deploying it in manufacturing functions the place pace and cost-efficiency are essential.

With 85% of its inference capability situated in the US, Cerebras can be positioning itself as a key participant in advancing home AI infrastructure at a time when technological sovereignty has change into a nationwide precedence.

“Cerebras is turbocharging the way forward for U.S. AI management with unmatched efficiency, scale and effectivity – these new international datacenters will function the spine for the subsequent wave of AI innovation,” mentioned Dhiraj Mallick, COO of Cerebras Programs, within the firm’s announcement.

As reasoning fashions like DeepSeek R1 and OpenAI’s o3 change into extra prevalent, the demand for quicker inference options is prone to develop. These fashions, which might take minutes to generate solutions on conventional {hardware}, function near-instantaneously on Cerebras techniques, in response to the corporate.

For technical resolution makers evaluating AI infrastructure choices, Cerebras’ growth represents a major new different to GPU-based options, notably for functions the place response time is essential to consumer expertise.

Whether or not the corporate can really problem Nvidia’s dominance within the broader AI {hardware} market stays to be seen, however its give attention to high-speed inference and substantial infrastructure funding demonstrates a transparent technique to carve out a worthwhile phase of the quickly evolving AI panorama.

Supply hyperlink

Tags
AI
AI News

Buy now

Cerebras just announced 6 new AI datacenters that process 40M tokens per second — and it could be bad news for Nvidia

Strategic partnerships that carry high-speed AI to builders and monetary analysts

How Cerebras is successful the race for AI inference pace as reasoning fashions decelerate

The compelling economics behind Cerebras’ problem to OpenAI and Nvidia

Inside Cerebras’ tornado-proof information facilities constructed for AI resilience

From skepticism to market management: How Cerebras is proving its worth

Why Cerebras’ US-centric growth issues for AI sovereignty and future workloads

Related Articles

China’s open AI models are in a dead heat with the...

I Tried GPT 5.2 and This is How It Went..

Undetectable AI vs. Scribbr: Which One Detects AI Writing More Accurately?

Leave a Reply Cancel reply

Latest Articles

China’s open AI models are in a dead heat with the...

I Tried GPT 5.2 and This is How It Went..

Undetectable AI vs. Scribbr: Which One Detects AI Writing More Accurately?

AWS re:Invent was an all-in pitch for AI. Customers might not...

Bone AI raises $12M to challenge Asia’s defense giants with AI-powered...