Shortly after Hunter Lightman joined OpenAI as a researcher in 2022, he watched his colleagues launch ChatGPT, one of many fastest-growing merchandise ever. In the meantime, Lightman quietly labored on a crew instructing OpenAI’s fashions to resolve highschool math competitions.
At this time that crew, referred to as MathGen, is taken into account instrumental to OpenAI’s industry-leading effort to create AI reasoning fashions: the core know-how behind AI brokers that may do duties on a pc like a human would.
“We have been attempting to make the fashions higher at mathematical reasoning, which on the time they weren’t superb at,” Lightman advised iinfoai, describing MathGen’s early work.
OpenAI’s fashions are removed from good immediately — the corporate’s newest AI methods nonetheless hallucinate and its brokers battle with complicated duties.
However its state-of-the-art fashions have improved considerably on mathematical reasoning. One in all OpenAI’s fashions lately received a gold medal on the Worldwide Math Olympiad, a math competitors for the world’s brightest highschool college students. OpenAI believes these reasoning capabilities will translate to different topics, and in the end energy general-purpose brokers that the corporate has all the time dreamed of constructing.
ChatGPT was a cheerful accident — a lowkey analysis preview turned viral shopper enterprise — however OpenAI’s brokers are the product of a years-long, deliberate effort inside the firm.
“Ultimately, you’ll simply ask the pc for what you want and it’ll do all of those duties for you,” mentioned OpenAI CEO Sam Altman on the firm’s first developer convention in 2023. “These capabilities are sometimes talked about within the AI area as brokers. The upsides of this are going to be great.”
Techcrunch occasion
San Francisco
|
October 27-29, 2025
Whether or not brokers will meet Altman’s imaginative and prescient stays to be seen, however OpenAI shocked the world with the discharge of its first AI reasoning mannequin, o1, within the fall of 2024. Lower than a yr later, the 21 foundational researchers behind that breakthrough are essentially the most extremely sought-after expertise in Silicon Valley.
Mark Zuckerberg recruited 5 of the o1 researchers to work on Meta’s new superintelligence-focused unit, providing some compensation packages north of $100 million. One in all them, Shengjia Zhao, was lately named chief scientist of Meta Superintelligence Labs.
The reinforcement studying renaissance
The rise of OpenAI’s reasoning fashions and brokers are tied to a machine studying coaching approach referred to as reinforcement studying (RL). RL offers suggestions to an AI mannequin on whether or not its selections have been right or not in simulated environments.
RL has been used for many years. As an illustration, in 2016, a few yr after OpenAI was based in 2015, an AI system created by Google DeepMind utilizing RL, AlphaGo, gained world consideration after beating a world champion within the board recreation, Go.
Round that point, considered one of OpenAI’s first staff, Andrej Karpathy, started pondering how one can leverage RL to create an AI agent that would use a pc. However it could take years for OpenAI to develop the mandatory fashions and coaching methods.
By 2018, OpenAI pioneered its first massive language mannequin within the GPT sequence, pretrained on huge quantities of web information and a big clusters of GPUs. GPT fashions excelled at textual content processing, finally resulting in ChatGPT, however struggled with fundamental math.
It took till 2023 for OpenAI to realize a breakthrough, initially dubbed “Q*” after which “Strawberry,” by combining LLMs, RL, and a method known as test-time computation. The latter gave the fashions further time and computing energy to plan and work by issues, verifying its steps, earlier than offering a solution.
This allowed OpenAI to introduce a brand new strategy known as “chain-of-thought” (CoT), which improved AI’s efficiency on math questions the fashions hadn’t seen earlier than.
“I may see the mannequin beginning to cause,” mentioned El Kishky. “It could discover errors and backtrack, it could get pissed off. It actually felt like studying the ideas of an individual.”
Although individually these methods weren’t novel, OpenAI uniquely mixed them to create Strawberry, which immediately led to the event of o1. OpenAI rapidly recognized that the planning and truth checking skills of AI reasoning fashions could possibly be helpful to energy AI brokers.
“We had solved an issue that I had been banging my head in opposition to for a few years,” mentioned Lightman. “It was one of the vital thrilling moments of my analysis profession.”
Scaling reasoning
With AI reasoning fashions, OpenAI decided it had two new axes that may enable it to enhance AI fashions: utilizing extra computational energy through the post-training of AI fashions, and giving AI fashions extra time and processing energy whereas answering a query.
“OpenAI, as an organization, thinks quite a bit about not simply the way in which issues are, however the way in which issues are going to scale,” mentioned Lightman.
Shortly after the 2023 Strawberry breakthrough, OpenAI spun up an “Brokers” crew led by OpenAI researcher Daniel Selsam to make additional progress on this new paradigm, two sources advised iinfoai. Though the crew was known as “Brokers,” OpenAI didn’t initially differentiate between reasoning fashions and brokers as we consider them immediately. The corporate simply needed to make AI methods able to finishing complicated duties.
Ultimately, the work of Selsam’s Brokers crew turned half of a bigger venture to develop the o1 reasoning mannequin, with leaders together with OpenAI co-founder Ilya Sutskever, chief analysis officer Mark Chen, and chief scientist Jakub Pachocki.
OpenAI must divert treasured sources — primarily expertise and GPUs — to create o1. All through OpenAI’s historical past, researchers have needed to negotiate with firm leaders to acquire sources; demonstrating breakthroughs was a surefire solution to safe them.
“One of many core elements of OpenAI is that every little thing in analysis is backside up,” mentioned Lightman. “After we confirmed the proof [for o1], the corporate was like, ‘This is smart, let’s push on it.’”
Some former staff say that the startup’s mission to develop AGI was the important thing think about attaining breakthroughs round AI reasoning fashions. By specializing in creating the smartest-possible AI fashions, relatively than merchandise, OpenAI was capable of prioritize o1 above different efforts. That kind of huge funding in concepts wasn’t all the time attainable at competing AI labs.
The choice to strive new coaching strategies proved prescient. By late 2024, a number of main AI labs began seeing diminishing returns on fashions created by conventional pretraining scaling. At this time, a lot of the AI area’s momentum comes from advances in reasoning fashions.
What does it imply for an AI to “cause?”
In some ways, the purpose of AI analysis is to recreate human intelligence with computer systems. For the reason that launch of o1, ChatGPT’s UX has been crammed with extra human-sounding options akin to “considering” and “reasoning.”
When requested whether or not OpenAI’s fashions have been really reasoning, El Kishky hedged, saying he thinks in regards to the idea when it comes to laptop science.
“We’re instructing the mannequin how one can effectively expend compute to get a solution. So in case you outline it that means, sure, it’s reasoning,” mentioned El Kishky.
Lightman takes the strategy of specializing in the mannequin’s outcomes and never as a lot on the means or their relation to human brains.
“If the mannequin is doing laborious issues, then it’s doing no matter vital approximation of reasoning it wants with a purpose to do this,” mentioned Lightman. “We are able to name it reasoning, as a result of it seems like these reasoning traces, nevertheless it’s all only a proxy for attempting to make AI instruments which are actually highly effective and helpful to lots of people.”
OpenAI’s researchers observe individuals could disagree with their nomenclature or definitions of reasoning — and certainly, critics have emerged — however they argue it’s much less necessary than the capabilities of their fashions. Different AI researchers are inclined to agree.
Nathan Lambert, an AI researcher with the non-profit AI2, compares AI reasoning modes to airplanes in a weblog put up. Each, he says, are artifical methods impressed by nature — human reasoning and chook flight, respectively — however they function by completely totally different mechanisms. That doesn’t make them any much less helpful, or any much less able to attaining related outcomes.
A gaggle of AI researchers from OpenAI, Anthropic, and Google DeepMind agreed in a latest place paper that AI reasoning fashions are usually not properly understood immediately, and extra analysis is required. It could be too early to confidently declare what precisely is occurring inside them.
The subsequent frontier: AI brokers for subjective duties
The AI brokers in the marketplace immediately work finest for well-defined, verifiable domains akin to coding. OpenAI’s Codex agent goals to assist software program engineers offload easy coding duties. In the meantime, Anthropic’s fashions have turn out to be significantly fashionable in AI coding instruments like Cursor and Claude Code — these are among the first AI brokers that individuals are prepared to pay up for.
Nonetheless, basic objective AI brokers like OpenAI’s ChatGPT Agent and Perplexity’s Comet battle with most of the complicated, subjective duties individuals need to automate. When attempting to make use of these instruments for on-line buying or discovering a long-term parking spot, I’ve discovered the brokers take longer than I’d like and make foolish errors.
Brokers are, in fact, early methods that may undoubtedly enhance. However researchers should first work out how one can higher prepare the underlying fashions to finish duties which are extra subjective.
“Like many issues in machine studying, it’s an information drawback,” mentioned Lightman, when requested in regards to the limitations of brokers on subjective duties. “Among the analysis I’m actually enthusiastic about proper now is determining how one can prepare on much less verifiable duties. Now we have some leads on how one can do these items.”
Noam Brown, an OpenAI researcher who helped create the IMO mannequin and o1, advised iinfoai that OpenAI has new general-purpose RL methods which permit them to show AI fashions abilities that aren’t simply verified. This was how the corporate constructed the mannequin which achieved a gold medal at IMO, he mentioned.
OpenAI’s IMO mannequin was a more recent AI system that spawns a number of brokers, which then concurrently discover a number of concepts, after which select the absolute best reply. A lot of these AI fashions are gaining popularity; Google and xAI have lately launched state-of-the-art fashions utilizing this system.
“I believe these fashions will turn out to be extra succesful at math, and I believe they’ll get extra succesful in different reasoning areas as properly,” mentioned Brown. “The progress has been extremely quick. I don’t see any cause to assume it’s going to decelerate.”
These methods could assist OpenAI’s fashions turn out to be extra performant, positive factors that would present up within the firm’s upcoming GPT-5 mannequin. OpenAI hopes to say its dominance over rivals with the launch of GPT-5, ideally providing the most effective AI mannequin to energy brokers for builders and customers.
However the firm additionally desires to make its merchandise easier to make use of. El Kishky says OpenAI desires to develop AI brokers that intuitively perceive what customers need, with out requiring them to pick out particular settings. He says OpenAI goals to construct AI methods that perceive when to name up sure instruments, and the way lengthy to cause for.
These concepts paint an image of an final model of ChatGPT: an agent that may do something on the web for you, and perceive the way you need it to be executed. That’s a a lot totally different product than what ChatGPT is immediately, however the firm’s analysis is squarely headed on this course.
Whereas OpenAI undoubtedly led the AI {industry} a number of years in the past, the corporate now faces a tranche of worthy opponents. The query is now not simply whether or not OpenAI can ship its agentic future, however can the corporate accomplish that earlier than Google, Anthropic, xAI, or Meta beat them to it?