Silicon Valley bets big on ‘environments’ to train AI agents

September 16, 2025

31

Table of Contents

For years, Huge Tech CEOs have touted visions of AI brokers that may autonomously use software program functions to finish duties for folks. However take at present’s client AI brokers out for a spin, whether or not it’s OpenAI’s ChatGPT Agent or Perplexity’s Comet, and also you’ll shortly notice how restricted the expertise nonetheless is. Making AI brokers extra sturdy might take a brand new set of methods that the business remains to be discovering.

A kind of methods is fastidiously simulating workspaces the place brokers might be skilled on multi-step duties — often known as reinforcement studying (RL) environments. Equally to how labeled datasets powered the final wave of AI, RL environments are beginning to appear to be a vital factor within the improvement of brokers.

AI researchers, founders, and traders inform iinfoai that main AI labs are actually demanding extra RL environments, and there’s no scarcity of startups hoping to provide them.

“All the massive AI labs are constructing RL environments in-house,” mentioned Jennifer Li, normal associate at Andreessen Horowitz, in an interview with iinfoai. “However as you may think about, creating these datasets could be very advanced, so AI labs are additionally taking a look at third occasion distributors that may create top quality environments and evaluations. Everyone seems to be taking a look at this house.”

The push for RL environments has minted a brand new class of well-funded startups, reminiscent of Mechanize and Prime Mind, that goal to steer the house. In the meantime, giant data-labeling firms like Mercor and Surge say they’re investing extra in RL environments to maintain tempo with the business’s shifts from static datasets to interactive simulations. The foremost labs are contemplating investing closely too: in accordance with The Data, leaders at Anthropic have mentioned spending greater than $1 billion on RL environments over the subsequent 12 months.

The hope for traders and founders is that one in every of these startups emerge because the “Scale AI for environments,” referring to the $29 billion knowledge labelling powerhouse that powered the chatbot period.

The query is whether or not RL environments will really push the frontier of AI progress.

Techcrunch occasion

San Francisco
|
October 27-29, 2025

What’s an RL surroundings?

At their core, RL environments are coaching grounds that simulate what an AI agent could be doing in an actual software program utility. One founder described constructing them in current interview “like creating a really boring online game.”

For instance, an surroundings might simulate a Chrome browser and activity an AI agent with buying a pair of socks on Amazon. The agent is graded on its efficiency and despatched a reward sign when it succeeds (on this case, shopping for a worthy pair of socks).

Whereas such a activity sounds comparatively easy, there are a number of locations the place an AI agent might get tripped up. It would get misplaced navigating the net web page’s drop down menus, or purchase too many socks. And since builders can’t predict precisely what flawed flip an agent will take, the surroundings itself needs to be sturdy sufficient to seize any sudden habits, and nonetheless ship helpful suggestions. That makes constructing environments much more advanced than a static dataset.

Some environments are fairly elaborate, permitting for AI brokers to make use of instruments, entry the web, or use varied software program functions to finish a given activity. Others are extra slender, aimed toward serving to an agent study particular duties in enterprise software program functions.

Whereas RL environments are the recent factor in Silicon Valley proper now, there’s a number of precedent for utilizing this system. One in every of OpenAI’s first tasks again in 2016 was constructing “RL Gyms,” which have been fairly just like the trendy conception of environments. The identical 12 months, Google DeepMind’s AlphaGo AI system beat a world champion on the board recreation, Go. It additionally used RL methods inside a simulated surroundings.

What’s distinctive about at present’s environments is that researchers try to construct computer-using AI brokers with giant transformer fashions. Not like AlphaGo, which was a specialised AI system working in a closed environments, at present’s AI brokers are skilled to have extra normal capabilities. AI researchers at present have a stronger place to begin, but additionally a sophisticated aim the place extra can go flawed.

A crowded area

AI knowledge labeling firms like Scale AI, Surge, and Mercor try to satisfy the second and construct out RL environments. These firms have extra sources than many startups within the house, in addition to deep relationships with AI labs.

Surge CEO Edwin Chen tells iinfoai he’s just lately seen a “important improve” in demand for RL environments inside AI labs. Surge — which reportedly generated $1.2 billion in income final 12 months from working with AI labs like OpenAI, Google, Anthropic and Meta — just lately spun up a brand new inner group particularly tasked with constructing out RL environments, he mentioned.

Shut behind Surge is Mercor, a startup valued at $10 billion, which has additionally labored with OpenAI, Meta, and Anthropic. Mercor is pitching traders on its enterprise constructing RL environments for area particular duties reminiscent of coding, healthcare, and regulation, in accordance with advertising and marketing supplies seen by iinfoai.

Mercor CEO Brendan Foody instructed iinfoai in an interview that “few perceive how giant the chance round RL environments really is.”

Scale AI used to dominate the information labeling house, however has misplaced floor since Meta invested $14 billion and employed away its CEO. Since then, Google and OpenAI dropped Scale AI as an information supplier, and the startup even faces competitors for knowledge labelling work inside Meta. However nonetheless, Scale is making an attempt to satisfy the second and construct environments.

“That is simply the character of the enterprise [Scale AI] is in,” mentioned Chetan Rane, Scale AI’s head of product for brokers and RL environments. “Scale has confirmed its skill to adapt shortly. We did this within the early days of autonomous automobiles, our first enterprise unit. When ChatGPT got here out, Scale AI tailored to that. And now, as soon as once more, we’re adapting to new frontier areas like brokers and environments.”

Some newer gamers are focusing solely on environments from the outset. Amongst them is Mechanize, a startup based roughly six months in the past with the audacious aim of “automating all jobs.” Nonetheless, co-founder Matthew Barnett tells iinfoai that his agency is beginning with RL environments for AI coding brokers.

Mechanize goals to provide AI labs with a small variety of sturdy RL environments, Barnett says, reasonably than bigger knowledge companies that create a variety of straightforward RL environments. Up to now, the startup is providing software program engineers $500,000 salaries to construct RL environments — far increased than an hourly contractor might earn working at Scale AI or Surge.

Mechanize has already been working with Anthropic on RL environments, two sources acquainted with the matter instructed iinfoai. Mechanize and Anthropic declined to touch upon the partnership.

Different startups are betting that RL environments can be influential outdoors of AI labs. Prime Mind — a startup backed by AI researcher Andrej Karpathy, Founders Fund, and Menlo Ventures — is focusing on smaller builders with its RL environments.

Final month, Prime Mind launched an RL environments hub, which goals to be a “Hugging Face for RL environments.” The thought is to offer open-source builders entry to the identical sources that enormous AI labs have, and promote these builders entry to computational sources within the course of.

Coaching typically succesful brokers in RL environments might be extra computational costly than earlier AI coaching methods, in accordance with Prime Mind researcher Will Brown. Alongside startups constructing RL environments, there’s one other alternative for GPU suppliers that may energy the method.

“RL environments are going to be too giant for anybody firm to dominate,” mentioned Brown in an interview. “A part of what we’re doing is simply making an attempt to construct good open-source infrastructure round it. The service we promote is compute, so it’s a handy onramp to utilizing GPUs, however we’re considering of this extra in the long run.”

Will it scale?

The open query round RL environments is whether or not the approach will scale like earlier AI coaching strategies.

Reinforcement studying has powered among the largest leaps in AI over the previous 12 months, together with fashions like OpenAI’s o1 and Anthropic’s Claude Opus 4. These are notably necessary breakthroughs as a result of the strategies beforehand used to enhance AI fashions are actually displaying diminishing returns.

Environments are a part of AI labs’ larger wager on RL, which many imagine will proceed to drive progress as they add extra knowledge and computational sources to the method. A number of the OpenAI researchers behind o1 beforehand instructed iinfoai that the corporate initially invested in AI reasoning fashions — which have been created by investments in RL and test-time-compute — as a result of they thought it might scale properly.

The easiest way to scale RL stays unclear, however environments look like a promising contender. As a substitute of merely rewarding chatbots for textual content responses, they let brokers function in simulations with instruments and computer systems at their disposal. That’s much more resource-intensive, however probably extra rewarding.

Some are skeptical that each one these RL environments will pan out. Ross Taylor, a former AI analysis lead with Meta that co-founded Basic Reasoning, tells iinfoai that RL environments are vulnerable to reward hacking. It is a course of by which AI fashions cheat to be able to get a reward, with out actually doing the duty.

“I believe individuals are underestimating how tough it’s to scale environments,” mentioned Taylor. “Even one of the best publicly out there [RL environments] usually don’t work with out critical modification.”

OpenAI’s Head of Engineering for its API enterprise, Sherwin Wu, mentioned in a current podcast that he was “quick” on RL surroundings startups. Wu famous that it’s a really aggressive house, but additionally that AI analysis is evolving so shortly that it’s onerous to serve AI labs nicely.

Karpathy, an investor in Prime Mind that has referred to as RL environments a possible breakthrough, has additionally voiced warning for the RL house extra broadly. In a put up on X, he raised considerations about how way more AI progress might be squeezed out of RL.

“I’m bullish on environments and agentic interactions however I’m bearish on reinforcement studying particularly,” mentioned Karpathy.

Replace: A earlier model of this text referred to Mechanize as Mechanize Work. It has been up to date to mirror the corporate’s official title.

Supply hyperlink

Tags
AI
AI News

Buy now

Silicon Valley bets big on ‘environments’ to train AI agents

What’s an RL surroundings?

A crowded area

Will it scale?

Related Articles

Bose QuietComfort Ultra vs. Sony WH-1000XM6: I tried the two best...

Hiring specialists made sense before AI — now generalists win

Top 10 AI Models For Web Development in 2025

Leave a Reply Cancel reply

Latest Articles

Bose QuietComfort Ultra vs. Sony WH-1000XM6: I tried the two best...

Hiring specialists made sense before AI — now generalists win

Top 10 AI Models For Web Development in 2025

‘ONE RULE’: Trump says he’ll sign an executive order blocking state...

Anthropic and Accenture sign multi-year AI strategic partnership