Nous Research drops Hermes 4 AI models that outperform ChatGPT without content restrictions

August 29, 2025

40

Table of Contents

Nous Analysis, a secretive synthetic intelligence startup that has emerged as a number one voice within the open-source AI motion, quietly launched Hermes 4 on Monday, a household of huge language fashions that the corporate claims can match the efficiency of main proprietary techniques whereas providing unprecedented consumer management and minimal content material restrictions.

The discharge represents a major escalation within the battle between open-source AI advocates and main know-how corporations over who ought to management entry to superior synthetic intelligence capabilities. In contrast to fashions from OpenAI, Google, or Anthropic, Hermes 4 is designed to answer almost any request with out the protection guardrails which have change into commonplace in business AI techniques.

Nous Analysis presents Hermes 4, our newest line of hybrid reasoning fashions.https://t.co/E5EW9hBurb

Hermes 4 builds on our legacy of user-aligned fashions with expanded test-time compute capabilities.

Particular consideration was given to creating the fashions inventive and attention-grabbing to… pic.twitter.com/52VjnvrDWM

— Nous Analysis (@NousResearch) August 26, 2025

“Hermes 4 builds on our legacy of user-aligned fashions with expanded test-time compute capabilities,” Nous Analysis introduced on X (previously Twitter). “Particular consideration was given to creating the fashions inventive and attention-grabbing to work together with, unencumbered by censorship, and neutrally aligned whereas sustaining state-of-the-art stage math, coding, and reasoning efficiency for open weight fashions.”

How Hermes 4’s ‘hybrid reasoning’ mode outperforms ChatGPT and Claude on math benchmarks

Hermes 4 introduces what Nous Analysis calls “hybrid reasoning,” permitting customers to toggle between quick responses and deeper, step-by-step pondering processes. When activated, the fashions generate their inner reasoning inside particular tags earlier than offering a ultimate reply — much like OpenAI’s o1 reasoning fashions however with full transparency into the AI’s thought course of.

The technical achievement is substantial. In testing, Hermes 4’s largest 405-billion parameter mannequin scored 96.3% on the MATH-500 benchmark in reasoning mode and 81.9% on the difficult AIME’24 arithmetic competitors — efficiency that rivals or exceeds many proprietary techniques costing thousands and thousands extra to develop.

“The problem is making pondering traces helpful and verifiable with out runaway reasoning,” famous AI researcher Rohan Paul on X, highlighting one of many technical breakthroughs within the launch.

Maybe most notably, Hermes 4 achieved the best rating amongst all examined fashions on “RefusalBench,” a brand new benchmark Nous Analysis created to measure how usually AI techniques refuse to reply questions. The mannequin scored 57.1% in reasoning mode, considerably outperforming GPT-4o (17.67%) and Claude Sonnet 4 (17%).

Hermes 4 fashions from Nous Analysis answered considerably extra questions than competing AI techniques on RefusalBench, a take a look at measuring how usually fashions refuse to answer consumer requests. (Credit score: Nous Analysis)

Inside DataForge and Atropos: The breakthrough coaching techniques behind Hermes 4’s capabilities

Behind Hermes 4’s capabilities lies a classy coaching infrastructure that Nous Analysis has developed over a number of years. The fashions have been educated utilizing two novel techniques: DataForge, a graph-based artificial knowledge generator, and Atropos, an open-source reinforcement studying framework.

DataForge creates coaching knowledge by what the corporate describes as “random walks” by directed graphs, remodeling easy pre-training knowledge into complicated instruction-following examples. The system can, for example, take a Wikipedia article and rework it right into a rap track, then generate questions and solutions based mostly on that transformation.

Atropos, in the meantime, operates like tons of of specialised coaching environments the place AI fashions observe particular abilities—arithmetic, coding, device use, and inventive writing—receiving suggestions solely after they produce appropriate options. This “rejection sampling” method ensures that solely verified, high-quality responses make it into the coaching knowledge.

Atropos is Nous’ Reinforcement Studying framework

Atropos is an open supply reinforcement studying setting by Nous that has tons of of “gyms” (like math, coding, video games, device‑use, imaginative and prescient) to coach and consider LLM trajectories by way of scalable, async RL loops.

In different phrases… pic.twitter.com/fjxaQKClEZ

— Tommy (@Shaughnessy119) August 26, 2025

“Nous used these environments to generate the dataset for Hermes 4!” defined Tommy Shaughnessy, a enterprise capitalist at Delphi Ventures who has invested in Nous Analysis. “All within the dataset accommodates 3.5 million reasoning samples and 1.6 million non-reasoning samples! Hermes was educated on RL knowledge, not simply static datasets of query and reply!”

The coaching course of required 192 Nvidia B200 GPUs and 71,616 GPU hours for the most important mannequin — a major however not unprecedented computational funding that demonstrates how specialised methods can compete with the large scale of tech giants.

Why Nous Analysis believes AI security guardrails are ‘annoying as hell’ and damage innovation

Nous Analysis has constructed its popularity on a philosophy that places consumer management above company content material insurance policies. The corporate’s fashions are designed to be “steerable,” which means they are often fine-tuned or prompted to behave in particular methods with out the inflexible security constraints that characterize business AI techniques.

“Hermes 4 just isn’t shackled by disclaimers, guidelines and being overly cautious which is annoying as hell and hurts innovation and usefulness,” wrote Shaughnessy in an in depth thread analyzing the discharge. “If its open supply however refuses all requests its pointless. Not a problem with Hermes 4.”

Hermes 4 just isn’t shackled by disclaimers, guidelines and being overly cautious which is annoying as hell and hurts innovation and usefulness.

Hermes 4 70B is at the exact opposite of the spectrum vs OpenAI’s open supply mannequin. It is also ~4x extra open vs ChatGPT 4o!

If its open… pic.twitter.com/q5RpX1oOzo

— Tommy (@Shaughnessy119) August 26, 2025

This method has made Nous Analysis fashionable amongst AI researchers and builders who need most flexibility, however it additionally locations the corporate on the middle of ongoing debates about AI security and content material moderation. Whereas the fashions can theoretically be used for dangerous functions, Nous Analysis argues that transparency and consumer management are preferable to company gatekeeping.

The corporate’s technical report, launched alongside the fashions, gives unprecedented element concerning the coaching course of, analysis outcomes, and even the precise textual content outputs from benchmark assessments. “We consider this report units a brand new commonplace for transparency in benchmarking,” the corporate said.

How a small startup with 192 GPUs is competing towards Large Tech’s billion-dollar AI budgets

Hermes 4‘s launch comes at a pivotal second within the AI trade. Whereas main know-how corporations have poured billions into creating more and more highly effective AI techniques, a rising open-source motion argues that these capabilities shouldn’t be managed by a handful of firms.

Current months have seen important advances in open-source AI, with fashions like Meta’s Llama 3.1, DeepSeek’s R1, and Alibaba’s Qwen sequence attaining efficiency that rivals proprietary techniques. Hermes 4 represents one other step on this development, notably within the space of reasoning—lengthy thought-about a energy of closed techniques like OpenAI’s o1.

“First up, Nous is a startup with dozens of extraordinarily gifted individuals,” famous Shaughnessy. “They don’t have the $100b+ annual capex spend of a hyperscaler nor 1,000’s of workers and regardless of that they proceed to place out revolutionary fashions and analysis at an insane tempo.”

The startup, which raised $65 million in funding earlier this 12 months led by Paradigm, has additionally been creating Psyche Community, a distributed coaching system that goals to coordinate AI coaching throughout internet-connected computer systems utilizing blockchain know-how.

The technical repair that stopped Hermes 4 from pondering in countless loops

One in all Hermes 4‘s most important technical contributions addresses an issue plaguing reasoning fashions: overly lengthy pondering processes. The researchers discovered that their smaller 14-billion parameter mannequin would attain most context size 60% of the time when reasoning, basically getting caught in countless loops of pondering.

Their resolution concerned a second coaching stage that teaches fashions to cease reasoning at precisely 30,000 tokens, decreasing overlong technology by 65-79% whereas sustaining a lot of the reasoning efficiency. This “size management” method may show beneficial for the broader AI analysis group.

“Smaller fashions (<14B) are inclined to overthink when distilled, however bigger fashions don’t,” noticed AI researcher Muyu He on X, highlighting insights from the technical report.

Nonetheless, Hermes 4 nonetheless faces limitations widespread to open-source fashions. Regardless of spectacular benchmark efficiency, the fashions require important computational assets to run and will not match the benefit of use or reliability of business AI companies for a lot of functions.

The place to attempt Hermes 4 and what it prices in comparison with ChatGPT and Claude

Nous Analysis has made Hermes 4 out there by a number of channels, reflecting the open-source philosophy. The mannequin weights are freely downloadable on Hugging Face, whereas the corporate additionally affords API entry by its revamped chat interface and partnerships with inference suppliers like Chutes, Nebius, and Luminal.

“You may attempt Hermes 4 within the new, revamped Nous Chat UI,” the corporate introduced, highlighting options like parallel interactions and a reminiscence system.

For enterprise customers and researchers, the fashions signify a doubtlessly engaging various to paying for API entry to proprietary techniques, particularly for functions requiring excessive ranges of customization or dealing with of delicate content material.

The larger image: What Hermes 4 means for the way forward for AI improvement

The discharge of Hermes 4 represents extra than simply one other AI mannequin launch — it’s a press release about who ought to management the way forward for synthetic intelligence. In an trade more and more dominated by a handful of tech giants with nearly limitless assets, Nous Analysis has demonstrated that innovation can nonetheless come from surprising locations.

The corporate’s method raises basic questions concerning the trade-offs between security and functionality, between company management and consumer freedom. Whereas main know-how corporations argue that cautious content material moderation and security guardrails are important for accountable AI deployment, Nous Analysis contends that transparency and consumer company are extra vital than corporate-imposed restrictions.

Whether or not this philosophy will in the end show useful or problematic stays to be seen. However one factor is definite: Hermes 4 has proven that the way forward for AI received’t be decided solely by the businesses with the deepest pockets.

In a discipline the place yesterday’s impossibilities change into tomorrow’s commodities, Nous Analysis simply proved that the one factor extra harmful than an AI that claims no is likely to be one which’s keen to say sure.

Supply hyperlink

Tags
AI
AI News

Buy now

Nous Research drops Hermes 4 AI models that outperform ChatGPT without content restrictions

How Hermes 4’s ‘hybrid reasoning’ mode outperforms ChatGPT and Claude on math benchmarks

Inside DataForge and Atropos: The breakthrough coaching techniques behind Hermes 4’s capabilities

Why Nous Analysis believes AI security guardrails are ‘annoying as hell’ and damage innovation

How a small startup with 192 GPUs is competing towards Large Tech’s billion-dollar AI budgets

The technical repair that stopped Hermes 4 from pondering in countless loops

The place to attempt Hermes 4 and what it prices in comparison with ChatGPT and Claude

The larger image: What Hermes 4 means for the way forward for AI improvement

Related Articles

China’s open AI models are in a dead heat with the...

I Tried GPT 5.2 and This is How It Went..

Undetectable AI vs. Scribbr: Which One Detects AI Writing More Accurately?

Leave a Reply Cancel reply

Latest Articles

China’s open AI models are in a dead heat with the...

I Tried GPT 5.2 and This is How It Went..

Undetectable AI vs. Scribbr: Which One Detects AI Writing More Accurately?

AWS re:Invent was an all-in pitch for AI. Customers might not...

Bone AI raises $12M to challenge Asia’s defense giants with AI-powered...