AI search engines fail accuracy test, study finds 60% error rate

March 24, 2025

72

In context: It’s a foregone conclusion that AI fashions can lack accuracy. Hallucinations and doubling down on fallacious data have been an ongoing battle for builders. Utilization varies a lot in particular person use circumstances that it is onerous to nail down quantifiable percentages associated to AI accuracy. A analysis staff claims it now has these numbers.

The Tow Middle for Digital Journalism not too long ago studied eight AI search engines like google, together with ChatGPT Search, Perplexity, Perplexity Professional, Gemini, DeepSeek Search, Grok-2 Search, Grok-3 Search, and Copilot. They examined every for accuracy and recorded how ceaselessly the instruments refused to reply.

The researchers randomly selected 200 information articles from 20 information publishers (10 every). They ensured every story returned throughout the high three leads to a Google search when utilizing a quoted excerpt from the article. Then, they carried out the identical question inside every AI search device and graded accuracy based mostly on whether or not the search accurately cited A) the article, B) the information group, and C) the URL.

The researchers then labeled every search based mostly on levels of accuracy from “fully appropriate” to “fully incorrect.” As you’ll be able to see from the diagram beneath, aside from each variations of Perplexity, the AIs didn’t carry out nicely. Collectively, AI search engines like google are inaccurate 60 p.c of the time. Moreover, these fallacious outcomes had been strengthened by the AI’s “confidence” in them.

Click on to enlarge.

The research is fascinating as a result of it quantifiably confirms what we’ve identified for just a few years – that LLMs are “the slickest con artists of all time.” They report with full authority that what they are saying is true even when it’s not, typically to the purpose of argument or making up different false assertions when confronted.

In a 2023 anecdotal article, Ted Gioia (The Trustworthy Dealer) identified dozens of ChatGPT responses, exhibiting that the bot confidently “lies” when responding to quite a few queries. Whereas some examples had been adversarial queries, many had been simply common questions.

“If I believed half of what I heard about ChatGPT, I might let it take over The Trustworthy Dealer whereas I sit on the seashore ingesting margaritas and looking for my misplaced shaker of salt,” Gioia flippantly famous.

Even when admitting it was fallacious, ChatGPT would observe up that admission with extra fabricated data. The LLM is seemingly programmed to reply each consumer enter in any respect prices. The researcher’s information confirms this speculation, noting that ChatGPT Search was the one AI device that answered all 200 article queries. Nevertheless, it solely achieved a 28-percent fully correct score and was fully inaccurate 57 p.c of the time.

ChatGPT is not even the worst of the bunch. Each variations of X’s Grok AI carried out poorly, with Grok-3 Search being 94 p.c inaccurate. Microsoft’s Copilot was not that a lot better when you think about that it declined to reply 104 queries out of 200. Of the remaining 96, solely 16 had been “fully appropriate,” 14 had been “partially appropriate,” and 66 had been “fully incorrect,” making it roughly 70 p.c inaccurate.

Arguably, the craziest factor about all that is that the businesses making these instruments usually are not clear about this lack of accuracy whereas charging the general public $20 to $200 per 30 days to entry their newest AI fashions. Furthermore, Perplexity Professional ($20/month) and Grok-3 Search ($40/month) answered barely extra queries accurately than their free variations (Perplexity and Grok-2 Search) however had considerably larger error charges (above). Speak about a con.

Nevertheless, not everybody agrees. TechRadar’s Lance Ulanoff stated he may by no means use Google once more after making an attempt ChatGPT Search. He describes the device as quick, conscious, and correct, with a clear, ad-free interface.

Be at liberty to learn all the main points within the Tow Middle’s paper printed within the Columbia Journalism Overview, and tell us what you assume.

Do you belief AI search engines like google to return correct outcomes?

Supply hyperlink

Buy now

AI search engines fail accuracy test, study finds 60% error rate

Related Articles

China’s open AI models are in a dead heat with the...

I Tried GPT 5.2 and This is How It Went..

Undetectable AI vs. Scribbr: Which One Detects AI Writing More Accurately?

Leave a Reply Cancel reply

Latest Articles

China’s open AI models are in a dead heat with the...

I Tried GPT 5.2 and This is How It Went..

Undetectable AI vs. Scribbr: Which One Detects AI Writing More Accurately?

AWS re:Invent was an all-in pitch for AI. Customers might not...

Bone AI raises $12M to challenge Asia’s defense giants with AI-powered...