OpenAI pledges to publish AI safety test results more often

May 14, 2025

63

OpenAI is transferring to publish the outcomes of its inner AI mannequin security evaluations extra repeatedly in what the outfit is saying is an effort to extend transparency.

On Wednesday, OpenAI launched the Security evaluations hub, an internet web page exhibiting how the corporate’s fashions rating on varied checks for dangerous content material era, jailbreaks, and hallucinations. OpenAI says that it’ll use the hub to share metrics on an “ongoing foundation” and that it intends to replace the hub with “main mannequin updates” going ahead.

Introducing the Security Evaluations Hub—a useful resource to discover security outcomes for our fashions.

Whereas system playing cards share security metrics at launch, the Hub can be up to date periodically as a part of our efforts to speak proactively about security.https://t.co/c8NgmXlC2Y

— OpenAI (@OpenAI) Could 14, 2025

“Because the science of AI analysis evolves, we purpose to share our progress on growing extra scalable methods to measure mannequin functionality and security,” wrote OpenAI in a weblog submit. “By sharing a subset of our security analysis outcomes right here, we hope this won’t solely make it simpler to grasp the protection efficiency of OpenAI methods over time, but additionally assist neighborhood efforts⁠ to extend transparency throughout the sphere.”

OpenAI says that it could add extra evaluations to the hub over time.

In latest months, OpenAI has raised the ire of some ethicists for reportedly speeding the protection testing of sure flagship fashions and failing to launch technical stories for others. The corporate’s CEO, Sam Altman, additionally stands accused of deceptive OpenAI executives about mannequin security critiques previous to his temporary ouster in November 2023.

Late final month, OpenAI was compelled to roll again an replace to the default mannequin powering ChatGPT, GPT-4o, after customers started reporting that it responded in an excessively validating and agreeable method. X grew to become flooded with screenshots of ChatGPT applauding all kinds of problematic, harmful selections and concepts.

OpenAI mentioned that it might implement a number of fixes and modifications to forestall future such incidents, together with introducing an opt-in “alpha section” for some fashions that may enable sure ChatGPT customers to check the fashions and provides suggestions earlier than launch.

Supply hyperlink

Tags
AI
AI News

Buy now

OpenAI pledges to publish AI safety test results more often

Related Articles

Nvidia expands AI ties with Hyundai, Samsung, SK, Naver

Best early Black Friday phone deals 2025: I’m tracking the 10+...

AI mania tanks CoreWeave’s Core Scientific acquisition — it buys Python...

Leave a Reply Cancel reply

Latest Articles

Nvidia expands AI ties with Hyundai, Samsung, SK, Naver

Best early Black Friday phone deals 2025: I’m tracking the 10+...

AI mania tanks CoreWeave’s Core Scientific acquisition — it buys Python...

Inside Celosphere 2025: Why there’s no ‘enterprise AI’ without process intelligence

Windows 11 users hit with bizarre Task Manager duplication bug –...