Is your AI app pissing off users or going off-script? Raindrop emerges with AI-native observability platform to monitor performance

May 20, 2025

68

Table of Contents

As enterprises more and more look to construct and deploy generative AI-powered purposes and companies for inner or exterior use (staff or prospects), one of many hardest questions they face is knowing precisely how nicely these AI instruments are performing out within the wild.

The truth is, a latest survey by consulting agency McKinsey and Firm discovered that solely 27% of 830 respondents stated that their enterprises’ reviewed the entire outputs of their generative AI methods earlier than they went out to customers.

Until a consumer really writes in with a grievance report, how is an organization to know if its AI product is behaving as anticipated and deliberate?

Raindrop, previously often known as Daybreak AI, is a brand new startup tackling the problem head-on, positioning itself as the primary observability platform purpose-built for AI in manufacturing, catching errors as they occur and explaining to enterprises what went incorrect and why. The objective? Assist resolve generative AI’s so-called “black field downside.”

“AI merchandise fail always—in methods each hilarious and terrifying,” wrote co-founder Ben Hylak on X lately, “Common software program throws exceptions. However AI merchandise fail silently.”

Raindrop seeks to supply any category-defining instrument akin to what observability firm Sentry does for conventional software program.

However whereas conventional exception monitoring instruments don’t seize the nuanced misbehaviors of huge language fashions or AI companions, Raindrop makes an attempt to fill the outlet.

“In conventional software program, you’ve got instruments like Sentry and Datadog to let you know what’s going incorrect in manufacturing,” he informed VentureBeat in a video name interview final week. “With AI, there was nothing.”

Till now — in fact.

How Raindrop works

Raindrop gives a collection of instruments that enable groups at enterprises massive and small to detect, analyze, and reply to AI points in actual time.

The platform sits on the intersection of consumer interactions and mannequin outputs, analyzing patterns throughout lots of of hundreds of thousands of day by day occasions, however doing so with SOC-2 encryption enabled, defending the info and privateness of customers and the corporate providing the AI answer.

“Raindrop sits the place the consumer is,” Hylak defined. “We analyze their messages, plus alerts like thumbs up/down, construct errors, or whether or not they deployed the output, to deduce what’s really going incorrect.”

Raindrop makes use of a machine studying pipeline that mixes LLM-powered summarization with smaller bespoke classifiers optimized for scale.

Promotional screenshot of Raindrop’s dashboard. Credit score: Raindrop.ai

“Our ML pipeline is without doubt one of the most complicated I’ve seen,” Hylak stated. “We use massive LLMs for early processing, then prepare small, environment friendly fashions to run at scale on lots of of hundreds of thousands of occasions day by day.”

Clients can monitor indicators like consumer frustration, job failures, refusals, and reminiscence lapses. Raindrop makes use of suggestions alerts resembling thumbs down, consumer corrections, or follow-up habits (like failed deployments) to establish points.

Fellow Raindrop co-founder and CEO Zubin Singh Koticha informed VentureBeat in the identical interview that whereas many enterprises relied on evaluations, benchmarks, and unit checks for checking the reliability of their AI options, there was little or no designed to test AI outputs throughout manufacturing.

“Think about in conventional coding in case you’re like, ‘Oh, my software program passes ten unit checks. It’s nice. It’s a strong piece of software program.’ That’s clearly not the way it works,” Koticha stated. “It’s an analogous downside we’re attempting to resolve right here, the place in manufacturing, there isn’t really lots that tells you: is it working extraordinarily nicely? Is it damaged or not? And that’s the place we slot in.”

For enterprises in extremely regulated industries or for these in search of further ranges of privateness and management, Raindrop gives Notify, a totally on-premises, privacy-first model of the platform geared toward enterprises with strict knowledge dealing with necessities.

In contrast to conventional LLM logging instruments, Notify performs redaction each client-side by way of SDKs and server-side with semantic instruments. It shops no persistent knowledge and retains all processing throughout the buyer’s infrastructure.

Raindrop Notify offers day by day utilization summaries and surfacing of high-signal points straight inside office instruments like Slack and Groups—with out the necessity for cloud logging or complicated DevOps setups.

Superior error identification and precision

Figuring out errors, particularly with AI fashions, is way from simple.

“What’s exhausting on this area is that each AI utility is totally different,” stated Hylak. “One buyer may construct a spreadsheet instrument, one other an alien companion. What ‘damaged’ seems to be like varies wildly between them.” That variability is why Raindrop’s system adapts to every product individually.

Every AI product Raindrop displays is handled as distinctive. The platform learns the form of the info and habits norms for every deployment, then builds a dynamic concern ontology that evolves over time.

“Raindrop learns the info patterns of every product,” Hylak defined. “It begins with a high-level ontology of widespread AI points—issues like laziness, reminiscence lapses, or consumer frustration—after which adapts these to every app.”

Whether or not it’s a coding assistant that forgets a variable, an AI alien companion that immediately refers to itself as a human from the U.S., or perhaps a chatbot that begins randomly mentioning claims of “white genocide” in South Africa, Raindrop goals to floor these points with actionable context.

The notifications are designed to be light-weight and well timed. Groups obtain Slack or Microsoft Groups alerts when one thing uncommon is detected, full with strategies on easy methods to reproduce the issue.

Over time, this permits AI builders to repair bugs, refine prompts, and even establish systemic flaws in how their purposes reply to customers.

“We classify hundreds of thousands of messages a day to seek out points like damaged uploads or consumer complaints,” stated Hylak. “It’s all about surfacing patterns robust and particular sufficient to warrant a notification.”

From Sidekick to Raindrop

The corporate’s origin story is rooted in hands-on expertise. Hylak, who beforehand labored as a human interface designer at visionOS at Apple and avionics software program engineering at SpaceX, started exploring AI after encountering GPT-3 in its early days again in 2020.

“As quickly as I used GPT-3—only a easy textual content completion—it blew my thoughts,” he recalled. “I immediately thought, ‘That is going to vary how folks work together with know-how.’”

Alongside fellow co-founders Koticha and Alexis Gauba, Hylak initially constructed Sidekick, a VS Code extension with lots of of paying customers.

However constructing Sidekick revealed a deeper downside: debugging AI merchandise in manufacturing was almost not possible with the instruments out there.

“We began by constructing AI merchandise, not infrastructure,” Hylak defined. “However fairly rapidly, we noticed that to develop something critical, we wanted tooling to know AI habits—and that tooling didn’t exist.”

What began as an annoyance rapidly advanced into the core focus. The workforce pivoted, constructing out instruments to make sense of AI product habits in real-world settings.

Within the course of, they found they weren’t alone. Many AI-native firms lacked visibility into what their customers have been really experiencing and why issues have been breaking. With that, Raindrop was born.

Raindrop’s pricing, differentiation and suppleness have attracted a variety of preliminary prospects

Raindrop’s pricing is designed to accommodate groups of assorted sizes.

A Starter plan is offered at $65/month, with metered utilization pricing. The Professional tier, which incorporates customized subject monitoring, semantic search, and on-prem options, begins at $350/month and requires direct engagement.

Whereas observability instruments will not be new, most present choices have been constructed earlier than the rise of generative AI.

Raindrop units itself aside by being AI-native from the bottom up. “Raindrop is AI-native,” Hylak stated. “Most observability instruments have been constructed for conventional software program. They weren’t designed to deal with the unpredictability and nuance of LLM habits within the wild.”

This specificity has attracted a rising set of consumers, together with groups at Clay.com, Tolen, and New Laptop.

Raindrop’s prospects span a variety of AI verticals—from code technology instruments to immersive AI storytelling companions—every requiring totally different lenses on what “misbehavior” seems to be like.

Born from necessity

Raindrop’s rise illustrates how the instruments for constructing AI have to evolve alongside the fashions themselves. As firms ship extra AI-powered options, observability turns into important—not simply to measure efficiency, however to detect hidden failures earlier than customers escalate them.

In Hylak’s phrases, Raindrop is doing for AI what Sentry did for net apps—besides the stakes now embrace hallucinations, refusals, and misaligned intent. With its rebrand and product enlargement, Raindrop is betting that the following technology of software program observability will probably be AI-first by design.

Supply hyperlink

Tags
AI
AI News

Buy now

Is your AI app pissing off users or going off-script? Raindrop emerges with AI-native observability platform to monitor performance

How Raindrop works

Superior error identification and precision

From Sidekick to Raindrop

Raindrop’s pricing, differentiation and suppleness have attracted a variety of preliminary prospects

Born from necessity

Related Articles

Bose QuietComfort Ultra vs. Sony WH-1000XM6: I tried the two best...

Hiring specialists made sense before AI — now generalists win

Top 10 AI Models For Web Development in 2025

Leave a Reply Cancel reply

Latest Articles

Bose QuietComfort Ultra vs. Sony WH-1000XM6: I tried the two best...

Hiring specialists made sense before AI — now generalists win

Top 10 AI Models For Web Development in 2025

‘ONE RULE’: Trump says he’ll sign an executive order blocking state...

Anthropic and Accenture sign multi-year AI strategic partnership