15.8 C
New York
Wednesday, October 22, 2025

Buy now

To scale agentic AI, Notion tore down its tech stack and started fresh

Many organizations can be hesitant to overtake their tech stack and begin from scratch.

Not Notion.

For the three.0 model of its productiveness software program (launched in September), the corporate didn’t hesitate to rebuild from the bottom up; they acknowledged that it was crucial, in reality, to help agentic AI at enterprise scale.

Whereas conventional AI-powered workflows contain express, step-by-step directions based mostly on few-shot studying, AI brokers powered by superior reasoning fashions are considerate about instrument definition, can determine and comprehend what instruments they’ve at their disposal and plan subsequent steps.

“Quite than attempting to retrofit into what we had been constructing, we wished to play to the strengths of reasoning fashions,” Sarah Sachs, Notion’s head of AI modeling, instructed VentureBeat. “We have rebuilt a brand new structure as a result of workflows are totally different from brokers.”

Re-orchestrating so fashions can work autonomously

Notion has been adopted by 94% of Forbes AI 50 corporations, has 100 million whole customers and counts amongst its prospects OpenAI, Cursor, Figma, Ramp and Vercel.

In a quickly evolving AI panorama, the corporate recognized the necessity to transfer past easier, task-based workflows to goal-oriented reasoning methods that permit brokers to autonomously choose, orchestrate, and execute instruments throughout related environments.

In a short time, reasoning fashions have turn out to be “much better” at studying to make use of instruments and comply with chain-of-thought (CoT) directions, Sachs famous. This permits them to be “way more unbiased” and make a number of selections inside one agentic workflow. “We rebuilt our AI system to play to that,” she stated.

From an engineering perspective, this meant changing inflexible prompt-based flows with a unified orchestration mannequin, Sachs defined. This core mannequin is supported by modular sub-agents that search Notion and the online, question and add to databases and edit content material.

Every agent makes use of instruments contextually; as an illustration, they’ll determine whether or not to look Notion itself, or one other platform like Slack. The mannequin will carry out successive searches till the related info is discovered. It may well then, as an illustration, convert notes into proposals, create follow-up messages, monitor duties, and spot and make updates in information bases.

See also  What was the first computer program to have AI (and on what hardware did it run)?

In Notion 2.0, the staff centered on having AI carry out particular duties, which required them to “suppose exhaustively” about easy methods to immediate the mannequin, Sachs famous. Nonetheless, with model 3.0, customers can assign duties to brokers, and brokers can really take motion and carry out a number of duties concurrently.

“We reorchestrated it to be self-selecting on the instruments, moderately than few-shotting, which is explicitly prompting easy methods to undergo all these totally different situations,” Sachs defined. The purpose is to make sure the whole lot interfaces with AI and that “something you are able to do, your Notion agent can do.”

Bifurcating to isolate hallucinations

Notion’s philosophy of “higher, sooner, cheaper,” drives a steady iteration cycle that balances latency and accuracy by means of fine-tuned vector embeddings and elastic search optimization. Sachs’ staff employs a rigorous analysis framework that mixes deterministic exams, vernacular optimization, human-annotated information and LLMs-as-a-judge, with model-based scoring figuring out discrepancies and inaccuracies.

“By bifurcating the analysis, we’re in a position to determine the place the issues come from, and that helps us isolate pointless hallucinations,” Sachs defined. Additional, making the structure itself easier means it’s simpler to make modifications as fashions and methods evolve.

“We optimize latency and parallel considering as a lot as doable,” which results in “method higher accuracy,” Sachs famous. Fashions are grounded in information from the online and the Notion related workspace.

In the end, Sachs reported, the funding in rebuilding its structure has already supplied Notion returns by way of functionality and sooner fee of change.

She added, “We’re totally open to rebuilding it once more, when the following breakthrough occurs, if we’ve got to.”

See also  Nous Research drops Hermes 4 AI models that outperform ChatGPT without content restrictions

Understanding contextual latency

When constructing and fine-tuning fashions, it’s necessary to grasp that latency is subjective: AI should present essentially the most related info, not essentially essentially the most, at the price of velocity.

“You would be stunned on the alternative ways prospects are keen to attend for issues and never anticipate issues,” Sachs stated. It makes for an fascinating experiment: How gradual are you able to go earlier than individuals abandon the mannequin?

With pure navigational search, as an illustration, customers might not be as affected person; they need solutions near-immediately. “Should you ask, ‘What’s two plus two,’ you do not need to wait on your agent to be looking out all over the place in Slack and JIRA,” Sachs identified.

However the longer the time it is given, the extra exhaustive a reasoning agent will be. For example, Notion can carry out 20 minutes of autonomous work throughout a whole bunch of internet sites, recordsdata and different supplies. In these cases, customers are extra keen to attend, Sachs defined; they permit the mannequin to execute within the background whereas they attend to different duties.

“It is a product query,” stated Sachs. “How will we set person expectations from the UI? How will we confirm person expectations on latency?”

Notion is its largest person

Notion understands the significance of utilizing its personal product — in reality, its workers are amongst its largest energy customers.

Sachs defined that groups have energetic sandboxes that generate coaching and analysis information, in addition to a “actually energetic” thumbs-up-thumbs-down person suggestions loop. Customers aren’t shy about saying what they suppose ought to be improved or options they’d wish to see.

Sachs emphasised that when a person thumbs down an interplay, they’re explicitly giving permission to a human annotator to investigate that interplay in a method that de-anonymizes them as a lot as doable.

“We’re utilizing our personal instrument as an organization all day, on daily basis, and so we get actually quick suggestions loops,” stated Sachs. “We’re actually dogfooding our personal product.”

See also  Benchmark in talks to lead Series A for Greptile, valuing AI-code reviewer at $180M, sources say

That stated, it’s their very own product they’re constructing, Sachs famous, in order that they perceive that they could have goggles on relating to high quality and performance. To stability this out, Notion has trusted “very AI-savvy” design companions who’re granted early entry to new capabilities and supply necessary suggestions.

Sachs emphasised that that is simply as necessary as inner prototyping.

“We’re all about experimenting within the open, I believe you get a lot richer suggestions,” stated Sachs. “As a result of on the finish of the day, if we simply take a look at how Notion makes use of Notion, we’re probably not giving the perfect expertise to our prospects.”

Simply as importantly, steady inner testing permits groups to guage progressions and ensure fashions aren’t regressing (when accuracy and efficiency degrades over time). “All the things you are doing stays trustworthy,” Sachs defined. “You recognize that your latency is inside bounds.”

Many corporations make the error of focusing too intensely on retroactively-focused evans; this makes it troublesome for them to grasp how or the place they’re enhancing, Sachs identified. Notion considers evals as a “litmus take a look at” of growth and forward-looking development and evals of observability and regression proofing.

“I believe a giant mistake plenty of corporations make is conflating the 2,” stated Sachs. “We use them for each functions; we take into consideration them actually in a different way.”

Takeaways from Notion’s journey

For enterprises, Notion can function a blueprint for easy methods to responsibly and dynamically operationalize agentic AI in a related, permissioned enterprise workspace.

Sach’s takeaways for different tech leaders:

  • Don’t be afraid to rebuild when foundational capabilities change; Notion totally re-engineered its structure to align with reasoning-based fashions.

  • Deal with latency as contextual: Optimize per use case, moderately than universally.

  • Floor all outputs in reliable, curated enterprise information to make sure accuracy and belief.

    She suggested: “Be keen to make the exhausting selections. Be keen to take a seat on the high of the frontier, so to talk, on what you are growing to construct the perfect product you possibly can on your prospects.”

Supply hyperlink

Related Articles

Leave a Reply

Please enter your comment!
Please enter your name here

Latest Articles