Google’s attempting to make waves with Gemini, its flagship suite of generative AI fashions, apps, and companies. However what’s Gemini? How will you use it? And the way does it stack as much as different generative AI instruments reminiscent of OpenAI’s ChatGPT, Meta’s Llama, and Microsoft’s Copilot?
To make it simpler to maintain up with the most recent Gemini developments, we’ve put collectively this helpful information, which we’ll hold up to date as new Gemini fashions, options, and information about Google’s plans for Gemini are launched.
What’s Gemini?
Gemini is Google’s long-promised, next-gen generative AI mannequin household. Developed by Google’s AI analysis labs DeepMind and Google Analysis, it is available in a number of flavors:
- Gemini Extremely, a really massive mannequin.
- Gemini Professional, a big mannequin — although smaller than Extremely. The most recent model, Gemini 2.0 Professional, is Google’s present flagship.
- Gemini Flash, a speedier, “distilled” model of Professional.
- Gemini Flash-Lite, a barely smaller and quicker model of Gemini Flash.
- Gemini Flash Considering, a mannequin with “reasoning” capabilities.
- Gemini Nano, two small fashions: Nano-1 and the marginally extra succesful Nano-2, which is supposed to run offline.
All Gemini fashions have been educated to be natively multimodal — that’s, in a position to work with and analyze extra than simply textual content. Google says they have been pre-trained and fine-tuned on a wide range of public, proprietary, and licensed audio, pictures, and movies; a set of codebases; and textual content in numerous languages.
This units Gemini aside from fashions reminiscent of Google’s personal LaMDA, which was educated solely on textual content information. LaMDA can’t perceive or generate something past textual content (e.g., essays, emails, and so forth), however that isn’t essentially the case with Gemini fashions. For instance, the most recent variations of Gemini Flash and Gemini Professional can natively output pictures and audio along with textual content.
We’ll notice right here that the ethics and legality of coaching fashions on public information, in some circumstances with out the information homeowners’ information or consent, are murky. Google has an AI indemnification coverage to protect sure Google Cloud clients from lawsuits ought to they face them, however this coverage accommodates carve-outs. Proceed with warning — significantly in the event you’re intending on utilizing Gemini commercially.
What’s the distinction between the Gemini apps and Gemini fashions?
Gemini is separate and distinct from the Gemini apps on the internet and cell (previously Bard).
The Gemini apps are purchasers that join to numerous Gemini fashions and layer a chatbot-like interface on prime. Consider them as entrance ends for Google’s generative AI, analogous to ChatGPT and Anthropic’s Claude household of apps.
Gemini on the internet lives right here. On Android, the Gemini app replaces the prevailing Google Assistant app. And on iOS, the Google and Google Search apps function that platform’s Gemini purchasers.
On Android, customers can carry up a Gemini overlay to ask questions on what’s on their display screen (for instance, a YouTube video). Urgent and holding a supported smartphone’s energy button or saying, “Hey Google” summons the overlay.
Gemini apps can settle for pictures in addition to voice instructions and textual content — together with recordsdata like PDFs, both uploaded or imported from Google Drive — and generate pictures. As you’d anticipate, conversations with Gemini apps on cell carry over to Gemini on the internet and vice versa in the event you’re signed in to the identical Google Account in each locations.
Gemini Superior
The Gemini apps aren’t the one technique of recruiting Gemini fashions’ help with duties. Slowly however certainly, Gemini-imbued options are making their approach into staple Google apps and companies like Gmail and Google Docs.
To make the most of most of those, you’ll want the Google One AI Premium Plan. Technically part of Google One, the AI Premium Plan prices $20 a month and gives entry to Gemini in Google Workspace apps like Docs, Maps, Slides, Sheets, Drive, and Meet. It additionally allows what Google calls Gemini Superior, which brings the corporate’s extra subtle Gemini fashions to the Gemini apps.
Gemini Superior customers get extras right here and there, too, like precedence entry to new options and fashions; the flexibility to run and edit Python code straight in Gemini; and elevated limits for NotebookLM, Google’s device that turns PDFs into AI-generated podcasts. Just lately, Gemini Superior gained a reminiscence characteristic that shops customers’ preferences and permits Gemini to confer with outdated conversations as context for present chats.
One of many extra compelling Gemini Superior exclusives, Deep Analysis, leverages Gemini fashions with “superior reasoning” to create detailed briefs. In response to a immediate (e.g. “How ought to I redesign my kitchen?”), Deep Analysis develops a multi-step analysis plan and searches the online to craft a complete reply.
In Gmail, Gemini lives in a facet panel that may write emails and summarize message threads. You’ll discover the identical panel in Docs, the place it helps write and refine content material and brainstorm new concepts. Gemini in Slides generates slides and customized pictures. And Gemini in Google Sheets tracks and organizes information, creating tables and formulation.
Gemini is in Google Maps, the place it could possibly mixture opinions about native companies and supply suggestions like tips on how to spend a day visiting a overseas metropolis. The chatbot’s attain extends to Drive, as properly, the place it could possibly summarize recordsdata and folders and provides fast details a couple of mission.
Gemini lately got here to Google’s Chrome browser within the type of an AI writing device. You need to use it to put in writing one thing fully new or rewrite present textual content; Google says it’ll take into account the online web page you’re on to make suggestions.
Elsewhere, you’ll discover hints of Gemini in Google’s database merchandise, cloud safety instruments, and app improvement platforms (together with Firebase and Venture IDX), in addition to in apps like Google Photographs (the place Gemini handles pure language search queries), YouTube (the place it helps brainstorm video concepts), and Meet (the place it interprets captions).
Code Help (previously Duet AI for Builders), Google’s suite of AI-powered help instruments for code completion and technology, is offloading heavy computational lifting to Gemini. So are Google’s safety merchandise underpinned by Gemini, like Gemini in Risk Intelligence, which might analyze massive parts of probably malicious code and let customers carry out pure language searches for ongoing threats or indicators of compromise.
Gemini extensions and Gems
Gemini Superior customers can create Gems, customized chatbots on desktop and cell powered by Gemini fashions. Gems could be generated from pure language descriptions — for example, “You’re my operating coach. Give me a every day operating plan” — and shared with different customers or stored personal.
The Gemini apps can faucet into Google companies through what Google calls “Gemini extensions.” Gemini integrates with Drive, Gmail, YouTube, and extra to reply to queries reminiscent of “May you summarize my final three emails?”
Gemini Dwell in-depth voice chats
An expertise referred to as Gemini Dwell permits customers to have “in-depth” voice chats with Gemini. It’s accessible within the Gemini apps on cell and the Pixel Buds Professional 2, the place it may be accessed even when your telephone’s locked.
With Gemini Dwell enabled, you possibly can interrupt Gemini whereas the chatbot’s chatting with ask a clarifying query, and it’ll adapt to your speech patterns in real-time. Dwell can be designed to function a digital coach of types, serving to you rehearse for occasions, brainstorm concepts, and so forth. For example, Dwell can recommend which abilities to spotlight in an upcoming job interview and provides public talking pointers.
You possibly can learn our assessment of Gemini Dwell right here.
Gemini for teenagers
Google presents a teen-focused Gemini expertise for college kids.
The teenager-focused Gemini has “further insurance policies and safeguards,” together with a tailor-made onboarding course of and an AI literacy information. In any other case, it’s practically an identical to the usual Gemini expertise, right down to the “double-check” characteristic that appears throughout the online to see if Gemini’s responses are correct.
What can the Gemini fashions do?
As a result of Gemini fashions are multimodal, they will carry out a variety of multimodal duties, from transcribing speech to captioning pictures and movies in real-time. Many of those capabilities have reached the product stage, and Google is promising far more within the not-too-distant future.
After all, Google presents no repair for among the underlying issues with generative AI know-how as we speak, like its encoded biases and tendency to make issues up (i.e., hallucinate). Neither do its rivals, nevertheless it’s one thing to bear in mind when contemplating utilizing or paying for Gemini.
Gemini Professional’s capabilities
Google says that its newest Professional mannequin, Gemini 2.0 Professional, is its finest but for coding and sophisticated prompts. 2.0 Professional outperforms its predecessor, Gemini 1.5 Professional, in benchmarks measuring programming, reasoning, math, and factual accuracy.
In Google’s Vertex AI platform, builders can customise Gemini Professional to particular contexts and use circumstances through a fine-tuning or “grounding” course of. For instance, Professional (together with different Gemini fashions) could be instructed to make use of information from third-party suppliers like Moody’s, Thomson Reuters, ZoomInfo, and MSCI, or supply info from company datasets or Google Search as a substitute of its wider information financial institution. Gemini Professional may also be linked to exterior, third-party APIs to carry out explicit actions, like automating a back-office workflow.
Google’s AI Studio platform presents templates for creating structured chat prompts with Professional. Builders can management the mannequin’s inventive vary and supply examples to provide tone and elegance directions — and in addition tune Professional’s security settings.
Gemini Flash is light-weight, whereas Gemini Flash Considering provides reasoning
Gemini 2.0 Flash, which might use instruments like Google Search and work together with exterior APIs, outperforms among the bigger Gemini 1.5 fashions on benchmarks measuring coding and picture evaluation. An offshoot of Gemini Professional, Flash is small and environment friendly — constructed for slender, high-frequency generative AI workloads.
Google says that Flash is especially well-suited for duties like summarization and chat apps, plus picture and video captioning and information extraction from lengthy paperwork and tables. In the meantime, Gemini 2.0 Flash-Lite, a extra compact model of Flash, outperforms Gemini 1.5 Flash however runs on the similar value and velocity, in line with Google.
Final December, Google launched a “pondering” model of Gemini 2.0 Flash that’s able to “reasoning.” The AI mannequin takes a number of seconds to work backward by way of an issue earlier than it provides a solution, which might enhance its reliability.
Gemini Nano can run in your telephone
Gemini Nano is a tiny model of Gemini environment friendly sufficient to run straight on (some) gadgets as a substitute of sending the duty off to a server someplace. To this point, Nano powers a few options on the Pixel 8 Professional, Pixel 8, Pixel 9 Professional, Pixel 9, and Samsung Galaxy S24, together with Summarize in Recorder and Good Reply in Gboard.
The Recorder app, which lets customers push a button to file and transcribe audio, features a Gemini-powered abstract of recorded conversations, interviews, displays, and different audio snippets. Customers get summaries even when they don’t have a sign or Wi-Fi connection — and in a nod to privateness, no information leaves their telephone in course of.
Nano can be in Gboard, Google’s keyboard substitute. There, it powers Good Reply, which helps to recommend the following factor you’ll need to say when having a dialog in a messaging app reminiscent of WhatsApp.
A future model of Android will faucet Nano to alert customers to potential scams throughout calls. The brand new climate app on Pixel telephones makes use of Gemini Nano to generate tailor-made climate experiences. And TalkBack, Google’s accessibility service, employs Nano to create aural descriptions of objects for low-vision and blind customers.
Gemini Extremely, MIA for now
We haven’t seen a lot of Gemini Extremely in current months. The mannequin isn’t accessible within the Gemini apps, and it isn’t listed on Google’s Gemini API pricing web page. Nonetheless, that doesn’t imply Google gained’t carry Extremely again in some unspecified time in the future sooner or later.
How a lot do the Gemini fashions value?
Gemini 1.5 Professional, 1.5 Flash, 2.0 Flash, and a couple of.0 Flash-Lite can be found by way of Google’s Gemini API for constructing apps and companies. They’re pay-as-you-go. Right here’s the bottom pricing — not together with add-ons — as of February 225:
- Gemini 1.5 Professional: $1.25 per 1 million enter tokens (for prompts as much as 128K tokens) or $2.50 per 1 million enter tokens (for prompts longer than 128K tokens); $5 per 1 million output tokens (for prompts as much as 128K tokens) or $10 per 1 million output tokens (for prompts longer than 128K tokens)
- Gemini 1.5 Flash: 7.5 cents per 1 million enter tokens (for prompts as much as 128K tokens), 15 cents per 1 million enter tokens (for prompts longer than 128K tokens), 30 cents per 1 million output tokens (for prompts as much as 128K tokens), 60 cents per 1 million output tokens (for prompts longer than 128K tokens)
- Gemini 2.0 Flash: 10 cents per 1 million enter tokens, 40 cents per 1 million output tokens. For audio, 70 cents per 1 million enter tokens.
- Gemini 2.0 Flash-Lite: 7.5 cents per 1 million enter tokens, 30 cents per 1 million output tokens.
Tokens are subdivided bits of uncooked information, just like the syllables “fan,” “tas,” and “tic” within the phrase “improbable”; 1 million tokens is equal to about 750,000 phrases. Enter refers to tokens fed into the mannequin, whereas output refers to tokens that the mannequin generates.
2.0 Professional pricing has but to be introduced, and Nano remains to be in early entry.
Is Gemini coming to the iPhone?
It would.
Apple has mentioned that it’s in talks to place Gemini and different third-party fashions to make use of for a lot of options in its Apple Intelligence suite. Following a keynote presentation at WWDC 2024, Apple SVP Craig Federighi confirmed plans to work with fashions, together with Gemini, however he didn’t expose any further particulars.
This submit was initially printed February 16, 2024, and is up to date repeatedly.