Google opens up Gemini 1.5 Flash, Pro with 2M tokens to the public

Google Cloud is making two variations of its flagship AI model—Gemini 1.5 Flash and Pro—publicly accessible. The former is a small multimodal model with a 1 million context window that tackles narrow high-frequency tasks. It was first introduced in May at Google I/O. The latter, the most powerful version of Google’s LLM, debuted in February before being notably upgraded to contain a 2 million context window. That version is now open to all developers.

The release of these Gemini variations aims to showcase how Google’s AI work empowers businesses to develop “compelling” AI agents and solutions. During a press briefing, Google Cloud Chief Executive Thomas Kurian boasts the company sees “incredible momentum” with its generative AI efforts, with organizations such as Accenture, Airbus, Anthropic, Box, Broadcom, Cognizant, Confluent, Databricks, Deloitte, Equifax, Estée Lauder Companies, Ford, GitLab, GM, the Golden State Warriors, Goldman Sachs, Hugging Face, IHG Hotels and Resorts, Lufthansa Group, Moody’s, Samsung, and others building on its platform. He attributes this adoption growth to the combination of what Google’s models are capable of and the company’s Vertex platform. It’ll “continue to introduce new capability in both those layers at a rapid pace.”

Google opens up Gemini 1.5 Flash, Pro with 2M tokens to the public

Ken Yeung @thekenyeung

June 27, 2024 6:00 AM

Sir Demis Hassabis introduces Gemini 1.5 Flash. Image credit: Screenshot

We want to hear from you! Take our quick AI survey and share your insights on the current state of AI, how you’re implementing it, and what you expect to see in the future. Learn More

Video Player is loading.

Handling Today’s Threatscape at Machine Scale

Google is also releasing context caching and provisioned throughput, new model capabilities designed to enhance the developer experience.

Gemini 1.5 Flash

Gemini 1.5 Flash offers developers lower latency, affordable pricing and a context window suitable for inclusion in retail chat agents, document processing, and bots that can synthesize entire repositories. Google claims, on average, that Gemini 1.5 Flash is 40 percent faster than GPT-3.5 Turbo when given an input of 10,000 characters. It has an input price four times lower than OpenAI’s model, with context caching enabled for inputs larger than 32,000 characters.

Countdown to VB Transform 2024

Join enterprise leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now

Gemini 1.5 Pro

As for Gemini 1.5 Pro, developers will be excited to have a much larger context window. With 2 million tokens, it’s in a class of its own, as none of the prominent AI models has as high of a limit. This means this model can process and consider more text before generating a response than ever before. “You may ask, ‘translate that for me in real terms,'” Kurian states. “Two million context windows says you can take two hours of high-definition video, feed it into the model, and have the model understand it as one thing. You don’t have to break it into chunks. You can feed it as one thing. You can do almost a whole day of audio, one or two hours of video, greater than 60,000 lines of code and over 1.5 million words. And we are seeing many companies find enormous value in this.”

Kurian explains the differences between Gemini 1.5 Flash and Pro: “It’s not just the kind of customers, but it’s the specific [use] cases within a customer.” He references Google’s I/O keynote as a practical and recent example. “If you wanted to take the entire keynote—not the short version, but the two-hour keynote—and you wanted all of it processed as one video, you would use [Gemini 1.5] Pro because it was a two-hour video. If you wanted to do something that’s super low latency…then you will use Flash because it is designed to be a faster model, more predictable latency, and is able to reason up to a million tokens.

Context caching now for Gemini 1.5 Pro and Flash

To help developers leverage Gemini’s different context windows, Google is launching context caching in public preview for both Gemini 1.5 Pro and Flash. Context caching allows models to store and reuse information they already have without recomputing everything from scratch whenever they receive a request. It’s helpful for long conversations or documents and lowers developers’ compute costs. Google reveals that context caching can reduce input costs by a staggering 75 percent. This feature will become more critical as context windows increase.

Technology

Wednesday, July 10, 2024