Google I/O 2024 Unveils Gemini 1.5 Pro Enhanced Features

Gemini, Google’s artificial intelligence model, is being integrated into various technologies across the tech giant’s ecosystem, including Gmail, YouTube, and its smartphones.

During his keynote speech at Google’s I/O 2024 developer conference on May 14, CEO Sundar Pichai highlighted several upcoming deployments of the AI model, emphasizing the importance of AI. He mentioned AI 121 times during his 110-minute presentation, with Gemini taking the spotlight. Launched in December, Gemini is set to become a central component of Google’s services.

Google is integrating this large language model (LLM) into nearly all of its products, including Android, Search, and Gmail. Users can expect the following in the future:

Gemini

A year ago, at the I/O stage, plans for Gemini were unveiled — a frontier model designed to be natively multimodal, capable of reasoning across various inputs. Since then, Gemini models have been introduced, showcasing top-notch performance on multimodal benchmarks. Gemini 1.5 Pro followed, offering a significant advancement in processing long context.

With over 1.5 million developers now using Gemini across various tools, its capabilities are being utilized for debugging, gaining insights, and creating the next generation of AI applications.

Product Progress and App Interactions

Gemini’s functionality is expanding to include seamless interaction with applications. In the upcoming update, users will enjoy the convenience of asking Gemini to perform tasks like effortlessly dragging and dropping AI-generated images into messages.

Additionally, YouTube users will have the ability to prompt Gemini to extract specific information directly from videos by selecting the “Ask this video” option.

Gemini Live and Gemini in Gmail

Gmail is incorporating AI integration through a new feature called Gemini. With Gemini, users will have the ability to search, summarize, and compose emails using advanced AI assistance. Moreover, the AI assistant will handle more intricate tasks like aiding in e-commerce returns by searching for relevant emails, locating receipts, and completing online forms.

Google recently introduced an innovative feature named Gemini Live, offering users the opportunity to engage in extensive voice conversations with AI directly on their smartphones. This chatbot is designed to seamlessly accommodate interruptions for further clarification during responses, while also dynamically adjusting to users’ speech nuances in real-time.

Moreover, Gemini possesses the capability to interpret and respond to the physical environment by analyzing photos or videos captured by the device.

Multimodality Developments

Google is currently engaged in the development of advanced AI agents capable of sophisticated reasoning, planning, and execution of complex tasks with user oversight. These agents are designed to handle various input modalities, such as text, images, audio, and video, expanding their capabilities beyond traditional text-based interactions.

Sundar Pichai, CEO of Google & Alphabet, stated, “The power of Gemini — with multimodality, long context, and agents — brings us closer to our ultimate goal: making AI helpful for everyone.”

A notable addition is the “Ask Photos” feature, which enables users to search their photo libraries using natural language queries. Powered by Gemini, this feature leverages contextual understanding, object and facial recognition, and summarization capabilities to provide relevant responses to user inquiries about their photo memories.

Additionally, Google Maps is set to benefit from AI-generated summaries of places and regions, leveraging insights derived from the platform’s extensive mapping data. These summaries aim to provide users with concise and informative overviews, enriching their navigation experiences.