Google launched Gemini, a new generative AI platform, to great fanfare. What can you do with it? How does it compare to other similar platforms? We have created this guide to keep you informed on Gemini’s latest models and features. We will update it as new information comes out.
This allows them to work with both structured and unstructured data like images, videos, and audio. Google has said that the models are designed to be “accurate, fast, and power-efficient.”
Gemini models are trained and fine-tuned on audio, images, videos, codebases, and text in various languages, setting them apart from Google’s large language model LaMDA, which is only trained on text data. Unlike LaMDA, Gemini models can understand and generate more than just text, such as essays and emails.
Gemini is a family of models, not an app or frontend. There is no standalone Gemini experience and it is unlikely one will ever exist. For comparison, OpenAI’s products can be divided into two categories; Bard and Gemini. Bard is similar to ChatGPT, OpenAI’s popular conversational AI app. Gemini, on the other hand, corresponds to the language model that powers ChatGPT, which is GPT-3.5 or 4.
Gemini is completely independent from Imagen-2, a text-to-image model that may or may not be part of the company’s AI strategy.
Google has promised many capabilities, including those mentioned above, for the not-too-distant future. However, it is difficult to trust the company given its past underperformance with the original Bard launch and its deceptive video meant to showcase Gemini’s capabilities.
The tech giant has made Gemini available in some form today, but it is quite limited.
Google faked its best demo of Gemini Ultra, but it still looks promising.
Gemini Ultra
Google will launch their largest model, Gemini Ultra, more broadly later this year. Although Google has provided product demos to give information on Ultra, it is best to view this information skeptically. Gemini Ultra can help students with physics homework, guiding them step-by-step on their worksheets and highlighting any mistakes in their answers.
Google can identify scientific papers relevant to a certain problem, extract information from them, and generate the formulas needed to update a chart with more recent data.
Google won’t include the native image generation capabilities in the productized version at launch. This is because the mechanism is more complex than how apps like ChatGPT generate images, which require prompts to be fed into an image generator (like DALL-E 3). Rather, Gemini Pro outputs images directly, without an intermediary step.
The study found that, like other large language models, Gemini Pro struggles with math problems involving several digits, and users have identified many examples of bad reasoning and mistakes. It also made factual errors for simple queries like who won the latest Oscars. Google has promised improvements, but it is uncertain when they will be implemented.
Additionally, Vertex AI has an endpoint called Gemini Pro Vision which can process text and imagery, such as photos and videos, and output text similar to OpenAI’s GPT-4 with Vision model. In addition, Gemini Pro can connect to external, third-party APIs to execute specific actions.
AI Studio enables developers to easily integrate data (e.g. PDFs, images) from various sources (e.g. OneDrive, Salesforce) and use it to answer queries.
Smart Reply in Gboard, the virtual keyboard app, uses Gemini to suggest short, contextually relevant replies to incoming messages.
Google Nano
Gemini Nano provides users with summaries even when they lack a signal or Wi-Fi connection. To ensure user privacy, no data leaves their phone during the process. Gemini Nano is available in Gboard, Google’s keyboard app, as a developer preview.
Through this, the Smart Reply feature is enabled to suggest what users want to say next when they are conversing in a messaging app. Initially, this feature only works with WhatsApp, but Google promises that it will be available in more apps in 2024.
Does this mean that Gemini is better than OpenAI’s GPT-4? It is difficult to say until more information is available about the benchmarks and how they were conducted.
Gemini Pro claims to be more adept at tasks such as summarizing content, brainstorming, and writing than GPT-3.5. But irrespective of whether benchmarks truly demonstrate a superior model, the scores suggested by Google seem to only slightly surpass OpenAI’s related models.
Users have already noted that basic facts are often wrong, translations are often inaccurate, and coding suggestions are often inadequate.
Cost to Generate Articles
Customers pay per 1,000 characters (about 140 to 250 words) and image ($0.0025) for Gemini Pro. For example, a 500-word article with two images would cost $5. Generating an article of similar length would cost only $0.1.
Gemini Pro now responds to text-based Bard queries in English in the U.S. with additional languages and supported countries to arrive in the future. Vertex AI also offers a preview of Gemini Pro through its API. This API is free to use “within limits” and currently supports 38 languages and regions, including Europe, with features such as chat functionality and filtering.
Google will launch Gemini models for Chrome and Firebase mobile development tools in early 2024. Developers can sign up for a sneak peek of Gemini Nano, which is currently on the Pixel 8 Pro and will come to other devices in the future. We will update this post with any further developments.