Google

Gemini 3: Leading the AI Race – But for How Long?

Published

4 months ago

November 24, 2025

Google’s new Scholar Labs search uses AI to find relevant studies

When an AI model release immediately spawns memes and treatises declaring the rest of the industry cooked, you know you’ve got something worth dissecting.

Google’s Gemini 3 was released Tuesday to widespread fanfare. The company called the model a “new era of intelligence,” integrating it into Google Search on day one for the first time. It’s blown past OpenAI and other competitors’ products on a range of benchmarks and is topping the charts on LMArena, a crowdsourced AI evaluation platform that’s essentially the Billboard Hot 100 of AI model ranking. Within 24 hours of its launch, more than one million users tried Gemini 3 in Google AI Studio and the Gemini API, per Google. “From a day one adoption standpoint, [it’s] the best we’ve seen from any of our model releases,” Google DeepMind’s Logan Kilpatrick, who is product lead for Google’s AI Studio and the Gemini API, told The Verge.

Even OpenAI CEO Sam Altman and xAI CEO Elon Musk publicly congratulated the Gemini team on a job well done. And Salesforce CEO Marc Benioff wrote that after using ChatGPT every day for three years, spending two hours on Gemini 3 changed everything: “Holy shit … I’m not going back. The leap is insane — reasoning, speed, images, video… everything is sharper and faster. It feels like the world just changed, again.”

“This is more than a leaderboard shuffle,” said Wei-Lin Chiang, cofounder and CTO of LMArena. Chiang told The Verge that Gemini 3 Pro holds a “clear lead” in occupational categories including coding, match, and creative writing, and its agentic coding abilities “in many cases now surpass top coding models like Claude 4.5 and GPT-5.1.” It also got the top spot on visual comprehension and was the first model to surpass a ~1500 score on the platform’s text leaderboard.

The new model’s performance, Chiang said, “illustrates that the AI arms race is being shaped by models that can reason more abstractly, generalize more consistently, and deliver dependable results across an increasingly diverse set of real-world evaluations.”

Alex Conway, principal software engineer at DataRobot, told The Verge that one of Gemini 3’s most notable advancements was on a specific reasoning benchmark called ARC-AGI-2. Gemini scored almost twice as high as OpenAI’s GPT-5 Pro while running at one-tenth of the cost per task, he said, which is “really challenging the notion that these models are plateauing.” And on the SimpleQA benchmark — which involves simple questions and answers on a broad range of topics, and requires a lot of niche knowledge — Gemini 3 Pro scored more than twice as high as OpenAI’s GPT-5.1, Conway flagged. “Use case-wise, it’ll be great for a lot more niche topics and diving deep into state-of-the-art research and scientific fields,” he said.

But leaderboards aren’t everything. It’s possible — and in the high-pressure AI world, tempting — to train a model for narrow benchmarks rather than general-purpose success. So to really know how well a system is doing, you have to rely on real-world testing, anecdotal experience, and complex use cases in the wild.

The Verge spoke with professionals across disciplines who use AI every day for work. The consensus: Gemini 3 looks impressive, and it does a great job on a wide breadth of tasks — but when it comes to edge cases and niche aspects of certain industries, many professionals won’t be replacing their current models with it anytime soon.

The majority of people The Verge spoke with plan to continue to use Anthropic’s Claude for their coding needs, despite Gemini 3’s advancements in that space. Some also said that Gemini 3 isn’t optimal on the user interaction front. Tim Dettmers, assistant professor at Carnegie Mellon University and a research scientist at Ai2, said that though it’s a “great model,” it’s a bit raw when it comes to UX, meaning “it doesn’t follow instructions precisely.”

Tulsee Doshi, Google DeepMind’s senior director of product management for Gemini and Gen Media, told The Verge that the company prioritized bringing Gemini 3 to a variety of Google products in a “very real way.” When asked about the instruction-following concerns, she said it’s been helpful to see “where folks are hitting some of the sticking points.”

She also said that since the Pro model is the first release in the Gemini 3 suite, later models will help “round out that concern.”

Joel Hron, CTO of Thomson Reuters, said that the company has its own internal benchmarks it’s developed to rank both its internal models and public ones on the areas that are most relevant to their work — like comparing two documents up to several hundreds of pages in length, interpreting a long document, understanding legal contracts, and reasoning in the legal and tax spaces. He said that so far, Gemini 3 has performed strongly across all of them and is “a significant jump up from where Gemini 2.5 was.” It also outperforms several of Anthropic’s and OpenAI’s models right now in some of those areas.

Louis Blankemeier, cofounder and CEO of Cognita, a radiology AI startup, said that in terms of “pure numbers” Gemini 3 is “super exciting.” But, he said, “we still need some time to figure out what the real-world utility of this model is.” For more general domains, Blankemeier said, Gemini 3 is a star, but when he played around with it for radiology, it struggled with correctly identifying subtle rib fractures on chest X-rays, as well as uncommon or rare conditions.

He compares radiology to self-driving cars in many ways, emphasizing the importance of refining and training models on custom data over time. Despite the advancements in newer, more powerful models, they may not always be as effective as older, refined ones due to the complexities of the real world.

Similarly, Matt Hoffman, head of AI at Longeye, acknowledges the potential of the Gemini 3 Pro-powered Nano Banana Pro image generator. Although it allows for creating synthetic datasets for testing and securing sensitive investigation data, the benchmarks may not directly translate to real-world use cases. He expresses doubt about immediately replacing their current production model with Gemini 3 for immediate improvements.

Other companies are also intrigued by Gemini but are not looking to replace all existing models. Built, a construction lending startup, currently uses a mix of models from various providers for analyzing construction draw requests. While exploring the possibility of switching to Gemini 3, they are cautious about replacing all other models.

Tanmai Gopal, CEO of PromptQL, acknowledges the significance of Gemini 3 but believes it does not mark the end for Google’s competitors. He mentions that AI models are constantly evolving, and different models excel in different tasks at different times. PromptQL is still evaluating how their model choices may change, with initial results not showing significant improvement over their current lineup.

And like many models, Gemini 3 has its moments of inconsistency, akin to “robotic hand syndrome,” where it excels in complex tasks but struggles with simpler queries. Despite some shortcomings, Gemini 3 shows promise and represents a notable advancement for the company.

In a continuous cycle of model advancements, companies are constantly navigating through different models to find the best fit for their needs. Google’s recent release has caught the attention of many due to the significant improvements it has made across various aspects of its models. This update is not just about enhancing coding skills or reasoning abilities; it marks an overall enhancement in performance. The improvements are visible across the board, showcasing a substantial leap in quality. This development is a testament to Google’s commitment to excellence and innovation in the tech industry. The updated models have set a new standard for efficiency and effectiveness, making Google a frontrunner in the field of AI and technology. Transform the following sentence into a question:

“He is going to the store.”

Is he going to the store?