AI

Baidu Unveils Groundbreaking Open-Source Multimodal AI Technology Outperforming GPT-5 and Gemini

Published

5 months ago

November 12, 2025

Baidu just dropped an open-source multimodal AI that it claims beats GPT-5 and Gemini

Baidu Inc., China’s leading search engine company, launched a groundbreaking artificial intelligence model on Monday that surpasses competitors from Google and OpenAI in vision-related benchmarks. The model, ERNIE-4.5-VL-28B-A3B-Thinking, boasts superior performance despite using significantly fewer computing resources. This efficiency is achieved through a sophisticated routing architecture that allows the model to activate just 3 billion parameters during operation while maintaining a total of 28 billion parameters.

According to Baidu, ERNIE-4.5-VL-28B-A3B-Thinking excels in tasks such as document understanding, chart analysis, and visual reasoning while consuming less computational power and memory compared to larger competing systems. The model’s technical documentation on Hugging Face, an AI model repository, highlights its advanced multimodal reasoning capabilities, achieved through an extensive mid-training phase involving a diverse corpus of visual-language reasoning data.

One of the model’s key features is “Thinking with Images,” which enables dynamic image analysis by allowing the AI to zoom in and out of images to examine fine-grained details, mimicking human visual problem-solving. This capability enhances the model’s ability to process detailed information and handle complex visual tasks. Additionally, the model supports enhanced visual grounding for precise object identification in images, with potential applications in robotics and industrial settings.

ERNIE-4.5-VL-28B-A3B-Thinking operates on a Mixture-of-Experts architecture, selectively activating relevant parameters for each input to optimize efficiency. The model’s release under the Apache 2.0 license allows for unrestricted commercial use, facilitating enterprise adoption by lowering barriers to entry.

The model’s performance claims have garnered attention, with comparisons made against Google’s Gemini 2.5 Pro and OpenAI’s GPT-5-High. Independent verification of these claims is pending, but the open licensing approach could accelerate adoption in the enterprise market.

Baidu’s ERNIE 4.5 model family, which includes various variants with different parameter sizes, addresses the challenge of training multimodal AI systems on visual and textual data without compromising performance. The company’s commitment to ongoing maintenance and support is crucial for organizations considering deployment.

For technical decision-makers evaluating the model, factors such as infrastructure requirements, integration with existing tools, and ongoing support should be considered. The model’s efficiency on a single 80GB GPU makes it accessible to a broader range of organizations, while its performance in specific use cases like document and chart understanding may vary in general-purpose vision tasks.

Overall, Baidu’s release of ERNIE-4.5-VL-28B-A3B-Thinking signifies a significant advancement in the AI landscape, with implications for enterprise applications requiring advanced visual understanding and reasoning capabilities. The model’s innovative features, efficiency, and open-source licensing make it a compelling option for organizations seeking powerful and cost-effective AI solutions.

Related Topics:Baidu Gemini GPT5 Groundbreaking Multimodal OpenSource Outperforming Technology Unveils

Up Next

Revolutionizing AI Education: Self-Teaching Systems Master Reasoning with Meta’s SPICE Framework

Don't Miss

Developers’ skepticism: AI code still requires human oversight, BairesDev survey finds

Click to comment

Bennett Tech Innovation

Baidu Unveils Groundbreaking Open-Source Multimodal AI Technology Outperforming GPT-5 and Gemini

AI

Baidu Unveils Groundbreaking Open-Source Multimodal AI Technology Outperforming GPT-5 and Gemini

Leave a Reply
Cancel reply

Leave a Reply

OnePlus 8T vs Google Pixel 5 – The Brutal Truth.

Enhanced Google Drive Security: Ransomware Detection Now Included for Premium Users

Toyota’s Woven Capital Accelerates Search for the Future of Mobility with New CIO and COO Appointments

Stay Connected on the Go with ChatGPT on Apple CarPlay

Revolutionizing In-Car Communication: ChatGPT Integration with CarPlay on iOS 26.4

Why Aren’t Petrol and Diesel Prices Reflecting the Fuel Excise Cut?

Code Chaos: The Unintentional NPM Leak

Operation Zero Parades: The Spy’s Last Stand

Mercor’s Cyberattack: The Fallout of the Open-Source LiteLLM Project Compromise

EU Takes Action Against Instagram and Facebook for Violating Illegal Content Rules

Warning: Facebook Creators Face Monetization Loss for Stealing and Reposting Videos

Facebook Compliance: ICE-tracking Page Removed After US Government Intervention

Facebook’s New Look: A Blend of Instagram’s Style

Facebook and Instagram to Reduce Personalized Ads for European Users

InstaDub: Meta’s AI Translation Tool for Instagram Videos

Reclaim Your Account: Facebook and Instagram Launch New Hub for Account Recovery

Meta discontinues Messenger apps for Windows and macOS

Breaking Updates: Meta Connect 2025 Unveils Latest Developments

OnePlus 8T vs Google Pixel 5 – The Brutal Truth.

iPhone 12 / 12 Pro Unboxing – ft MKBHD!

iPhone 12 Pro vs iPhone 12 vs iPhone 11 Pro Camera Test Comparison.

Huawei Mate 40 Pro Unboxing – WHAT.

iPhone 12 vs iPhone 12 Pro / 11 Pro Max / 11 Pro / 11 / XR / SE Battery Life DRAIN Test.

19 Smartphone Gadgets that Blew me Away.

iPhone 12 Pro Max Unboxing & Review!

iPhone 12 Pro Max vs Samsung Note 20 Ultra / Huawei Mate 40 Pro Camera Test Comparison.

iPhone 12 Pro Max vs Samsung Note 20 Ultra / Huawei / Xiaomi / OnePlus Battery Life DRAIN Test.

Trending

Newsletter Signup

Bennett Tech Innovation

Baidu Unveils Groundbreaking Open-Source Multimodal AI Technology Outperforming GPT-5 and Gemini

You may like

Leave a Reply Cancel reply

Leave a Reply

OnePlus 8T vs Google Pixel 5 – The Brutal Truth.

Enhanced Google Drive Security: Ransomware Detection Now Included for Premium Users

Toyota’s Woven Capital Accelerates Search for the Future of Mobility with New CIO and COO Appointments

Stay Connected on the Go with ChatGPT on Apple CarPlay

Revolutionizing In-Car Communication: ChatGPT Integration with CarPlay on iOS 26.4

Why Aren’t Petrol and Diesel Prices Reflecting the Fuel Excise Cut?

Code Chaos: The Unintentional NPM Leak

Operation Zero Parades: The Spy’s Last Stand

Mercor’s Cyberattack: The Fallout of the Open-Source LiteLLM Project Compromise

EU Takes Action Against Instagram and Facebook for Violating Illegal Content Rules

Warning: Facebook Creators Face Monetization Loss for Stealing and Reposting Videos

Facebook Compliance: ICE-tracking Page Removed After US Government Intervention

Facebook’s New Look: A Blend of Instagram’s Style

Facebook and Instagram to Reduce Personalized Ads for European Users

InstaDub: Meta’s AI Translation Tool for Instagram Videos

Reclaim Your Account: Facebook and Instagram Launch New Hub for Account Recovery

Meta discontinues Messenger apps for Windows and macOS

Breaking Updates: Meta Connect 2025 Unveils Latest Developments

OnePlus 8T vs Google Pixel 5 – The Brutal Truth.

iPhone 12 / 12 Pro Unboxing – ft MKBHD!

iPhone 12 Pro vs iPhone 12 vs iPhone 11 Pro Camera Test Comparison.

Huawei Mate 40 Pro Unboxing – WHAT.

iPhone 12 vs iPhone 12 Pro / 11 Pro Max / 11 Pro / 11 / XR / SE Battery Life DRAIN Test.

19 Smartphone Gadgets that Blew me Away.

iPhone 12 Pro Max Unboxing & Review!

iPhone 12 Pro Max vs Samsung Note 20 Ultra / Huawei Mate 40 Pro Camera Test Comparison.

iPhone 12 Pro Max vs Samsung Note 20 Ultra / Huawei / Xiaomi / OnePlus Battery Life DRAIN Test.

Trending

Leave a Reply
Cancel reply