AI
Baidu Unveils Groundbreaking Open-Source Multimodal AI Technology Outperforming GPT-5 and Gemini
Baidu Inc., China’s leading search engine company, launched a groundbreaking artificial intelligence model on Monday that surpasses competitors from Google and OpenAI in vision-related benchmarks. The model, ERNIE-4.5-VL-28B-A3B-Thinking, boasts superior performance despite using significantly fewer computing resources. This efficiency is achieved through a sophisticated routing architecture that allows the model to activate just 3 billion parameters during operation while maintaining a total of 28 billion parameters.
According to Baidu, ERNIE-4.5-VL-28B-A3B-Thinking excels in tasks such as document understanding, chart analysis, and visual reasoning while consuming less computational power and memory compared to larger competing systems. The model’s technical documentation on Hugging Face, an AI model repository, highlights its advanced multimodal reasoning capabilities, achieved through an extensive mid-training phase involving a diverse corpus of visual-language reasoning data.
One of the model’s key features is “Thinking with Images,” which enables dynamic image analysis by allowing the AI to zoom in and out of images to examine fine-grained details, mimicking human visual problem-solving. This capability enhances the model’s ability to process detailed information and handle complex visual tasks. Additionally, the model supports enhanced visual grounding for precise object identification in images, with potential applications in robotics and industrial settings.
ERNIE-4.5-VL-28B-A3B-Thinking operates on a Mixture-of-Experts architecture, selectively activating relevant parameters for each input to optimize efficiency. The model’s release under the Apache 2.0 license allows for unrestricted commercial use, facilitating enterprise adoption by lowering barriers to entry.
The model’s performance claims have garnered attention, with comparisons made against Google’s Gemini 2.5 Pro and OpenAI’s GPT-5-High. Independent verification of these claims is pending, but the open licensing approach could accelerate adoption in the enterprise market.
Baidu’s ERNIE 4.5 model family, which includes various variants with different parameter sizes, addresses the challenge of training multimodal AI systems on visual and textual data without compromising performance. The company’s commitment to ongoing maintenance and support is crucial for organizations considering deployment.
For technical decision-makers evaluating the model, factors such as infrastructure requirements, integration with existing tools, and ongoing support should be considered. The model’s efficiency on a single 80GB GPU makes it accessible to a broader range of organizations, while its performance in specific use cases like document and chart understanding may vary in general-purpose vision tasks.
Overall, Baidu’s release of ERNIE-4.5-VL-28B-A3B-Thinking signifies a significant advancement in the AI landscape, with implications for enterprise applications requiring advanced visual understanding and reasoning capabilities. The model’s innovative features, efficiency, and open-source licensing make it a compelling option for organizations seeking powerful and cost-effective AI solutions.
-
Facebook4 months agoEU Takes Action Against Instagram and Facebook for Violating Illegal Content Rules
-
Facebook4 months agoWarning: Facebook Creators Face Monetization Loss for Stealing and Reposting Videos
-
Facebook4 months agoFacebook Compliance: ICE-tracking Page Removed After US Government Intervention
-
Facebook4 months agoInstaDub: Meta’s AI Translation Tool for Instagram Videos
-
Facebook2 months agoFacebook’s New Look: A Blend of Instagram’s Style
-
Facebook2 months agoFacebook and Instagram to Reduce Personalized Ads for European Users
-
Facebook2 months agoReclaim Your Account: Facebook and Instagram Launch New Hub for Account Recovery
-
Apple4 months agoMeta discontinues Messenger apps for Windows and macOS

