AI
Baidu ERNIE: The New Champion of Multimodal AI Benchmarks
Baidu has recently unveiled its latest ERNIE model, a highly efficient multimodal AI system that is outperforming competitors like GPT and Gemini in key benchmarks. This new model, ERNIE-4.5-VL-28B-A3B-Thinking, is specifically designed to analyze enterprise data that is typically overlooked by text-focused AI models, such as engineering schematics, factory video feeds, medical scans, and logistics dashboards.
One of the most notable features of Baidu’s ERNIE model is its lightweight architecture, which only activates three billion parameters during operation. This approach aims to address the high inference costs that often hinder the scalability of AI projects. By focusing on efficiency, Baidu is paving the way for the adoption of multimodal agents that can not only perceive but also reason and act.
The ERNIE AI model excels at handling dense, non-text data, such as interpreting complex charts and technical diagrams. It has shown superior performance on various benchmarks, surpassing competitors like GPT-5-High and Gemini 2.5 Pro in tests like MathVista, ChartQA, and VLMs Are Blind.
Baidu’s ERNIE model is shifting the focus from perception to automation by integrating visual grounding with tool use. This enables the AI to perform tasks like identifying specific objects in images and extracting structured data, which can be applied to various real-world scenarios, from visual inspection on production lines to code analysis in data centers.
In addition to its technical capabilities, Baidu’s ERNIE AI model also targets corporate video archives by extracting subtitles and analyzing visual cues to make vast video libraries searchable. The end goal is to provide employees with the ability to quickly find specific information within hours of video content.
Despite its impressive capabilities, deploying Baidu’s ERNIE AI model requires substantial hardware resources, with a single-card deployment needing 80GB of GPU memory. However, Baidu provides deployment guidance and tools like ERNIEKit for fine-tuning on proprietary data, making it suitable for organizations with existing high-performance AI infrastructure.
The market is moving towards multimodal AI that can see, read, and act within a business context, and Baidu’s ERNIE model is leading the way with its impressive capabilities. Organizations looking to leverage this technology should identify high-value visual reasoning tasks within their operations and carefully consider the hardware and governance costs involved.
For more insights on AI and big data, industry professionals can explore events like the AI & Big Data Expo, part of the TechEx series, which covers topics related to enterprise technology and cybersecurity.
AI News, powered by TechForge Media, offers valuable information on upcoming technology events and webinars for enterprise professionals. Discover more about the latest trends and developments in AI and big data by visiting the TechForge Media website.
-
Facebook5 months agoEU Takes Action Against Instagram and Facebook for Violating Illegal Content Rules
-
Facebook5 months agoWarning: Facebook Creators Face Monetization Loss for Stealing and Reposting Videos
-
Facebook5 months agoFacebook Compliance: ICE-tracking Page Removed After US Government Intervention
-
Facebook4 months agoFacebook’s New Look: A Blend of Instagram’s Style
-
Facebook4 months agoFacebook and Instagram to Reduce Personalized Ads for European Users
-
Facebook5 months agoInstaDub: Meta’s AI Translation Tool for Instagram Videos
-
Facebook4 months agoReclaim Your Account: Facebook and Instagram Launch New Hub for Account Recovery
-
Apple5 months agoMeta discontinues Messenger apps for Windows and macOS

