AI
Adapting AI Infrastructure: Addressing the Rising Costs of Inference
AI Infrastructure Challenges in Asia Pacific
The adoption of artificial intelligence (AI) in Asia Pacific is on the rise, but many companies are struggling to extract value from their AI projects. This challenge is largely attributed to the inadequacies in the infrastructure that supports AI systems. Most existing systems are not designed to handle real-time inference at the speed and scale required for practical applications. Research in the industry indicates that a significant number of AI projects fall short of achieving their return on investment (ROI) targets despite substantial investments in cutting-edge AI tools.
This gap underscores the critical role that AI infrastructure plays in determining performance, cost efficiency, and the scalability of real-world AI deployments in the region.
To tackle this challenge, Akamai has introduced Inference Cloud in collaboration with NVIDIA, leveraging the latest Blackwell GPUs. The fundamental concept behind this initiative is to ensure that AI applications can make real-time decisions by processing data closer to end-users rather than in remote data centers. Akamai argues that this shift can help organizations control costs, minimize delays, and support AI services that rely on instantaneous responses.
Jay Jenkins, the Chief Technology Officer (CTO) of Cloud Computing at Akamai, elaborated to AI News on why enterprises are compelled to rethink their AI deployment strategies in light of the current landscape, emphasizing that inference, rather than training, has emerged as the primary bottleneck.
The Significance of Robust AI Infrastructure
Jenkins emphasizes that the disparity between experimental AI projects and full-scale deployment is wider than most organizations anticipate. He notes, “Many AI initiatives fall short of delivering the anticipated business value because enterprises often underestimate the gap between experimentation and production.” Despite the enthusiasm for cutting-edge AI technologies, hefty infrastructure costs, high latency issues, and the complexities of running models at scale frequently impede progress.
Many companies still rely on centralized cloud platforms and substantial GPU clusters. However, as the demand for AI grows, these setups become financially unsustainable, particularly in regions distant from major cloud hubs. Latency emerges as a critical concern when models require successive inference steps over long distances. According to Jenkins, the efficacy of AI is intrinsically linked to the infrastructure and architecture on which it operates. Latency issues not only compromise user experience but also undermine the intended business value. He also highlights the challenges posed by multi-cloud environments, intricate data regulations, and escalating compliance requirements, which often impede the transition from pilot projects to full-fledged production.
The Shift towards Prioritizing Inference over Training
Across Asia Pacific, the adoption of AI is transitioning from small-scale pilots to practical implementations in applications and services. Jenkins observes that as this evolution unfolds, day-to-day inference, rather than sporadic training cycles, consumes the bulk of computational resources. With numerous organizations deploying language, vision, and multimodal models across multiple markets, the need for rapid and reliable inference is escalating beyond expectations. Consequently, inference has emerged as the primary constraint in the region, as models must function in diverse languages, regulatory frameworks, and data environments, often in real-time scenarios. This places immense strain on centralized systems that were not originally designed to accommodate such responsiveness.
Enhancing AI Performance and Cost Efficiency through Edge Infrastructure
Jenkins contends that relocating inference closer to users, devices, or agents can redefine the cost dynamics. This approach reduces data travel distances, enabling models to respond swiftly while circumventing the expense associated with routing large data volumes between major cloud centers.
Physical AI systems, such as robots, autonomous machines, or smart city technologies, rely on instantaneous decision-making. When inference processes are conducted remotely, these systems fail to operate optimally.
The financial benefits of localized deployments are significant. Akamai’s analysis reveals that enterprises in India and Vietnam experience substantial cost reductions in running image-generation models when workloads are decentralized to the edge, as opposed to centralized cloud architectures. Improved GPU utilization and diminished egress fees play a pivotal role in achieving these savings.
Key Industries Embracing Edge-Based AI
The demand for edge inference is particularly pronounced in industries where even minor delays can impact revenue, safety, or user engagement. Retail and e-commerce sectors are among the early adopters, given that customers tend to abandon slow platforms. Personalized recommendations, search functionalities, and multimodal shopping tools perform notably better when inference is local and rapid.
Finance represents another sector where latency directly influences value. Workloads such as fraud detection, payment authorization, and transaction scoring hinge on a sequence of AI decisions that must occur within milliseconds. By conducting inference in proximity to data generation points, financial institutions can expedite processes and uphold regulatory compliance.
The Growing Relevance of Cloud and GPU Partnerships
As AI workloads expand, organizations require infrastructure that can keep pace with evolving demands. Jenkins underscores the increasing collaboration between cloud providers and GPU manufacturers to address this need. The partnership between Akamai and NVIDIA exemplifies this trend, with GPUs, DPUs, and AI software deployed across numerous edge locations.
The objective is to establish an “AI delivery network” that disperses inference tasks across multiple sites instead of consolidating everything in a few regions. This approach enhances performance while ensuring compliance with data regulations. Jenkins highlights that nearly half of the major organizations in Asia Pacific encounter challenges due to varying data regulations across markets, underscoring the importance of local processing. Emerging partnerships are shaping the future of AI infrastructure in the region, particularly for workloads reliant on low-latency responses.
Security features are inherently integrated into these systems, Jenkins asserts. Zero-trust controls, data-sensitive routing, and safeguards against fraud and malicious bot activities are becoming standard components of the available technology stacks.
Essential Infrastructure for Supporting Agentic AI and Automation
Executing agentic systems, which entail a series of sequential decisions, necessitates infrastructure capable of operating at millisecond speeds. Jenkins acknowledges that the region’s diversity presents challenges, but not insurmountable ones. Given the variations in connectivity, regulations, and technical capabilities across different countries, AI workloads must possess the flexibility to operate where it makes the most sense. Research indicates that a majority of enterprises in the region are already leveraging public cloud services for production, but many anticipate a shift towards edge services by 2027. This transition demands infrastructure capable of storing data within national boundaries, directing tasks to the nearest feasible location, and sustaining operations during network instabilities.
Anticipating the Future of AI Operations
As inference migrates to the edge, organizations must adapt to new operational paradigms. Jenkins suggests that companies should prepare for a more decentralized AI lifecycle, wherein models are updated across multiple sites. This necessitates enhanced orchestration and robust visibility into performance, cost efficiency, and error monitoring across core and edge systems.
Data governance becomes more intricate yet manageable with localized processing. Given that half of the region’s leading enterprises grapple with regulatory discrepancies, deploying inference closer to data sources can offer tangible benefits.
Security considerations demand heightened attention. While distributing inference to the edge enhances resilience, it also requires securing each site. Firms must safeguard APIs, data pipelines, and fortify defenses against fraud and bot intrusions. Jenkins notes that many financial institutions already rely on Akamai’s protective measures in these domains.
(Image by Igor Omilaev)
Interested in delving deeper into AI and big data insights from industry experts? Explore the AI & Big Data Expo scheduled in Amsterdam, California, and London. This comprehensive event is part of TechEx and co-located with other premier technology gatherings. Visit here for additional details.
AI News is brought to you by TechForge Media. Discover upcoming enterprise technology events and webinars here.
-
Facebook4 months agoEU Takes Action Against Instagram and Facebook for Violating Illegal Content Rules
-
Facebook4 months agoWarning: Facebook Creators Face Monetization Loss for Stealing and Reposting Videos
-
Facebook4 months agoFacebook Compliance: ICE-tracking Page Removed After US Government Intervention
-
Facebook4 months agoInstaDub: Meta’s AI Translation Tool for Instagram Videos
-
Facebook2 months agoFacebook’s New Look: A Blend of Instagram’s Style
-
Facebook2 months agoFacebook and Instagram to Reduce Personalized Ads for European Users
-
Facebook2 months agoReclaim Your Account: Facebook and Instagram Launch New Hub for Account Recovery
-
Apple4 months agoMeta discontinues Messenger apps for Windows and macOS


