Connect with us

AI

Google Cloud Unveils Enterprise-Scale AI Training Solution to Compete with CoreWeave and AWS

Published

on

Google Cloud takes aim at CoreWeave and AWS with managed Slurm for enterprise-scale AI training

Enhancing Enterprise Model-Making with Google Cloud’s Vertex AI Training

Many enterprises are considering building their own customized AI models to meet their specific needs, requiring access to GPUs. To cater to this demand, Google Cloud has introduced a new service called Vertex AI Training. This service provides enterprises with a managed Slurm environment, data science tools, and access to chips capable of large-scale model training.

With Vertex AI Training, Google Cloud aims to attract more enterprises away from other providers and encourage the development of company-specific AI models. While Google Cloud has previously allowed customization of its Gemini models, this new service enables customers to bring in their own models or customize any open-source model hosted by Google Cloud.

Vertex AI Training places Google Cloud in direct competition with companies like CoreWeave and Lambda Labs, as well as cloud competitors such as AWS and Microsoft Azure.

Jaime de Guerre, senior director of product management at Google Cloud, revealed that many organizations are seeking ways to optimize compute resources in a more reliable environment. He emphasized the increasing trend of companies building or customizing large-scale AI models for product offerings or internal processes.

While Vertex AI Training is available to all, Google is targeting companies planning large-scale model training rather than simple fine-tuning or LoRA adopters. The service is designed for longer-running training jobs involving hundreds or even thousands of chips, with pricing based on the required compute resources.

According to de Guerre, Vertex AI Training is focused on training models from scratch rather than adding more information to existing models. It offers a comprehensive solution for enterprises embarking on large-scale model training projects.

See also  Black Forest Labs Unveils Next-Gen Flux.2 AI Models in Battle Against Nano Banana Pro and Midjourney

The Rise of Customized AI Models

Enterprises are increasingly recognizing the value of building customized AI models beyond simple fine-tuning. Custom models can provide more in-depth insights and tailored responses specific to the organization. Companies like Arcee.ai are offering their models for customization, while others like Adobe allow enterprises to retrain models like Firefly for specific needs.

Organizations such as FICO create industry-specific language models and often invest in GPUs for training. Google Cloud’s Vertex AI Training differentiates itself by offering access to a wide range of chips, monitoring and management services, and expertise gained from training Gemini models.

Early adopters of Vertex AI Training include organizations like AI Singapore and Salesforce‘s AI research team, showcasing the service’s appeal to a diverse range of industries.

While enterprises have the option to fine-tune existing models, building a custom AI model from scratch can be challenging and costly, especially when competing for GPU resources. Hyperscalers like AWS and Microsoft offer massive data centers and high-end chips to support enterprises in model training and deployment.

Google Cloud’s Vertex AI Training goes beyond offering bare compute resources by providing a managed Slurm environment for efficient job scheduling and automatic recovery in case of failures. This ensures higher throughput and more efficient training for large-scale compute clusters.

Services like Vertex AI Training make it easier for enterprises to build niche models or customize existing models. However, the suitability of this option varies for each enterprise, and careful consideration is necessary before embarking on large-scale model training projects.

Trending