AI

Engineering Solutions: How Ants Overcome Reinforcement Learning Challenges at Trillion Scale

Published

5 months ago

October 25, 2025

BTI Team

Inside Ring-1T: Ant engineers solve reinforcement learning bottlenecks at trillion scale

Ant Group Reveals Ring-1T: A Groundbreaking Open-Source Reasoning Model

Ant Group, an affiliate of Alibaba, recently unveiled its latest innovation, Ring-1T. This new model boasts one trillion total parameters, making it the first open-source reasoning model of its kind. With Ring-1T, Ant Group aims to challenge established reasoning models like GPT-5 and Google’s Gemini 2.5, sparking a debate over AI dominance between China and the US.

Ring-1T is specifically designed for mathematical and logical problem-solving, code generation, and scientific applications. Ant Group stated that despite relying solely on natural language reasoning capabilities, Ring-1T achieves state-of-the-art performance across various benchmarks.

Introduced in September, Ring-1T shares the architecture of Ling 2.0 and is based on the Ling-1T-base model. This design allows Ring-1T to support up to 128,000 tokens, enabling advanced computational capabilities.

To train a model as massive as Ring-1T, researchers at Ant Group developed innovative methods to scale reinforcement learning (RL) effectively.

Revolutionary Training Methods for Ring-1T

Ant Group implemented three groundbreaking innovations to support the training and reinforcement learning of Ring-1T: IcePop, C3PO++, and ASystem.

IcePop eliminates noisy gradient updates during training, ensuring stability without impeding inference. This innovation addresses the challenge of training models with a mixture-of-experts (MoE) architecture like Ring-1T, where discrepancies in probability calculations can arise.

C3PO++ is an enhanced version of the C3PO system, optimizing the generation and processing of training examples for large parameter models like Ring-1T. By dividing work into parallel processes, C3PO++ maximizes GPU utilization efficiently.

ASystem adopts a SingleController+SPMD architecture, facilitating asynchronous operations to enhance training effectiveness and model performance.

Performance Evaluation of Ring-1T

Ant Group conducted benchmark tests to evaluate Ring-1T’s performance in mathematics, coding, logical reasoning, and general tasks. Compared to models like GPT-5 and Gemini 2.5, Ring-1T demonstrated strong performance, excelling in various benchmarks.

Ring-1T achieved a remarkable 93.4% score on the AIME 25 leaderboard, showcasing its prowess in coding applications. Ant Group emphasized the robust performance of Ring-1T in programming tasks, setting a solid foundation for future advancements.

Chinese Innovation in AI Models

Ring-1T represents China’s ongoing efforts to challenge established AI models like GPT-5 and Gemini. Chinese companies have been rapidly releasing innovative models since the debut of DeepSeek earlier this year.

Alibaba, Ant Group’s parent company, recently launched Qwen3-Omni, a multimodal model integrating text, image, audio, and video. DeepSeek also introduced DeepSeek-OCR, redefining information processing methodologies.

With Ring-1T and advancements in training methods for large models, the competition for AI supremacy between China and the US intensifies, driving continuous innovation and technological progress.