AI

Advancing Reinforcement Learning: Olmo 3.1 Enhances Reasoning Benchmark Performance

Published

7 months ago

December 12, 2025

Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks

The Allen Institute for AI (Ai2) has recently unveiled its latest and most powerful family of models, Olmo 3. This new release, Olmo 3.1, focuses on efficiency, transparency, and control for enterprises. The updated versions of Olmo 3.1 include Think 32B and Instruct 32B, optimized for advanced research and instruction-following, respectively.

Additionally, Olmo 3 introduces a third version, Olmo 3-Base, tailored for programming, comprehension, and math tasks. To enhance the performance of Olmo 3.1 Think 32B, Ai2 extended its reinforcement learning (RL) training run, resulting in significant improvements across various benchmarks such as AIME, ZebraLogic, IFEval, and IFBench.

Ai2 also developed Olmo 3.1 Instruct 32B, which is designed for chat, tool use, and multi-turn dialogue. This model outperforms its predecessors and is ready for real-world applications. The new checkpoints are currently available on the Ai2 Playground and Hugging Face platforms, with API access expected soon.

The Olmo 3.1 models have demonstrated superior performance on benchmark tests, surpassing previous models like Qwen 3 32B and Gemma 27B. Olmo 3.1 Instruct has even outperformed models like Gemma 3 on math benchmarks, showcasing its capabilities in chat and dialogue tasks.

In addition to model upgrades, Ai2 has also improved its RL-Zero 7B models for math and coding through longer and more stable training runs. The company remains committed to transparency and open source initiatives, allowing organizations to enhance the model’s data mix and retrain it based on additional input.

Ai2’s dedication to transparency is evident in tools like OlmoTrace, which tracks the alignment of model outputs with its training data. By combining openness with performance improvements, Ai2 continues to advance capabilities while maintaining transparency over data, code, and training decisions.

Overall, Ai2’s Olmo 3.1 models represent a significant step forward in AI research and development, offering enhanced efficiency, transparency, and control for enterprises and research labs alike. With a focus on performance and openness, these models are poised to drive innovation in the field of artificial intelligence.