AI
Cost-Efficient AI Model Retraining to Prevent Forgetting: A Breakthrough Discovery by Researchers

Enterprises frequently discover that when refining models, a successful strategy to optimize a large language model (LLM) involves sacrificing some of its capabilities to ensure it aligns with the intended purpose and remains grounded in data. Following fine-tuning, certain models may “forget” how to execute specific tasks or previously acquired skills.
Research conducted by the University of Illinois Urbana-Champaign introduces a novel approach to retraining models that prevents “catastrophic forgetting,” where the model loses past knowledge. The study concentrates on two distinct LLMs, namely LLaVA and Qwen 2.5-VL, which generate responses from images.
The proposed method encourages enterprises to selectively retrain specific sections of an LLM to avoid the need for comprehensive model retraining, which could result in a significant increase in computational expenses. The researchers argue that catastrophic forgetting is not synonymous with genuine memory loss but rather a consequence of bias drift.
“Developing a new LMM can be a costly and time-consuming process, accompanied by a substantial carbon footprint. Therefore, identifying more efficient and effective ways to update existing models is imperative,” the team emphasized in their paper. “Guided by this insight, we explore tuning approaches that maintain learning while limiting output deviations.”
The study focused on a multi-layer perceptron (MLP), the internal decision-making component of the model.
Catastrophic Forgetting
The researchers initially sought to confirm the presence and cause of catastrophic forgetting in models.
To achieve this, they designed a series of tasks for the models to complete. Following fine-tuning and evaluation, the researchers observed whether significant forgetting occurred. Interestingly, as the process unfolded, the models began to regain some of their capabilities.
“We observed a notable phenomenon where model performance would decrease notably in certain benchmarks after training on a specific task. However, the performance would largely recover on other specialized tasks not well-represented in the benchmarks,” they noted. “During the forgetting mitigation experiments, we also explored the option of tuning only the self-attention projection (SA Proj) or MLP layers, based on the observation that tuning solely the LLM yielded better results than full model tuning. Surprisingly, tuning only the self-attention projection layers facilitated significant learning of the target tasks without compromising performance on held-out tasks, even after sequential training on all five target tasks.”
The researchers suggested that what appears as forgetting or interference post fine-tuning on a specific task is actually a bias in the output distribution caused by a shift in task distribution.
Narrow Retraining
This discovery proved pivotal in the experiment. The researchers highlighted that tuning the MLP increased the chances of outputting numeric tokens, leading to a corresponding drop in accuracy for held-out tasks. This indicated that temporary knowledge loss in a model is transient and not a long-term issue.
“To prevent bias in the output distribution, we propose tuning the MLP up/gating projections while keeping the down projection static, which resulted in comparable learning to full MLP tuning with minimal forgetting,” the researchers explained.
This streamlined approach offers a more efficient and reproducible method for fine-tuning a model. By concentrating on a specific segment of the model rather than retraining it entirely, enterprises can reduce computational costs and better manage output variations.
However, the study focused solely on two models specializing in vision and language tasks. The researchers acknowledged their limitations in experimenting with other models due to resource constraints.
Nevertheless, the findings can be extrapolated to other LLMs, particularly those involving diverse modalities.
-
Video Games2 days ago
Tekken 8: Rise of the Shadows
-
Video Games1 day ago
Goku Takes on the Dragon Ball FighterZ Arena
-
Amazon2 days ago
Neil Young Takes a Stand: Pulling Music from Amazon in Protest of Jeff Bezos’ Support for Trump
-
Tech News2 days ago
Samsung Galaxy UI 8: Embracing the Big Free AI Upgrade
-
Security2 days ago
Critical Vulnerability Exposed: Oracle EBS Targeted in Recent Cyber Attacks by Cl0p Hackers
-
Microsoft2 days ago
Enhanced Copilot Features: Creating Office Documents and Gmail Integration
-
Apple2 days ago
Exploring the Dystopian Realms of Pluribus: An Apple Original Series Trailer
-
AI1 day ago
Oracle’s Next-Gen Enterprise AI Services Powered by NVIDIA’s Cutting-Edge GPUs