Connect with us

AI

Cost-Efficient AI Model Retraining to Prevent Forgetting: A Breakthrough Discovery by Researchers

Published

on

Enterprises frequently discover that when refining models, a successful strategy to optimize a large language model (LLM) involves sacrificing some of its capabilities to ensure it aligns with the intended purpose and remains grounded in data. Following fine-tuning, certain models may “forget” how to execute specific tasks or previously acquired skills.

Research conducted by the University of Illinois Urbana-Champaign introduces a novel approach to retraining models that prevents “catastrophic forgetting,” where the model loses past knowledge. The study concentrates on two distinct LLMs, namely LLaVA and Qwen 2.5-VL, which generate responses from images.

The proposed method encourages enterprises to selectively retrain specific sections of an LLM to avoid the need for comprehensive model retraining, which could result in a significant increase in computational expenses. The researchers argue that catastrophic forgetting is not synonymous with genuine memory loss but rather a consequence of bias drift.

“Developing a new LMM can be a costly and time-consuming process, accompanied by a substantial carbon footprint. Therefore, identifying more efficient and effective ways to update existing models is imperative,” the team emphasized in their paper. “Guided by this insight, we explore tuning approaches that maintain learning while limiting output deviations.”

The study focused on a multi-layer perceptron (MLP), the internal decision-making component of the model.

Catastrophic Forgetting

The researchers initially sought to confirm the presence and cause of catastrophic forgetting in models.

To achieve this, they designed a series of tasks for the models to complete. Following fine-tuning and evaluation, the researchers observed whether significant forgetting occurred. Interestingly, as the process unfolded, the models began to regain some of their capabilities.

See also  Figaro.ai Acquisition: Tuned Global's Music Technology Leadership Reinforced

“We observed a notable phenomenon where model performance would decrease notably in certain benchmarks after training on a specific task. However, the performance would largely recover on other specialized tasks not well-represented in the benchmarks,” they noted. “During the forgetting mitigation experiments, we also explored the option of tuning only the self-attention projection (SA Proj) or MLP layers, based on the observation that tuning solely the LLM yielded better results than full model tuning. Surprisingly, tuning only the self-attention projection layers facilitated significant learning of the target tasks without compromising performance on held-out tasks, even after sequential training on all five target tasks.”

The researchers suggested that what appears as forgetting or interference post fine-tuning on a specific task is actually a bias in the output distribution caused by a shift in task distribution.

Narrow Retraining

This discovery proved pivotal in the experiment. The researchers highlighted that tuning the MLP increased the chances of outputting numeric tokens, leading to a corresponding drop in accuracy for held-out tasks. This indicated that temporary knowledge loss in a model is transient and not a long-term issue.

“To prevent bias in the output distribution, we propose tuning the MLP up/gating projections while keeping the down projection static, which resulted in comparable learning to full MLP tuning with minimal forgetting,” the researchers explained.

This streamlined approach offers a more efficient and reproducible method for fine-tuning a model. By concentrating on a specific segment of the model rather than retraining it entirely, enterprises can reduce computational costs and better manage output variations.

However, the study focused solely on two models specializing in vision and language tasks. The researchers acknowledged their limitations in experimenting with other models due to resource constraints.

See also  Protecting Personal Information: How CaseGuard Studio Sets the Standard for Privacy-Focused AI Redaction

Nevertheless, the findings can be extrapolated to other LLMs, particularly those involving diverse modalities.

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Trending