Welcome to the repository accompanying our research on Spurious Forgetting in Continual Learning of Language Models, accepted at ICLR 2025. This repository is organized into two main sections, each dedicated to specific experiments and use cases.
This section focuses on experiments using the synthetic Biography Dataset, designed to simulate a controlled continual learning environment.
- Dataset Preparation: Generate the Biography dataset by running the preprocessing script:
./code_for_biography_dataset/physics_of_forgetting/data_preparation/preprocess_0720.py
- Pretraining:
- Train a model on 100K individuals to establish a foundational knowledge base.
- Continual Finetuning:
- Incrementally finetune the model on 20K individuals.
- Extended Settings:
- Include more tasks.
- Vary the number of individuals.
- Explore diverse task types.
- Recovery Experiments:
- Investigate the model’s ability to recover performance on previously seen tasks.
- Feature Perspective:
- Analyze residual stream shifts in the visualization directory:
./code_for_biography_dataset/physics_of_forgetting/residual_stream_shift_analysis
- Analyze residual stream shifts in the visualization directory:
This section extends the research to real-world scenarios, integrating methods and datasets that reflect practical continual learning challenges.
This section builds upon this incremental learning repository. For detailed instructions on dataset preprocessing and usage, refer to the README within this directory:
./code_for_realworld_scenarios/README.md
- Continual Finetuning on Biography Dataset:
- Methods: EWC, LAMOL, Task Vector, Gradient Projection, SEQ, REPLAY, Freeze.
- Safety Alignment:
- Methods: Freeze, SEQ.
- Continual Instruction Tuning:
- Methods: Freeze, SEQ.
- Continual Knowledge Editing:
- Methods: Freeze, SEQ.
- Instance Incremental Learning:
- Methods: Freeze, SEQ.
- Task Vector:
- Explore tradeoffs using:
./code_for_realworld_scenarios/visualization-tradeoff
- Explore tradeoffs using:
- Continual Learning Methods:
- Visualize EWC, LAMOL, and Gradient Projection results:
./code_for_realworld_scenarios/visualization_continual_learning_methods
- Visualize EWC, LAMOL, and Gradient Projection results:
- Weight Update Perspective:
- Examine orthogonal weight updates:
./code_for_realworld_scenarios/visualization-orthogonal-weight-update
- Examine orthogonal weight updates:
- Loss Landscape Perspective:
- Analyze the model’s loss landscape:
./code_for_realworld_scenarios/visualization-loss-landscape
- Analyze the model’s loss landscape:
If you find this repository useful, please consider citing our research:
@inproceedings{
zheng2025spurious,
title={Spurious Forgetting in Continual Learning of Language Models},
author={Junhao Zheng and Xidi Cai and Shengjie Qiu and Qianli Ma},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=ScI7IlKGdI}
}
Help us grow by starring 🌟 this repository on GitHub! 💖
Thank you for your interest in our work. We look forward to your feedback and collaboration! ✨
If you have questions about this repository, please feel free to contact me at junhaozheng47@outlook.com.