Beyond Context: Enhancing LLM Comprehension through Extended Memory

This repository explores state-of-the-art techniques for extending the context windows of large language models (LLMs). By leveraging architectures such as the Recurrent Memory Transformer and drawing insights from Google's Inifini Attention, we aim to deepen our understanding of how extended context windows affect model performance—especially in terms of metrics like context recall, faithfulness, and overall responsiveness.

Project Objectives

Investigate Extended Contexts:
Analyze how large context windows influence LLM performance in retaining, processing, and leveraging extensive inputs.
Recurrent Memory Transformer (RMT):
Explore the RMT architecture that employs custom memory cells (MemoryCell and RecurrentWrapper) to propagate context across segmented inputs.
Inifini Attention Insights:
Utilize concepts from Google's Inifini Attention to further enhance context management in LLM systems.
Comprehensive Evaluation:
Assess system performance using a variety of metrics to ensure robust and reliable deployment in production environments.

Evaluation Metrics

Before launching your LLM system into production, it is essential to evaluate it against a suite of critical metrics:

Answer Relevancy
Prompt Alignment
Correctness
Hallucination
Contextual Relevancy
Responsible Metrics
Task-Specific Metrics

Highlights

Innovative Architecture:
Combines the RMT approach with efficient low-rank adaptation (LoRA) to optimize performance while extending context windows.
Robust Experimentation:
Utilizes the Wikitext-2 dataset for extensive experiments, evaluating performance across various memory configurations.
Comprehensive Analysis:
Detailed insights and experimental findings are documented in the accompanying report (NLP_Final_Report.pdf).

Quick Start

Clone the Repository:

git clone https://github.com/Vishwa44/nlp_rmt.git

Install Dependencies Ensure you have Python 3.7+ installed, then run:

pip install numpy torch tqdm datasets wandb transformers matplotlib

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Inference		Inference
data		data
model/model		model/model
modeling_rmt		modeling_rmt
.gitignore		.gitignore
NLP_Final_Report.pdf		NLP_Final_Report.pdf
README.md		README.md
evals.ipynb		evals.ipynb
load_saved_model_ablation.py		load_saved_model_ablation.py
mem_size_ablation.py		mem_size_ablation.py
mem_size_lora_sweep.py		mem_size_lora_sweep.py
script_2.py		script_2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beyond Context: Enhancing LLM Comprehension through Extended Memory

Project Objectives

Evaluation Metrics

Highlights

Quick Start

License

About

Releases

Packages

Languages

starceees/Recurrent_Memory_transformer

Folders and files

Latest commit

History

Repository files navigation

Beyond Context: Enhancing LLM Comprehension through Extended Memory

Project Objectives

Evaluation Metrics

Highlights

Quick Start

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages