🤹 2025-02: We provide HuggingFace gradio demo and self-deployed demo for EgoGPT.
🌟 2025-02: The EgoLife video is released at HuggingFace and uploaded to Youtube as video collection.
🌟 2025-02: We release the EgoIT-99K dataset at HuggingFace.
🌟 2025-02: We release the first version of EgoGPT and EgoRAG codebase.
📖 2025-02: Our arXiv submission is currently on hold. For an overview, please visit our academic page.
🎉 2025-02: The paper is accepted to CVPR 2025. Please be invited to our online EgoHouse.
EgoGPT is an omni-modal vision-language model fine-tuned on egocentric datasets. It performs continuous video captioning, extracting key events, actions, and context from first-person video and audio streams.
Key Features:
- Dense captioning for visual and auditory events.
- Fine-tuned for egocentric scenarios (optimized for EgoLife data).
EgoRAG is a retrieval-augmented generation (RAG) module that enables long-term reasoning and memory reconstruction. It retrieves relevant past events and synthesizes contextualized answers to user queries.
Key Features:
- Hierarchical memory bank (hourly, daily summaries).
- Time-stamped retrieval for context-aware Q&A.
EgoLife/
│── assets/ # General assets used across the project
│── EgoGPT/ # Core module for egocentric omni-modal model
│── EgoRAG/ # Retrieval-augmented generation (RAG) module
│── README.md # Main documentation for the overall project
Please dive in to the project of EgoGPT and EgoRAG for more details.
If you use EgoLife in your research, please cite our work:
@misc{yang2025egolifeegocentriclifeassistant,
title={EgoLife: Towards Egocentric Life Assistant},
author={Jingkang Yang and Shuai Liu and Hongming Guo and Yuhao Dong and Xiamengwei Zhang and Sicheng Zhang and Pengyun Wang and Zitang Zhou and Binzhu Xie and Ziyue Wang and Bei Ouyang and Zhengyu Lin and Marco Cominelli and Zhongang Cai and Yuanhan Zhang and Peiyuan Zhang and Fangzhou Hong and Joerg Widmer and Francesco Gringoli and Lei Yang and Bo Li and Ziwei Liu},
year={2025},
eprint={2503.03803},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.03803},
}
This project is licensed under the S-Lab license. See the LICENSE file for details.