Within this repository, we collect works that aim to show the power of LMMs in the field of video understanding, such as:
- True Video Understanding: The ability of current Video-LMMs in real world video understanding.
- AI-generated Video Understanding: The ability of current Video-LMMs in AI-generated video understanding.
-
PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos (2024-12-02)
Note: video game understanding
-
LLaVA-OneVision: Easy Visual Task Transfer (2024-10-26)
Note: general video understanding
-
Video Instruction Tuning With Synthetic Data (2024-10-04)
Note: general video understanding, surprising performance!
-
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution (2024-10-03)
Note: general video understanding
- Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation (2024-10-07)
Note: physics understanding in Generated video introduced
- VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation (2024-06-21)
Note: human perspective alignment with score only
- Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation (2024-10-07)
Note: physics understanding in Generated video introduced