-
NLPR, CASIA
- Shanghai China
-
10:13
- 8h ahead - https://orcid.org/0009-0006-1577-9223
- in/haochen-tian-7b7a06288
Highlights
- Pro
Lists (5)
Sort Name ascending (A-Z)
Stars
Wan: Open and Advanced Large-Scale Video Generative Models
Official Repository of paper OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓
The Next Step Forward in Multimodal LLM Alignment
[ICLR 2025 Oral] The official implementation of "Diffusion-Based Planning for Autonomous Driving with Flexible Guidance"
PLUTO: Push the Limit of Imitation Learning-based Planning for Autonomous Driving
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
World's First Large-scale High-quality Robotic Manipulation Benchmark
[RSS 2024] Learning Manipulation by Predicting Interaction
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
[CVPR 2025] Truncated Diffusion Model for Real-Time End-to-End Autonomous Driving
[NeurIPS 2024] Behavioral Topology (BeTop), a multi-agent behavior formulation for interactive motion prediction and planning
Align Anything: Training All-modality Model with Feedback
[NeurIPS 2024] NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking
[NeurIPS 2024] CLOVER: Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation
The official repository of our survey paper: "Towards a Unified View of Preference Learning for Large Language Models: A Survey"
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Enhancing End-to-End Autonomous Driving with Latent World Model
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Can 3D Vision-Language Models Truly Understand Natural Language?
[ECCV 2024] Embodied Understanding of Driving Scenarios
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
Code and documentation to train Stanford's Alpaca models, and generate the data.