Skip to content

Latest commit

 

History

History
250 lines (161 loc) · 12.6 KB

File metadata and controls

250 lines (161 loc) · 12.6 KB

Autonomous Driving Papers 🚗

This repository contains papers on autonomous driving.

📖 ICLR 2025

I might have missed some works as I only skimmed through the titles 🙏

✨ Perception

  • Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion

    • OpenReview
    • Keywords: sensor fusion, uncertainty quantification
    • Datasets: nuScenes
  • MamBEV: Enabling State Space Models to Learn Birds-Eye-View Representations

    • OpenReview
    • Keywords: BEV, state space model, causal attention
    • Datasets: nuScenes
  • MGMapNet: Multi-Granularity Representation Learning for End-to-End Vectorized HD Map Construction

    • OpenReview
    • Keywords: online HD Map construction, lane detection, multi-granularity representation
    • Datasets: nuScenes, Argoverse2
  • MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection

    • OpenReview
    • Keywords: test-time adaptation
    • Datasets: KITTI, Waymo, nuScenes, KITTI-C
  • Predictive Uncertainty Quantification for Bird's Eye View Segmentation: A Benchmark and Novel Loss Function

    • OpenReview
    • Keywords: uncertainty quantification, BEV segmentation
    • Datasets: CALRA, nuScenes, Lyft, nuScenes-C
  • RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection

    • OpenReview
    • Keywords: radar-camera 3D object detection, robust detection
    • Datasets:
  • TAU-106K: A New Dataset for Comprehensive Understanding of Traffic Accident

    • OpenReview
    • Keywords: traffic accident detection, MLLM
    • Datasets: TAU-106K (proposed)
  • Uni^2Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection

    • OpenReview
    • Keywords: multi-dataset training
    • Datasets: KITTI, Waymo, nuScenes
  • UniDrive: Towards Universal Driving Perception Across Camera Configurations

    • OpenReview / Code
    • Keywords: cross domain perception, sensor configuration
    • Datasets: CARLA

✨ Prediction

  • Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction

    • OpenReview
    • Keywords: ego-trajectory prediction, gaze, driver FOV
    • GEM (proposed; ego-motion dataset with driver positions and perspective)
  • Trajectory-LLM: A Language-based Data Generator for Trajectory Prediction in Autonomous Driving

    • OpenReview
    • Keywords: vehicle trajectory generator, language interface
    • Datasets: L2T (proposed), WOMD
  • OccProphet: Pushing the Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with an Observer-Forecaster-Refiner Framework

    • OpenReview
    • Keywords: camera-only occupancy forecasting
    • Datasets: nuScenes, nuScenes-Occupancy, Lyft-Level5
  • Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving

    • OpenReview
    • Keywords: occupancy prediction, occupancy forecasting, semi-supervised

✨ Planning

  • AdaWM: Adaptive World Model based Planning for Autonomous Driving

    • OpenReview
    • Keywords: reinforcement learning, world model, adaptive finetuning
    • Datasets: Bench2Drive, CARLA
  • Diffusion-Based Planning for Autonomous Driving with Flexible Guidance

    • OpenReview / Code
    • Keywords: diffusion model, flexible guidance
    • Datasets: nuPlan, Delivery-vehicle Dataset (proposed)

✨ Generation/Reconstruction

  • DynamicCity: Large-Scale LiDAR Generation from Dynamic Scenes

    • OpenReview / Code
    • Keywords: 4D occupancy generation, HexPlane, DiT, conditional generation
    • Datasets: Occ3D-Waymo, Occ3D-nuScenes, CarlaSC
  • FreeVS: Generative View Synthesis on Free Driving Trajectory

    • OpenReview / Code
    • Keywords: novel view synthesis, generative model, novel trajectory
    • Datasets: WOD
  • Glad: A Streaming Scene Generator for Autonomous Driving

    • OpenReview
    • Keywords: streaming video generation, diffusion models
    • Datasets: nuScenes
  • GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS

    • OpenReview / Code
    • Keywords: simulation, benchmark, multi-agent reinforcement learning, planning
  • GS-LiDAR: Generating Realistic LiDAR Point Clouds with Panoramic Gaussian Splatting

    • OpenReview / Code
    • Keywords: novel view synthesis, LiDAR simulation, gaussian splatting
    • Datasets: KITTI-360, nuScenes
  • OmniRe: Omni Urban Scene Reconstruction

    • OpenReview / Project page
    • Keywords: dynamic scene modeling, human modeling, gaussian splatting
    • Datasets: WOD, nuScenes, Argoverse2, PandaSet, KITTI, nuPlan
  • X-Drive: Cross-modality Consistent Multi-Sensor Data Synthesis for Driving Scenarios

    • OpenReview / Code
    • Keywords: multimodal generation, diffusion models
    • Datasets: nuScenes

✨ End-to-End Driving

  • DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving

    • OpenReview
    • Keywords: task parallelism, sparse representation, streaming processing
    • Datasets: Bench2Drive, nuScenes
  • Enhancing End-to-End Autonomous Driving with Latent World Model

    • OpenReview / Code
    • Keywords: world model, self-supervised future latent prediction
    • Datasets: nuScenes, NAVSIM, CARLA
  • Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving

    • OpenReview / Code
    • Keywords: (navigation-guided) sparse scene representation, self-supervised future feature prediction
    • Datasets: nuScenes, CARLA

✨ Collaborative Driving

  • Learning 3D Perception from Others' Predictions

    • OpenReview / code
    • Keywords: label-efficient learning, domain adaptation, curriculum learning
    • Datasets: V2V4Real, OPV2V
  • Point Cluster: A Compact Message Unit for Communication-Efficient Collaborative Perception

    • OpenReview
    • Keywords: communication efficiency, sparse detectors
    • Datasets: V2X-Set, OPV2V, DAIR-V2X
  • STAMP: Scalable Task- And Model-agnostic Collaborative Perception

    • OpenReview
    • Keywords: heterogeneous collaborative perception
    • Datasets: OPV2V, V2V4Real

General (haven't categorized yet..)

  • 3D StreetUnveiler with Semantic-aware 2DGS - a simple baseline [OpenReview]

  • Adversarial Generative Flow Network for Solving Vehicle Routing Problems [OpenReview]

  • Boosting Neural Combinatorial Optimization for Large-Scale Vehicle Routing Problems [OpenReview]

  • Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion [OpenReview]

  • CoMotion: Concurrent Multi-person 3D Motion [OpenReview]

  • CityAnchor: City-scale 3D Visual Grounding with Multi-modality LLMs [OpenReview]

  • Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels [OpenReview]

  • LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation models [OpenReview]

  • MGMapNet: Multi-Granularity Representation Learning for End-to-End Vectorized HD Map Construction [OpenReview]

  • Rethinking Light Decoder-based Solvers for Vehicle Routing Problems [OpenReview]

  • Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking [OpenReview]

  • SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation [OpenReview]

(Potentially) Related

  • 3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds [OpenReview]

  • 4K4DGen: Panoramic 4D Generation at 4K Resolution [OpenReview]

  • CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes [OpenReview]

  • Depth Any Video with Scalable Synthetic Data [OpenReview]

  • Depth Pro: Sharp Monocular Metric Depth in Less Than a Second [OpenReview]

  • EmbodiedSAM: Online Segment Any 3D Thing in Real Time [OpenReview]

  • Gaussian-Det: Learning Closed-Surface Gaussians for 3D Object Detection [OpenReview]

  • GOPS: Learning Generative Object Priors for Unsupervised 3D Instance Segmentation [OpenReview]

  • Interactive Adjustment for Human Trajectory Prediction with Individual Feedback [OpenReview]

  • MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility [OpenReview]

  • MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [OpenReview]

  • MTSAM: Multi-Task Fine-Tuning for Segment Anything Model [OpenReview]

  • Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation [OpenReview]

  • OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer [OpenReview]

  • PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection [OpenReview]

  • Point-SAM: Promptable 3D Segmentation Model for Point Clouds [OpenReview]

  • RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything[OpenReview]

  • SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement [OpenReview]

  • SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation [OpenReview]

  • Segment Any 3D Object with Language [OpenReview]

  • Stable Segment Anything Model [OpenReview]

  • State Space Model Meets Transformer: A New Paradigm for 3D Object Detection [OpenReview]

  • TAPE3D: Tracking All Pixels Efficiently in 3D [OpenReview]

  • Track-On: Transformer-based Online Point Tracking with Memory [OpenReview]

  • TSC-Net: Predict Pedestrian Trajectory by Trajectory-Scene-Cell Classification [OpenReview]

  • Union-over-Intersections: Object Detection beyond Winner-Takes-All [OpenReview]