Skip to content

UAV Navigation in 3D Unknown Environments on Estimated Monocular Depth

License

Notifications You must be signed in to change notification settings

jonasctrl/monocular-slam-drone

Repository files navigation

FAUN-MoDe: Framework for Autonomous UAV Navigation Based on Monocular Depth Estimation

UAV
IROS 2025: A Framework for Autonomous UAV Navigation Based on Monocular Depth Estimation

About Project

The solution utilizes a depth image estimation model to create an occupancy grid map of the surrounding area and uses an A* path planning algorithm to find optimal paths to end goals while simultaneously navigating around obstacles. The simulation is conducted using AirSim in Unreal Engine.

Important

This project provides an open-source framework that uses a virtual environment to test and compare autonomous UAV navigation methods based on monocular vision.

Current drone navigation system is proof of concept. Refer to the Citation section for the journal paper to learn more about the research and concerns addressed application in real-world scenarios.

Citation

When using A Framework for Autonomous UAV Navigation Based on Monocular Depth Estimation, please cite the following journal paper (pdf, video)

@article{monocular-slam-drone,
  title={A Framework for Autonomous UAV Navigation Based on Monocular Depth Estimation},
  author={Gaigalas Jonas and Perkauskas Linas and Gricius Henrikas, Kanapickas Tomas and Kriščiūnas Andrius},
  journal={TODO},
  year={2025},
  publisher={TODO},
}

Prerequisites

  • Docker installed (with docker-compose support)
  • NVIDIA GPU and drivers with CUDA support for GPU acceleration (optional)
  • X11 server running for displaying graphical applications (optional)

Tip

If using Windows, you can set up VcXsrv as your X11 server. Instructions are provided in the "Setting Up X11 Server" section below.

Project Setup Steps

Follow the steps below to set up the project on your local machine:

1. Clone Repository

Clone repository with included git submodules by running:

git clone --recurse-submodules https://github.com/jonasctrl/monocular-slam-drone.git

2. Build Docker Container

To build the project using docker compose:

docker compose up --build

3. Configure Environment

Caution

You need to create a .env file in the root directory and update it with your system-specific variables for display forwarding.

DISPLAY=192.168.0.120:0 # Your X11 server address

Tip

See .env.example for general steps on how to find your display for X11 forwarding.

4. Import Fine-tuned Models

Select the environment you want to use and import the fine-tuned model for it:

5. Start Simulation Environment

Start the simulation environment in Unreal Engine. The environment can be downloaded from the AirSim Environments page.

6. Configure AirSim

After starting the simulation environment:

  1. AirSim will create configuration files in C:\Users\{username}\Documents\AirSim\
  2. Copy the settings.json from the repository's config folder to the AirSim config directory

7. Launch the System

To start the project, run the following command inside the docker container:

python3 /catkin_ws/src/drone-node/src/mapper_nav_ros.py

Tip

The red grid shown in Rviz is the uncertain depth space and the blue one is the occupied-known space. Use the RVIZ window GUI to plan specified missions with the 2D planning tool.

Architecture

The proposed architecture for an autonomous monocular drone navigation system is depicted in following diagram:

Architecture diagram for autonomous monocular drone navigation system

The system is divided into:

  • Simulation Environment: Using Airsim v1.8.1
  • Monocular Drone Navigation System: Running in a containerized Docker environment with Ubuntu 20.04 and using the ROS noetic v1.17.0 framework for communication

The navigation system consists of three modules:

Module Description
Depth Estimation Module (DEM) Estimates depth images from the RGB camera feed provided by the simulation environment. Based on "Depth Anything V2" model.
Mapper Module (MM) Builds and iteratively updates the occupancy map-based 3D environment using depth images from DEM and camera position/orientation from the simulation.
Navigation Module (NM) Finds viable path trajectories to specified points in the mapped 3D environment using the A* algorithm. Output is fed back to the simulation environment.

Depth Mapping in 3D Space

The following images demonstrate the depth estimation and mapping process:

Camera Image 3D Mapping Visualization
Camera image from the simulation environment 3D space mapping visualization

Tip

The red voxel grid shown in Rviz is the uncertain depth space and the blue one is the occupied-known space.

Setting Up X11 Server for Display Forwarding

Windows

If you're running this on Windows, you can set up an X11 server using VcXsrv:

  1. Download and install VcXsrv from SourceForge
  2. During configuration, check the box "Disable access control"
  3. Ensure that DISPLAY is correctly set in your .env file

Linux

If using a Linux-based environment:

export DISPLAY=host.docker.internal:0
xhost +

Tip

To test the X11 setup, run xclock inside the container. If you see a clock window appear, your X11 server is configured correctly.

License

This project is licensed under the MIT License - see the LICENSE file for details.