A microcontroller-compatible version of Heatmap-Guided 6-DoF Grasp Detection.
All requirements are tested and confirmed working on Windows 10/11. Other versions may be compatible but are to be used at own discretion.
- Python 3.9.13
- CUDA Toolkit 12.2
- GCC 8.3.0
- Microsoft Visual Studio 2022
- Download the code. It is recommended to create a separate virtual environment.
git clone https://github.com/ThomasVroom/HGGD-MCU.git
- Install torch (with relevant CUDA toolkit; below is for python 3.9.13).
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
- Install fvcore and wheel separately.
pip install fvcore wheel
- Download the code for pytorch3d (leave in the root).
git clone https://github.com/facebookresearch/pytorch3d.git
- Compile pytorch3d (this can take up to 30 minutes).
pip install --no-build-isolation ./pytorch3d
- Install the remaining packages.
pip install -r requirements.txt
To run the training / testing scripts, you need the following data:
- 3D Models: download
models.zip
from the GraspNet dataset, unzip intograspnet/
. - Images: download train and test images from the GraspNet dataset, unzip into
graspnet/scenes/
. - Labels: download
realsense.7z
from this link, unzip intodata/
.
Note that this is not necessary if you just want to run a forward pass through the model. The data is only needed for running the training / testing scripts.
demo.py
performs a single forward pass through the model using the RGBD images and checkpoint in resources/
.
The first output of the demo script is the visualization of the heatmap:
The top-left of this visualization shows the original RGB image, the top-right shows the original depth image, the bottom-left shows the grasp heatmap, and the bottom-right shows the predicted 2D grasps.
After closing this visualization, the predicted grasps are converted to 6D grasps and visualized using an interactive pointcloud:
The demo script can also be used for exporting the model as a series of optimized onnx files, which can be used to run the model on a microcontroller. The hyperparameters of the model are controlled at the top of the file.
test_graspnet.py
performs accuracy tests on a subset of the GraspNet dataset using the checkpoint in resources/
.
The testing results are saved in logs/
.
The hyperparameters of the model and the testing are controlled at the top of the file.
The scene-l
and scene-r
parameters control what subset of the GraspNet data the model is tested on:
GraspNet | scene-l |
scene-r |
---|---|---|
Seen | 100 | 130 |
Similar | 130 | 160 |
Novel | 160 | 190 |
If you want to visualize some test cases (similarly to the demo script), you can add the index to the vis_id
array in line 149.
train_graspnet.py
performs the training of a model on a subset of the GraspNet dataset.
A log of the training can be accessed using TensorBoard:
tensorboard --logdir=./logs/
The model is trained end-to-end and the checkpoint(s) are saved to the logs/
directory.
vis_output.py
visualizes the 2d- and 6d-predictions from an MCU (or any external predictor).
The exact variables that need to be imported are:
pred_2d
(output of AnchorNet)perpoint_features
(output of AnchorNet)pred
(output of LocalNet)offset
(output of LocalNet)
Several changes have been made to the original HGGD model to make it compatible with microcontrollers. The following list provides an overview of the most relevant changes made:
- The resolution of the input image can be controlled through the
input-w
andinput-h
parameters. Whatever image is imported is automatically resized to this resolution. The smallest supported size is 320x160. Image scalling is done using Lanczos for RGB and nearest-neighbour for depth (to prevent artifacts). - The code is runnable with or without a GPU. Whether a GPU is used or not is determined through the
torch.cuda.is_available()
method. - The two models of HGGD (AnchorNet & LocalNet) have been split into four smaller models: ResNet, AnchorNet, PointNet, and LocalNet.
- There is support for exporting all four models as onnx files.
- Grasps can be predicted by an external predictor using the
vis_output.py
script. - A lot of documentation has been added to the existing code.
See mcu/
.
This repository is a fork of the original HGGD implementation. Their paper can be cited as follows:
@article{chen2023efficient,
title={Efficient Heatmap-Guided 6-Dof Grasp Detection in Cluttered Scenes},
author={Chen, Siang and Tang, Wei and Xie, Pengwei and Yang, Wenming and Wang, Guijin},
journal={IEEE Robotics and Automation Letters},
year={2023},
publisher={IEEE}
}