Skip to content

Commit

Permalink
Document downloading the av2 sensor data used in 4DGF. (#12)
Browse files Browse the repository at this point in the history
Co-authored-by: tobiasfshr <tobias.fischer@inf.ethz.ch>
  • Loading branch information
JiantengChen and tobiasfshr authored Oct 7, 2024
1 parent 237bc0a commit 9656b4c
Show file tree
Hide file tree
Showing 6 changed files with 251 additions and 74 deletions.
80 changes: 6 additions & 74 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,84 +39,16 @@ python setup.py develop
```

## Data
We support the following datasets:
- [Argoverse 2](https://www.argoverse.org/av2.html) (Sensor dataset)
- [KITTI](https://www.cvlibs.net/datasets/kitti/eval_tracking.php) (tracking split)
- [VKITTI2](https://europe.naverlabs.com/research/computer-vision/proxy-virtual-worlds-vkitti-2/)
- [Waymo Open](https://waymo.com/open/)

Download the datasets to a location of your convenience. You can later adjust the data path in the preprocessing script. Note that we provide a joint download & preprocess utility for Waymo (see below).

By default, assume the following folder structure:
```
data/
Argoverse2/
train/
0c61aea3-3cba-35f3-8971-df42cd5b9b1a/
...
KITTI/
tracking/
training/
image_02/
...
VKITTI2/
Scene01/
...
waymo/
...
```

For Argoverse 2, the ego-vehicle masks are located at `assets/masks` by default.

Generate the necessary metadata files with:
Use our preprocessing scripts to prepare the datasets:
```
mp-process [av2|kitti|vkitti2|waymo]
```

To prepare the full datasets, run:

### VKITTI2
```
mp-process vkitti2 --sequence 02
mp-process vkitti2 --sequence 06
mp-process vkitti2 --sequence 18
```

### KITTI

```
mp-process kitti --sequence 0001
mp-process kitti --sequence 0002
mp-process kitti --sequence 0006
```
### Waymo
```
mp-process waymo
```

This will download and preprocess the full Dynamic-32 split from [EmerNeRF](https://emernerf.github.io/).

### Argoverse 2
```
# Residential split
mp-process av2 --location-aabb 6180 1620 6310 1780
# Downtown split
mp-process av2 --location-aabb 1100 -50 1220 150
# Single sequence
mp-process av2
```

NOTE: For Argoverse 2, install the modified [devkit](https://argoverse.github.io/user-guide/getting_started.html) via
```
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup default nightly-2023-12-11
pip install git+https://github.com/tobiasfshr/av2-api.git
```
We use the following rustc version: `rustc 1.76.0-nightly (21cce21d8 2023-12-11)`.
We provided detailed instructions for preparing the supported datasets in our documentation:
- [Argoverse 2](docs/datasets/Argoverse2.md)
- [KITTI](docs/datasets/KITTI.md)
- [VKITTI2](docs/datasets/VKITTI2.md)
- [Waymo Open](docs/datasets//Waymo.md)

## Models

Expand Down
63 changes: 63 additions & 0 deletions docs/datasets/Argoverse2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# [Argoverse 2](https://www.argoverse.org/av2.html)

This dataset is a collection of open-source autonomous driving data and high-definition (HD) maps from six U.S. cities: Austin, Detroit, Miami, Pittsburgh, Palo Alto, and Washington, D.C. This release builds upon the initial launch of Argoverse (“Argoverse 1”), which was among the first data releases of its kind to include HD maps for machine learning and computer vision research.

We provide scripts to download and preprocess the parts of the Argoverse 2 dataset used in our experiments. We refer to the [Argoverse User Guide](https://argoverse.github.io/user-guide/getting_started.html#overview) for detailed instructions on how to get started with the dataset.

## Requirements
To download and preprocess the Argoverse 2 dataset:

1. **Install our modified [Argoverse 2 devkit](https://argoverse.github.io/user-guide/getting_started.html) via**
```
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup default nightly-2023-12-11
pip install git+https://github.com/tobiasfshr/av2-api.git
```
We use the following rustc version: `rustc 1.76.0-nightly (21cce21d8 2023-12-11)`.

2. **Install `s5cmd`**

#### Conda Installation (Recommended)

```bash
conda install s5cmd -c conda-forge
```

#### Manual Installation

```bash
#!/usr/bin/env bash

export INSTALL_DIR=$HOME/.local/bin
export PATH=$PATH:$INSTALL_DIR
export S5CMD_URI=https://github.com/peak/s5cmd/releases/download/v2.0.0/s5cmd_2.0.0_$(uname | sed 's/Darwin/macOS/g')-64bit.tar.gz

mkdir -p $INSTALL_DIR
curl -sL $S5CMD_URI | tar -C $INSTALL_DIR -xvzf - s5cmd
```

Note that it will install s5cmd in your local bin directory. You can always change the path if you prefer installing it in another directory. Note that an AWS account is **not** required to download the datasets.

## Download & Preprocessing
Use the following commands to download and preprocess the data:

```
# Residential split
mp-process av2 --location-aabb 6180 1620 6310 1780
# Downtown split
mp-process av2 --location-aabb 1100 -50 1220 150
# Single sequence
mp-process av2
```

By default, this will download and preprocess the dataset in the following folder structure:
```
data/
Argoverse2/
train/
0c61aea3-3cba-35f3-8971-df42cd5b9b1a/
...
```
You can adjust the path with the `--data` option. Note that we provide ego-vehicle masks for this dataset located at `assets/masks`.
23 changes: 23 additions & 0 deletions docs/datasets/KITTI.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# [KITTI](https://www.cvlibs.net/datasets/kitti/eval_tracking.php)

Download the dataset to a location of your convenience. You can later adjust the data path in the preprocessing script.

## Preprocessing

By default we assume the following dataset location.
```
data/
KITTI/
tracking/
training/
image_02/
...
```

You can then process the data with the following commands:

```
mp-process kitti --sequence 0001
mp-process kitti --sequence 0002
mp-process kitti --sequence 0006
```
23 changes: 23 additions & 0 deletions docs/datasets/VKITTI2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@


# [VKITTI2](https://europe.naverlabs.com/research/computer-vision/proxy-virtual-worlds-vkitti-2/)

Download the dataset to a location of your convenience. You can later adjust the data path in the preprocessing script.

## Preprocessing

By default we assume the following dataset location.
```
data/
VKITTI2/
Scene01/
...
```

You can then process the data with the following commands:

```
mp-process vkitti2 --sequence 02
mp-process vkitti2 --sequence 06
mp-process vkitti2 --sequence 18
```
32 changes: 32 additions & 0 deletions docs/datasets/Waymo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# [Waymo](https://waymo.com/open/)

## Requirements
Please the Waymo API install via

```
pip install waymo-open-dataset-tf-2-11-0==1.6.1 --no-deps
```

Note that due to a dependency conflict with numpy, you need to install e.g. tensorflow manually after.

## Download & Preprocessing

We provide data download and preprocessing of the full Dynamic-32 split from [EmerNeRF](https://emernerf.github.io/) via a single command:

```
mp-process waymo
```

By default, this will download and the data to the following location:

```
data/
waymo/
raw/
segment-...
processed/
...
metadata_segment-...
```

You can adjust the path with the `--data` option.
104 changes: 104 additions & 0 deletions src/map4d/scripts/datasets/argoverse2.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import json
import os
import pickle
import shutil
import subprocess
from dataclasses import dataclass
from pathlib import Path
from typing import Literal, Tuple
Expand All @@ -27,6 +29,52 @@
av2 = None


# Residential split
RESIDENTIAL_SEQUENCE_IDS = [
"0c61aea3-3cba-35f3-8971-df42cd5b9b1a",
"7c30c3fc-ea17-38d8-9c52-c75ccb112253",
"a2f568b5-060f-33f0-9175-7e2062d86b6c",
"b9f73e2a-292a-3876-b363-3ebb94584c7a",
"cea5f5c2-e786-30f5-8305-baead8923063",
"6b0cc3b0-2802-33a7-b885-f1f1409345ac",
"7cb4b11f-3872-3825-83b5-622e1a2cdb28",
"a359e053-a350-36cf-ab1d-a7980afaffa2",
"c654b457-11d4-393c-a638-188855c8f2e5",
"f41d0e8f-856e-3f7d-a3f9-ff5ba7c8e06d",
"6f2f7d1e-8ded-35c5-ba83-3ca906b05127",
"8aad8778-73ce-3fa0-93c7-804ac998667d",
"b51561d9-08b0-3599-bc78-016f1441bb91",
"c990cafc-f96c-3107-b213-01d217b11272",
]

# Downtown split
DOWNTOWN_SEQUENCE_IDS = [
"05853f69-f948-3d04-8d64-d4e721c0e1a5",
"05fb81ab-5e46-3f63-a59f-82fc66d5a477",
"150ae964-5091-3681-b738-88715052c792",
"1a6487dd-8bc6-3762-bd8a-2e50e15dbe75",
"37fcd8ac-c148-3c4a-92ac-a10f355451b7",
"422dd53b-6010-4eb9-8902-de3d134c5a70",
"51e6b881-e5a1-30c8-ae2b-02891d5a54ce",
"5bc5e7b0-4d90-3ac8-8ca8-f6037e1cf75c",
"5d9c1080-e6e9-3222-96a2-37ca7286a874",
"6bae6c0c-5296-376d-96bc-6c8dbe6693a5",
"6e106cf8-f6dd-38f6-89c8-9be7a71e7275",
"8184872e-4203-3ff1-b716-af5fad9233ec",
"8606d399-57d4-3ae9-938b-db7b8fb7ef8c",
"89f79c55-6698-3037-bd2e-d40c81af169a",
"9158b740-6527-3194-9953-6b7b3b28d544",
"931b76ee-63df-36f6-9f2e-7fb16f2ee721",
"9eb87a0b-2457-359d-b958-81e8583d8e44",
"9efe1171-6faf-3427-8451-8f6469f7678e",
"bd9636d2-7220-3585-9c7d-4acaea167b71",
"c8cdffb0-7942-3ff5-9f71-210e095e1d31",
"d0828f48-3e67-3136-9c70-1f99968c8280",
"e453f164-dd36-3f1a-9471-05c2627cbaa5",
"fb720691-1736-3fa2-940b-07b97603efc6",
]


@dataclass
class ProcessArgoverse2:
cameras: Tuple[str, ...] = (
Expand Down Expand Up @@ -78,13 +126,69 @@ class ProcessArgoverse2:
masks_path: Path = Path("assets/masks").absolute()
"""Path to ego-vehicle masks."""

def _download_data(self, split: str):
split_name = "train" if split in ["train", "residential", "downtown"] else split
CONSOLE.log(f"Downloading Argoverse2 {split} split...")
target_dir = self.data / split_name

# Create an S3 client with no signature required
if split == "residential":
sequences = RESIDENTIAL_SEQUENCE_IDS
for sequence in sequences:
s3_path = f"s3://argoverse/datasets/av2/sensor/{split_name}/{sequence}/*"
self._download_s3(s3_path, target_dir / sequence)
elif split == "downtown":
sequences = DOWNTOWN_SEQUENCE_IDS
for sequence in sequences:
s3_path = f"s3://argoverse/datasets/av2/sensor/{split_name}/{sequence}/*"
self._download_s3(s3_path, target_dir / sequence)
else:
s3_path = f"s3://argoverse/datasets/av2/sensor/{split_name}/*"
self._download_s3(s3_path, target_dir)

def _download_s3(self, s3_path: str, target_dir: Path):
assert shutil.which(
"s5cmd"
), "s5cmd is not installed. Please install it with e.g. 'conda install s5cmd -c conda-forge'."
os.makedirs(target_dir, exist_ok=True)
command = ["s5cmd", "--no-sign-request", "cp", s3_path, str(target_dir)]
CONSOLE.log(f"Downloading {s3_path} to {str(target_dir)}")
subprocess.run(command, check=True)

def _check_exists(self, sequences: list[str]):
for seq in sequences:
if not (self.data / self.split / seq).exists():
return False
return True

def _check_data(self):
if self.location_aabb is not None:
if self.city == "PIT" and self.location_aabb == (6180, 1620, 6310, 1780):
# residential split
if not self._check_exists(RESIDENTIAL_SEQUENCE_IDS):
self._download_data("residential")
elif self.city == "PIT" and self.location_aabb == (1100, -50, 1220, 150):
# downtown split
if not self._check_exists(DOWNTOWN_SEQUENCE_IDS):
self._download_data("downtown")
else:
# any other split defined by location AABB
self._download_data(self.split)
else:
# if no location AABB is set, we need a log id in a given split
assert self.log_id is not None
if not self._check_exists([self.log_id]):
s3_path = f"s3://argoverse/datasets/av2/sensor/{self.split}/{self.log_id}/*"
self._download_s3(s3_path, self.data / self.split / self.log_id)

def main(self):
if av2 is None:
CONSOLE.log(
"AV2 API is not installed. Please install it with `pip install git+https://github.com/tobiasfshr/av2-api.git`.",
style="bold red",
)
return
self._check_data()
self.prepare_seq()

def prepare_seq(self, seq_name: str | None = None):
Expand Down

0 comments on commit 9656b4c

Please sign in to comment.