MMDetection3D Workflow Guide: KITTI Data Prep, Testing, Training, and Custom Model Development
MMDetection3D is an open-source 3D object detection toolbox built on PyTorch and designed as a next-generation platform for 3D detection. If your main goal is to build and iterate on AI models with mmdet3d, the practical workflow usually comes down to four parts: preparing data, testing pretrained models, training your own models, and extending the framework with custom components.
The official documentation is generally solid and worth checking when something behaves unexpectedly. That said, some multimodal examples can still be troublesome in practice, so a few commands below come with notes on what tends to break and what to use instead.
Data preparation for KITTI
For 3D detection on KITTI, start by extracting the dataset and arranging the files in the expected directory structure:
<table> <thead> <tr> <th>1 2 3 4 5 6 7 8 9 10 11 12 13</th>
<th>mmdetection3d ├── data | ├── kitti | | ├── ImageSets | | ├── testing | │ │ ├── calib | │ │ ├── image_2 | │ │ ├── velodyne | | ├── training | │ │ ├── calib | │ │ ├── image_2 | │ │ ├── velodyne | │ │ ├── label_2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Once the folders are in place, create the split files and generate the dataset metadata:
<table> <thead> <tr> <th>1 2 3 4 5 6 7 8 9</th>
<th>mkdir ./data/kitti/ && mkdir ./data/kitti/ImageSets # 下载数据划分文件 wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/test.txt wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/train.txt wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/val.txt wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/trainval.txt python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
This step prepares the split definitions and runs tools/create_data.py so MMDetection3D can build the files it needs for training and evaluation.
Running model evaluation and inference
Model testing supports both single-GPU and multi-GPU execution:
<table> <thead> <tr> <th>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17</th>
<th># 单块显卡测试 python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] [--show] [--show-dir ${SHOW_DIR}] # 多块显卡测试 ./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
A few options matter more than the rest:
RESULT_FILE: filename for saving output results in pickle format. If you do not set it, results are not written to disk.EVAL_METRICS: evaluation metric depends on the dataset and task.- For detection on nuScenes, Lyft, ScanNet, and SUNRGBD,
mAPis typically enough. - For KITTI, if you only want to evaluate 2D detection, use
img_bbox. - For Waymo, two evaluation styles are available: KITTI-style (
kitti) and the official Waymo metric (waymo). The official metric is the recommended choice because it is more stable and allows fairer comparison. - For segmentation tasks on datasets such as S3DIS and ScanNet, use
mIoU. --show: saves visualization outputs in silent mode for debugging and inspection. This only works in single-GPU testing and is usually paired with--show-dir.--show-dir: writes visualization results to a target folder as***_points.objand***_pred.obj. This is also single-GPU only. A graphical interface is not required for this option.
Example:
<table> <thead> <tr> <th>1 2 3 4 5 6 7 8 9 10 11 12 13</th>
<th>CONFIG_FILE="configs/mvxnet/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class.py" CHECKPOINT_FILE="checkpoints/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class_20200621_003904-10140f2d.pth" RESULT_FILE="assets/result.pickle" EVAL_METRICS="img_bbox" SHOW_DIR="assets/kitti_pred/" python \ tools/test.py \ ${CONFIG_FILE} \ ${CHECKPOINT_FILE} \ --eval ${EVAL_METRICS} \ --show \ --show-dir ${SHOW_DIR}</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
A note on visualization
The image display interface is still not especially stable and often needs extra debugging. In particular, adding --show for multimodal detection box rendering can trigger errors. If you specifically need rendered detection images, demo/multi_modality_demo.py is the more reliable route.
Training a model
Training follows the same single-GPU / distributed pattern:
<table> <thead> <tr> <th>1 2 3 4 5 6 7 8 9 10 11</th>
<th># 使用单块显卡进行训练 python tools/train.py ${CONFIG_FILE} [optional arguments] # 使用多块显卡进行训练 ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Useful optional arguments include:
--no-validate(not recommended): by default, validation runs everykepochs during training, with the default being 1 unless changed in the config. This flag disables validation entirely.--work-dir ${WORK_DIR}: overrides the working directory defined in the config file.--resume-from ${CHECKPOINT_FILE}: resumes training from an existing checkpoint.--options 'Key=value': overrides selected settings from the config without editing the file directly.
Minimal example:
<table> <thead> <tr> <th>1 2 3 4 5</th>
<th>CONFIG_FILE="configs/mvxnet/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class.py" python \ tools/train.py \ ${CONFIG_FILE}</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
How model components are organized
When developing a new model inside MMDetection3D, the framework typically splits the architecture into six component types:
- Encoder: modules used before the backbone, including voxel-based stages such as voxel layer, voxel encoder, and middle encoder. Examples include
HardVFEandPointPillarsScatter. - Backbone: the main feature extractor, often FCN-style, such as
ResNetorSECOND. - Neck: sits between backbone and head, for example
FPNorSECONDFPN. - Head: task-specific prediction modules, such as bounding box prediction or mask prediction heads.
- RoI extractor: modules that extract RoI features from feature maps, such as
H3DRoIHeadandPartAggregationROIHead. - Loss: the loss functions used inside heads, including
FocalLoss,L1Loss, andGHMLoss.
To extend the framework, create the corresponding module under mmdetection3d/mmdet3d/models/ in the appropriate location, then register and configure it through the config file.
Working with configuration files
A large part of everyday development in MMDetection3D is simply understanding and editing configuration files. That includes learning how inheritance works, how data pipelines are defined, and how models such as MVXNet express multimodal fusion through config structure.
In practice, most experiments are controlled through configuration changes rather than direct edits to training scripts.
Useful dataset browsing tools
MMDetection3D also provides helper scripts for inspecting loaded samples and ground-truth annotations:
<table> <thead> <tr> <th>1 2 3 4 5 6 7</th>
<th># 显示载入的数据和真值标签 python tools/misc/browse_dataset.py ${CONFIG_FILE} --task ${TASK} --output-dir ${OUTPUT_DIR} [--online]</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Optional argument:
online: displays results interactively in real time. This requires a graphical environment andopen3d==0.9.0.0.
Example:
<table> <thead> <tr> <th>1 2 3 4 5 6 7 8 9 10 11</th>
<th># 显示 2D 图像以及投影的 3D 边界框,多模态 MVXNET TASK="multi_modality-det" CONFIG_FILE="configs/mvxnet/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class.py" OUTPUT_DIR="assets/kitti_true" python \ tools/misc/browse_dataset.py \ ${CONFIG_FILE} \ --task ${TASK} \ --output-dir ${OUTPUT_DIR}</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Important fix for the official MVXNet config
The official MVXNet configuration misses the bounding-box information required for this browsing workflow. You need to modify eval_pipeline as follows:
1 2 3 4 5</th>
<th># 将配置中的 eval_pipeline 中的 dict(type='Collect3D', keys=['points', 'img']) # 增加 'gt_bboxes_3d'。 dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'img'])</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Without this change, the visualization pipeline will not include the 3D ground-truth boxes it needs.
