MMDetection3D Workflow Guide: KITTI Data Prep, Testing, Training, and Custom Model Development

2025-11-27

MMDetection3D is an open-source 3D object detection toolbox built on PyTorch and designed as a next-generation platform for 3D detection. If your main goal is to build and iterate on AI models with mmdet3d, the practical workflow usually comes down to four parts: preparing data, testing pretrained models, training your own models, and extending the framework with custom components.

The official documentation is generally solid and worth checking when something behaves unexpectedly. That said, some multimodal examples can still be troublesome in practice, so a few commands below come with notes on what tends to break and what to use instead.

Data preparation for KITTI

For 3D detection on KITTI, start by extracting the dataset and arranging the files in the expected directory structure:

mmdetection3d ├── data | ├── kitti | | ├── ImageSets | | ├── testing | │ │ ├── calib | │ │ ├── image_2 | │ │ ├── velodyne | | ├── training | │ │ ├── calib | │ │ ├── image_2 | │ │ ├── velodyne | │ │ ├── label_2

</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> </tr> </tbody> </table>

Once the folders are in place, create the split files and generate the dataset metadata:

mkdir ./data/kitti/ && mkdir ./data/kitti/ImageSets # 下载数据划分文件 wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/test.txt wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/train.txt wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/val.txt wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/trainval.txt python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti

</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> </tr> </tbody> </table>

This step prepares the split definitions and runs tools/create_data.py so MMDetection3D can build the files it needs for training and evaluation.

Running model evaluation and inference

Model testing supports both single-GPU and multi-GPU execution:

# 单块显卡测试 python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] [--show] [--show-dir ${SHOW_DIR}] # 多块显卡测试 ./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]

</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> </tr> </tbody> </table>

A few options matter more than the rest:

RESULT_FILE: filename for saving output results in pickle format. If you do not set it, results are not written to disk.
EVAL_METRICS: evaluation metric depends on the dataset and task.
For detection on nuScenes, Lyft, ScanNet, and SUNRGBD, mAP is typically enough.
For KITTI, if you only want to evaluate 2D detection, use img_bbox.
For Waymo, two evaluation styles are available: KITTI-style (kitti) and the official Waymo metric (waymo). The official metric is the recommended choice because it is more stable and allows fairer comparison.
For segmentation tasks on datasets such as S3DIS and ScanNet, use mIoU.
--show: saves visualization outputs in silent mode for debugging and inspection. This only works in single-GPU testing and is usually paired with --show-dir.
--show-dir: writes visualization results to a target folder as ***_points.obj and ***_pred.obj. This is also single-GPU only. A graphical interface is not required for this option.

Example:

CONFIG_FILE="configs/mvxnet/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class.py" CHECKPOINT_FILE="checkpoints/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class_20200621_003904-10140f2d.pth" RESULT_FILE="assets/result.pickle" EVAL_METRICS="img_bbox" SHOW_DIR="assets/kitti_pred/" python \ tools/test.py \ ${CONFIG_FILE} \ ${CHECKPOINT_FILE} \ --eval ${EVAL_METRICS} \ --show \ --show-dir ${SHOW_DIR}

</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> </tr> </tbody> </table>

A note on visualization

The image display interface is still not especially stable and often needs extra debugging. In particular, adding --show for multimodal detection box rendering can trigger errors. If you specifically need rendered detection images, demo/multi_modality_demo.py is the more reliable route.

Training a model

Training follows the same single-GPU / distributed pattern:

# 使用单块显卡进行训练 python tools/train.py ${CONFIG_FILE} [optional arguments] # 使用多块显卡进行训练 ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]

</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> </tr> </tbody> </table>

Useful optional arguments include:

--no-validate (not recommended): by default, validation runs every k epochs during training, with the default being 1 unless changed in the config. This flag disables validation entirely.
--work-dir ${WORK_DIR}: overrides the working directory defined in the config file.
--resume-from ${CHECKPOINT_FILE}: resumes training from an existing checkpoint.
--options 'Key=value': overrides selected settings from the config without editing the file directly.

Minimal example:

CONFIG_FILE="configs/mvxnet/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class.py" python \ tools/train.py \ ${CONFIG_FILE}

</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> </tr> </tbody> </table>

How model components are organized

When developing a new model inside MMDetection3D, the framework typically splits the architecture into six component types:

Encoder: modules used before the backbone, including voxel-based stages such as voxel layer, voxel encoder, and middle encoder. Examples include HardVFE and PointPillarsScatter.
Backbone: the main feature extractor, often FCN-style, such as ResNet or SECOND.
Neck: sits between backbone and head, for example FPN or SECONDFPN.
Head: task-specific prediction modules, such as bounding box prediction or mask prediction heads.
RoI extractor: modules that extract RoI features from feature maps, such as H3DRoIHead and PartAggregationROIHead.
Loss: the loss functions used inside heads, including FocalLoss, L1Loss, and GHMLoss.

To extend the framework, create the corresponding module under mmdetection3d/mmdet3d/models/ in the appropriate location, then register and configure it through the config file.

Working with configuration files

A large part of everyday development in MMDetection3D is simply understanding and editing configuration files. That includes learning how inheritance works, how data pipelines are defined, and how models such as MVXNet express multimodal fusion through config structure.

In practice, most experiments are controlled through configuration changes rather than direct edits to training scripts.

Useful dataset browsing tools

MMDetection3D also provides helper scripts for inspecting loaded samples and ground-truth annotations:

# 显示载入的数据和真值标签 python tools/misc/browse_dataset.py ${CONFIG_FILE} --task ${TASK} --output-dir ${OUTPUT_DIR} [--online]

</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> </tr> </tbody> </table>

Optional argument:

online: displays results interactively in real time. This requires a graphical environment and open3d==0.9.0.0.

Example:

# 显示 2D 图像以及投影的 3D 边界框，多模态 MVXNET TASK="multi_modality-det" CONFIG_FILE="configs/mvxnet/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class.py" OUTPUT_DIR="assets/kitti_true" python \ tools/misc/browse_dataset.py \ ${CONFIG_FILE} \ --task ${TASK} \ --output-dir ${OUTPUT_DIR}

</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> </tr> </tbody> </table>

Important fix for the official MVXNet config

The official MVXNet configuration misses the bounding-box information required for this browsing workflow. You need to modify eval_pipeline as follows:

# 将配置中的 eval_pipeline 中的 dict(type='Collect3D', keys=['points', 'img']) # 增加 'gt_bboxes_3d'。 dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'img'])

</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> </tr> </tbody> </table>

Without this change, the visualization pipeline will not include the 3D ground-truth boxes it needs.

GeekShared

MMDetection3D Workflow Guide: KITTI Data Prep, Testing, Training, and Custom Model Development

Data preparation for KITTI

Running model evaluation and inference

A note on visualization

Training a model

How model components are organized

Working with configuration files

Useful dataset browsing tools

Important fix for the official MVXNet config

Popular Posts

GeekShared

MMDetection3D Workflow Guide: KITTI Data Prep, Testing, Training, and Custom Model Development

Data preparation for KITTI

Running model evaluation and inference

A note on visualization

Training a model

How model components are organized

Working with configuration files

Useful dataset browsing tools

Important fix for the official MVXNet config

Related Posts

How to Reduce V2board Domain Blocking with a Split Frontend and API Setup

A Quick Late-Night Dive Into Dune: Sandworms, Spice, Blue Eyes, and an Ancient Future

Why We End Up Loving Someone Who Looks Like the One We Lost

Popular Posts