Resilient Sensor Fusion under Adverse Sensor Failures via Multi-Modal Expert Fusion

1Seoul National University, 2Hanyang University, 3LG AI Research
*Indicates Equal Contribution Indicates Corresponding Author

CVPR 2025
MY ALT TEXT

MoME: Mixture of Multi-modal Experts is an efficient and robust LiDAR-camera 3D object detector designed for resilient sensor fusion under adverse sensor failure scenarios. MoME extends the Mixture of Experts (MoE) framework to explicitly handle sensor failures, achieving superior performance with negligible computational overhead compared to single-expert models.

Abstract

Modern autonomous driving perception systems utilize complementary multi-modal sensors, such as LiDAR and cameras. Although sensor fusion architectures enhance performance in challenging environments, they still suffer significant performance drops under severe sensor failures, such as LiDAR beam reduction, LiDAR drop, limited field of view, camera drop, and occlusion. This limitation stems from inter-modality dependencies in current sensor fusion frameworks. In this study, we introduce an efficient and robust LiDAR-camera 3D object detector, referred to as MoME, which can achieve robust performance through a mixture of experts approach. Our MoME fully decouples modality dependencies using three parallel expert decoders, which use camera features, LiDAR features, or a combination of both to decode object queries, respectively. We propose Multi-Expert Decoding (MED) framework, where each query is decoded selectively using one of three expert decoders. MoME utilizes an Adaptive Query Router (AQR) to select the most appropriate expert decoder for each query based on the quality of camera and LiDAR features. This ensures that each query is processed by the best-suited expert, resulting in robust performance across diverse sensor failure scenarios. We evaluated the performance of MoME on the nuScenes-R benchmark. Our MoME achieved state-of-the-art performance in extreme weather and sensor failure conditions, significantly outperforming the existing models across various sensor failure scenarios.

Qualitative Results

Quantitative Results

MY ALT TEXT

Comparison under various sensor failure scenarios on nuScenes-R dataset. Italic denotes the degree of sensor failure. Note that DETR3D uses only cameras, while CenterPoint uses only LiDAR. ’†’ indicates reproduced results using their open-source code. MoME achieves state-of-the-art performance across most tasks, significantly surpassing prior methods in the relative performance ratio R.

BibTeX

@article{park2025resilient,
        title={Resilient Sensor Fusion under Adverse Sensor Failures via Multi-Modal Expert Fusion},
        author={Park, Konyul and Kim, Yecheol and Kim, Daehun and Choi, Jun Won},
        journal={arXiv preprint arXiv:2503.19776},
        year={2025}
      }