ActLoc: Learning to Localize on the Move via Active Viewpoint Selection

CoRL 2025

^* equal contribution

¹ETH Zürich, ²Sapienza University of Rome, ³University of Bonn, ⁴Microsoft

Abstract

Reliable localization is critical for robot navigation, yet most existing systems implicitly assume that all viewing directions at a location are equally informative. In practice, localization becomes unreliable when the robot observes unmapped, ambiguous, or uninformative regions. To address this, we present ActLoc, an active viewpoint-aware planning framework for enhancing localization accuracy for general robot navigation tasks. At its core, ActLoc employs a large-scale trained attention-based model for viewpoint selection. The model encodes a metric map and the camera poses used during map construction, and predicts localization accuracy across yaw and pitch directions at arbitrary 3D locations. These per-point accuracy distributions are incorporated into a path planner, enabling the robot to actively select camera orientations that maximize localization robustness while respecting task and motion constraints. ActLoc achieves state-of-the-art results on single-viewpoint selection and generalizes effectively to full-trajectory planning. Its modular design makes it readily applicable to diverse robot navigation and inspection tasks.

Method

ActLoc takes as input 3D landmarks and reconstructed camera poses from Structure-from-Motion. At each waypoint, the inputs are transformed into an egocentric frame and locally cropped to focus on nearby geometry. The two modalities are independently encoded with self-attention and then fused through bidirectional cross-attention. The network outputs a LocMap, a yaw-pitch grid of localization performance distributions. During planning, the LocMap is combined with a smoothness term to form a mixed cost map. The planner selects viewpoints along the waypoint sequence that maximize localization reliability while keeping the motion continuous.

BibTeX

@misc{li2025actloclearninglocalizeactive, title={ActLoc: Learning to Localize on the Move via Active Viewpoint Selection}, author={Jiajie Li and Boyang Sun and Luca Di Giammarino and Hermann Blum and Marc Pollefeys}, year={2025}, eprint={2508.20981}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2508.20981}, }