TY - GEN
T1 - A Landmark-Aware Visual Navigation Dataset for Map Representation Learning
AU - Johnson, Faith
AU - Dana, Kristin
AU - Cao, Bryan Bo
AU - Jain, Shubham
AU - Ashok, Ashwin
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Map representations learned by expert demonstrations have shown promising research value. However, the field of visual navigation still faces challenges due to the lack of real-world human-navigation datasets that can support efficient, supervised, representation learning of environments. We present a Landmark-Aware Visual Navigation (LAVN) dataset to allow for supervised learning of human-centric exploration policies and map building. We collect RGBD observation and human point-click pairs as a human annotator explores virtual and real-world environments with the goal of full coverage exploration of the space. The human annotators also provide distinct landmark examples along each trajectory, which we intuit will simplify the task of map or graph building and localization. These human point-clicks serve as direct supervision for waypoint prediction when learning to explore in environments. Our dataset covers a wide spectrum of scenes, including rooms in indoor environments, as well as walkways outdoors. We releaseour dataset with detailed documentation at https://huggingface.co/datasets/visnavdataset/lavn (DOI: l0.57967/hf/2386) and a plan for long-term preservation.
AB - Map representations learned by expert demonstrations have shown promising research value. However, the field of visual navigation still faces challenges due to the lack of real-world human-navigation datasets that can support efficient, supervised, representation learning of environments. We present a Landmark-Aware Visual Navigation (LAVN) dataset to allow for supervised learning of human-centric exploration policies and map building. We collect RGBD observation and human point-click pairs as a human annotator explores virtual and real-world environments with the goal of full coverage exploration of the space. The human annotators also provide distinct landmark examples along each trajectory, which we intuit will simplify the task of map or graph building and localization. These human point-clicks serve as direct supervision for waypoint prediction when learning to explore in environments. Our dataset covers a wide spectrum of scenes, including rooms in indoor environments, as well as walkways outdoors. We releaseour dataset with detailed documentation at https://huggingface.co/datasets/visnavdataset/lavn (DOI: l0.57967/hf/2386) and a plan for long-term preservation.
KW - Dataset
KW - Gaze Behavior Generation
KW - Graph Representation
KW - Human-in-the-Loop
KW - Implicit Behavior Cloning
KW - Landmark
KW - Map Representation
KW - Visual Navigation
UR - https://www.scopus.com/pages/publications/105004877682
U2 - 10.1109/HRI61500.2025.10974038
DO - 10.1109/HRI61500.2025.10974038
M3 - Conference contribution
AN - SCOPUS:105004877682
T3 - ACM/IEEE International Conference on Human-Robot Interaction
SP - 1026
EP - 1031
BT - HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction
PB - IEEE Computer Society
T2 - 20th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2025
Y2 - 4 March 2025 through 6 March 2025
ER -