Habitat Navigation Challenge 2023

Overview

In 2023, we are hosting the ObjectNav and ImageNav challenge in the Habitat simulator 1.

Task #1: ObjectNav focuses on egocentric object/scene recognition and a commonsense understanding of object semantics (where is a bed typically located in a house?).

Task #2: ImageNav focuses on visual reasoning and embodied instance disambiguation (is the particular chair I observe the same one depicted by the goal image?).

For details on how to participate, submit and train agents refer to github.com/habitat-challenge repository.

New in 2023

We are instantiating ObjectNav on a new version of the HM3D-Semantics dataset called HM3D-Semantics v0.2.
We are announcing the ImageNav track, also on the HM3D-Semantics v0.2 scene dataset.
We are introducing several changes in the agent config for easier sim-to-real transfer. We are using the HelloRobot Stretch robot configuration with support of continuous action space and updating the dataset such that all episodes can be navigated without traversing between floors.

In ObjectNav, an agent is initialized at a random starting position and orientation in an unseen environment and asked to find an instance of an object category (‘find a chair’) by navigating to it. A map of the environment is not provided and the agent must only use its sensory input to navigate.

The agent is modeled after the Hello Stretch robot and equipped with an RGB-D camera and a (noiseless) GPS+Compass sensor.

Dataset

The 2023 ObjectNav challenge uses 216 scenes from the Habitat-Matterport3D (HM3D) Semantics v0.2 2 dataset with train/val/test splits on 145/36/35. Following Chaplot et al. 3, we use 6 object goal categories: chair, couch, potted plant, bed, toilet and tv. All episodes can be navigated without traversing between floors.

Task 2: ImageNav

In ImageNav, an agent is initialized at a random start pose in an unseen environment and given an RGB goal image. We adopt the InstanceImageNav 4 task definition where the goal image depicts a particular object instance and the agent is asked to navigate to that object. The goal camera is disentangled from the agent’s camera; sampled parameters such as height, look-at-angle, and field-of-view reflect the realistic use case of a user-supplied goal image.

The agent is modeled after the Hello Stretch robot and equipped with an RGB-D camera and a (noiseless) GPS+Compass sensor.

Dataset

The 2023 ImageNav challenge uses 216 scenes from the Habitat-Matterport3D (HM3D) Semantics v0.2 2 dataset with train/val/test splits on 145/36/35. Following Krantz et al. 4, we sample goal images depicting object instances belonging to the same 6 goal categories used in the ObjectNav challenge: chair, couch, potted plant, bed, toilet, and tv. All episodes can be navigated without traversing between floors.

Evaluation

Similar to 2022 Habitat Challenge, we measure performance along the same two axes as specified by Anderson et al. 5:

Success: Did the agent navigate to an instance of the goal object? (Notice: any instance, regardless of distance from starting location.)

Concretely, an episode is deemed successful if on outputting a STOP command (a separate action with value greater than 0 when using velocity control), the agent is within 1.0m Euclidean distance from any instance of the target object category AND the object can be viewed by an oracle from that stopping position by turning the agent or looking up/down. Notice: we do NOT require the agent to be actually viewing the object at the stopping location, simply that the such oracle-visibility is possible without moving. Why? Because we want participants to focus on navigation not object framing. In Embodied AI’s larger goal, the agent is navigating to an object instance to interact with it (say point at or manipulate an object). Oracle-visibility is our proxy for ‘the agent is close enough to interact with the object’.

SPL: How efficient was the agent’s path compared to an optimal path? (Notice: optimal path = shortest path from the agent’s starting position to the closest instance of the target object category.)

After the episode ends, the agent is evaluated using the ‘Success weighted by Path Length’ (SPL) metric5.

ObjectNav-SPL is defined analogous to PointNav-SPL. The only key difference is that the shortest path is computed to the object instance closest to the agent start location. Thus, if an agent spawns very close to ‘chair1’ but stops at a distant ‘chair2’, it will be achieve 100% success (because it found a ‘chair’) but a fairly low SPL (because the agent path is much longer compared to the oracle path).

ImageNav-SPL is also defined analogous to PointNav-SPL. The only key difference is that the shortest path is computed to the image goal viewpoint closest to the agent start location.

We reserve the right to use additional metrics to choose winners in case of statistically insignificant SPL differences.

Participation Guidelines

Participate in the contest by registering on the EvalAI challenge page and creating a team. Participants will upload docker containers with their agents that evaluated on a AWS GPU-enabled instance. Before pushing the submissions for remote evaluation, participants should test the submission docker locally to make sure it is working. Instructions for training, local evaluation, and online submission are provided below.

Valid challenge phases are habitat-{objectnav,imagenav}-{minival, test-standard, test-challenge}-2023-{challenge_id}.

The challenge consists of the following phases:

Minival phase: This split is same as the one used in ./test_locally_{objectnav,imagenav}_rgbd.sh. The purpose of this phase is sanity checking — to confirm that our remote evaluation reports the same result as the one you’re seeing locally. Each team is allowed maximum of 100 submissions per day for this phase, but please use them judiciously. We will block and disqualify teams that spam our servers.
Test Standard phase: The purpose of this phase/split is to serve as the public leaderboard establishing the state of the art; this is what should be used to report results in papers. Each team is allowed maximum of 10 submissions per day for this phase, but again, please use them judiciously. Don’t overfit to the test set.
Test Challenge phase: This split will be used to decide challenge winners. Each team is allowed total of 5 submissions until the end of challenge submission phase. The highest performing of these 5 will be automatically chosen. Results on this split will not be made public until the announcement of final results at the Embodied AI workshop at CVPR.

Note: Your agent will be evaluated on 1000 episodes and will have a total available time of 48 hours to finish. Your submissions will be evaluated on AWS EC2 p2.xlarge instance which has a Tesla K80 GPU (12 GB Memory), 4 CPU cores, and 61 GB RAM. If you need more time/resources for evaluation of your submission please get in touch.

For more details on how to participate, submit , and train agents, refer to github.com/habitat-challenge repository.

Dates

Challenge starts	March 13, 2023
Leaderboard opens	March 20, 2023
Challenge submission deadline	May 31, 2023

Citing Habitat Challenge 2023

@misc{habitatchallenge2023,
  title         =     Habitat Challenge 2023,
  author        =     {Karmesh Yadav and Jacob Krantz and Ram Ramrakhya and Santhosh Kumar Ramakrishnan and Jimmy Yang and Austin Wang and John Turner and Aaron Gokaslan and Vincent-Pierre Berges and Roozbeh Mootaghi and Oleksandr Maksymets and Angel X Chang and Manolis Savva and Alexander Clegg and Devendra Singh Chaplot and Dhruv Batra},
  howpublished  =     {\url{https://aihabitat.org/challenge/2023/}},
  year          =     {2023}
}

Acknowledgments

The Habitat challenge would not have been possible without the infrastructure and support of EvalAI team. We also thank the team behind HM3D and HM3D-Semantics datasets.

References

1.: ^ Habitat: A Platform for Embodied AI Research. Manolis Savva*, Abhishek Kadian*, Oleksandr Maksymets*, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra. ICCV, 2019.
2.: ^ a b Habitat-Matterport 3D Semantics Dataset (HM3DSem). Karmesh Yadav*, Ram Ramrakhya*, Santhosh Kumar Ramakrishnan*, Theo Gervet, John Turner, Aaron Gokaslan, Noah Maestre, Angel Xuan Chang, Dhruv Batra, Manolis Savva, Alexander William Clegg^, Devendra Singh Chaplot^. arXiv:2210.05633, 2022.
3.: ^ Object Goal Navigation using Goal-Oriented Semantic Exploration. Devendra Singh Chaplot, Dhiraj Gandhi, Abhinav Gupta, Ruslan Salakhutdinov. NeurIPS, 2020.
4.: ^ a b Instance-Specific Image Goal Navigation: Training Embodied Agents to Find Object Instances. Jacob Krantz, Stefan Lee, Jitendra Malik, Dhruv Batra, Devendra Singh Chaplot. arxiv:2211.15876, 2022.
5.: ^ a b On evaluation of embodied navigation agents. Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir. arXiv:1807.06757, 2018.