Habitat Challenge 2021

Overview

In 2021, we are continuing to host challenges on two embodied navigation tasks in Habitat:

  1. PointNav (‘Go 5m north, 3m west relative to start’)
  2. ObjectNav (‘find a chair’).

Task #1: PointNav focuses on realism and sim2real predictivity (the ability to predict the performance of a nav-model on a real robot from its performance in simulation).

Task #2: ObjectNav focuses on egocentric object/scene recognition and a commonsense understanding of object semantics (where is a fireplace typically located in a house?).

For details on how to participate, submit and train agents refer to github.com/habitat-challenge repository.

Task 1: PointNav

In PointNav, an agent is spawned at a random starting position and orientation in an unseen environment and asked to navigate to target coordinates specified relative to the agent’s start location (‘Go 5m north, 3m west relative to start’). No ground-truth map is available and the agent must only use its sensory input (an RGB-D camera) to navigate.

Dataset

We use Gibson 3D scenes for the challenge. As in the 2020 Habitat challenge, we use the Gibson dataset’s splits, retaining the train and val sets, and separating the test set into test-standard and test-challenge. The train and val scenes are provided to participants. The test scenes are used for the official challenge evaluation and are not provided to participants.

Evaluation

After calling the STOP action, the agent is evaluated using the ‘Success weighted by Path Length’ (SPL) metric2.

\begin{array}{rcl} \text{SPL} & = & \cfrac{1}{N} \displaystyle \sum_{i=1}^N S_i \cfrac{l_i}{\max(p_i, l_i)} \\[15pt] \text{where}, l_i & = & \text{length of shortest path between goal and target for an episode} \\ p_i & = & \text{length of path taken by agent in an episode} \\ S_i & = & \text{binary indicator of success in episode} ~ i \end{array}

An episode is deemed successful if on calling the STOP action, the agent is within 0.36m (2x agent-radius) of the goal position.

Task 2: ObjectNav

In ObjectNav, an agent is initialized at a random starting position and orientation in an unseen environment and asked to find an instance of an object category (‘find a chair’) by navigating to it. A map of the environment is not provided and the agent must only use its sensory input to navigate.

The agent is equipped with an RGB-D camera and a (noiseless) GPS+Compass sensor. GPS+Compass sensor provides the agent’s current location and orientation information relative to the start of the episode. We attempt to match the camera specification (field of view, resolution) in simulation to the Azure Kinect camera, but this task does not involve any injected sensing noise.

Dataset

We use 90 of the Matterport3D scenes (MP3D) with the standard splits of train/val/test as prescribed by Anderson et al.2. MP3D contains 40 annotated categories. We hand-select a subset of 21 by excluding categories that are not visually well defined (like doorways or windows) and architectural elements (like walls, floors, and ceilings).

Evaluation

Similar to 2020 Habitat Challenge, we generalize the PointNav evaluation protocol used by 123 to ObjectNav. At a high-level, we measure performance along the same two axes:

  • Success: Did the agent navigate to an instance of the goal object? (Notice: any instance, regardless of distance from starting location.)
  • Efficiency: How efficient was the agent’s path compared to an optimal path? (Notice: optimal path = shortest path from the agent’s starting position to the closest instance of the target object category.)

Concretely, an episode is deemed successful if on calling the STOP action, the agent is within 1.0m Euclidean distance from any instance of the target object category AND the object can be viewed by an oracle from that stopping position by turning the agent or looking up/down. Notice: we do NOT require the agent to be actually viewing the object at the stopping location, simply that the such oracle-visibility is possible without moving. Why? Because we want participants to focus on navigation not object framing. In Embodied AI’s larger goal, the agent is navigating to an object instance to interact with it (say point at or manipulate an object). Oracle-visibility is our proxy for ‘the agent is close enough to interact with the object’.

ObjectNav-SPL is defined analogous to PointNav-SPL. The only key difference is that the shortest path is computed to the object instance closest to the agent start location. Thus, if an agent spawns very close to ‘chair1’ but stops at a distant ‘chair2’, it will be achieve 100% success (because it found a ‘chair’) but a fairly low SPL (because the agent path is much longer compared to the oracle path).

New in 2021

The results of Habitat Challenge 2020 indicate that these benchmarks are far from being solved or stagnated. Thus, the task specifications remained unchanged except for the agent’s camera’s tilt angle for the PointNav task. The agent can observe the area in front of it as the agent’s camera has now tilted.

We reserve the right to use additional metrics to choose winners in case of statistically insignificant SPL differences.

Participation Guidelines

Participate in the contest by registering on the EvalAI challenge page and creating a team. Participants will upload docker containers with their agents that evaluated on a AWS GPU-enabled instance. Before pushing the submissions for remote evaluation, participants should test the submission docker locally to make sure it is working. Instructions for training, local evaluation, and online submission are provided below.

Valid challenge phases are habitat21-{pointnav, objectnav}-{minival, test-std, test-ch}.

The challenge consists of the following phases:

  1. Minival phase: This split is same as the one used in ./test_locally_{pointgoal, objectnav}_rgbd.sh. The purpose of this phase is sanity checking — to confirm that our remote evaluation reports the same result as the one you’re seeing locally. Each team is allowed maximum of 100 submissions per day for this phase, but please use them judiciously. We will block and disqualify teams that spam our servers.
  2. Test Standard phase: The purpose of this phase/split is to serve as the public leaderboard establishing the state of the art; this is what should be used to report results in papers. Each team is allowed maximum of 10 submissions per day for this phase, but again, please use them judiciously. Don’t overfit to the test set.
  3. Test Challenge phase: This split will be used to decide challenge winners. Each team is allowed total of 5 submissions until the end of challenge submission phase. The highest performing of these 5 will be automatically chosen. Results on this split will not be made public until the announcement of final results at the Embodied AI workshop at CVPR.

Note: Your agent will be evaluated on 1000-2000 episodes and will have a total available time of 48 hours to finish. Your submissions will be evaluated on AWS EC2 p2.xlarge instance which has a Tesla K80 GPU (12 GB Memory), 4 CPU cores, and 61 GB RAM. If you need more time/resources for evaluation of your submission please get in touch. If you face any issues or have questions you can ask them on the habitat-challenge forum (coming soon).

Citing Habitat Challenge 2021

Please cite the following paper for details about the 2021 PointNav challenge:

@inproceedings{habitat2020sim2real,
  title     =     {Sim2{R}eal {P}redictivity: {D}oes {E}valuation in {S}imulation {P}redict {R}eal-{W}orld {P}erformance?},
  author    =     {{Abhishek Kadian*} and {Joanne Truong*} and Aaron Gokaslan and Alexander Clegg and Erik Wijmans and Stefan Lee and Manolis Savva and Sonia Chernova and Dhruv Batra},
  journal   =   {IEEE Robotics and Automation Letters},
  year      =   {2020},
  volume    =   {5},
  number    =   {4},
  pages     =   {6670-6677},
}

Please cite the following paper for details about the 2021 ObjectNav challenge:

@inproceedings{batra2020objectnav,
  title     =     {Object{N}av {R}evisited: {O}n {E}valuation of {E}mbodied {A}gents {N}avigating to {O}bjects},
  author    =     {Dhruv Batra and Aaron Gokaslan and Aniruddha Kembhavi and Oleksandr Maksymets and Roozbeh Mottaghi and Manolis Savva and Alexander Toshev and Erik Wijmans},
  booktitle =     {arXiv:2006.13171},
  year      =     {2020}
}

Acknowledgments

The Habitat challenge would not have been possible without the infrastructure and support of EvalAI team. We also thank the work behind Gibson and Matterport3D datasets.

References

1.
^ Habitat: A Platform for Embodied AI Research. Manolis Savva*, Abhishek Kadian*, Oleksandr Maksymets*, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra. IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
2.
^ a b c On evaluation of embodied navigation agents. Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir. arXiv:1807.06757, 2018.
3.
^ Are We Making Real Progress in Simulated Environments? Measuring the Sim2Real Gap in Embodied Visual Navigation. Abhishek Kadian*, Joanne Truong*, Aaron Gokaslan, Alexander Clegg, Erik Wijmans, Stefan Lee, Manolis Savva, Sonia Chernova, Dhruv Batra. arXiv:1912.06321, 2019.

Prizes

The winning team for each of the PointNav and ObjectNav tracks will receive $5k Amazon Web Services Cloud credits. Thank you for the support AWS!

Dates

Challenge starts February 17, 2021
Leaderboard opens March 1, 2021
Challenge submission deadline May 31, 2021

Organizer

Facebook AI Research

Participation

For details on how to participate, submit , and train agents, refer to github.com/habitat-challenge repository.

Please note that the latest submission to the test-challenge split will be used for final evaluation.

Sponsors

Facebook AI Research
Amazon Web Services