NeurIPS 2023 HomeRobot: Open Vocabulary Mobile Manipulation (OVMM) Challenge
![]() |
Coming Soon! Hi, you’ve found this page early. Assets will be released in mid-June at the launch of the competition, but if you have a Hello-Robot Stretch, you can begin exploring the HomeRobot GitHub now! |
Overview

The objective of the HomeRobot: Open Vocabulary Mobile Manipulation (OVMM) Challenge is to create a platform that enables researchers to develop agents that can navigate unfamiliar environments, manipulate novel objects, and move away from closed object classes towards open-vocabulary natural language. This challenge aims to facilitate cross-cutting research in embodied AI using recent advances in machine learning, computer vision, natural language, and robotics.
HomeRobot OVMM challenge provides:
- a realistic goal description with challenging multi-step tasks
- multi-room interactive environments
- a large and diverse object asset library, in support of an open-vocabulary mobile-manipulation task
- continuous action space – both navigation + manipulation
- real-world counterpart benchmark
Hello-Robot is sponsoring the challenge so the winner will receive their own Stretch robot (a $25k value)!
Data
We have designed 3D simulated homes in Habitat. These environment are realistically complex, and cluttered with articulated objects, receptacles, and a large array of assets. Hello-Robot Stretch can then be trained and evaluated in simulation with the same control stack as on physical hardware. Agents that can accurately navigate around, search for, and manipulate objects in simulation will be transferred to the real world benchmark. No static data or human traces are provided. Participants are encouraged to utilize the rich simulated environment for fast iteration and training of reinforcement learning-based policies.

Environments
The evaluation is in two phases: Stage 1 is simulation, while Stage 2 is run on our physical hardware in FAIR’s mock apartment space in Fremont, CA. The challenge is setup such that APIs and therefore models can be seamlessly shared between the simulation benchmark and physical hardware. Our goal, is in part, to enable researchers without access to hardware to evaluate their approaches in the real world.
Stage 1: Simulation 1
Below are example scenes from the simulation benchmark. Participants do not require a robot to compete, instead performing entirely in simulation. We will then take the best performing simulation systems and run them on our held-out physical hardware environment (see blow)
![]() |
![]() |
The environments all contain multiple rooms with varying layounts. An example top-down layout is presented on the right. For the examples here, we have removed small objects and clutter that will be present in the actual evaluation. |
![]() |
Stage 2: Physical Robot 2
Once systems have been tested in simulation, we will transfer the most competitive entries to a physical robot. Below you can see part of the physical robot evaluation environment hosted by Meta in Fremont California. The unseen evaluation settings will change or introduce furniture, object locations, and object types (not pictured).

The software stack is designed to be easy to use and has therefore also been tested in university labs. We invite you to try the stack even if you do not plan to submit to the challenge. To help advance robotics research and fair benchmarking, our stack is designed to quickly on-board new researchers into robotics research. Everything is written in python with no knowledge of ROS required. The full open-source software stack is available at github.com/facebookresearch/home-robot.
Metrics
Participating entries will be evaluated using the following three metrics – both in simulation for the automatic leaderboard and on the physical robot:
- Rearrangement Progress A trial is successful if, at the end of the trial, the specified object is anywhere on top of a target receptacle of the correct category. Anywhere on top of the surface is acceptable. In addition, we report success for each of the five individual sub-tasks: (1) finding the start receptacle, (2) finding the target object on a receptacle, (3) grasping the object, (4) finding the goal receptacle, and (5) placing the object on the goal receptacle (full success), incremental partial credit is awarded for incomplete tasks provided for incomplete tasks. If at any point the real robot collides with scene geometry, the task fails immediately.
- Time Taken The amount of time (in seconds) needed to solve the episode. This will be used as a tie-breaker, especially in the case of real-world evaluations with the same score.
The above metrics will be averaged over the evaluation episodes, and standard errors will be computed across episodes to produce confidence bounds when comparing participant submissions. We will also provide the participants additional analytics such as the number of interactions with objects.
We reserve the right to use additional metrics to choose winners in case of statistically insignificant differences.
Participation Guidelines
Stage 1: Simulation 1
We will use EvalAI 3 to host the challenge (link coming soon!). Participants should register for the competition on the EvalAI challenge page and create a team. Participants need to upload docker containers with their agents.
The challenge will consist of the following phases/splits that allow participants to organize their work:
- Minival phase: The purpose of this phase is sanity checking — to confirm that remote evaluation reports the same result as local evaluation. Each team is allowed up to 100 submissions per day. We will disqualify teams that spam the servers.
- Test standard phase: The purpose of this phase/split is to serve as the public leaderboard establishing the state of the art. This is what should be used to report results in papers. Each team is allowed up to 10 submissions per day, to be used judiciously.
- Test challenge phase: This split will be used to decide challenge teams who will proceed to Stage 2 Evaluation. Each team is allowed a total of 5 submissions until the end of challenge submission phase. The highest performing of these 5 will be automatically chosen.
The agents will be evaluated on an AWS GPU-enabled instance. Agents will be evaluated on 1000 episodes and will have a total available time of 48 hours to finish each run. Submissions will be evaluated on AWS EC2 p2.xlarge instance which has a Tesla K80 GPU (12 GB Memory), 4 CPU cores, and 61 GB RAM.
Note: Before pushing the submissions for remote evaluation, participants should test the their submission docker locally to make sure it is working.
Stage 2: Physical Robot 2
A full open-source control stack will be released by Meta AI (the github discussed above is currently in beta), in collaboration with Hello Robot, for the hardware challenge. Once systems have been successfully tested and ranked in simulation, we will evaluate top-3 teams on a physical robot. The final evaluation scenario is a fully held-out apartment with novel objects. All participants’ code will be run with the same default calibrations for fair comparison. We will reset the scene to several predefined configurations. Robots will be run repeatedly on each scene to allow for demonstration of repeatability and statistical significance testing of the results.
Dates
Challenge Starts | June 19, 2023 |
Simulator Leaderboard opens | July 1, 2023 |
Leaderboard closes, Eval on physical robots | Oct 10, 2023 |
Evaluation results examined | Oct 20, 2023 |
Winners announced and invited to contribute | Oct 30, 2023 |
Presentation at NeurIPS 2023 | Dec, 2023 |
Citing HomeRobot OVMM Challenge 2023
@misc{homerobotovmmchallenge2023, title = HomeRobot Open Vocab Mobile Manipulation Challenge 2023, author = {Sriram Yenamandra and Arun Ramachandran and Mukul Khanna and Karmesh Yadav and Devendra Singh Chaplot and Gunjan Chhablani and Alexander Clegg and Theophile Gervet and Vidhi Jain and Ruslan Partsey and Ram Ramrakhya and Andrew Szot and Tsung-Yen Yang and Aaron Edsinger and Charlie Kemp and Binit Shah and Zsolt Kira and Dhruv Batra and Roozbeh Mottaghi and Yonatan Bisk and and Chris Paxton}, howpublished = {\url{https://aihabitat.org/challenge/2023_homerobot_ovmm/}}, year = {2023} }
Acknowledgments
The HomeRobot OVMM would not have been possible without the infrastructure and support of Hello Robot 2 and EvalAI 3 team.
References
- 1.
- ^ a b Habitat: A Platform for Embodied AI Research. Manolis Savva*, Abhishek Kadian*, Oleksandr Maksymets*, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra. ICCV, 2019.
- 2.
- ^ a b c The Design of Stretch: A compact, lightweight mobile manipulator for indoor human environments. Charles C. Kemp, Aaron Edsinger, Henry M. Clever, and Blaine Matulevich. ICRA, 2022.
- 3.
- ^ a b Eval AI: Towards Better Evaluation Systems for AI Agents. Deshraj Yadav, Rishabh Jain, Harsh Agrawal, Prithvijit Chattopadhyay, Taranjeet Singh, Akash Jain, Shiv Baran Singh, Stefan Lee, Dhruv Batra. SOSP, 2019.
- 4.
- ^ Benchmarking in Manipulation Research: The YCB Object and Model Set and Benchmarking Protocols. Berk Calli, Aaron Walsman, Arjun Singh, Siddhartha Srinivasa, Pieter Abbeel, Aaron M. Dollar. Robotics and Automation Magazine (RAM), 2015.
Organizer and Sponsor
![]() |
![]() |
![]() |
![]() |