PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks

A Meta FAIR Release

A Large-scale Benchmark for Human-Robot Collaboration

PARTNR comprises 100,000 natural language tasks, designed to study multi-agent reasoning and planning.

Semi-automated Task and Evaluation Function Generation

PARTNR utilizes LLMs to generate tasks at scale, incorporating simulation-in-the-loop to ground the LLMs and reducing errors.

Diverse Reasoning Tasks

PARTNR tasks exhibit characteristics of everyday tasks, such as spatial, temporal, and heterogeneous agent capability constraints.

Human-in-the-loop Demonstration and Evaluation

PARTNR enables evaluation of AI agents with real human partners, through a human-in-the-loop infrastructure.

Challenging tasks for SoTA planners

Our analysis reveals significant limitations in SoTA LLM-based planners, such as poor coordination and failures in task tracking and recovery from errors. While people solve 93% of tasks, LLMs solve only 30% of tasks.

PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks

[alphabetical order] Matthew Chang, Gunjan Chhablani, Alexander Clegg, Mikael Dallaire Cote, Ruta Desai, Michal Hlavac, Vladimir Karashchuk, Jacob Krantz, Roozbeh Mottaghi, Priyam Parashar, Siddharth Patki, Ishita Prasad, Xavier Puig, Akshara Rai, Ram Ramrakhya, Daniel Tran, Joanne Truong, John M. Turner, Eric Undersander, Tsung-Yen Yang