Implements dataset functionality to be used habitat.EmbodiedTask. habitat.core.dataset abstracts over a collection of habitat.core.Episode. Each episode consists of a single instantiation of a habitat.Agent inside habitat.Env.

class habitat.core.dataset.Dataset[source]

Base class for dataset specification.


list of episodes containing instance information.

filter_episodes(filter_fn: Callable[habitat.core.dataset.Episode, bool]) → habitat.core.dataset.Dataset[source]

Returns a new dataset with only the filtered episodes from the original dataset.


filter_fn – function used to filter the episodes.


the new dataset.

from_json(json_str: str, scenes_dir: Optional[str] = None) → None[source]

Creates dataset from json_str. Directory containing relevant graphical assets of scenes is passed through scenes_dir.

  • json_str – JSON string containing episodes information.

  • scenes_dir – directory containing graphical assets relevant for episodes present in json_str.

get_episodes(indexes: List[int]) → List[T][source]

indexes – episode indices in dataset.


list of episodes corresponding to indexes.

get_scene_episodes(scene_id: str) → List[T][source]

scene_id – id of scene in scene dataset.


list of episodes for the scene_id.

get_splits(num_splits: int, episodes_per_split: Optional[int] = None, remove_unused_episodes: bool = False, collate_scene_ids: bool = True, sort_by_episode_id: bool = False, allow_uneven_splits: bool = False) → List[habitat.core.dataset.Dataset][source]

Returns a list of new datasets, each with a subset of the original episodes. All splits will have the same number of episodes, but no episodes will be duplicated.

  • num_splits – the number of splits to create.

  • episodes_per_split – if provided, each split will have up to this many episodes. If it is not provided, each dataset will have len(original_dataset.episodes) // num_splits episodes. If max_episodes_per_split is provided and is larger than this value, it will be capped to this value.

  • remove_unused_episodes – once the splits are created, the extra episodes will be destroyed from the original dataset. This saves memory for large datasets.

  • collate_scene_ids – if true, episodes with the same scene id are next to each other. This saves on overhead of switching between scenes, but means multiple sequential episodes will be related to each other because they will be in the same scene.

  • sort_by_episode_id – if true, sequences are sorted by their episode ID in the returned splits.

  • allow_uneven_splits – if true, the last split can be shorter than the others. This is especially useful for splitting over validation/test datasets in order to make sure that all episodes are copied but none are duplicated.


a list of new datasets, each with their own subset of episodes.

sample_episodes(num_episodes: int) → None[source]

Sample from existing episodes a list of episodes of size num_episodes, and replace self.episodes with the list of sampled episodes. :param num_episodes: number of episodes to sample, input -1 to use :param whole episodes:

property scene_ids

unique scene ids present in the dataset.

class habitat.core.dataset.Episode(*, episode_id: str = None, scene_id: str = None, start_position: List[float] = None, start_rotation: List[float] = None, info: Optional[Dict[str, str]] = None)[source]

Base class for episode specification that includes initial position and rotation of agent, scene id, episode. This information is provided by a Dataset instance.

  • episode_id – id of episode in the dataset, usually episode number.

  • scene_id – id of scene in dataset.

  • start_position – list of length 3 for cartesian coordinates (x, y, z).

  • start_rotation – list of length 4 for (x, y, z, w) elements of unit quaternion (versor) representing 3D agent orientation (https://en.wikipedia.org/wiki/Versor). The rotation specifying the agent’s orientation is relative to the world coordinate axes.