class
DatasetBase class for dataset specification.
Contents
Class methods
- def build_content_scenes_filter(config) -> typing.Callable[[T], bool]
- Returns a filter function that takes an episode and returns True if that episode is valid under the CONTENT_SCENES feild of the provided config
- def get_scenes_to_load(config: habitat.config.default.Config) -> typing.List[str]
- Returns a list of scene names that would be loaded with this dataset.
Static methods
- def scene_from_scene_path(scene_path: str) -> str
- Helper method to get the scene name from an episode.
Methods
- def filter_episodes(self, filter_fn: typing.Callable[[T], bool]) -> Dataset
- Returns a new dataset with only the filtered episodes from the original dataset.
- def from_json(self, json_str: str, scenes_dir: typing.Optional[str] = None) -> None
- Creates dataset from
json_str
. - def get_episode_iterator(self, *args: typing.Any, **kwargs: typing.Any) -> typing.Iterator
- Gets episode iterator with options. Options are specified in EpisodeIterator documentation.
- def get_episodes(self, indexes: typing.List[int]) -> typing.List[T]
- def get_scene_episodes(self, scene_id: str) -> typing.List[T]
- def get_splits(self, num_splits: int, episodes_per_split: typing.Optional[int] = None, remove_unused_episodes: bool = False, collate_scene_ids: bool = True, sort_by_episode_id: bool = False, allow_uneven_splits: bool = False) -> typing.List[Dataset]
- Returns a list of new datasets, each with a subset of the original episodes.
- def to_json(self) -> str
Special methods
- def __new__(cls, *args, **kwds)
Properties
- num_episodes: int get
- number of episodes in the dataset
- scene_ids: typing.List[str] get
- unique scene ids present in the dataset.
Data
- episodes: typing.List[T] = None
Method documentation
def habitat. core. dataset. Dataset. get_scenes_to_load(config: habitat.config.default.Config) -> typing.List[str] classmethod
Returns a list of scene names that would be loaded with this dataset.
Useful for determing what scenes to split up among different workers.
- param config
- The config for the dataset
- return
- A list of scene names that would be loaded with the dataset
def habitat. core. dataset. Dataset. scene_from_scene_path(scene_path: str) -> str staticmethod
Helper method to get the scene name from an episode.
Parameters | |
---|---|
scene_path | The path to the scene, assumes this is formatted
/path/to/<scene_name>.<ext> |
def habitat. core. dataset. Dataset. filter_episodes(self,
filter_fn: typing.Callable[[T], bool]) -> Dataset
Returns a new dataset with only the filtered episodes from the original dataset.
Parameters | |
---|---|
filter_fn | function used to filter the episodes. |
Returns | the new dataset. |
def habitat. core. dataset. Dataset. from_json(self,
json_str: str,
scenes_dir: typing.Optional[str] = None) -> None
Creates dataset from json_str
.
Parameters | |
---|---|
json_str | JSON string containing episodes information. |
scenes_dir | directory containing graphical assets relevant
for episodes present in json_str . |
Directory containing relevant graphical assets of scenes is passed
through scenes_dir
.
def habitat. core. dataset. Dataset. get_episode_iterator(self,
*args: typing.Any,
**kwargs: typing.Any) -> typing.Iterator
Gets episode iterator with options. Options are specified in EpisodeIterator documentation.
Parameters | |
---|---|
args | positional args for iterator constructor |
kwargs | keyword args for iterator constructor |
Returns | episode iterator with specified behavior |
To further customize iterator behavior for your Dataset subclass, create a customized iterator class like EpisodeIterator and override this method.
def habitat. core. dataset. Dataset. get_episodes(self,
indexes: typing.List[int]) -> typing.List[T]
Parameters | |
---|---|
indexes | episode indices in dataset. |
Returns | list of episodes corresponding to indexes. |
def habitat. core. dataset. Dataset. get_scene_episodes(self,
scene_id: str) -> typing.List[T]
Parameters | |
---|---|
scene_id | id of scene in scene dataset. |
Returns | list of episodes for the scene_id . |
def habitat. core. dataset. Dataset. get_splits(self,
num_splits: int,
episodes_per_split: typing.Optional[int] = None,
remove_unused_episodes: bool = False,
collate_scene_ids: bool = True,
sort_by_episode_id: bool = False,
allow_uneven_splits: bool = False) -> typing.List[Dataset]
Returns a list of new datasets, each with a subset of the original episodes.
Parameters | |
---|---|
num_splits | the number of splits to create. |
episodes_per_split | if provided, each split will have up to this
many episodes. If it is not provided, each dataset will have
len(original_dataset.episodes) // num_splits episodes. If
max_episodes_per_split is provided and is larger than this value,
it will be capped to this value. |
remove_unused_episodes | once the splits are created, the extra episodes will be destroyed from the original dataset. This saves memory for large datasets. |
collate_scene_ids | if true, episodes with the same scene id are next to each other. This saves on overhead of switching between scenes, but means multiple sequential episodes will be related to each other because they will be in the same scene. |
sort_by_episode_id | if true, sequences are sorted by their episode ID in the returned splits. |
allow_uneven_splits | if true, the last splits can be shorter than the others. This is especially useful for splitting over validation/test datasets in order to make sure that all episodes are copied but none are duplicated. |
Returns | a list of new datasets, each with their own subset of episodes. |
All splits will have the same number of episodes, but no episodes will be duplicated.