DataCollector#

minari.DataCollector#

class minari.DataCollector(env: Env, step_data_callback: Type[StepDataCallback] = StepDataCallback, episode_metadata_callback: Type[EpisodeMetadataCallback] = EpisodeMetadataCallback, record_infos: bool = False, max_buffer_steps: int | None = None, observation_space=None, action_space=None)[source]#

Gymnasium environment wrapper that collects step data.

This wrapper is meant to work as a temporary buffer of the environment data before creating a Minari dataset. The creation of the buffers that will be convert to a Minari dataset is agnostic to the user:

import minari
import gymnasium as gym

env = minari.DataCollector(gym.make('EnvID'))

env.reset()

for _ in range(num_steps):
    action = env.action_space.sample()
    obs, rew, terminated, truncated, info = env.step()

    if terminated or truncated:
        env.reset()

dataset = env.create_dataset(dataset_id="env_name-dataset_name-v(version)", **kwargs)

Some of the characteristics of this wrapper:

  • The step data is stored per episode in dictionaries. This dictionaries are then stored in-memory in a global list buffer. The episode dictionaries contain items with list buffers as values for the main episode step datasets observations, actions, terminations, and truncations, the infos key can be a list or another nested dictionary with extra datasets. Separate data keys can be added by passing a custom StepDataCallback to the wrapper. When creating the HDF5 file the list values in the episode dictionary will be stored as datasets and the nested dictionaries will generate a new HDF5 group.

  • A new episode dictionary buffer is created if the env.step(action) call returns truncated or terminated, or if the environment calls env.reset(). If calling reset and the previous episode was not truncated or terminated, this will automatically be truncated.

  • To perform caching the user can set the max_buffer_steps or max_buffer_episodes before saving the in-memory buffers to a temporary HDF5 file in disk. If non of max_buffer_steps or max_buffer_episodes are set, the data will move from in-memory to a permanent location only when the Minari dataset is created. To move all the stored data to a permanent location use DataCollector.save_to_disK(path_to_permanent_location).

Initialize the data collector attributes and create the temporary directory for caching.

Parameters:
  • env (gym.Env) – Gymnasium environment

  • step_data_callback (type[StepDataCallback], optional) – Callback class to edit/update step databefore storing to buffer. Defaults to StepDataCallback.

  • episode_metadata_callback (type[EpisodeMetadataCallback], optional) – Callback class to add custom metadata to episode group in HDF5 file. Defaults to EpisodeMetadataCallback.

  • record_infos (bool, optional) – If True record the info return key of each step. Defaults to False.

  • max_buffer_steps (Optional[int], optional) – number of steps saved in-memory buffers before dumping to HDF5 file in disk. Defaults to None.

Raises:

ValueErrormax_buffer_steps and max_buffer_episodes can’t be passed at the same time

Methods#

minari.DataCollector.step(self, action: ActType) tuple[ObsType, SupportsFloat, bool, bool, dict[str, Any]]#

Gymnasium step method.

minari.DataCollector.reset(self, *, seed: int | None = None, options: dict[str, Any] | None = None) tuple[ObsType, dict[str, Any]]#

Gymnasium environment reset.

If no seed is set, one will be automatically generated, for reproducibility, unless minari_autoseed=False in the options dictionary.

Parameters:
  • seed (optional int) – The seed that is used to initialize the environment’s PRNG. If no seed is specified, one will be automatically generated (by default).

  • options (optional dict) – Additional information to specify how the environment is reset. Set minari_autoseed=False to disable automatic seeding.

Returns:
  • observation (ObsType) – Observation of the initial state.

  • info (dictionary) – Auxiliary information complementing observation.

minari.DataCollector.close(self)#

Close the DataCollector.

Clear buffer and close temporary directory.

minari.DataCollector.create_dataset(self, dataset_id: str, eval_env: str | gym.Env | EnvSpec | None = None, algorithm_name: str | None = None, author: str | None = None, author_email: str | None = None, code_permalink: str | None = None, ref_min_score: float | None = None, ref_max_score: float | None = None, expert_policy: Callable[[ObsType], ActType] | None = None, num_episodes_average_score: int = 100, minari_version: str | None = None)#

Create a Minari dataset using the data collected from stepping with a Gymnasium environment wrapped with a DataCollector Minari wrapper.

The dataset_id parameter corresponds to the name of the dataset, with the syntax as follows: (env_name-)(dataset_name)(-v(version)) where env_name identifies the name of the environment used to generate the dataset dataset_name. This dataset_id is used to load the Minari datasets with minari.load_dataset().

Parameters:
  • dataset_id (str) – name id to identify Minari dataset

  • buffer (list[Dict[str, Union[list, Dict]]]) – list of episode dictionaries with data

  • eval_env (Optional[str|gym.Env|EnvSpec]) – Gymnasium environment(gym.Env)/environment id(str)/environment spec(EnvSpec) to use for evaluation with the dataset. After loading the dataset, the environment can be recovered as follows: MinariDataset.recover_environment(eval_env=True). If None the `env used to collect the buffer data should be used for evaluation.

  • algorithm_name (Optional[str], optional) – name of the algorithm used to collect the data. Defaults to None.

  • author (Optional[str], optional) – author that generated the dataset. Defaults to None.

  • author_email (Optional[str], optional) – email of the author that generated the dataset. Defaults to None.

  • code_permalink (Optional[str], optional) – link to relevant code used to generate the dataset. Defaults to None.

  • ref_min_score (Optional[float], optional) – minimum reference score from the average returns of a random policy. This value is later used to normalize a score with minari.get_normalized_score(). If default None the value will be estimated with a default random policy.

  • (Optional[float] (ref_max_score) – maximum reference score from the average returns of a hypothetical expert policy. This value is used in minari.get_normalized_score(). Default None.

  • optional – maximum reference score from the average returns of a hypothetical expert policy. This value is used in minari.get_normalized_score(). Default None.

  • expert_policy (Optional[Callable[[ObsType], ActType], optional) – policy to compute ref_max_score by averaging the returns over a number of episodes equal to num_episodes_average_score. ref_max_score and expert_policy can’t be passed at the same time. Default to None

  • num_episodes_average_score (int) – number of episodes to average over the returns to compute ref_min_score and ref_max_score. Default to 100.

  • minari_version (Optional[str], optional) – Minari version specifier compatible with the dataset. If None (default) use the installed Minari version.

Returns:

MinariDataset