DataCollector¶
minari.DataCollector¶
- class minari.DataCollector(env: Env, step_data_callback: Type[StepDataCallback] = StepDataCallback, episode_metadata_callback: Type[EpisodeMetadataCallback] = EpisodeMetadataCallback, record_infos: bool = False, observation_space: Space | None = None, action_space: Space | None = None, data_format: str | None = None)[source]¶
Gymnasium environment wrapper that collects step data.
This wrapper is meant to work as a temporary buffer of the environment data before creating a Minari dataset. The creation of the buffers that will be convert to a Minari dataset is agnostic to the user:
import minari import gymnasium as gym env = minari.DataCollector(gym.make('EnvID')) env.reset() for _ in range(num_steps): action = env.action_space.sample() obs, rew, terminated, truncated, info = env.step() if terminated or truncated: env.reset() dataset = env.create_dataset(dataset_id="env_name/dataset_name-v(version)", **kwargs)
Some of the characteristics of this wrapper:
The step data is stored per episode in dictionaries. This dictionaries are then stored in-memory in a global list buffer. The episode dictionaries contain items with list buffers as values for the main episode step datasets observations, actions, terminations, and truncations, the infos key can be a list or another nested dictionary with extra datasets.
A new episode dictionary buffer is created if the env.step(action) call returns truncated or terminated, or if the environment calls env.reset(). If calling reset and the previous episode was not truncated or terminated, this will automatically be truncated.
Initialize the data collector attributes and create the temporary directory for caching.
- Parameters:
env (gym.Env) – Gymnasium environment
step_data_callback (type[StepDataCallback], optional) – Callback class to edit/update step databefore storing to buffer. Defaults to StepDataCallback.
episode_metadata_callback (type[EpisodeMetadataCallback], optional) – Callback class to add custom metadata to episode group in HDF5 file. Defaults to EpisodeMetadataCallback.
record_infos (bool, optional) – If True record the info return key of each step. Defaults to False.
observation_space (gym.Space) – Observation space of the dataset. The default value is the environment observation space.
action_space (gym.Space) – Action space of the dataset. The default value is the environment action space.
data_format (str, optional) – Data format to store the data in the Minari dataset. If None (defaults), it will use the default format of MinariStorage.
Methods¶
- minari.DataCollector.step(self, action: ActType) tuple[ObsType, SupportsFloat, bool, bool, dict[str, Any]] ¶
Gymnasium step method.
- minari.DataCollector.reset(self, *, seed: int | None = None, options: dict[str, Any] | None = None) tuple[ObsType, dict[str, Any]] ¶
Gymnasium environment reset.
If no seed is set, one will be automatically generated, for reproducibility, unless
minari_autoseed=False
in theoptions
dictionary.- Parameters:
seed (optional int) – The seed that is used to initialize the environment’s PRNG. If no seed is specified, one will be automatically generated (by default).
options (optional dict) – Additional information to specify how the environment is reset. Set
minari_autoseed=False
to disable automatic seeding.
- Returns:
observation (ObsType) – Observation of the initial state.
info (dictionary) – Auxiliary information complementing
observation
.
- minari.DataCollector.create_dataset(self, dataset_id: str, eval_env: str | gym.Env | EnvSpec | None = None, algorithm_name: str | None = None, author: str | set | None = None, author_email: str | set | None = None, code_permalink: str | None = None, ref_min_score: float | None = None, ref_max_score: float | None = None, expert_policy: Callable[[ObsType], ActType] | None = None, num_episodes_average_score: int = 100, description: str | None = None, requirements: list | None = None)¶
Create a Minari dataset using the data collected from stepping with a Gymnasium environment wrapped with a DataCollector Minari wrapper.
The
dataset_id
parameter corresponds to the name of the dataset, with the syntax as follows:(namespace/)(env_name/)dataset_name(-v[version])
whereenv_name
identifies the name of the environment used to generate the dataset. The namespace is optional. Thisdataset_id
is used to load the Minari datasets withminari.load_dataset()
.- Parameters:
dataset_id (str) – name id to identify Minari dataset
eval_env (str | gym.Env | EnvSpec, optional) – Gymnasium environment(gym.Env)/environment id(str)/environment spec(EnvSpec) to use for evaluation with the dataset. After loading the dataset, the environment can be recovered as follows: MinariDataset.recover_environment(eval_env=True). If None the `env used to collect the buffer data should be used for evaluation.
algorithm_name (str, optional) – name of the algorithm used to collect the data. Defaults to None.
author (str | set, optional) – name of the author(s) that generated the dataset. Defaults to None.
author_email (str | set, optional) – email(s) of the author(s) that generated the dataset. Defaults to None.
code_permalink (str, optional) – link to relevant code used to generate the dataset. Defaults to None.
ref_min_score (float, optional) – minimum reference score from the average returns of a random policy. This value is later used to normalize a score with
minari.get_normalized_score()
. If default None the value will be estimated with a default random policy.ref_max_score (float, optional) – maximum reference score from the average returns of a hypothetical expert policy. This value is used in
minari.get_normalized_score()
. Default None.expert_policy (Callable[[ObsType], ActType], optional) – policy to compute ref_max_score by averaging the returns over a number of episodes equal to num_episodes_average_score. ref_max_score and expert_policy can’t be passed at the same time. Default to None
num_episodes_average_score (int) – number of episodes to average over the returns to compute ref_min_score and ref_max_score. Default to 100.
description (str, optional) – description of the dataset being created. Defaults to None.
requirements (list of str, optional) – list of requirements in pip-style to load the environment and reproduce the dataset. For example, mujoco>=3.1.0,<3.2.0, which indicate the supported version range for mujoco package. Defaults to None.
- Returns:
MinariDataset
- minari.DataCollector.add_to_dataset(self, dataset: MinariDataset)¶
Add extra data to Minari dataset from collector environment buffers (DataCollector).
- Parameters:
dataset (MinariDataset) – Dataset to add the data
- minari.DataCollector.close(self)¶
Close the DataCollector.
Clear buffer and close temporary directory.