Medium¶

Description¶

The data is collected from the PointMaze_Medium-v3 environment. The agent uses a PD controller to follow a path of waypoints generated with QIteration until it reaches the goal. The task is continuing which means that when the agent reaches the goal the environment generates a new random goal without resetting the location of the agent. The reward function is sparse, only returning a value of 1 if the goal is reached, otherwise 0. To add variance to the collected paths random noise is added to the actions taken by the agent.

Dataset Specs¶


Total Steps	1000000
Total Episodes	4752
Dataset Observation Space	`Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (4,), float64))`
Dataset Action Space	`Box(-1.0, 1.0, (2,), float32)`
Algorithm	QIteration
Author	Rodrigo Perez-Vicente
Email	rperezvicente@farama.org
Code Permalink	https://github.com/rodrigodelazcano/d4rl-minari-dataset-generation
Minari Version	`0.4.3` (supported)
Download	`minari download pointmaze-medium-v2`

Environment Specs¶

Note

The following table rows correspond to (in addition to the action and observation space) the Gymnasium environment specifications used to generate the dataset. To read more about what each parameter means you can have a look at the Gymnasium documentation https://gymnasium.farama.org/api/registry/#gymnasium.envs.registration.EnvSpec

This environment can be recovered from the Minari dataset as follows:

import minari

dataset = minari.load_dataset('pointmaze-medium-v2')
env  = dataset.recover_environment()


ID	PointMaze_Medium-v3
Observation Space	`Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (4,), float64))`
Action Space	`Box(-1.0, 1.0, (2,), float32)`
entry_point	`gymnasium_robotics.envs.maze.point_maze:PointMazeEnv`
max_episode_steps	1000000.0
reward_threshold	None
nondeterministic	`False`
order_enforce	`True`
autoreset	`False`
disable_env_checker	`False`
kwargs	`{'maze_map': [[1, 1, 1, 1, 1, 1, 1, 1], [1, 0, 0, 1, 1, 0, 0, 1], [1, 0, 0, 1, 0, 0, 0, 1], [1, 1, 0, 0, 0, 1, 1, 1], [1, 0, 0, 1, 0, 0, 0, 1], [1, 0, 1, 0, 0, 1, 0, 1], [1, 0, 0, 0, 1, 0, 0, 1], [1, 1, 1, 1, 1, 1, 1, 1]], 'reward_type': 'sparse', 'continuing_task': True, 'reset_target': True}`
additional_wrappers	`()`
vector_entry_point	`None`

Evaluation Environment Specs¶

Note

This environment can be recovered from the Minari dataset as follows:

import minari

dataset = minari.load_dataset('pointmaze-medium-v2')
eval_env  = dataset.recover_environment(eval_env=True)


ID	PointMaze_Medium-v3
Observation Space	`Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (4,), float64))`
Action Space	`Box(-1.0, 1.0, (2,), float32)`
entry_point	`gymnasium_robotics.envs.maze.point_maze:PointMazeEnv`
max_episode_steps	600
reward_threshold	None
nondeterministic	`False`
order_enforce	`True`
autoreset	`False`
disable_env_checker	`False`
kwargs	`{'maze_map': [[1, 1, 1, 1, 1, 1, 1, 1], [1, 0, 0, 1, 1, 0, 0, 1], [1, 0, 0, 1, 0, 0, 0, 1], [1, 1, 0, 0, 0, 1, 1, 1], [1, 0, 0, 1, 0, 0, 0, 1], [1, 0, 1, 0, 0, 1, 0, 1], [1, 0, 0, 0, 1, 0, 'g', 1], [1, 1, 1, 1, 1, 1, 1, 1]], 'reward_type': 'sparse', 'continuing_task': True, 'reset_target': False}`
additional_wrappers	`()`
vector_entry_point	`None`