Medium-Diverse#

Description#

The data is collected from the AntMaze_Medium_Diverse_GR-v4 environment. At the beginning of each episode the goal and agent’s reset locations are selected from hand-picked cells in the map provided. The success rate of all the trajectories is more than 80%, failed trajectories occur because the Ant flips and can’t stand up again. Also note that when the Ant reaches the goal the episode doesn’t terminate or generate a new target leading to a reward accumulation. The Ant reaches the goals by following a set of waypoints using a goal-reaching policy trained using SAC.

Dataset Specs#


Total Timesteps	1000000
Total Episodes	1000
Dataset Observation Space	`Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (27,), float64))`
Dataset Action Space	`Box(-1.0, 1.0, (8,), float32)`
Algorithm	QIteration+SAC
Author	Alex Davey
Email	amd1g13@soton.ac.uk
Code Permalink	https://github.com/rodrigodelazcano/d4rl-minari-dataset-generation
Minari Version	~=0.4
Download	`minari.download_dataset("antmaze-medium-diverse-v0")`

Environment Specs#

Note

The following table rows correspond to (in addition to the action and observation space) the Gymnasium environment specifications used to generate the dataset. To read more about what each parameter means you can have a look at the Gymnasium documentation https://gymnasium.farama.org/api/registry/#gymnasium.envs.registration.EnvSpec

This environment can be recovered from the Minari dataset as follows:

import minari

dataset = minari.load_dataset('antmaze-medium-diverse-v0')
env  = dataset.recover_environment()


ID	AntMaze_Medium_Diverse_GR-v4
Observation Space	`Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (27,), float64))`
Action Space	`Box(-1.0, 1.0, (8,), float32)`
entry_point	`gymnasium_robotics.envs.maze.ant_maze_v4:AntMazeEnv`
max_episode_steps	1000
reward_threshold	None
nondeterministic	`False`
order_enforce	`True`
autoreset	`False`
disable_env_checker	`False`
kwargs	`{'maze_map': [[1, 1, 1, 1, 1, 1, 1, 1], [1, 'c', 0, 1, 1, 0, 0, 1], [1, 0, 0, 1, 0, 0, 'c', 1], [1, 1, 0, 0, 0, 1, 1, 1], [1, 0, 0, 1, 0, 0, 0, 1], [1, 'c', 1, 0, 0, 1, 0, 1], [1, 0, 0, 0, 1, 'c', 0, 1], [1, 1, 1, 1, 1, 1, 1, 1]], 'reward_type': 'sparse', 'continuing_task': True, 'reset_target': False}`
additional_wrappers	`()`
vector_entry_point	`None`

Evaluation Environment Specs#

Note

This environment can be recovered from the Minari dataset as follows:

import minari

dataset = minari.load_dataset('antmaze-medium-diverse-v0')
eval_env  = dataset.recover_environment(eval_env=True)


ID	AntMaze_Medium_Diverse_GR-v4
Observation Space	`Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (27,), float64))`
Action Space	`Box(-1.0, 1.0, (8,), float32)`
entry_point	`gymnasium_robotics.envs.maze.ant_maze_v4:AntMazeEnv`
max_episode_steps	1000
reward_threshold	None
nondeterministic	`False`
order_enforce	`True`
autoreset	`False`
disable_env_checker	`False`
kwargs	`{'maze_map': [[1, 1, 1, 1, 1, 1, 1, 1], [1, 'r', 0, 1, 1, 0, 0, 1], [1, 0, 0, 1, 0, 0, 0, 1], [1, 1, 0, 0, 0, 1, 1, 1], [1, 0, 0, 1, 0, 0, 0, 1], [1, 0, 1, 0, 0, 1, 0, 1], [1, 0, 0, 0, 1, 0, 'g', 1], [1, 1, 1, 1, 1, 1, 1, 1]], 'reward_type': 'sparse', 'continuing_task': True, 'reset_target': False}`
additional_wrappers	`()`
vector_entry_point	`None`