Large-Play¶

Description¶

The data is collected from the AntMaze_Large-v4 environment. At the beginning of each episode random locations for the goal and agent’s reset are selected. The success rate of all the trajectories is more than 80%, failed trajectories occur because the Ant flips and can’t stand up again. Also note that when the Ant reaches the goal the episode doesn’t terminate or generate a new target leading to a reward accumulation. The Ant reaches the goals by following a set of waypoints using a goal-reaching policy trained using SAC.

Dataset Specs¶


Total Steps	1000000
Total Episodes	1000
Dataset Observation Space	`Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (27,), float64))`
Dataset Action Space	`Box(-1.0, 1.0, (8,), float32)`
Algorithm	QIteration+SAC
Author	Alex Davey
Email	alexdavey0@gmail.com
Code Permalink	https://github.com/rodrigodelazcano/d4rl-minari-dataset-generation
Minari Version	`0.4.3` (supported)
Download	`minari download D4RL/antmaze/large-play-v1`

Environment Specs¶

The following table rows correspond to the Gymnasium environment specifications used to generate the dataset. To read more about what each parameter means you can have a look at the Gymnasium documentation https://gymnasium.farama.org/api/registry/#gymnasium.envs.registration.EnvSpec

This environment can be recovered from the Minari dataset as follows:

import minari

dataset = minari.load_dataset('D4RL/antmaze/large-play-v1')
env  = dataset.recover_environment()


ID	AntMaze_Large-v4
Observation Space	`Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (27,), float64))`
Action Space	`Box(-1.0, 1.0, (8,), float32)`
entry_point	`gymnasium_robotics.envs.maze.ant_maze_v4:AntMazeEnv`
max_episode_steps	1000
reward_threshold	None
nondeterministic	`False`
order_enforce	`True`
disable_env_checker	`False`
kwargs	`{'maze_map': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1], [1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1], [1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1], [1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1], [1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1], [1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], 'reward_type': 'sparse', 'continuing_task': True, 'reset_target': False}`
additional_wrappers	`()`
vector_entry_point	`None`

Evaluation Environment Specs¶

This environment can be recovered from the Minari dataset as follows:

import minari

dataset = minari.load_dataset('D4RL/antmaze/large-play-v1')
eval_env  = dataset.recover_environment(eval_env=True)


ID	AntMaze_Large-v4
Observation Space	`Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (27,), float64))`
Action Space	`Box(-1.0, 1.0, (8,), float32)`
entry_point	`gymnasium_robotics.envs.maze.ant_maze_v4:AntMazeEnv`
max_episode_steps	1000
reward_threshold	None
nondeterministic	`False`
order_enforce	`True`
disable_env_checker	`False`
kwargs	`{'maze_map': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1], [1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1], [1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1], [1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1], [1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1], [1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], 'reward_type': 'sparse', 'continuing_task': True, 'reset_target': False}`
additional_wrappers	`()`
vector_entry_point	`None`