Large-Play

Description

The data is collected from the AntMaze_Large-v4 environment. At the beginning of each episode random locations for the goal and agent’s reset are selected. The success rate of all the trajectories is more than 80%, failed trajectories occur because the Ant flips and can’t stand up again. Also note that when the Ant reaches the goal the episode doesn’t terminate or generate a new target leading to a reward accumulation. The Ant reaches the goals by following a set of waypoints using a goal-reaching policy trained using SAC.

Dataset Specs

Total Steps

1000000

Total Episodes

1000

Dataset Observation Space

Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (27,), float64))

Dataset Action Space

Box(-1.0, 1.0, (8,), float32)

Algorithm

QIteration+SAC

Author

Alex Davey

Email

alexdavey0@gmail.com

Code Permalink

https://github.com/rodrigodelazcano/d4rl-minari-dataset-generation

Minari Version

0.4.3 (supported)

Download

minari download antmaze-large-play-v1

Environment Specs

Note

The following table rows correspond to (in addition to the action and observation space) the Gymnasium environment specifications used to generate the dataset. To read more about what each parameter means you can have a look at the Gymnasium documentation https://gymnasium.farama.org/api/registry/#gymnasium.envs.registration.EnvSpec

This environment can be recovered from the Minari dataset as follows:

import minari

dataset = minari.load_dataset('antmaze-large-play-v1')
env  = dataset.recover_environment()

ID

AntMaze_Large-v4

Observation Space

Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (27,), float64))

Action Space

Box(-1.0, 1.0, (8,), float32)

entry_point

gymnasium_robotics.envs.maze.ant_maze_v4:AntMazeEnv

max_episode_steps

1000

reward_threshold

None

nondeterministic

False

order_enforce

True

autoreset

False

disable_env_checker

False

kwargs

{'maze_map': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1], [1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1], [1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1], [1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1], [1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1], [1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], 'reward_type': 'sparse', 'continuing_task': True, 'reset_target': False}

additional_wrappers

()

vector_entry_point

None

Evaluation Environment Specs

Note

This environment can be recovered from the Minari dataset as follows:

import minari

dataset = minari.load_dataset('antmaze-large-play-v1')
eval_env  = dataset.recover_environment(eval_env=True)

ID

AntMaze_Large-v4

Observation Space

Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (27,), float64))

Action Space

Box(-1.0, 1.0, (8,), float32)

entry_point

gymnasium_robotics.envs.maze.ant_maze_v4:AntMazeEnv

max_episode_steps

1000

reward_threshold

None

nondeterministic

False

order_enforce

True

autoreset

False

disable_env_checker

False

kwargs

{'maze_map': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 'r', 0, 0, 0, 1, 0, 0, 0, 0, 0, 1], [1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1], [1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1], [1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1], [1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1], [1, 0, 0, 1, 0, 0, 0, 1, 0, 'g', 0, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], 'reward_type': 'sparse', 'continuing_task': True, 'reset_target': False}

additional_wrappers

()

vector_entry_point

None