Large-Play

Description

The data is collected from the AntMaze_Large-v4 environment. At the beginning of each episode random locations for the goal and agent’s reset are selected. The success rate of all the trajectories is more than 80%, failed trajectories occur because the Ant flips and can’t stand up again. Also note that when the Ant reaches the goal the episode doesn’t terminate or generate a new target leading to a reward accumulation. The Ant reaches the goals by following a set of waypoints using a goal-reaching policy trained using SAC.

Dataset Specs

Total Steps

1000000

Total Episodes

1000

Dataset Observation Space

Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (27,), float64))

Dataset Action Space

Box(-1.0, 1.0, (8,), float32)

Algorithm

QIteration+SAC

Author

Alex Davey

Email

alexdavey0@gmail.com

Code Permalink

https://github.com/rodrigodelazcano/d4rl-minari-dataset-generation

Minari Version

0.4.3 (supported)

Download

minari download D4RL/antmaze/large-play-v1

Environment Specs

The following table rows correspond to the Gymnasium environment specifications used to generate the dataset. To read more about what each parameter means you can have a look at the Gymnasium documentation https://gymnasium.farama.org/api/registry/#gymnasium.envs.registration.EnvSpec

This environment can be recovered from the Minari dataset as follows:

import minari

dataset = minari.load_dataset('D4RL/antmaze/large-play-v1')
env  = dataset.recover_environment()

ID

AntMaze_Large-v4

Observation Space

Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (27,), float64))

Action Space

Box(-1.0, 1.0, (8,), float32)

entry_point

gymnasium_robotics.envs.maze.ant_maze_v4:AntMazeEnv

max_episode_steps

1000

reward_threshold

None

nondeterministic

False

order_enforce

True

disable_env_checker

False

kwargs

{'maze_map': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1], [1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1], [1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1], [1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1], [1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1], [1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], 'reward_type': 'sparse', 'continuing_task': True, 'reset_target': False}

additional_wrappers

()

vector_entry_point

None

Evaluation Environment Specs

This environment can be recovered from the Minari dataset as follows:

import minari

dataset = minari.load_dataset('D4RL/antmaze/large-play-v1')
eval_env  = dataset.recover_environment(eval_env=True)

ID

AntMaze_Large-v4

Observation Space

Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (27,), float64))

Action Space

Box(-1.0, 1.0, (8,), float32)

entry_point

gymnasium_robotics.envs.maze.ant_maze_v4:AntMazeEnv

max_episode_steps

1000

reward_threshold

None

nondeterministic

False

order_enforce

True

disable_env_checker

False

kwargs

{'maze_map': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1], [1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1], [1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1], [1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1], [1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1], [1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], 'reward_type': 'sparse', 'continuing_task': True, 'reset_target': False}

additional_wrappers

()

vector_entry_point

None