Umaze-Diverse¶
Description¶
The data is collected from the AntMaze_UMaze-v4
environment, which contains a U shape maze. At the beginning of each episode random locations for the goal and agent’s reset are selected. The success rate of all the trajectories is more than 90%, failed trajectories occur because the Ant flips and can’t stand up again. Also note that when the Ant reaches the goal the episode doesn’t terminate or generate a new target leading to a reward accumulation. The Ant reaches the goals by following a set of waypoints using a goal-reaching policy trained using SAC.
Dataset Specs¶
Total Steps |
1000000 |
Total Episodes |
1430 |
Dataset Observation Space |
|
Dataset Action Space |
|
Algorithm |
QIteration+SAC |
Author |
Alex Davey |
alexdavey0@gmail.com |
|
Code Permalink |
https://github.com/rodrigodelazcano/d4rl-minari-dataset-generation |
Minari Version |
|
Download |
|
Environment Specs¶
The following table rows correspond to (in addition to the action and observation space) the Gymnasium environment specifications used to generate the dataset. To read more about what each parameter means you can have a look at the Gymnasium documentation https://gymnasium.farama.org/api/registry/#gymnasium.envs.registration.EnvSpec
This environment can be recovered from the Minari dataset as follows:
import minari
dataset = minari.load_dataset('D4RL/antmaze/umaze-diverse-v1')
env = dataset.recover_environment()
ID |
AntMaze_UMaze-v4 |
Observation Space |
|
Action Space |
|
entry_point |
|
max_episode_steps |
700 |
reward_threshold |
None |
nondeterministic |
|
order_enforce |
|
disable_env_checker |
|
kwargs |
|
additional_wrappers |
|
vector_entry_point |
|
Evaluation Environment Specs¶
This environment can be recovered from the Minari dataset as follows:
import minari
dataset = minari.load_dataset('D4RL/antmaze/umaze-diverse-v1')
eval_env = dataset.recover_environment(eval_env=True)
ID |
AntMaze_UMaze-v4 |
Observation Space |
|
Action Space |
|
entry_point |
|
max_episode_steps |
700 |
reward_threshold |
None |
nondeterministic |
|
order_enforce |
|
disable_env_checker |
|
kwargs |
|
additional_wrappers |
|
vector_entry_point |
|