Umaze#

Description#

The data is collected from the PointMaze_UMaze-v3 environment, which contains a U shape maze. The agent uses a PD controller to follow a path of waypoints generated with QIteration until it reaches the goal. The task is continuing which means that when the agent reaches the goal the environment generates a new random goal without resetting the location of the agent. The reward function is sparse, only returning a value of 1 if the goal is reached, otherwise 0. To add variance to the collected paths random noise is added to the actions taken by the agent.

Dataset Specs#


Total Timesteps	`1000000`
Total Episodes	`13289`
Dataset Observation Space	`Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (4,), float64))`
Dataset Action Space	`Box(-1.0, 1.0, (2,), float32)`
Algorithm	`QIteration`
Author	`Rodrigo Perez-Vicente`
Email	`rperezvicente@farama.org`
Code Permalink	`https://github.com/rodrigodelazcano/d4rl-minari-dataset-generation`
Minari Version	`~=0.4`
download	`minari.download_dataset("pointmaze-umaze-v1")`

Environment Specs#


ID	`PointMaze_UMaze-v3`
Action Space	`Box(-1.0, 1.0, (2,), float32)`
Observation Space	`Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (4,), float64))`