Umaze#

Description#
The data is collected from the PointMaze_UMaze-v3
environment, which contains a U shape maze. The agent uses a PD controller to follow a path of waypoints generated with QIteration until it reaches the goal. The task is continuing which means that when the agent reaches the goal the environment generates a new random goal without resetting the location of the agent. The reward function is sparse, only returning a value of 1 if the goal is reached, otherwise 0. To add variance to the collected paths random noise is added to the actions taken by the agent.
Dataset Specs#
Total Timesteps |
|
Total Episodes |
|
Dataset Observation Space |
|
Dataset Action Space |
|
Algorithm |
|
Author |
|
|
|
Code Permalink |
|
Minari Version |
|
download |
|
Environment Specs#
ID |
|
Action Space |
|
Observation Space |
|