Point Maze

The Point Maze domain involves moving a force-actuated ball (along the X and Y axis) to a fixed target location. The observation consists of the (x, y) location and velocities. The dataset consists of one continuous trajectory of the agent navigating to random goal locations, and thus has no terminal states. However, for the purposes of being able to split the trajectory into smaller episodes, the trajectory is truncated when the randomly selected navigation goal has been reached.

The datasets for each maze version includes two different reward functions, sparse and dense.

The data is generated by selecting goal locations at random and then using a planner (QIteration)[2] that generates sequences of waypoints that are followed using a PD controller. Because the controllers memorize the reached waypoints, the data collection policy is non-Markovian.

These datasets were originally generated by D4RL[1] under the Maze2D domain.

References

[1] Fu, Justin, et al. ‘D4RL: Datasets for Deep Data-Driven Reinforcement Learning’. CoRR, vol. abs/2004.07219, 2020, https://arxiv.org/abs/2004.07219..

[2] Lambert, Nathan. ‘Fundamental Iterative Methods of Reinforcement Learnin’. Apr 8, 2020, https://towardsdatascience.com/fundamental-iterative-methods-of-reinforcement-learning-df8ff078652a

Content

ID

Description

large-dense-v2

The data is collected from the PointMaze_LargeDense-v3 environment

large-v2

The data is collected from the PointMaze_Large-v3 environment

medium-dense-v2

The data is collected from the PointMaze_MediumDense-v3 environment

medium-v2

The data is collected from the PointMaze_Medium-v3 environment

open-dense-v2

The data is collected from the PointMaze_OpenDense-v3 environment, which contains an open arena with only perimeter walls

open-v2

The data is collected from the PointMaze_Open-v3 environment, which contains an open arena with only perimeter walls

umaze-dense-v2

The data is collected from the PointMaze_UMazeDense-v3 environment, which contains a U shape maze

umaze-v2

The data is collected from the PointMaze_UMaze-v3 environment, which contains a U shape maze