Point Maze¶

The Point Maze domain involves moving a force-actuated ball (along the X and Y axis) to a fixed target location. The observation consists of the (x, y) location and velocities. The dataset consists of one continuous trajectory of the agent navigating to random goal locations, and thus has no terminal states. However, for the purposes of being able to split the trajectory into smaller episodes, the trajectory is truncated when the randomly selected navigation goal has been reached.

The datasets for each maze version includes two different reward functions, sparse and dense.

The data is generated by selecting goal locations at random and then using a planner (QIteration)[2] that generates sequences of waypoints that are followed using a PD controller. Because the controllers memorize the reached waypoints, the data collection policy is non-Markovian.

These datasets were originally generated by D4RL[1] under the Maze2D domain.

References¶

[1] Fu, Justin, et al. ‘D4RL: Datasets for Deep Data-Driven Reinforcement Learning’. CoRR, vol. abs/2004.07219, 2020, https://arxiv.org/abs/2004.07219..

[2] Lambert, Nathan. ‘Fundamental Iterative Methods of Reinforcement Learnin’. Apr 8, 2020, https://towardsdatascience.com/fundamental-iterative-methods-of-reinforcement-learning-df8ff078652a

Content¶

ID	Description
open-dense-v2	The data is collected from the [`PointMaze_OpenDense-v3`](https://robotics
umaze-v2	The data is collected from the [`PointMaze_UMaze-v3`](https://robotics
large-dense-v2	The data is collected from the [`PointMaze_LargeDense-v3`](https://robotics
medium-v2	The data is collected from the [`PointMaze_Medium-v3`](https://robotics
umaze-dense-v2	The data is collected from the [`PointMaze_UMazeDense-v3`](https://robotics
medium-dense-v2	The data is collected from the [`PointMaze_MediumDense-v3`](https://robotics
large-v2	The data is collected from the [`PointMaze_Large-v3`](https://robotics
open-v2	The data is collected from the [`PointMaze_Open-v3`](https://robotics