Ant Maze#

The Ant Maze datasets present a navigation domain that replaces the 2D ball from pointmaze with the more complex 8-DoF Ant quadruped robot. This dataset was introduced in D4RL[1] to test the stitching challenge using a morphologically complex robot that could mimic real-world robotic navigation tasks. Additionally, for this task the reward is sparse 0-1 which is activated upon reaching the goal.

To collect the data, a goal reaching expert policy is previously trained with the SAC algorithm provided in Stable Baselines 3[2]. This goal reaching policy is then used by the Ant agent to follow a set of waypoints generated by a planner (QIteration)[3] to the final goal location. Because the controllers memorize the reached waypoints, the data collection policy is non-Markovian.

References#

[1] Fu, Justin, et al. ‘D4RL: Datasets for Deep Data-Driven Reinforcement Learning’. CoRR, vol. abs/2004.07219, 2020, https://arxiv.org/abs/2004.07219.

[2] Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, & Noah Dormann (2021). Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 22(268), 1-8.

[3] Lambert, Nathan. ‘Fundamental Iterative Methods of Reinforcement Learnin’. Apr 8, 2020, https://towardsdatascience.com/fundamental-iterative-methods-of-reinforcement-learning-df8ff078652a

Available Datasets#

Dataset ID	Description
antmaze-large-diverse-v0	The data is collected from the `AntMaze_Large_Diverse_GR-v4` environment
antmaze-large-play-v0	The data is collected from the `AntMaze_Large-v4` environment
antmaze-medium-diverse-v0	The data is collected from the `AntMaze_Medium_Diverse_GR-v4` environment
antmaze-medium-play-v0	The data is collected from the `AntMaze_Medium-v4` environment
antmaze-umaze-diverse-v0	The data is collected from the `AntMaze_UMaze-v4` environment, which contains a U shape maze
antmaze-umaze-v0	The data is collected from the `AntMaze_UMaze-v4` environment, which contains a U shape maze