Ant Maze

The Ant Maze datasets present a navigation domain that replaces the 2D ball from pointmaze with the more complex 8-DoF Ant quadruped robot. This dataset was introduced in D4RL[1] to test the stitching challenge using a morphologically complex robot that could mimic real-world robotic navigation tasks. Additionally, for this task the reward is sparse 0-1 which is activated upon reaching the goal.

To collect the data, a goal reaching expert policy is previously trained with the SAC algorithm provided in Stable Baselines 3[2]. This goal reaching policy is then used by the Ant agent to follow a set of waypoints generated by a planner (QIteration)[3] to the final goal location. Because the controllers memorize the reached waypoints, the data collection policy is non-Markovian.

References

[1] Fu, Justin, et al. ‘D4RL: Datasets for Deep Data-Driven Reinforcement Learning’. CoRR, vol. abs/2004.07219, 2020, https://arxiv.org/abs/2004.07219.

[2] Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, & Noah Dormann (2021). Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 22(268), 1-8.

[3] Lambert, Nathan. ‘Fundamental Iterative Methods of Reinforcement Learnin’. Apr 8, 2020, https://towardsdatascience.com/fundamental-iterative-methods-of-reinforcement-learning-df8ff078652a

Content

ID

Description

medium-play-v1

The data is collected from the AntMaze_Medium-v4 environment

umaze-diverse-v1

The data is collected from the AntMaze_UMaze-v4 environment, which contains a U shape maze

large-diverse-v1

The data is collected from the AntMaze_Large_Diverse_GR-v4 environment

large-play-v1

The data is collected from the AntMaze_Large-v4 environment

medium-diverse-v1

The data is collected from the AntMaze_Medium_Diverse_GR-v4 environment

umaze-v1

The data is collected from the AntMaze_UMaze-v4 environment, which contains a U shape maze