Ant Maze¶
The Ant Maze datasets present a navigation domain that replaces the 2D ball from pointmaze with the more complex 8-DoF Ant quadruped robot. This dataset was introduced in D4RL[1] to test the stitching challenge using a morphologically complex robot that could mimic real-world robotic navigation tasks. Additionally, for this task the reward is sparse 0-1 which is activated upon reaching the goal.
To collect the data, a goal reaching expert policy is previously trained with the SAC algorithm provided in Stable Baselines 3[2]. This goal reaching policy is then used by the Ant agent to follow a set of waypoints generated by a planner (QIteration)[3] to the final goal location. Because the controllers memorize the reached waypoints, the data collection policy is non-Markovian.
References¶
[1] Fu, Justin, et al. ‘D4RL: Datasets for Deep Data-Driven Reinforcement Learning’. CoRR, vol. abs/2004.07219, 2020, https://arxiv.org/abs/2004.07219.
[2] Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, & Noah Dormann (2021). Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 22(268), 1-8.
[3] Lambert, Nathan. ‘Fundamental Iterative Methods of Reinforcement Learnin’. Apr 8, 2020, https://towardsdatascience.com/fundamental-iterative-methods-of-reinforcement-learning-df8ff078652a
Content¶
ID |
Description |
---|---|
The data is collected from the |
|
The data is collected from the |
|
The data is collected from the |
|
The data is collected from the |
|
The data is collected from the |
|
The data is collected from the |