Large-Dense

Description

The data is collected from the PointMaze_LargeDense-v3 environment. The agent uses a PD controller to follow a path of waypoints generated with QIteration until it reaches the goal. The task is continuing which means that when the agent reaches the goal the environment generates a new random goal without resetting the location of the agent. The reward function is dense, being the negative Euclidean distance between the goal and the agent. To add variance to the collected paths random noise is added to the actions taken by the agent.

Dataset Specs

Total Steps

1000000

Total Episodes

3360

Dataset Observation Space

Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (4,), float64))

Dataset Action Space

Box(-1.0, 1.0, (2,), float32)

Algorithm

QIteration

Author

Rodrigo Perez-Vicente

Email

rperezvicente@farama.org

Code Permalink

https://github.com/rodrigodelazcano/d4rl-minari-dataset-generation

Minari Version

0.4.3 (supported)

Download

minari download pointmaze-large-dense-v2

Environment Specs

Note

The following table rows correspond to (in addition to the action and observation space) the Gymnasium environment specifications used to generate the dataset. To read more about what each parameter means you can have a look at the Gymnasium documentation https://gymnasium.farama.org/api/registry/#gymnasium.envs.registration.EnvSpec

This environment can be recovered from the Minari dataset as follows:

import minari

dataset = minari.load_dataset('pointmaze-large-dense-v2')
env  = dataset.recover_environment()

ID

PointMaze_LargeDense-v3

Observation Space

Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (4,), float64))

Action Space

Box(-1.0, 1.0, (2,), float32)

entry_point

gymnasium_robotics.envs.maze.point_maze:PointMazeEnv

max_episode_steps

1000000.0

reward_threshold

None

nondeterministic

False

order_enforce

True

autoreset

False

disable_env_checker

False

kwargs

{'maze_map': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1], [1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1], [1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1], [1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1], [1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1], [1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], 'reward_type': 'dense', 'continuing_task': True, 'reset_target': True}

additional_wrappers

()

vector_entry_point

None

Evaluation Environment Specs

Note

This environment can be recovered from the Minari dataset as follows:

import minari

dataset = minari.load_dataset('pointmaze-large-dense-v2')
eval_env  = dataset.recover_environment(eval_env=True)

ID

PointMaze_LargeDense-v3

Observation Space

Dict('achieved_goal': Box(-inf, inf, (2,), float64), 'desired_goal': Box(-inf, inf, (2,), float64), 'observation': Box(-inf, inf, (4,), float64))

Action Space

Box(-1.0, 1.0, (2,), float32)

entry_point

gymnasium_robotics.envs.maze.point_maze:PointMazeEnv

max_episode_steps

800

reward_threshold

None

nondeterministic

False

order_enforce

True

autoreset

False

disable_env_checker

False

kwargs

{'maze_map': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1], [1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1], [1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1], [1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1], [1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1], [1, 0, 0, 1, 0, 0, 0, 1, 0, 'g', 0, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], 'reward_type': 'dense', 'continuing_task': True, 'reset_target': False}

additional_wrappers

()

vector_entry_point

None