Release Notes¶

v0.5.3¶

Released on 2025-04-17 - GitHub - PyPI

Summary of changes

We now support JPEG encoding for datasets.
57 new Atari datasets on our remote, collected using CleanRL's expert policies.
We now support Parquet storage along with Arrow.

What's Changed

uniform terminated/truncated naming by @younik in #269
Update README.md by @Tanmay692004 in #270
Fix: (In dataset creation tutorial) Register PointMaze_Medium-v3 and Correct env.maze Access by @mengyuest in #272
Fix incorrect path name on windows by @FlutteryEmbers in #278
JPEG encoding and decoding if the observation is an image by @gabrielemaraglino in #275
Update docs-test.yml by @younik in #280
fix minigrid observation decoding by @younik in #281
add atari docs, git clone command speedup by @younik in #283
fix: HF remote usage by @younik in #284
Add support for parquet storage by @rob-pitkin in #285
docs(readme): added a link to the workflow by @wakened2024 in #287

New Contributors

@Tanmay692004 made their first contribution in #270
@mengyuest made their first contribution in #272
@FlutteryEmbers made their first contribution in #278
@gabrielemaraglino made their first contribution in #275
@rob-pitkin made their first contribution in #285
@wakened2024 made their first contribution in #287

Full Changelog: v0.5.2...v0.5.3

v0.5.2¶

Released on 2024-12-09 - GitHub - PyPI

Summary of changes

New datasets on remote (BabyAI and MuJoCo).
Support HuggingFace as remote (and switch from GCP to HF for default Farama remote).
Support full path for list, download, and show, e.g. minari list hf://farama-minari/minigrid.
Improve displayed information when listing datasets

What's Changed

Fix pyarrow with big info & drop Gymnasium logger by @younik in #248
improve minari list remote performance by @younik in #249
auto update dataset size by @younik in #251
Test d3rlpy by @younik in #252
ENH: Support MultiDiscrete & MultiBinary by @younik in #253
Try multiprocessing on dataset docs generation by @younik in #254
Improve docs generation by @younik in #255
fix autogenerated by @younik in #256
add mujoco by @younik in #257
ENH: add HuggingFace support by @younik in #259
Rename key_path to token by @younik in #261
fix tutorials reqs by @younik in #262
ENH: add prefix support by @younik in #264
ENH: cli list with growing datasets by @younik in #266
ENH: Allow complete remote path for list, download and show by @younik in #267
add prefix CLI list by @younik in #268

Full Changelog: v0.5.1...v0.5.2

v0.5.1¶

Released on 2024-10-09 - GitHub - PyPI

Small bug fixes & Python 3.12 support.

What's Changed

fix dataset_id error on Windows by @kaixin96 in #239
fix unknown author appearance by @younik in #240
Fix: Avoid error when missing dataset dependency by @younik in #241
Update pre-commit by @younik in #244
Single step env by @alexdavey in #245
Add Python 3.12 support by @younik in #224

New Contributors

@kaixin96 made their first contribution in #239

Full Changelog: v0.5.0...v0.5.1

v0.5.0¶

Released on 2024-08-29 - GitHub - PyPI

Key changes

PyArrow support

Minari now supports PyArrow datasets. To create a new dataset using PyArrow, set the data_format flag to arrow during the creation of a DataCollector or while creating a dataset with a buffer. For example:

env = DataCollector(env, data_format="arrow")

Loading a dataset doesn't require any change, Minari will detect automatically the data format.

Namespaces

Datasets can now be grouped to create a more organized dataset hub. For example, current remote datasets, which are reproductions of the D4RL datasets, are grouped under a namespace called D4RL. We encourage grouping datasets based on the environment used to produce them, if applicable. For instance, the previously named door-human-v2 dataset is now referenced as D4RL/door/human-v2. Multiple datasets are available in the D4RL group as well as in the D4RL/door subgroup, such as D4RL/door/cloned-v2. These grouped datasets can share metadata, enhancing their organization and accessibility.

For more information on creating and managing namespaces, please refer to the documentation page.

Support for other remotes

You can now set your own remote storage in Minari. Currently, only Google Cloud buckets are supported, but we plan to add support for other cloud services in the future. To configure your remote storage, set the MINARI_REMOTE environment variable, for example as follows:

export MINARI_REMOTE=gcp://bucket-name

Breaking changes

This release introduces a few breaking changes:

The deprecated versioning of DataCollector has been removed. It can now only be imported as DataCollector, not as DataCollectorV0.
DataCollector no longer supports max_episode_step.
We remove the deprecated method minari.create_dataset_from_collector_env; use DataCollector.create_dataset instead.
The naming convention has been changed as explained above. When using Minari 0.5.0, remote dataset names have been updated to adhere to the new convention.
We renamed total_timesteps to total_steps to unify the naming across the library.

Contributors

New contributors

@tomekster made their first contribution in #183
@cmmcirvin made their first contribution in #196
@pseudo-rnd-thoughts made their first contribution in #211
@JosephCarrino made their first contribution in #218
@jamartinh made their first contribution in #177

Others contributors

@younik @alexdavey @enerrio

Full Changelog: v0.4.3...v0.5.0

v0.4.3¶

Released on 2024-01-27 - GitHub - PyPI

Minari 0.4.3 Release Notes

small simple bug-fix: update lost infos in function create_dataset function by @im-Kitsch in #144
Refactor DataCollectorV0 and HDF5 dependencies isolation by @younik in #133
Add custom space serialization tutorial by @enerrio in #151
Add basic CI to test documentation (using pytest-markdown-docs) by @elliottower in #153
Add evaluation environment specs to dataset metadata by @rodrigodelazcano in #155
Add kwargs to recover_env by @younik in #161
Fix combine dataset attributes by @younik in #162
Add obs/act spaces when combining datasets by @rodrigodelazcano in #163
Run tests in a temporary Minari dataset dir by @alexdavey in #160
Mandatory spaces by @rodrigodelazcano in #164
Add minigrid docs by @younik in #165
Update pre-commit to check for all things gymnasium/pettingzoo does by @elliottower in #157
make env optional arg while creating from buffers by @avjmachine in #137
added dataset_size attribute to minari datasets by @shreyansjainn in #158
Improve README by @younik in #167
Deprecate create_dataset_from_collector_env by @avjmachine #169
Add minari show by @alexdavey #170
DataCollectorV0 -> DataCollector, deprecation warning by @younik #171
Add automatic seeding of Datacollector reset by @alexdavey #172
Adds infos to EpisodeData by @balisujohn @rodrigodelazcano @younik #132

New Contributors

@avjmachine made their first contribution in #137

Full Changelog: v0.4.2...v0.4.3

v0.4.2¶

Released on 2023-10-09 - GitHub - PyPI

Minari 0.4.2 Release Notes

Ruggedizes list_local_datasets and adds basic tests for CLI by @grahamannett in #126
Fix actions with noise type and value needs to be within (low,high) by @grahamannett in #128
Load dataset with download option by @grahamannett in #130
Improve speed of list_remote_datasets by @balisujohn in #124
Improve MinariDataset sampling speed, fixes total_steps bug and adds test coverage by @balisujohn in #129
Update issue template by @younik in #139
Loose typing_extensions dependency by @im-Kitsch in #148

New Contributors

@grahamannett made their first contribution in #126
@im-Kitsch made their first contribution in #148

Full Changelog: v0.4.1...v0.4.2

v0.4.1¶

Released on 2023-07-19 - GitHub - PyPI

v0.4.1 Release Notes

Bugfix: Adds packaging as a dependency for Minari in #121.

v0.4.0¶

Released on 2023-07-19 - GitHub - PyPI

v0.4.0 Release Notes

Important changes from this PR include the move away from observation and action flattening, the move to explicitly and fully support Dict, Tuple, Box, Discrete, and Text spaces, and the move to explicit dataset versioning. Additionally, we have added support for using a subset of an environment's action/observation spaces when creating a dataset.

Finally, we have released new versions of each dataset to make them compliant with our new dataset format. This includes all the changes listed in the "Dataset Updates" section of the release notes.

We have two new tutorials:

Unflattened `Dict` and `Tuple` space support

The following exerpt from our documentation shows how unflattened gymnasium.spaces.Dict and gymnasium.spaces.Tuple are now supported.

In the case where, the observation space is a relatively complex Dict space with the following definition:

spaces.Dict(
    {
        "component_1": spaces.Box(low=-1, high=1, dtype=np.float32),
        "component_2": spaces.Dict(
            {
                "subcomponent_1": spaces.Box(low=2, high=3, dtype=np.float32),
                "subcomponent_2": spaces.Box(low=4, high=5, dtype=np.float32),
            }
        ),
    }
)

and the action space is Box space, the resulting HDF5 file will end up looking as follows:

📄 main_data.hdf5
├ 📁 episode_0
│ ├ 📁 observations
│ │ ├ 💾 component_1
│ │ └ 📁 component_2
│ │ ├ 💾 subcomponent_1
│ │ └ 💾 subcomponent_2
│ ├ 💾 actions
│ ├ 💾 terminations
│ ├ 💾 truncations
│ ├ 💾 rewards
│ ├ 📁 infos
│ │ ├ 💾 infos_datasets
│ │ └ 📁 infos_subgroup
│ │ └ 💾 more_datasets
│ └ 📁 additional_groups
│ └ 💾 additional_datasets
├ 📁 episode_1
├ 📁 episode_2
│
└ 📁 last_episode_id

Similarly, consider the case where we have a Box space as an observation space and a relatively complex Tuple space as an action space with the following definition:

spaces.Tuple(
    (
        spaces.Box(low=2, high=3, dtype=np.float32),
        spaces.Tuple(
            (
                spaces.Box(low=2, high=3, dtype=np.float32),
                spaces.Box(low=4, high=5, dtype=np.float32),
            )
        ),
    )
)

In this case, the resulting Minari dataset HDF5 file will end up looking as follows:

📄 main_data.hdf5
├ 📁 episode_0
│ ├ 💾 observations
│ ├ 📁 actions
│ │ ├ 💾 _index_0
│ │ └ 📁 _index_1
│ │ ├ 💾 _index_0
│ │ └ 💾 _index_0
│ ├ 💾 terminations
│ ├ 💾 truncations
│ ├ 💾 rewards
│ ├ 📁 infos
│ │ ├ 💾 infos_datasets
│ │ └ 📁 infos_subgroup
│ │ └ 💾 more_datasets
│ └ 📁 additional_groups
│ └ 💾 additional_datasets
├ 📁 episode_1
├ 📁 episode_2
│
└ 📁 last_episode_id

EpisodeData: Data format when sampling episodes

Episodes are now sampled as EpisodeData Instances that comply with the following format:

Field	Type	Description
`id`	`np.int64`	ID of the episode.
`seed`	`np.int64`	Seed used to reset the episode.
`total_timesteps`	`np.int64`	Number of timesteps in the episode.
`observations`	`np.ndarray`, `list`, `tuple`, `dict`	Observations for each timestep including initial observation.
`actions`	`np.ndarray`, `list`, `tuple`, `dict`	Actions for each timestep.
`rewards`	`np.ndarray`	Rewards for each timestep.
`terminations`	`np.ndarray`	Terminations for each timestep.
`truncations`	`np.ndarray`	Truncations for each timestep.

Breaking Changes

Changed dataset format to support storing unflattened spaces and also subset spaces by @rodrigodelazcano, @balisujohn, and @younik in #77
Refactored minari_dataset.py to not use h5py directly. This meant moving public function clear_episode_buffer to minari_storage.py by @balisujohn in #101
Removed Python 3.7 compatibility due to EOL by @rodrigodelazcano in #107

New Features

Added Python 3.11 support by @rodrigodelazcano in #73
Reorganized tests and added more thorough testing of MinariStorage by @balisujohn in #75
Added option to copy data(instead of reference based copy) when combining datasets by @Howuhh in #82
Fail with a descriptive error if dataset env base library not installed by @shreyansjainn in #86
Made EpisodeData into a dataclass by @younik in #88
Added force_download flag for locally existing datasets by @shreyansjainn in #90
Added support for text spaces by @younik in #99
Added minari.get_normalized_scores that follows the evaluation process in the D4RL datasets. by @rodrigodelazcano in #110
Added code to support Minari dataset version specifiers by @rodrigodelazcano in #107

Bug Fixes

Fixed path printing in the CLI(previously incorrect) by @Howuhh in #83
Copy infos from the previous episode if truncated or terminated without reset by @Howuhh in #96
Ignore hidden files when listing local datasets by @enerrio #104
h5py group creation bug fix by @rodrigodelazcano in #111

Documentation Changes

Adding a table describing supported action and oibservation spaces by @tohsin in #84
Adds test instructions to contributing.MD by @shreyansjainn in #86
Adds installation instructions to basic usage section of doc and also doc build instructions to documentation by @enerrio in #105
Added tutorial for space subsetting by @Bamboofungus in #108
Added description of EpisodeData to documentation by @enerrio in #109
Improved background about PID control in pointmaze dataset creation tutorial by @tohsin in #95
Docs now show serialized dataset spaces by @rodrigodelazcano in #116
Adds behavior cloning tutorial with Minari and PyTorchDataLoader by @younik in #102

Misc Changes

Added a citation.cff file by @rodrigodelazcano
Fixed some typos and type annotations, slightly ruggedized datacollector by @RedTachyon in #52
Froze pyright in pre-commit to version 1.1.305 by @balisujohn
Updated dataset used in readme example by @shreyansjainn in #80
Corrected readme example by @younik in #87
Added check to make sure observations and actions were in the right space in MinariStorage instances by @balisujohn in #92
Ruggedized list_remote_datasets by @balisujohn in #93
Added pre-commit and code-style black badges by @elliottower in #112

Dataset Updates

v1 versions of each provided dataset have been released and new dataset format has the following changes.

Observation and action flattening have been removed for pointmaze datasets, as arbitrary nesting of Dict and Tuple spaces is now supported with the new dataset format.
v1 and subsequent datasets now have action_space and observation_space fields which store a serialized representation of the observation and action spaces used for observations and actions in the dataset. It's important to note that this can be different from the spaces of the gymnasium environment mentioned in the dataset spec.
v1 and subsequent datasets have the minari_version field which specify with which versions of Minari they are compatible.
v1 pointmaze datasets copies the last info to the next episode as fixed in #96

v0.3.1¶

Released on 2023-05-19 - GitHub - PyPI

v0.3.1 Release notes

Minor release for fixing the following bugs:

Fix combining multiple datasets . Use the h5py method dataset.attrs.modify() to update the "author" and "author_email" metadata attributes. Also added CI tests. @Howuhh in #60
Fix .github/workflows/build-docs-version.yml. The workflow was missing to run the dataset documentation generation file python docs/_scripts/gen_dataset_md.py as well as the SPHINX_GITHUB_CHANGELOG_TOKEN environment variable. @rodrigodelazcano in #71

Full Changelog: v0.3.0...v0.3.1

v0.3.0¶

Released on 2023-05-17 - GitHub - PyPI

v0.3.0: Minari is ready for testing

Minari 0.3.0 Release Notes:

For this beta release Minari has experienced considerable changes from its past v0.2.2 version. As a major refactor, the C source code and Cython dependency have been removed in favor of a pure Python API in order to reduce code complexity. If we require a more efficient API in the future we will explore the use of C.

Apart from the API changes and new features we are excited to include the first official Minari datasets which have been re-created from the D4RL project.

The documentation page at https://minari.farama.org/, has also been updated with the latest changes.

We are constantly developing this library. Please don't hesitate to open a GitHub issue or reach out to us directly. Your ideas and contributions are highly appreciated and will help shape the future of this library. Thank you for using our library!

New Features and Improvements

Dataset File Format

We are keeping the HDF5 file format to store the Minari datasets. However, the internal structure of the datasets has been modified. The data is now stored in a per episode basis. Each Minari dataset has a minimum of one HDF5 file (:page_facing_up:, main_data.hdf5). In the dataset file, the collected transitions are separated by episode groups (:file_folder:) that contain 5 required datasets(:floppy_disk:) : observations, actions, terminations, truncations, and rewards. Other optional group and dataset collections can be included in each episode; such is the case of the infos step return. This structure allows us to store metadata for each episode.

📄 main_data.hdf5
├ 📁 episode_id
│ ├ 💾 observations
│ ├ 💾 actions
│ ├ 💾 terminations
│ ├ 💾 truncations
│ ├ 💾 rewards
│ ├ 📁 infos
│ │ ├ 💾 info datasets
│ │ └ 📁 info subgroup
│ │ └ 💾 info subgroup dataset
│ └ 📁 extra dataset group
│ └ 💾 extra datasets
└ 📁 next_episode_id

MinariDataset

When loading a dataset, the MinariDataset object now delegates the HDF5 file access to a MinariStorage object. The MinariDataset provides new methods (MinariDataset.sample_episodes()(#34) and MinariDataset.iterate_episodes()(#54)) to retrieve EpisodeData from the available episode indices in the dataset.

NOTE: for now the user is in charge of creating their own replay buffers with the provided episode sampling methods. We are currently working on creating standard replay buffers (#55) and making Minari datasets compatible with other learning Offline RL libraries.

The available episode indices can be filtered using metadata or other information from the episodes HDF5 datasets with MinariDataset.filter_episodes(condition: Callable[[h5py.Group], bool])(#34).

dataset = minari.load_dataset("door-human-v0")

print(f'TOTAL EPISODES ORIGINAL DATASET: {dataset.total_episodes}')

# get episodes with mean reward greater than 2
filter_dataset = dataset.filter_episodes(lambda episode: episode["rewards"].attrs.get("mean") > 2)

print(f'TOTAL EPISODES FILTER DATASET: {filter_dataset.total_episodes}')

>>> TOTAL EPISODES ORIGINAL DATASET: 25
>>> TOTAL EPISODES FILTER DATASET: 18

The episodes in a MinariDataset can also be splitted into smaller sub-datasets with minari.split_dataset(dataset: MinariDataset, sizes: List[int], seed: int | None = None)(#34).

dataset = minari.load_dataset("door-human-v0")

split_datasets = minari.split_dataset(dataset, sizes=[20, 5], seed=123)

print(f'TOTAL EPISODES FIRST SPLIT: {split_datasets[0].total_episodes}')
print(f'TOTAL EPISODES SECOND SPLIT: {split_datasets[1].total_episodes}')

>>> TOTAL EPISODES FIRST SPLIT: 20
>>> TOTAL EPISODES SECOND SPLIT: 5

Finally, Gymnasium release v0.28.0 made possible the conversion of the environment's EnvSpec to a json dictionary. This allowed Minari to "safe" the description of the environment used to generate the dataset into the HDF5 file for later recovery through: MinariDataset.recover_environment() (#31). NOTE: the entry_point of the environment must be available, i.e. to recover the environment from door-human-v0 dataset, the gymnasium-robotics library must be installed.

Dataset Creation (#31)

We are facilitating the logging of environment data by providing a Gymnasium environment wrapper, DataCollectorV0. This wrapper buffers the parameters from a Gymnasium step transition. The DataCollectorV0 is also memory efficient by providing a step/episode scheduler to cache the recorded data. In addition, this wrapper can be initialized with two custom callbacks:

StepDataCallback - This callback automatically flattens Dictionary or Tuple observation/action spaces (this functionality will be removed in a future release following the suggestions of #57). This class can be overridden to store additional environment data.
EpisodeMetadataCallback - This callback adds metadata to each recorded episode. For now automatic metadata will be added to the rewards dataset of each episode. It can also be overridden to include additional metadata.

To save the Minari dataset in disk with a specific dataset id two functions are provided. If the data is collected by wrapping the environment with a DataCollectorV0, use minari.create_dataset_from_collector_env. Otherwise you can collect the episode trajectories with dictionary collection buffers and use minari.create_dataset_from_buffers.

This functions return a MinariDataset object which can be used to checkpoint the data collection process to later append more data with
MinariDataset.update_dataset_from_collector_env(collector_env: DataCollectorV0).

import minari
import gynasium as gym

env = gym.make('CartPole-v1')   
collector_env = minari.DataCollectorV0(env)

dataset_id = 'cartpole-test-v0'

# Collect 1000 episodes for the dataset
for n_step in range(1000):
	collector_env.reset(seed=123)
	while True:
    	action = collector_env.action_space.sample()
    	obs, rew, terminated, truncated, info = collector_env.step(action)
    	if terminated or truncated:
         	break

	# Checkpoint data after each 100 episodes
	if (n_step + 1) % 100 == 0:
    	# If the Minari dataset id does not exist create a new dataset, otherwise update the existing one
    	if dataset_id not in minari.list_local_datasets():
        	dataset = minari.create_dataset_from_collector_env(collector_env=collector_env, dataset_id=dataset_id)
    	else:
        	dataset.update_dataset_from_collector_env(collector_env)

We provide a curated tutorial in the documentation on how to use these dataset creation tools: https://minari.farama.org/main/tutorials/dataset_creation/point_maze_dataset/#sphx-glr-tutorials-dataset-creation-point-maze-dataset-py

Finally, multiple existent datasets can be combined into a larger dataset. This requires that the datasets to be combined have the same observation/action space as well as the same EnvSpec (except for the max_episode_steps argument for which the largest will be selected among all the datasets)

Multiple already existent Minari datasets can be combined under a different name as follows:

dataset_v1 = minari.load_dataset('dataset-v1')
dataset_v2 = minari.load_dataset('dataset-v2')

dataset_v3 = minari.combine_datasets(datasets_to_combine = [dataset_v1, dataset_v2], new_dataset_id = 'dataset-v3')

CLI

To improve accessibility to the remote public datasets, we are also including a CLI tool with commands to list, download, and upload Minari datasets.

New Public Datasets

Bellow is a list of new available dataset ids from different Gymnasium environments. These datasets have been re-created from the original D4RL project.

0.2.2¶

Released on 2023-01-04 - GitHub - PyPI

What's Changed

Minari rename by @WillDudley in #8
Environment name by @WillDudley in #11
update naming conventions by @WillDudley in #13
allow nonetypes of codelink, author and email by @WillDudley in #12
Environment stack by @WillDudley in #14
PR tests by @WillDudley in #22
import_bugfix by @WillDudley in #26
Add docs versioning by @mgoulao in #28

New Contributors

@mgoulao made their first contribution in #28

Full Changelog: 0.1.0...0.2.2

0.1.0¶

Released on 2022-11-04 - GitHub - PyPI

What's Changed

init structure by @WillDudley in #3
remove residual PZ files by @WillDudley in #4
precommit by @WillDudley in #5
Wd/mdp dataset by @WillDudley in #6

New Contributors

@WillDudley made their first contribution in #3

Full Changelog: https://github.com/Farama-Foundation/Kabuki/commits/0.1.0

Release Notes¶

v0.5.3¶

Summary of changes

What's Changed

New Contributors

v0.5.2¶

Summary of changes

What's Changed

v0.5.1¶

What's Changed

New Contributors

v0.5.0¶

Key changes

PyArrow support

Namespaces

Support for other remotes

Breaking changes

Contributors

New contributors

Others contributors

v0.4.3¶

Minari 0.4.3 Release Notes

New Contributors

v0.4.2¶

Minari 0.4.2 Release Notes

New Contributors

v0.4.1¶

v0.4.1 Release Notes

v0.4.0¶

v0.4.0 Release Notes

Unflattened Dict and Tuple space support

EpisodeData: Data format when sampling episodes

Breaking Changes

New Features

Bug Fixes

Documentation Changes

Misc Changes

Dataset Updates

v0.3.1¶

v0.3.1 Release notes

v0.3.0¶

v0.3.0: Minari is ready for testing

Minari 0.3.0 Release Notes:

New Features and Improvements

Dataset File Format

MinariDataset

Dataset Creation (#31)

CLI

New Public Datasets

AdroitHandDoor-v1:

AdroitHandHammer-v1

AdroitHandPen-v1

AdroitHandRelocate-v1

PointMaze

FrankaKitchen-v1

0.2.2¶

What's Changed

New Contributors

0.1.0¶

What's Changed

New Contributors

Unflattened `Dict` and `Tuple` space support