A dataset handle and abstract low level access to the data. the dataset will takes data stored locally, in the format in which they have been downloaded, and will convert them into a MNE raw object. There are options to pool all the different recording sessions per subject or to evaluate them separately.

See http://moabb.neurotechx.com/docs/dataset_summary.html for detail on datasets (electrodes, number of trials, sessions, etc.)

Data Summary#

MOABB gather many datasets, here is list summarizing important information. Most of the datasets are listed here but this list not complete yet, check API for complete documentation.

Do not hesitate to help us complete this list. It is also possible to add new datasets, there is a tutorial explaining how to do so, and we welcome warmly any new contributions!

It is possible to use an external dataset within MOABB as long as it is in Brain Imaging Data Structure (BIDS) format. See this guide for more information on how to structure your data according to BIDS You can use this class to convert your local dataset to work within MOABB without creating a new dataset class.

See also the Wiki for supplementary detail on datasets (class name, size, licence, etc.) Dataset, #Subj, #Chan, #Classes, #Trials, Trial length, Freq, #Session, #Runs, Total_trials, PapersWithCode leaderboard

Columns definitions:

Dataset is the name of the dataset.
#Subj is the number of subjects.
#Chan is the number of EEG channels.
#Trials / class is the number of repetitions performed by one subject for each class. This number is computed using only the first subject of each dataset. The definitions of a **class* and of a trial depend on the paradigm used (see sections below)*.
Trials length is the duration of trial in seconds.
Total_trials is the total number of trials in the dataset (all subjects and classes together).
Freq is the sampling frequency of the raw data.
#Session is the number of sessions per subject. Different sessions are often recorded on different days.
#Runs is the number of runs per session. A run is a continuous recording of the EEG data. Often, the different runs of a given session are recorded without removing the EEG cap in between.
PapersWithCode leaderboard is the link to the dataset on the PapersWithCode leaderboard.

Datasets overview:

A visual overview of all datasets can be generated using the functions moabb.datasets.utils.plot_datasets_grid() or moabb.datasets.utils.plot_datasets_cluster(). This overview allows to quickly compare the number of subjects, trials, and sessions across different datasets. The function will generate a figure like this:

Visual overview from the datasets used on the `The largest EEG-based BCI reproducibility study for open science: the MOABB benchmark <https://universite-paris-saclay.hal.science/hal-04537061v1/file/MOABB-arXiv.pdf>`_

Motor Imagery#

Motor Imagery is a BCI paradigm where the subject imagines performing movements. Each movement is associated with a different command to build an application.

Motor Imagery-specific definitions:

#Classes is the number of different imagery tasks.
Trial is one repetition of the imagery task.

Dataset	#Subj	#Chan	#Classes	#Trials / class	Trials length (s)	Freq (Hz)	#Sessions	#Runs	Total_trials	PapersWithCode leaderboard
`AlexMI`	8	16	3	20	3.0	512	1	1	480	Yes
`BNCI2003_004`	5	118	2	84	3.5	100	1	1	1400	No
`BNCI2014_001`	9	22	4	144	4.0	250	2	6	62208	Yes
`BNCI2014_002`	14	15	2	80	5.0	512	1	8	17920	Yes
`BNCI2014_004`	9	3	2	360	4.5	250	5	1	32400	Yes
`BNCI2015_001`	12	13	2	200	5.0	512	3	1	14400	Yes
`BNCI2015_004`	9	30	5	80	7.0	256	2	1	7200	Yes
`Cho2017`	52	64	2	100	3.0	512	1	1	9800	Yes
`Lee2019_MI`	54	62	2	100	4.0	1000	2	1	11000	Yes
`GrosseWentrup2009`	10	128	2	150	7.0	500	1	1	3000	Yes
`Schirrmeister2017`	14	128	4	120	4.0	500	1	2	13440	Yes
`Ofner2017`	15	61	7	60	3.0	512	1	10	63000	No
`PhysionetMI`	109	64	4	23	3.0	160	1	1	69760	Yes
`Shin2017A`	29	30	2	30	10.0	200	3	1	5220	Yes
`Shin2017B`	29	30	2	30	10.0	200	3	1	5220	No
`Weibo2014`	10	60	7	80	4.0	200	1	1	5600	Yes
`Zhou2016`	4	14	3	160	5.0	250	3	2	11496	Yes
`Stieger2021`	62	64	4	450	3.0	1000	7 or 11	1	250000	No
`Liu2024`	50	29	2	20	4.0	500	1	1	2000	No
`Beetl2021_A`	4	63	4	224	4.0	500	1	1	1490	No
`Beetl2021_B`	2	32	4	160	4.0	200	1	1	1590	No
`Dreyer2023A`	60	27	2	20	5.0	512	1	6	14400	No
`Dreyer2023B`	21	27	2	20	5.0	512	1	6	5040	No
`Dreyer2023C`	6	27	2	20	5.0	512	1	6	1440	No
`Dreyer2023`	87	27	2	20	5.0	512	1	6	20880	No

P300/ERP#

ERP (Event-Related Potential) is a BCI paradigm where the subject is presented with a stimulus and the EEG response is recorded. The P300 is a positive peak in the EEG signal that occurs around 300 ms after the stimulus.

P300-specific definitions:

A trial is one flash.
The classes are binary: a trial is target if the key on which the subject focuses is flashed and non-target otherwise.

Dataset	#Subj	#Chan	#Trials / class	Trials length (s)	Freq (Hz)	#Sessions	PapersWithCode leaderboard
`BNCI2014_008`	8	8	3500 NT / 700 T	1.0	256	1	Yes
`BNCI2014_009`	10	16	1440 NT / 288 T	0.8	256	3	Yes
`BNCI2015_003`	10	8	1500 NT / 300 T	0.8	256	1	Yes
`BI2012`	25	16	640 NT / 128 T	1.0	128	2	Yes
`BI2013a`	24	16	3200 NT / 640 T	1.0	512	8 for subjects 1-7 else 1	Yes
`BI2014a`	64	16	990 NT / 198 T	1.0	512	up to 3	Yes
`BI2014b`	38	32	200 NT / 40 T	1.0	512	3	Yes
`BI2015a`	43	32	4131 NT / 825 T	1.0	512	3	Yes
`BI2015b`	44	32	2160 NT / 480 T	1.0	512	1	Yes
`Cattan2019_VR`	21	16	600 NT / 120 T	1.0	512	2	Yes
`Huebner2017`	13	31	364 NT / 112 T	0.9	1000	3	Yes
`Huebner2018`	12	31	364 NT / 112 T	0.9	1000	3	Yes
`Sosulski2019`	13	31	7500 NT / 1500 T	1.2	1000	1	Yes
`EPFLP300`	8	32	2753 NT / 551 T	1.0	2048	4	Yes
`Lee2019_ERP`	54	62	6900 NT / 1380 T	1.0	1000	2	Yes
`DemonsP300`	60	8	935 NT / 50 T	1.0	500	1	No
`ErpCore2021_N170`	40	30	240 NT / 80 T	1.0	1024	1	No
`ErpCore2021_MMN`	40	30	800 NT / 200 T	1.0	1024	1	No
`ErpCore2021_N2pc`	40	30	160 NT / 160 T	1.0	1024	1	No
`ErpCore2021_P3`	40	30	160 NT / 40 T	1.0	1024	1	No
`ErpCore2021_N400`	40	30	60 NT / 60 T	1.0	1024	1	No
`ErpCore2021_ERN`	40	30	~400 All	1.0	1024	1	No
`ErpCore2021_LRP`	40	30	~400 All	1.0	1024	1	No
`Kojima2024A`	11	64	~130 NT / ~65 T	1.0	1000	1	No
`Kojima2024B`	15	64	2160 NT / 720 T	1.0	1000	1	No
`RomaniBF2025ERP`	22	8	540 NT / 60 T	1.0	250	up to 3	No

SSVEP#

SSVEP (Steady-State Visually Evoked Potential) is a BCI paradigm where the subject is presented with flickering stimuli. The EEG signal is modulated at the same frequency as the stimulus. Each stimulus is flickering at a different frequency.

SSVEP-specific definitions:

#Classes is the number of different stimulation frequencies.
A trial is one symbol selection. This includes multiple flashes.

Dataset	#Subj	#Chan	#Classes	#Trials / class	Trials length (s)	Freq (Hz)	#Sessions	PapersWithCode leaderboard
`Lee2019_SSVEP`	54	62	4	50	4.0	1000	2	Yes
`Kalunga2016`	12	8	4	16	2.0	256	1	Yes
`MAMEM1`	10	256	5	12-15	3.0	250	1	Yes
`MAMEM2`	10	256	5	20-30	3.0	250	1	Yes
`MAMEM3`	10	14	4	20-30	3.0	128	1	Yes
`Nakanishi2015`	9	8	12	15	4.15	256	1	Yes
`Wang2016`	34	64	40	6	5.0	250	1	Yes

c-VEP#

Include neuro experiments where the participant is presented with psuedo-random noise-codes, such as m-sequences, Gold codes, or any arbitrary “pseudo-random” code. Specifically, the difference with SSVEP is that SSVEP presents periodic stimuli, while c-VEP presents non-periodic stimuli. For a review of c-VEP BCI, see:

Martínez-Cagigal, V., Thielen, J., Santamaria-Vazquez, E., Pérez-Velasco, S., Desain, P.,& Hornero, R. (2021). Brain–computer interfaces based on code-modulated visual evoked potentials (c-VEP): A literature review. Journal of Neural Engineering, 18(6), 061002. DOI: https://doi.org/10.1088/1741-2552/ac38cf

c-VEP-specific definitions:

A trial is one symbol selection. This includes multiple flashes.
#Trial classes is the number of different symbols.
#Epoch classes is the number of possible intensities for the flashes (for a visual cVEP paradigm). Typically, there are only two intensities: on and off.
#Epochs / class the number of flashes per intensity in each session.
Codes is the type of code used in the experiment.
Presentation rate is the rate at which the codes are presented.

Dataset	#Subj	#Sessions	Freq (Hz)	#Chan	Trials length (s)	#Trial classes	#Trials / class	#Epochs classes	#Epochs / class	Codes	Presentation rate (Hz)	PapersWithCode leaderboard
`Thielen2015`	12	1	2048	64	4.2	36	3	2	27216 NT / 27216 T	Gold codes	120	No
`Thielen2021`	30	1	512	8	31.5	20	5	2	18900 NT / 18900 T	Gold codes	60	No
`CastillosCVEP100`	12	1	500	32	2.2	4	15/15/15/15	2	3525 NT / 3495 T	m-sequence	60	No
`CastillosCVEP40`	12	1	500	32	2.2	4	15/15/15/15	2	3525 NT / 3495 T	m-sequence	60	No
`CastillosBurstVEP40`	12	1	500	32	2.2	4	15/15/15/15	2	5820 NT / 1200 T	Burst-CVEP	60	No
`CastillosBurstVEP100`	12	1	500	32	2.2	4	15/15/15/15	2	5820 NT / 1200 T	Burst-CVEP	60	No

Resting States#

Include neuro experiments where the participant is not actively doing something. For example, recoding the EEG of a subject while s/he is having the eye closed or opened is a resting state experiment.

Dataset	#Subj	#Chan	#Classes	#Blocks / class	Trials length (s)	Freq (Hz)	#Sessions	PapersWithCode leaderboard
`Cattan2019_PHMD`	12	16	2	5	60	512	1	No
`Hinss2021`	15	62	4	1	2	250	1	No
`Rodrigues2017`	20	16	2	5	10	512	1	No

Compound Datasets#

Compound Datasets are datasets compounded with subjects from other datasets. It is useful for merging different datasets (including other Compound Datasets), select a sample of subject inside a dataset (e.g. subject with high/low performance).

Dataset	#Subj	#Original datasets
`BI2014a_Il`	17	BI2014a
`BI2014b_Il`	11	BI2014b
`BI2015a_Il`	2	BI2015a
`BI2015b_Il`	25	BI2015b
`Cattan2019_VR_Il`	4	Cattan2019_VR
`BI_Il`	59	`BI2014a_Il` `BI2014b_Il` `BI2015a_Il` `BI2015b_Il` `Cattan2019_VR_Il`

Submit a new dataset#

you can submit a new dataset by mentioning it to this issue. The datasets currently on our radar can be seen here, but we are open to any suggestion.

If you want to actively contribute to inclusion of one new dataset, you can follow also this tutorial tutorial.