A dataset handle and abstract low level access to the data. the dataset will takes data stored locally, in the format in which they have been downloaded, and will convert them into a MNE raw object. There are options to pool all the different recording sessions per subject or to evaluate them separately.

See NeuroTechX/moabb for detail on datasets (electrodes, number of trials, sessions, etc.)

Data Summary#

MOABB gather many datasets, here is list summarizing important information. Most of the datasets are listed here but this list not complete yet, check API for complete documentation.

Do not hesitate to help us complete this list. It is also possible to add new datasets, there is a tutorial explaining how to do so, and we welcome warmly any new contributions!

See also Datasets-Support for supplementary detail on datasets (class name, size, licence, etc.) Dataset, #Subj, #Chan, #Classes, #Trials, Trial length, Freq, #Session, #Runs, Total_trials, PapersWithCode leaderboard

Columns definitions: * Dataset is the name of the dataset. * #Subj is the number of subjects. * #Chan is the number of EEG channels. * #Trials / class is the number of repetitions performed by one subject for each class. This number is computed using only the first subject of each dataset. The definitions of a **class* and of a trial depend on the paradigm used (see sections below)*. * Trial length is the duration of trial in seconds. * Total_trials is the total number of trials in the dataset (all subjects and classes together). * Freq is the sampling frequency of the raw data. * #Session is the number of sessions per subject. Different sessions are often recorded on different days. * #Runs is the number of runs per session. A run is a continuous recording of the EEG data. Often, the different runs of a given session are recorded without removing the EEG cap in between. * PapersWithCode leaderboard is the link to the dataset on the PapersWithCode leaderboard.

Motor Imagery#

Motor Imagery is a BCI paradigm where the subject imagines performing movements. Each movement is associated with a different command to build an application.

Motor Imagery-specific definitions: * #Classes is the number of different imagery tasks. * Trial is one repetition of the imagery task.

Dataset

#Subj

#Chan

#Classes

#Trials / class

Trial length

Freq

#Session

#Runs

Total_trials

PapersWithCode leaderboard

AlexMI

8

16

3

20

3s

512Hz

1

1

480

Yes

BNCI2014_001

9

22

4

144

4s

250Hz

2

6

62208

Yes

BNCI2014_002

14

15

2

80

5s

512Hz

1

8

17920

Yes

BNCI2014_004

9

3

2

360

4.5s

250Hz

5

1

32400

Yes

BNCI2015_001

12

13

2

200

5s

512Hz

3

1

14400

Yes

BNCI2015_004

9

30

5

80

7s

256Hz

2

1

7200

Yes

Cho2017

52

64

2

100

3s

512Hz

1

1

9800

Yes

Lee2019_MI

54

62

2

100

4s

1000Hz

2

1

11000

Yes

GrosseWentrup2009

10

128

2

150

7s

500Hz

1

1

3000

Yes

Schirrmeister2017

14

128

4

120

4s

500Hz

1

2

13440

Yes

Ofner2017

15

61

7

60

3s

512Hz

1

10

63000

No

PhysionetMI

109

64

4

23

3s

160Hz

1

1

69760

Yes

Shin2017A

29

30

2

30

10s

200Hz

3

1

5220

Yes

Shin2017B

29

30

2

30

10s

200Hz

3

1

5220

No

Weibo2014

10

60

7

80

4s

200Hz

1

1

5600

Yes

Zhou2016

4

14

3

160

5s

250Hz

3

2

11496

Yes

Stieger2021

62

64

4

450

3s

1000Hz

7 or 11

1

250000

No

Liu2024

50

29

2

20

4s

500Hz

1

1

2000

No

P300/ERP#

ERP (Event-Related Potential) is a BCI paradigm where the subject is presented with a stimulus and the EEG response is recorded. The P300 is a positive peak in the EEG signal that occurs around 300 ms after the stimulus.

P300-specific definitions: * A trial is one flash. * The classes are binary: a trial is target if the key on which the subject focuses is flashed and non-target otherwise.

Dataset

#Subj

#Chan

#Trials / class

Trials length

Sampling rate

#Sessions

PapersWithCode leaderboard

BNCI2014_008

8

8

3500 NT / 700 T

1s

256Hz

1

Yes

BNCI2014_009

10

16

1440 NT / 288 T

0.8s

256Hz

3

Yes

BNCI2015_003

10

8

1500 NT / 300 T

0.8s

256Hz

1

Yes

BI2012

25

16

640 NT / 128 T

1s

128Hz

2

Yes

BI2013a

24

16

3200 NT / 640 T

1s

512Hz

8 for subjects 1-7 else 1

Yes

BI2014a

64

16

990 NT / 198 T

1s

512Hz

up to 3

Yes

BI2014b

38

32

200 NT / 40 T

1s

512Hz

3

Yes

BI2015a

43

32

4131 NT / 825 T

1s

512Hz

3

Yes

BI2015b

44

32

2160 NT / 480 T

1s

512Hz

1

Yes

Cattan2019_VR

21

16

600 NT / 120 T

1s

512Hz

2

Yes

Huebner2017

13

31

364 NT / 112 T

0.9s

1000Hz

3

Yes

Huebner2018

12

31

364 NT / 112 T

0.9s

1000Hz

3

Yes

Sosulski2019

13

31

7500 NT / 1500 T

1.2s

1000Hz

1

Yes

EPFLP300

8

32

2753 NT / 551 T

1s

2048Hz

4

Yes

Lee2019_ERP

54

62

6900 NT / 1380 T

1s

1000Hz

2

Yes

DemonsP300

60

8

935 NT / 50 T

1s

500Hz

1

No

SSVEP#

SSVEP (Steady-State Visually Evoked Potential) is a BCI paradigm where the subject is presented with flickering stimuli. The EEG signal is modulated at the same frequency as the stimulus. Each stimulus is flickering at a different frequency.

SSVEP-specific definitions: * #Classes is the number of different stimulation frequencies. * A trial is one symbol selection. This includes multiple flashes.

Dataset

#Subj

#Chan

#Classes

#Trials / class

Trials length

Sampling rate

#Sessions

PapersWithCode leaderboard

Lee2019_SSVEP

54

62

4

50

4s

1000Hz

2

Yes

Kalunga2016

12

8

4

16

2s

256Hz

1

Yes

MAMEM1

10

256

5

12-15

3s

250Hz

1

Yes

MAMEM2

10

256

5

20-30

3s

250Hz

1

Yes

MAMEM3

10

14

4

20-30

3s

128Hz

1

Yes

Nakanishi2015

9

8

12

15

4.15s

256Hz

1

Yes

Wang2016

34

62

40

6

5s

250Hz

1

Yes

c-VEP#

Include neuro experiments where the participant is presented with psuedo-random noise-codes, such as m-sequences, Gold codes, or any arbitrary “pseudo-random” code. Specifically, the difference with SSVEP is that SSVEP presents periodic stimuli, while c-VEP presents non-periodic stimuli. For a review of c-VEP BCI, see:

Martínez-Cagigal, V., Thielen, J., Santamaria-Vazquez, E., Pérez-Velasco, S., Desain, P.,& Hornero, R. (2021). Brain–computer interfaces based on code-modulated visual evoked potentials (c-VEP): A literature review. Journal of Neural Engineering, 18(6), 061002. DOI: https://doi.org/10.1088/1741-2552/ac38cf

c-VEP-specific definitions: * A trial is one symbol selection. This includes multiple flashes. * #Trial classes is the number of different symbols. * #Epoch classes is the number of possible intensities for the flashes (for a visual cVEP paradigm). Typically, there are only two intensities: on and off. * #Epochs / class the number of flashes per intensity in each session. * Codes is the type of code used in the experiment. * Presentation rate is the rate at which the codes are presented.

Dataset

#Subj

#Sessions

Sampling rate

#Chan

Trials length

#Trial classes

#Trials / class

#Epochs classes

#Epochs / class

Codes

Presentation rate

PapersWithCode leaderboard

Thielen2015

12

1

2048Hz

64

4.2s

36

3

2

27216 NT / 27216 T

Gold codes

120Hz

No

Thielen2021

30

1

512Hz

8

31.5s

20

5

2

18900 NT / 18900 T

Gold codes

60Hz

No

CastillosCVEP100

12

1

500Hz

32

2.2s

4

15/15/15/15

2

3525 NT / 3495 T

m-sequence

60Hz

No

CastillosCVEP40

12

1

500Hz

32

2.2s

4

15/15/15/15

2

3525 NT / 3495 T

m-sequence

60Hz

No

CastillosBurstVEP40

12

1

500Hz

32

2.2s

4

15/15/15/15

2

5820 NT / 1200 T

Burst-CVEP

60Hz

No

CastillosBurstVEP100

12

1

500Hz

32

2.2s

4

15/15/15/15

2

5820 NT / 1200 T

Burst-CVEP

60Hz

No

Resting States#

Include neuro experiments where the participant is not actively doing something. For example, recoding the EEG of a subject while s/he is having the eye closed or opened is a resting state experiment.

Dataset

#Subj

#Chan

#Classes

#Blocks / class

Trials length

Sampling rate

#Sessions

PapersWithCode leaderboard

Cattan2019_PHMD

12

16

2

10

60s

512Hz

1

No

Hinss2021

15

62

4

1

2s

250Hz

1

No

Rodrigues2017

20

16

2

5

10s

512Hz

1

No

Compound Datasets#

Compound Datasets are datasets compounded with subjects from other datasets. It is useful for merging different datasets (including other Compound Datasets), select a sample of subject inside a dataset (e.g. subject with high/low performance).

Dataset

#Subj

#Original datasets

BI2014a_Il

17

BI2014a

BI2014b_Il

11

BI2014b

BI2015a_Il

2

BI2015a

BI2015b_Il

25

BI2015b

Cattan2019_VR_Il

4

Cattan2019_VR

BI_Il

59

BI2014a_Il BI2014b_Il BI2015a_Il BI2015b_Il Cattan2019_VR_Il

Submit a new dataset#

you can submit a new dataset by mentioning it to this issue. The datasets currently on our radar can be seen here, but we are open to any suggestion.

If you want to actively contribute to inclusion of one new dataset, you can follow also this tutorial tutorial.