Note
Go to the end to download the full example code.
Tutorial 5: Combining Multiple Datasets into a Single Dataset#
# Author: Gregoire Cattan
#
# https://github.com/plcrodrigues/Workshop-MOABB-BCI-Graz-2019
from pyriemann.classification import MDM
from pyriemann.estimation import ERPCovariances
from sklearn.pipeline import make_pipeline
from moabb.datasets import Cattan2019_VR
from moabb.datasets.braininvaders import BI2014a
from moabb.datasets.compound_dataset import CompoundDataset
from moabb.datasets.utils import blocks_reps
from moabb.evaluations import WithinSessionEvaluation
from moabb.paradigms.p300 import P300
Initialization#
This tutorial illustrates how to use the CompoundDataset to: 1) Select a few subjects/sessions/runs in an existing dataset 2) Merge two CompoundDataset into a new one 3) … and finally use this new dataset on a pipeline (this steps is not specific to CompoundDataset)
Let’s define a paradigm and a pipeline for evaluation first.
paradigm = P300()
pipelines = {}
pipelines["MDM"] = make_pipeline(ERPCovariances(estimator="lwf"), MDM(metric="riemann"))
Creation a selection of subject#
We are going to great two CompoundDataset, namely CustomDataset1 & 2. A CompoundDataset accepts a subjects_list of subjects. It is a list of tuple. A tuple contains 4 values:
the original dataset
the subject number to select
the sessions. It can be:
a session name (‘0’)
a list of sessions ([‘0’, ‘1’])
None to select all the sessions attributed to a subject
the runs. As for sessions, it can be a single run name, a list or None` (to select all runs).
class CustomDataset1(CompoundDataset):
def __init__(self):
biVR = Cattan2019_VR(virtual_reality=True, screen_display=True)
runs = blocks_reps([0, 2], [0, 1, 2, 3, 4], biVR.n_repetitions)
subjects_list = [(biVR, 1, "0VR", runs), (biVR, 2, "0VR", runs)]
CompoundDataset.__init__(
self, subjects_list=subjects_list, code="CustomDataset1", interval=[0, 1.0]
)
class CustomDataset2(CompoundDataset):
def __init__(self):
bi2014 = BI2014a()
subjects_list = [(bi2014, 4, None, None), (bi2014, 7, None, None)]
CompoundDataset.__init__(
self, subjects_list=subjects_list, code="CustomDataset2", interval=[0, 1.0]
)
Merging the datasets#
We are now going to merge the two CompoundDataset into a single one. The implementation is straight forward. Instead of providing a list of subjects, you should provide a list of CompoundDataset. subjects_list = [CustomDataset1(), CustomDataset2()]
class CustomDataset3(CompoundDataset):
def __init__(self):
subjects_list = [CustomDataset1(), CustomDataset2()]
CompoundDataset.__init__(
self, subjects_list=subjects_list, code="CustomDataset3", interval=[0, 1.0]
)
Evaluate and display#
Let’s use a WithinSessionEvaluation to evaluate our new dataset. If you already new how to do this, nothing changed: The CompoundDataset can be used as a normal dataset.
datasets = [CustomDataset3()]
evaluation = WithinSessionEvaluation(
paradigm=paradigm, datasets=datasets, overwrite=False, suffix="newdataset"
)
scores = evaluation.process(pipelines)
print(scores)
0%| | 0.00/46.4M [00:00<?, ?B/s]
0%| | 12.3k/46.4M [00:00<10:37, 72.8kB/s]
0%| | 39.9k/46.4M [00:00<05:55, 131kB/s]
0%| | 94.2k/46.4M [00:00<03:25, 226kB/s]
0%|▏ | 163k/46.4M [00:00<02:30, 307kB/s]
1%|▏ | 247k/46.4M [00:00<01:59, 387kB/s]
1%|▎ | 338k/46.4M [00:00<01:42, 451kB/s]
1%|▍ | 461k/46.4M [00:01<01:22, 557kB/s]
1%|▌ | 608k/46.4M [00:01<01:07, 676kB/s]
2%|▋ | 788k/46.4M [00:01<00:55, 821kB/s]
2%|▊ | 984k/46.4M [00:01<00:48, 941kB/s]
3%|█ | 1.28M/46.4M [00:01<00:36, 1.22MB/s]
3%|█▏ | 1.52M/46.4M [00:01<00:34, 1.32MB/s]
4%|█▌ | 1.93M/46.4M [00:02<00:26, 1.70MB/s]
5%|█▉ | 2.39M/46.4M [00:02<00:21, 2.06MB/s]
6%|██▎ | 2.98M/46.4M [00:02<00:16, 2.56MB/s]
8%|███ | 3.85M/46.4M [00:02<00:12, 3.44MB/s]
11%|███▉ | 4.88M/46.4M [00:02<00:09, 4.36MB/s]
14%|█████ | 6.30M/46.4M [00:02<00:06, 5.75MB/s]
18%|██████▌ | 8.21M/46.4M [00:03<00:04, 7.65MB/s]
23%|████████▍ | 10.6M/46.4M [00:03<00:03, 9.88MB/s]
30%|███████████ | 13.8M/46.4M [00:03<00:02, 13.0MB/s]
36%|█████████████▍ | 16.8M/46.4M [00:03<00:02, 14.8MB/s]
45%|████████████████▋ | 20.9M/46.4M [00:03<00:01, 17.9MB/s]
54%|███████████████████▉ | 24.9M/46.4M [00:03<00:01, 20.0MB/s]
62%|███████████████████████ | 29.0M/46.4M [00:03<00:00, 21.2MB/s]
71%|██████████████████████████▎ | 33.0M/46.4M [00:04<00:00, 22.3MB/s]
80%|█████████████████████████████▌ | 37.1M/46.4M [00:04<00:00, 23.2MB/s]
89%|████████████████████████████████▊ | 41.2M/46.4M [00:04<00:00, 23.8MB/s]
97%|████████████████████████████████████ | 45.2M/46.4M [00:04<00:00, 24.3MB/s]
0%| | 0.00/46.4M [00:00<?, ?B/s]
100%|██████████████████████████████████████| 46.4M/46.4M [00:00<00:00, 222GB/s]
0%| | 0.00/74.3M [00:00<?, ?B/s]
0%| | 12.3k/74.3M [00:00<19:18, 64.1kB/s]
0%| | 35.8k/74.3M [00:00<11:09, 111kB/s]
0%| | 91.1k/74.3M [00:00<05:41, 217kB/s]
0%| | 162k/74.3M [00:00<04:00, 308kB/s]
0%|▏ | 244k/74.3M [00:00<03:12, 384kB/s]
0%|▏ | 342k/74.3M [00:00<02:38, 468kB/s]
1%|▏ | 440k/74.3M [00:01<02:21, 521kB/s]
1%|▎ | 554k/74.3M [00:01<02:05, 587kB/s]
1%|▎ | 685k/74.3M [00:01<01:50, 666kB/s]
1%|▍ | 849k/74.3M [00:01<01:33, 785kB/s]
1%|▌ | 996k/74.3M [00:01<01:27, 835kB/s]
2%|▌ | 1.22M/74.3M [00:01<01:11, 1.03MB/s]
2%|▋ | 1.45M/74.3M [00:02<01:02, 1.16MB/s]
2%|▉ | 1.83M/74.3M [00:02<00:47, 1.54MB/s]
3%|█▏ | 2.27M/74.3M [00:02<00:37, 1.93MB/s]
4%|█▍ | 2.93M/74.3M [00:02<00:27, 2.61MB/s]
5%|█▊ | 3.73M/74.3M [00:02<00:20, 3.37MB/s]
6%|██▎ | 4.76M/74.3M [00:02<00:15, 4.35MB/s]
8%|███ | 6.20M/74.3M [00:02<00:11, 5.82MB/s]
11%|███▉ | 7.91M/74.3M [00:03<00:08, 7.38MB/s]
14%|█████ | 10.2M/74.3M [00:03<00:06, 9.61MB/s]
18%|██████▌ | 13.1M/74.3M [00:03<00:05, 12.2MB/s]
21%|███████▊ | 15.8M/74.3M [00:03<00:04, 13.8MB/s]
26%|█████████▌ | 19.2M/74.3M [00:03<00:03, 16.2MB/s]
30%|███████████ | 22.2M/74.3M [00:03<00:03, 17.2MB/s]
34%|████████████▋ | 25.5M/74.3M [00:04<00:02, 18.2MB/s]
40%|██████████████▋ | 29.5M/74.3M [00:04<00:02, 20.5MB/s]
45%|████████████████▌ | 33.3M/74.3M [00:04<00:01, 21.6MB/s]
49%|██████████████████▏ | 36.5M/74.3M [00:04<00:01, 21.0MB/s]
55%|████████████████████▏ | 40.5M/74.3M [00:04<00:01, 22.2MB/s]
60%|██████████████████████▏ | 44.5M/74.3M [00:04<00:01, 23.0MB/s]
65%|████████████████████████▏ | 48.6M/74.3M [00:05<00:01, 23.7MB/s]
71%|██████████████████████████▏ | 52.6M/74.3M [00:05<00:00, 24.2MB/s]
76%|███████████████████████████▉ | 56.1M/74.3M [00:05<00:00, 23.6MB/s]
81%|█████████████████████████████▉ | 60.2M/74.3M [00:05<00:00, 24.1MB/s]
87%|████████████████████████████████ | 64.2M/74.3M [00:05<00:00, 24.6MB/s]
92%|██████████████████████████████████ | 68.3M/74.3M [00:05<00:00, 24.8MB/s]
97%|████████████████████████████████████ | 72.3M/74.3M [00:05<00:00, 25.0MB/s]
0%| | 0.00/74.3M [00:00<?, ?B/s]
100%|██████████████████████████████████████| 74.3M/74.3M [00:00<00:00, 308GB/s]
score time samples ... dataset pipeline codecarbon_task_name
0 0.587500 0.332801 120.0 ... CustomDataset3 MDM
1 0.565000 0.337454 120.0 ... CustomDataset3 MDM
2 0.588719 2.008563 768.0 ... CustomDataset3 MDM
3 0.541950 4.220372 1356.0 ... CustomDataset3 MDM
[4 rows x 13 columns]
Total running time of the script: (2 minutes 2.887 seconds)
Estimated memory usage: 679 MB