Tutorial 5: Combining Multiple Datasets into a Single Dataset#

# Author: Gregoire Cattan
#
# https://github.com/plcrodrigues/Workshop-MOABB-BCI-Graz-2019

from pyriemann.classification import MDM
from pyriemann.estimation import ERPCovariances
from sklearn.pipeline import make_pipeline

from moabb.datasets import Cattan2019_VR
from moabb.datasets.braininvaders import BI2014a
from moabb.datasets.compound_dataset import CompoundDataset
from moabb.datasets.utils import blocks_reps
from moabb.evaluations import WithinSessionEvaluation
from moabb.paradigms.p300 import P300

Initialization#

This tutorial illustrates how to use the CompoundDataset to: 1) Select a few subjects/sessions/runs in an existing dataset 2) Merge two CompoundDataset into a new one 3) … and finally use this new dataset on a pipeline (this steps is not specific to CompoundDataset)

Let’s define a paradigm and a pipeline for evaluation first.

paradigm = P300()
pipelines = {}
pipelines["MDM"] = make_pipeline(ERPCovariances(estimator="lwf"), MDM(metric="riemann"))

Creation a selection of subject#

We are going to great two CompoundDataset, namely CustomDataset1 & 2. A CompoundDataset accepts a subjects_list of subjects. It is a list of tuple. A tuple contains 4 values:

  • the original dataset

  • the subject number to select

  • the sessions. It can be:

    • a session name (‘0’)

    • a list of sessions ([‘0’, ‘1’])

    • None to select all the sessions attributed to a subject

  • the runs. As for sessions, it can be a single run name, a list or None` (to select all runs).

class CustomDataset1(CompoundDataset):
    def __init__(self):
        biVR = Cattan2019_VR(virtual_reality=True, screen_display=True)
        runs = blocks_reps([0, 2], [0, 1, 2, 3, 4], biVR.n_repetitions)
        subjects_list = [(biVR, 1, "0VR", runs), (biVR, 2, "0VR", runs)]
        CompoundDataset.__init__(
            self, subjects_list=subjects_list, code="CustomDataset1", interval=[0, 1.0]
        )


class CustomDataset2(CompoundDataset):
    def __init__(self):
        bi2014 = BI2014a()
        subjects_list = [(bi2014, 4, None, None), (bi2014, 7, None, None)]
        CompoundDataset.__init__(
            self, subjects_list=subjects_list, code="CustomDataset2", interval=[0, 1.0]
        )

Merging the datasets#

We are now going to merge the two CompoundDataset into a single one. The implementation is straight forward. Instead of providing a list of subjects, you should provide a list of CompoundDataset. subjects_list = [CustomDataset1(), CustomDataset2()]

class CustomDataset3(CompoundDataset):
    def __init__(self):
        subjects_list = [CustomDataset1(), CustomDataset2()]
        CompoundDataset.__init__(
            self, subjects_list=subjects_list, code="CustomDataset3", interval=[0, 1.0]
        )

Evaluate and display#

Let’s use a WithinSessionEvaluation to evaluate our new dataset. If you already new how to do this, nothing changed: The CompoundDataset can be used as a normal dataset.

  0%|                                              | 0.00/46.4M [00:00<?, ?B/s]
  0%|                                     | 12.3k/46.4M [00:00<10:37, 72.8kB/s]
  0%|                                      | 39.9k/46.4M [00:00<05:55, 131kB/s]
  0%|                                      | 94.2k/46.4M [00:00<03:25, 226kB/s]
  0%|▏                                      | 163k/46.4M [00:00<02:30, 307kB/s]
  1%|▏                                      | 247k/46.4M [00:00<01:59, 387kB/s]
  1%|▎                                      | 338k/46.4M [00:00<01:42, 451kB/s]
  1%|▍                                      | 461k/46.4M [00:01<01:22, 557kB/s]
  1%|▌                                      | 608k/46.4M [00:01<01:07, 676kB/s]
  2%|▋                                      | 788k/46.4M [00:01<00:55, 821kB/s]
  2%|▊                                      | 984k/46.4M [00:01<00:48, 941kB/s]
  3%|█                                    | 1.28M/46.4M [00:01<00:36, 1.22MB/s]
  3%|█▏                                   | 1.52M/46.4M [00:01<00:34, 1.32MB/s]
  4%|█▌                                   | 1.93M/46.4M [00:02<00:26, 1.70MB/s]
  5%|█▉                                   | 2.39M/46.4M [00:02<00:21, 2.06MB/s]
  6%|██▎                                  | 2.98M/46.4M [00:02<00:16, 2.56MB/s]
  8%|███                                  | 3.85M/46.4M [00:02<00:12, 3.44MB/s]
 11%|███▉                                 | 4.88M/46.4M [00:02<00:09, 4.36MB/s]
 14%|█████                                | 6.30M/46.4M [00:02<00:06, 5.75MB/s]
 18%|██████▌                              | 8.21M/46.4M [00:03<00:04, 7.65MB/s]
 23%|████████▍                            | 10.6M/46.4M [00:03<00:03, 9.88MB/s]
 30%|███████████                          | 13.8M/46.4M [00:03<00:02, 13.0MB/s]
 36%|█████████████▍                       | 16.8M/46.4M [00:03<00:02, 14.8MB/s]
 45%|████████████████▋                    | 20.9M/46.4M [00:03<00:01, 17.9MB/s]
 54%|███████████████████▉                 | 24.9M/46.4M [00:03<00:01, 20.0MB/s]
 62%|███████████████████████              | 29.0M/46.4M [00:03<00:00, 21.2MB/s]
 71%|██████████████████████████▎          | 33.0M/46.4M [00:04<00:00, 22.3MB/s]
 80%|█████████████████████████████▌       | 37.1M/46.4M [00:04<00:00, 23.2MB/s]
 89%|████████████████████████████████▊    | 41.2M/46.4M [00:04<00:00, 23.8MB/s]
 97%|████████████████████████████████████ | 45.2M/46.4M [00:04<00:00, 24.3MB/s]
  0%|                                              | 0.00/46.4M [00:00<?, ?B/s]
100%|██████████████████████████████████████| 46.4M/46.4M [00:00<00:00, 222GB/s]

  0%|                                              | 0.00/74.3M [00:00<?, ?B/s]
  0%|                                     | 12.3k/74.3M [00:00<19:18, 64.1kB/s]
  0%|                                      | 35.8k/74.3M [00:00<11:09, 111kB/s]
  0%|                                      | 91.1k/74.3M [00:00<05:41, 217kB/s]
  0%|                                       | 162k/74.3M [00:00<04:00, 308kB/s]
  0%|▏                                      | 244k/74.3M [00:00<03:12, 384kB/s]
  0%|▏                                      | 342k/74.3M [00:00<02:38, 468kB/s]
  1%|▏                                      | 440k/74.3M [00:01<02:21, 521kB/s]
  1%|▎                                      | 554k/74.3M [00:01<02:05, 587kB/s]
  1%|▎                                      | 685k/74.3M [00:01<01:50, 666kB/s]
  1%|▍                                      | 849k/74.3M [00:01<01:33, 785kB/s]
  1%|▌                                      | 996k/74.3M [00:01<01:27, 835kB/s]
  2%|▌                                    | 1.22M/74.3M [00:01<01:11, 1.03MB/s]
  2%|▋                                    | 1.45M/74.3M [00:02<01:02, 1.16MB/s]
  2%|▉                                    | 1.83M/74.3M [00:02<00:47, 1.54MB/s]
  3%|█▏                                   | 2.27M/74.3M [00:02<00:37, 1.93MB/s]
  4%|█▍                                   | 2.93M/74.3M [00:02<00:27, 2.61MB/s]
  5%|█▊                                   | 3.73M/74.3M [00:02<00:20, 3.37MB/s]
  6%|██▎                                  | 4.76M/74.3M [00:02<00:15, 4.35MB/s]
  8%|███                                  | 6.20M/74.3M [00:02<00:11, 5.82MB/s]
 11%|███▉                                 | 7.91M/74.3M [00:03<00:08, 7.38MB/s]
 14%|█████                                | 10.2M/74.3M [00:03<00:06, 9.61MB/s]
 18%|██████▌                              | 13.1M/74.3M [00:03<00:05, 12.2MB/s]
 21%|███████▊                             | 15.8M/74.3M [00:03<00:04, 13.8MB/s]
 26%|█████████▌                           | 19.2M/74.3M [00:03<00:03, 16.2MB/s]
 30%|███████████                          | 22.2M/74.3M [00:03<00:03, 17.2MB/s]
 34%|████████████▋                        | 25.5M/74.3M [00:04<00:02, 18.2MB/s]
 40%|██████████████▋                      | 29.5M/74.3M [00:04<00:02, 20.5MB/s]
 45%|████████████████▌                    | 33.3M/74.3M [00:04<00:01, 21.6MB/s]
 49%|██████████████████▏                  | 36.5M/74.3M [00:04<00:01, 21.0MB/s]
 55%|████████████████████▏                | 40.5M/74.3M [00:04<00:01, 22.2MB/s]
 60%|██████████████████████▏              | 44.5M/74.3M [00:04<00:01, 23.0MB/s]
 65%|████████████████████████▏            | 48.6M/74.3M [00:05<00:01, 23.7MB/s]
 71%|██████████████████████████▏          | 52.6M/74.3M [00:05<00:00, 24.2MB/s]
 76%|███████████████████████████▉         | 56.1M/74.3M [00:05<00:00, 23.6MB/s]
 81%|█████████████████████████████▉       | 60.2M/74.3M [00:05<00:00, 24.1MB/s]
 87%|████████████████████████████████     | 64.2M/74.3M [00:05<00:00, 24.6MB/s]
 92%|██████████████████████████████████   | 68.3M/74.3M [00:05<00:00, 24.8MB/s]
 97%|████████████████████████████████████ | 72.3M/74.3M [00:05<00:00, 25.0MB/s]
  0%|                                              | 0.00/74.3M [00:00<?, ?B/s]
100%|██████████████████████████████████████| 74.3M/74.3M [00:00<00:00, 308GB/s]
      score      time  samples  ...         dataset  pipeline  codecarbon_task_name
0  0.587500  0.332801    120.0  ...  CustomDataset3       MDM
1  0.565000  0.337454    120.0  ...  CustomDataset3       MDM
2  0.588719  2.008563    768.0  ...  CustomDataset3       MDM
3  0.541950  4.220372   1356.0  ...  CustomDataset3       MDM

[4 rows x 13 columns]

Total running time of the script: (2 minutes 2.887 seconds)

Estimated memory usage: 679 MB

Gallery generated by Sphinx-Gallery