Tutorial 4: Creating a dataset class

# Authors: Pedro L. C. Rodrigues, Sylvain Chevallier
#
# https://github.com/plcrodrigues/Workshop-MOABB-BCI-Graz-2019

import mne
import numpy as np
from pyriemann.classification import MDM
from pyriemann.estimation import Covariances
from scipy.io import loadmat, savemat
from sklearn.pipeline import make_pipeline

from moabb.datasets import download as dl
from moabb.datasets.base import BaseDataset
from moabb.evaluations import WithinSessionEvaluation
from moabb.paradigms import LeftRightImagery

Creating some Data

To illustrate the creation of a dataset class in MOABB, we first create an example dataset saved in .mat file. It contains a single fake recording on 8 channels lasting for 150 seconds (sampling frequency 256 Hz). We have included the script that creates this dataset and have uploaded it online. The fake dataset is available on the Zenodo website

def create_example_dataset():
    """Create a fake example for a dataset"""
    sfreq = 256
    t_recording = 150
    t_trial = 1  # duration of a trial
    intertrial = 2  # time between end of a trial and the next one
    n_chan = 8

    x = np.zeros((n_chan + 1, t_recording * sfreq))  # electrodes + stimulus
    stim = np.zeros(t_recording * sfreq)
    t_offset = 1.0  # offset where the trials start
    n_trials = 40

    rep = np.linspace(0, 4 * t_trial, t_trial * sfreq)
    signal = np.sin(2 * np.pi / t_trial * rep)
    for n in range(n_trials):
        label = n % 2 + 1  # alternate between class 0 and class 1
        tn = int(t_offset * sfreq + n * (t_trial + intertrial) * sfreq)
        stim[tn] = label
        noise = 0.1 * np.random.randn(n_chan, len(signal))
        x[:-1, tn : (tn + t_trial * sfreq)] = label * signal + noise
    x[-1, :] = stim
    return x, sfreq


# Create the fake data
for subject in [1, 2, 3]:
    x, fs = create_example_dataset()
    filename = "subject_" + str(subject).zfill(2) + ".mat"
    mdict = {}
    mdict["x"] = x
    mdict["fs"] = fs
    savemat(filename, mdict)

Creating a Dataset Class

We will create now a dataset class using the fake data simulated with the code from above. For this, we first need to import the right classes from MOABB:

  • dl is a very useful script that downloads automatically a dataset online if it is not yet available in the user’s computer. The script knows where to download the files because we create a global variable telling the URL where to fetch the data.

  • BaseDataset is the basic class that we overload to create our dataset.

The global variable with the dataset’s URL should specify an online repository where all the files are stored.

ExampleDataset_URL = "https://sandbox.zenodo.org/record/369543/files/"

The ExampleDataset needs to implement only 3 functions:

  • __init__ for indicating the parameter of the dataset

  • _get_single_subject_data to define how to process the data once they have been downloaded

  • data_path to define how the data are downloaded.

class ExampleDataset(BaseDataset):
    """
    Dataset used to exemplify the creation of a dataset class in MOABB.
    The data samples have been simulated and has no physiological meaning
    whatsoever.
    """

    def __init__(self):
        super().__init__(
            subjects=[1, 2, 3],
            sessions_per_subject=1,
            events={"left_hand": 1, "right_hand": 2},
            code="Example dataset",
            interval=[0, 0.75],
            paradigm="imagery",
            doi="",
        )

    def _get_single_subject_data(self, subject):
        """return data for a single subject"""
        file_path_list = self.data_path(subject)

        data = loadmat(file_path_list[0])
        x = data["x"]
        fs = data["fs"]
        ch_names = ["ch" + str(i) for i in range(8)] + ["stim"]
        ch_types = ["eeg" for i in range(8)] + ["stim"]
        info = mne.create_info(ch_names, fs, ch_types)
        raw = mne.io.RawArray(x, info)

        sessions = {}
        sessions["session_1"] = {}
        sessions["session_1"]["run_1"] = raw
        return sessions

    def data_path(
        self, subject, path=None, force_update=False, update_path=None, verbose=None
    ):
        """Download the data from one subject"""
        if subject not in self.subject_list:
            raise (ValueError("Invalid subject number"))

        url = "{:s}subject_0{:d}.mat".format(ExampleDataset_URL, subject)
        path = dl.data_dl(url, "ExampleDataset")
        return [path]  # it has to return a list

Using the ExampleDataset

Now that the ExampleDataset is defined, it could be instanciated directly. The rest of the code follows the steps described in the previous tutorials.

dataset = ExampleDataset()

paradigm = LeftRightImagery()
X, labels, meta = paradigm.get_data(dataset=dataset, subjects=[1])

evaluation = WithinSessionEvaluation(
    paradigm=paradigm, datasets=dataset, overwrite=False, suffix="newdataset"
)
pipelines = {}
pipelines["MDM"] = make_pipeline(Covariances("oas"), MDM(metric="riemann"))
scores = evaluation.process(pipelines)

print(scores)

Out:

  0%|                                              | 0.00/2.77M [00:00<?, ?B/s]
  1%|▍                                     | 32.8k/2.77M [00:00<00:10, 261kB/s]
  3%|█▏                                    | 81.9k/2.77M [00:00<00:07, 336kB/s]
  4%|█▋                                     | 116k/2.77M [00:00<00:08, 304kB/s]
  8%|███▏                                   | 229k/2.77M [00:00<00:04, 539kB/s]
 17%|██████▍                                | 459k/2.77M [00:00<00:02, 997kB/s]
 34%|████████████▊                         | 934k/2.77M [00:00<00:00, 1.93MB/s]
 68%|████████████████████████▉            | 1.87M/2.77M [00:00<00:00, 3.71MB/s]
  0%|                                              | 0.00/2.77M [00:00<?, ?B/s]
100%|█████████████████████████████████████| 2.77M/2.77M [00:00<00:00, 7.02GB/s]

Example dataset-WithinSession:   0%|          | 0/3 [00:00<?, ?it/s]
Example dataset-WithinSession:  33%|###3      | 1/3 [00:00<00:00,  3.68it/s]

  0%|                                              | 0.00/2.77M [00:00<?, ?B/s]

  1%|▍                                     | 32.8k/2.77M [00:00<00:10, 261kB/s]

  3%|█▏                                    | 81.9k/2.77M [00:00<00:08, 334kB/s]

  4%|█▋                                     | 116k/2.77M [00:00<00:08, 303kB/s]

  6%|██▎                                    | 164k/2.77M [00:00<00:07, 333kB/s]

 11%|████▍                                  | 311k/2.77M [00:00<00:03, 632kB/s]

 21%|████████                              | 590k/2.77M [00:00<00:01, 1.16MB/s]

 42%|███████████████▌                     | 1.16M/2.77M [00:00<00:00, 2.26MB/s]

 82%|██████████████████████████████▍      | 2.28M/2.77M [00:01<00:00, 4.33MB/s]

  0%|                                              | 0.00/2.77M [00:00<?, ?B/s]
100%|█████████████████████████████████████| 2.77M/2.77M [00:00<00:00, 3.11GB/s]

Example dataset-WithinSession:  67%|######6   | 2/3 [00:02<00:01,  1.37s/it]

  0%|                                              | 0.00/2.77M [00:00<?, ?B/s]

  1%|▍                                     | 32.8k/2.77M [00:00<00:10, 270kB/s]

  3%|█▏                                    | 81.9k/2.77M [00:00<00:07, 347kB/s]

  4%|█▋                                     | 117k/2.77M [00:00<00:08, 318kB/s]

  6%|██▎                                    | 164k/2.77M [00:00<00:07, 344kB/s]

 11%|████▍                                  | 311k/2.77M [00:00<00:03, 653kB/s]

 21%|████████                              | 590k/2.77M [00:00<00:01, 1.20MB/s]

 42%|███████████████▌                     | 1.16M/2.77M [00:00<00:00, 2.33MB/s]

 83%|██████████████████████████████▋      | 2.29M/2.77M [00:00<00:00, 4.52MB/s]

  0%|                                              | 0.00/2.77M [00:00<?, ?B/s]
100%|█████████████████████████████████████| 2.77M/2.77M [00:00<00:00, 3.33GB/s]

Example dataset-WithinSession: 100%|##########| 3/3 [00:04<00:00,  1.75s/it]
Example dataset-WithinSession: 100%|##########| 3/3 [00:04<00:00,  1.53s/it]
   score      time  samples  ... n_sessions          dataset  pipeline
0    1.0  0.038655     40.0  ...          1  Example dataset       MDM
1    1.0  0.039003     40.0  ...          1  Example dataset       MDM
2    1.0  0.036576     40.0  ...          1  Example dataset       MDM

[3 rows x 9 columns]

Pushing on MOABB Github

If you want to make your dataset available to everyone, you could upload your data on public server (like Zenodo or Figshare) and signal that you want to add your dataset to MOABB in the dedicated issue. # noqa: E501 You could then follow the instructions on how to contribute # noqa: E501

Total running time of the script: ( 0 minutes 7.175 seconds)

Gallery generated by Sphinx-Gallery