Note
Go to the end to download the full example code.
Statistical Analysis and Chance Level Assessment#
The MOABB codebase comes with convenience plotting utilities and some statistical testing. This tutorial focuses on what those exactly are and how they can be used.
In addition, we demonstrate how to compute and visualize statistically adjusted chance levels following Combrisson & Jerbi (2015). The theoretical chance level (100/c %) only holds for infinite sample sizes. With finite test samples, classifiers can exceed this threshold purely by chance — an effect that grows stronger as the sample size decreases. The adjusted chance level, derived from the inverse survival function of the binomial distribution, gives the minimum accuracy needed to claim statistically significant decoding at a given alpha level.
# Authors: Vinay Jayaram <vinayjayaram13@gmail.com>
#
# License: BSD (3-clause)
# sphinx_gallery_thumbnail_number = -2
import matplotlib.pyplot as plt
from mne.decoding import CSP
from pyriemann.estimation import Covariances
from pyriemann.tangentspace import TangentSpace
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
import moabb
import moabb.analysis.plotting as moabb_plt
from moabb.analysis.chance_level import (
adjusted_chance_level,
chance_by_chance,
)
from moabb.analysis.meta_analysis import ( # noqa: E501
compute_dataset_statistics,
find_significant_differences,
)
from moabb.datasets import BNCI2014_001
from moabb.evaluations import CrossSessionEvaluation
from moabb.paradigms import LeftRightImagery
moabb.set_log_level("info")
print(__doc__)
Results Generation#
First we need to set up a paradigm, dataset list, and some pipelines to test. This is explored more in the examples – we choose left vs right imagery paradigm with a single bandpass. There is only one dataset here but any number can be added without changing this workflow.
Create Pipelines#
Pipelines must be a dict of sklearn pipeline transformer.
The CSP implementation from MNE is used. We selected 8 CSP components, as usually done in the literature.
The Riemannian geometry pipeline consists in covariance estimation, tangent space mapping and finally a logistic regression for the classification.
pipelines = {}
pipelines["CSP+LDA"] = make_pipeline(CSP(n_components=8), LDA())
pipelines["RG+LR"] = make_pipeline(Covariances(), TangentSpace(), LogisticRegression())
pipelines["CSP+LR"] = make_pipeline(CSP(n_components=8), LogisticRegression())
pipelines["RG+LDA"] = make_pipeline(Covariances(), TangentSpace(), LDA())
Evaluation#
We define the paradigm (LeftRightImagery) and the dataset (BNCI2014_001). The evaluation will return a DataFrame containing a single AUC score for each subject / session of the dataset, and for each pipeline.
Results are saved into the database, so that if you add a new pipeline, it will not run again the evaluation unless a parameter has changed. Results can be overwritten if necessary.
paradigm = LeftRightImagery()
dataset = BNCI2014_001()
dataset.subject_list = dataset.subject_list[:4]
datasets = [dataset]
overwrite = True # set to False if we want to use cached results
evaluation = CrossSessionEvaluation(
paradigm=paradigm, datasets=datasets, suffix="stats", overwrite=overwrite
)
results = evaluation.process(pipelines)
2026-03-21 20:17:22,639 INFO MainThread moabb.evaluations.base CSP+LDA | BNCI2014-001 | 1 | 0train: Score 0.932
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base RG+LR | BNCI2014-001 | 1 | 0train: Score 0.950
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base CSP+LR | BNCI2014-001 | 1 | 0train: Score 0.954
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base RG+LDA | BNCI2014-001 | 1 | 0train: Score 0.861
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base CSP+LDA | BNCI2014-001 | 1 | 1test: Score 0.955
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base RG+LR | BNCI2014-001 | 1 | 1test: Score 0.963
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base CSP+LR | BNCI2014-001 | 1 | 1test: Score 0.969
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base RG+LDA | BNCI2014-001 | 1 | 1test: Score 0.878
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base CSP+LDA | BNCI2014-001 | 2 | 0train: Score 0.527
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base RG+LR | BNCI2014-001 | 2 | 0train: Score 0.574
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base CSP+LR | BNCI2014-001 | 2 | 0train: Score 0.537
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base RG+LDA | BNCI2014-001 | 2 | 0train: Score 0.513
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base CSP+LDA | BNCI2014-001 | 2 | 1test: Score 0.646
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base RG+LR | BNCI2014-001 | 2 | 1test: Score 0.586
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base CSP+LR | BNCI2014-001 | 2 | 1test: Score 0.629
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base RG+LDA | BNCI2014-001 | 2 | 1test: Score 0.470
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base CSP+LDA | BNCI2014-001 | 3 | 0train: Score 0.994
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base RG+LR | BNCI2014-001 | 3 | 0train: Score 0.999
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base CSP+LR | BNCI2014-001 | 3 | 0train: Score 0.994
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base RG+LDA | BNCI2014-001 | 3 | 0train: Score 0.966
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base CSP+LDA | BNCI2014-001 | 3 | 1test: Score 0.996
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base RG+LR | BNCI2014-001 | 3 | 1test: Score 0.999
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base CSP+LR | BNCI2014-001 | 3 | 1test: Score 0.997
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base RG+LDA | BNCI2014-001 | 3 | 1test: Score 0.973
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base CSP+LDA | BNCI2014-001 | 4 | 0train: Score 0.829
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base RG+LR | BNCI2014-001 | 4 | 0train: Score 0.857
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base CSP+LR | BNCI2014-001 | 4 | 0train: Score 0.835
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base RG+LDA | BNCI2014-001 | 4 | 0train: Score 0.690
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base CSP+LDA | BNCI2014-001 | 4 | 1test: Score 0.829
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base RG+LR | BNCI2014-001 | 4 | 1test: Score 0.877
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base CSP+LR | BNCI2014-001 | 4 | 1test: Score 0.837
2026-03-21 20:17:22,640 INFO MainThread moabb.evaluations.base RG+LDA | BNCI2014-001 | 4 | 1test: Score 0.721
Chance Level Computation#
The theoretical chance level for a c-class problem is 100/c (e.g. 50% for 2 classes). However, this threshold assumes an infinite number of test samples. In practice, with a finite number of trials, a classifier can exceed the theoretical chance level purely by chance — especially when the sample size is small.
Combrisson & Jerbi (2015) demonstrated that classifiers applied to pure Gaussian noise can yield accuracies well above the theoretical chance level when the number of test samples is limited. For example, with only 40 observations in a 2-class problem, a decoding accuracy of 70% can occur by chance alone, far above the 50% theoretical threshold.
To address this, they proposed computing a statistically adjusted chance level using the inverse survival function of the binomial cumulative distribution. This gives the minimum accuracy required to claim that decoding significantly exceeds chance at a given significance level alpha. Stricter alpha values (e.g. 0.001 vs 0.05) require higher accuracy to assert significance.
Note that the number of classes depends on the paradigm, not the raw dataset. BNCI2014_001 has 4 motor imagery classes, but LeftRightImagery selects only left_hand and right_hand (2 classes).
n_classes = len(paradigm.used_events(dataset))
print(f"Number of classes (from paradigm): {n_classes}")
# theoretical chance level: 1 / n_classes
print(f"Theoretical chance level: {1.0 / n_classes:.2f}")
# Adjusted chance level for 144 test trials at alpha=0.05
# (BNCI2014_001 has 144 trials per class per session)
n_test_trials = 144 * n_classes
print(
f"Adjusted chance level (n={n_test_trials}, alpha=0.05): "
f"{adjusted_chance_level(n_classes, n_test_trials, 0.05):.4f}"
)
Number of classes (from paradigm): 2
Theoretical chance level: 0.50
Adjusted chance level (n=288, alpha=0.05): 0.5486
The convenience function chance_by_chance reads
n_samples_test and n_classes directly from the results DataFrame,
so no dataset objects are needed.
chance_levels = chance_by_chance(results, alpha=[0.05, 0.01, 0.001])
print("\nChance levels:")
for name, levels in chance_levels.items():
print(f" {name}:")
print(f" Theoretical: {levels['theoretical']:.2f}")
for alpha, threshold in sorted(levels["adjusted"].items()):
print(f" Adjusted (alpha={alpha}): {threshold:.4f}")
Chance levels:
BNCI2014-001:
Theoretical: 0.50
Adjusted (alpha=0.001): 0.6250
Adjusted (alpha=0.01): 0.5972
Adjusted (alpha=0.05): 0.5694
MOABB Plotting with Chance Levels#
Here we plot the results using the convenience methods within the toolkit.
The score_plot visualizes all the data with one score per subject for
every dataset and pipeline.
By passing the chance_level parameter, the plot draws the correct
theoretical chance level line and, when adjusted levels are available, also
draws significance threshold lines at each alpha level.
fig, _ = moabb_plt.score_plot(results, chance_level=chance_levels)
plt.show()

2026-03-21 20:17:23,688 WARNING MainThread moabb.analysis.plotting Dataset names are too similar, turning off name shortening
Distribution Plot with KDE#
The distribution_plot combines a violin plot (showing the KDE density
of scores) with a strip plot (showing individual data points). This gives
a richer view of score distributions compared to the strip plot alone.
fig, _ = moabb_plt.distribution_plot(results, chance_level=chance_levels)
plt.show()

2026-03-21 20:17:23,819 WARNING MainThread moabb.analysis.plotting Dataset names are too similar, turning off name shortening
Paired Plot with Chance Level#
For a comparison of two algorithms, the paired_plot shows performance
of one versus the other. When chance_level is provided, the axis limits
are adjusted accordingly and dashed crosshair lines mark the theoretical
chance level. When adjusted significance thresholds are included, a shaded
band highlights the region that is not significantly above chance.
fig = moabb_plt.paired_plot(results, "CSP+LDA", "RG+LDA", chance_level=chance_levels)
plt.show()

Statistical Testing and Further Plots#
If the statistical significance of results is of interest, the method
compute_dataset_statistics allows one to show a meta-analysis style plot
as well. For an overview of how all algorithms perform in comparison with
each other, the method find_significant_differences and the
summary_plot are possible.
The meta-analysis style plot shows the standardized mean difference within each tested dataset for the two algorithms in question, in addition to a meta-effect and significance both per-dataset and overall.
fig = moabb_plt.meta_analysis_plot(stats, "CSP+LDA", "RG+LDA")
plt.show()

The summary plot shows the effect and significance related to the hypothesis that the algorithm on the y-axis significantly outperformed the algorithm on the x-axis over all datasets.

Total running time of the script: (1 minutes 53.512 seconds)