BrainVoyager v23.0

# The Multiple Comparisons Problem

An important issue in fMRI data analysis is the specification of an appropriate threshold for statistical maps. If there would be only a single voxel's data, a conventional threshold of p < 0.05 (or p < 0.01) could be used, which would indicate the probability to obtain the observed effect, quantified by an R, t or F statistic, due to chance alone. Running the statistical analysis separately for each voxel creates, however, a *massive multiple comparisons (MCP) problem*. If a single test is performed, the conventional threshold protects with a probability of p < 0.05 from wrongly declaring a voxel as significantly modulated when there is no effect (alpha error). Note that an error probability of p = 0.05 means that if we would repeat the same test 100 times and assume that there is no effect (null hypothesis, differences only due to chance), we would wrongly reject (accept) the null (alternative) hypothesis on average in five cases (*false positive*, alpha error). If we assume that there is no real effect, running a statistical test spatially in parallel is statistically identical to repeating the test 100,000 times for a single voxel. It is evident that this would lead to about 5000 false positives, i.e. about 5000 voxels would be labeled (derclared) "significant" although these voxels would surpass the 0.05 threshold due to chance alone. To avoid this multiple comparisions problem proper methods have to be employed by adjusting probability thresholds accordingly without being too conservative. The multiple comparisions problem is equally relevant for first-level as well as second-level statistical assessment of formulated hypotheses.

## Bonferroni Correction

The Bonferroni correction method is a simple multiple comparison correction approach adjusting the single-voxel threshold in such a way that we retain an error probability of 0.05 at the global level. With N independent tests, this is achieved by using a statistical significance level which is N times smaller than usual. The Bonferroni correction can be derived mathematically as follows. Under the assumption of independent tests, the probability that all of N performed tests lead to a sub-threshold result is (1-p)^{N} and the probability to obtain one or more false positive results is 1-(1-p)^{N}. In order to guarantee a global or family-wise error (FWE) probability of p_{FWE} = 1-(1-p)^{N}, the threshold for a single test, p, has to be adjusted as follows: p = 1-(1-p_{FWE})^{1/N}. For small p_{FWE} values (e.g. 0.05), this equation can be approximated by p = p_{FWE} / N. This means that to obtain a global error probability of p_{FWE} < 0.05, the significance level for a single test is obtained by dividing the family-wise error probability by the number of independent tests. Given 100,000 voxels, we would obtain an adjusted single-voxel threshold of p_{v} = p_{FWE} / N = 0.05/100000 = 0.0000005. The Bonferroni correction method guarantees that we accept even a single voxel wrongly as significant with an error probability of only 0.05. The method, thus, controls the alpha error across *all* voxels, and it is therefore called a family-wise correction approach. For fMRI data, the Bonferroni method would be a valid approach to correct the alpha error if the data at neighboring voxels would be truly independent from each other. Neighboring voxels, however, show similar response patterns within functionally defined brain regions, such as the fusiform face area (FFA). In the presence of such spatial correlations, the Bonferroni correction method operates too conservative, i.e. it corrects more strictly than necessary. As a result of a too strict control of the alpha error, the sensitivity (power) to detect truly active voxels is reduced: Many voxels will be labeled as "not significant" although they would reflect true effects. Wrongly accepting (rejecting) a null (alternative) hypothesis is called *beta error*. The Bonferroni method is available in BrainVoyager that shows Bonferroni corrected p values automatically in the VMR/surface view when statistical maps are overlayed.

## Spatial Extent Methods

Worsley et al. (1992) suggested a less conservative approach to correct for multiple comparisons taking the observation explicitly into account that neighboring voxels are not activated independently from each other but are more likely to activate together in clusters. In order to incorporate spatial neighborhood relationships in the calculation of global error probabilities, the method describes a statistical map as a "Gaussian random field" (for details, see Worsley et al., 1992). Unfortunately, application of this correction method requires that the fMRI data are spatially smoothed substantially reducing one of its most attractive properties, namely its high spatial resolution. This is one of the reasons why this method is not used in BrainVoyager.

Another correction method incorporating the observation that neighboring voxels often activate in clusters is based on Monte Carlo simulations calculating the likelihood to obtain different cluster sizes (Forman et al., 1995). In combination with relaxed single-voxel thresholds, calculated cluster extent thresholds are applied to the statistical map ensuring that a global error probability of p < 0.05 is met. This approach does not require spatial smoothing and appears highly appropriate for fMRI data. The only disadvantage is that the method is quite computational intensive. This multiple comparisons correctiion approach is available in BrainVoyager through the Cluster-Level Statistical Threshold Estimator item in the Plugins menu after a normal installation of BrainVoyager. For details about how to use this cluster-size thresholding method, consult the plugin's documentation.

## False Discovery Rate

While the described multiple comparison correction methods aim to control the family-wise error rate, the false discovery rate (FDR) approach (Benjamini & Hochberg, 1995) uses a different statistical logic, and has been proposed for fMRI analysis by Genovese and colleagues (2002). In this approach, not the overall number of false positive voxels is controlled but the number of false positive voxels among the subset of voxels labeled as significant. Supra-threshold voxels given a specific threshold value are called "discovered" voxels or "voxels declared as active". With a false discovery rate of q < 0.05 one would accept that 5% of the discovered (supra-threshold) voxels would be false positives. Given a dvoxelsesired false discovery rate, the FDR algorithm calculates a single-voxel threshold, which ensures that the beyond that threshold contain not more than the specified proportion of false positives. With a q value of 0.05 this also means that one can "trust" 95% of the supra-threshold (i.e. color-coded) voxels since the null hypothesis has been rejected correctly. Since the FDR logic relates the number of false positives to the amount of truly active voxels, the FDR method adapts to the amount of activity in the data: The method is very strict if there is not much activity in the data, but assumes less conservative thresholds if larger regions of the brain show task-related effects. In the extreme case that not a single voxel is truly active, the calculated single voxel threshold is identical to the one computed with the Bonferroni method. The FDR method appears ideal for fMRI data because it does not require spatial smoothing and it detects voxels with a high sensitivity (low beta error) if there are true effects in the data. The FDR method for correction of multiple comparisons is the default method used in BrainVoyager.

## Nonparametric Permutation Inference

The common strategy to statistically assess hypotheses is to construct a plausible explanatory general linear model for the observed data. The parameters of the formulated GLM are estimated and finally tested with respect to the formulated hypotheses using suitable calculated statistics. In case of parametric inference the distribution of a caluclated statistic under the null hypothesis is known. Thus the probability of ﬁnding a statistical value due to chance alone, at least as extreme as the one observed can be ascertained directly. In order to be valid, such parametric tests rely, however, on a number of assumptions under which such distribution arises. When these assumptions are not guaranteed to be met, non-parametric methods come to the rescue. Permutation tests are a class of non-parametric methods that are increasingly being used as a reliable method for inference in neuroimaging. While requiring only minimal assumptions, permutation tests are computationally intensive; this might explai, at least in part, why they were not used much in the past. BrainVoyager 20.6 introduced the *Randomise Plugin* providing an efficient implementation of permutation inference for common multi-subject designs including single-group contrast testing and group comparisions. Other more complex designs can be specified using a graphical user interface. Furthermore, the Randomise Plugin allows to apply the thresold free cluster enhancement (TFCE) method to suppress background noise. The plugn is available via the Randomise Plugin item in the Plugins menu after a normal installation of BrainVoyager. For details about how to use the nonparametric permutation inference method, consult the plugin's documentation.

## Methods Exploiting Anatomical Information

Another simple approach to the multiple comparisons problem is to reduce the number of tests by using anatomical information. Most correction methods, including Bonferroni and FDR, can be combined with this approach since a smaller number of tests leads to a less strict control of the alpha error and thus a smaller beta error is made as compared to inclusion of all voxels. In a simple version of an anatomical constraint, an intensity threshold for the basic signal level can be used to remove voxels outside the brain. The number of voxels can be further reduced by explicitly masking the brain, e.g., after performing a brain segmentation step. These simple steps typically reduce the number of voxels from about 100,000 to about 50,000 voxels. In a more advanced version, statistical data analysis may be restricted to grey matter voxels, which may be identified by standard cortex segmentation procedures (e.g., Kriegeskorte & Goebel, 2001). This approach not only removes voxels outside the brain but also excludes voxels in white matter and ventricles. Note that anatomically informed correction methods do not require spatial smoothing of the data and not only reduce the multiple comparisons problem, but also reduce computation time since fewer tests (e.g. GLM calculations) have to be performed. Anatomically constrained data analysis, especially cortex-based data analysis, is supported in BrainVoyager by providing tools to calculate (cortex-based) masks that can be applied optionally when performing single-run or multi-run / multi-subject GLM analyses.

Copyright © 2023 Rainer Goebel. All rights reserved.