BrainVoyager QX v2.8

The Multiple Comparisons Problem

An important issue in fMRI data analysis is the specification of an appropriate threshold for statistical maps. If there would be only a single voxel's data, a conventional threshold of p < 0.05 (or p < 0.01) could be used, which would indicate the probability to obtain the observed effect, quantified by an R, t or F statistic, solely due to noise fluctuations. Running the statistical analysis separately for each voxel creates, however, a massive multiple comparisons (MCP) problem. If a single test is performed, the conventional threshold protects us with a probability of p < 0.05 from wrongly declaring a voxel as significantly modulated when there is no effect (alpha error). Note that an error probability of p = 0.05 means that if we would repeat the same test 100 times and assume that there is no effect (null hypothesis), we would wrongly reject (accept) the null (alternative) hypothesis on average in five cases (false positive, alpha error). If we assume that there is no real effect in any voxel time course, running a statistical test spatially in parallel is statistically identical to repeating the test 100,000 times for a single voxel. It is evident that this would lead to about 5000 false positives, i.e. about 5000 voxels would be labeled "significant" although they would surpass the 0.05 threshold purely due to chance.

Bonferroni Correction

The Bonferroni correction method is a simple multiple comparison correction approach adjusting the single-voxel threshold in such a way that we retain an error probability of 0.05 at the global level. With N independent tests, this is achieved by using a statistical significance level which is N times smaller than usual. The Bonferroni correction can be derived mathematically as follows. Under the assumption of independent tests, the probability that all of N performed tests lead to a sub-threshold result is (1-p)N and the probability to obtain one or more false positive results is 1-(1-p)N. In order to guarantee a global or family-wise error (FWE) probability of pFWE = 1-(1-p)N, the threshold for a single test, p, has to be adjusted as follows: p = 1-(1-pFWE)1/N. For small pFWE values (e.g. 0.05), this equation can be approximated by p = pFWE / N. This means that to obtain a global error probability of pFWE < 0.05, the significance level for a single test is obtained by dividing the family-wise error probability by the number of independent tests. Given 100,000 voxels, we would obtain an adjusted single-voxel threshold of pv = pFWE / N = 0.05/100000 = 0.0000005. The Bonferroni correction method guarantees that we accept even a single voxel wrongly as significant with an error probability of only 0.05. The method, thus, controls the alpha error across all voxels, and it is therefore called a family-wise correction approach. For fMRI data, the Bonferroni method would be a valid approach to correct the alpha error if the data at neighboring voxels would be truly independent from each other. Neighboring voxels, however, show similar response patterns within functionally defined brain regions, such as the fusiform face area (FFA). In the presence of such spatial correlations, the Bonferroni correction method operates too conservative, i.e. it corrects more strictly than necessary. As a result of a too strict control of the alpha error, the sensitivity (power) to detect truly active voxels is reduced: Many voxels will be labeled as "not significant" although they would reflect true effects. Wrongly accepting (rejecting) a null (alternative) hypothesis is called beta error. The Bonferroni method is available in BrainVoyager QX that shows Bonferroni corrected p values automatically in the VMR/surface view when statistical maps are overlayed.

Spatial Extent Methods

Worsley et al. (1992) suggested a less conservative approach to correct for multiple comparisons taking the observation explicitly into account that neighboring voxels are not activated independently from each other but are more likely to activate together in clusters. In order to incorporate spatial neighborhood relationships in the calculation of global error probabilities, the method describes a statistical map as a "Gaussian random field" (for details, see Worsley et al., 1992). Unfortunately, application of this correction method requires that the fMRI data are spatially smoothed substantially reducing one of its most attractive properties, namely its high spatial resolution.

Another correction method incorporating the observation that neighboring voxels often activate in clusters is based on Monte Carlo simulations calculating the likelihood to obtain different cluster sizes (Forman et al., 1995). In combination with relaxed single-voxel thresholds, calculated cluster extent thresholds are applied to the statistical map ensuring that a global error probability of p < 0.05 is met. This approach does not require spatial smoothing and appears highly appropriate for fMRI data. The only disadvantage is that the method is quite computer intensive. This multiple comparisons correctiion approach is available in BrainVoyager QX through the "Cluster Threshold Estimator" plugin, for details, consult the plugin's documentation.

False Discovery Rate

While the described multiple comparison correction methods aim to control the family-wise error rate, the false discovery rate (FDR) approach (Benjamini & Hochberg, 1995) uses a different statistical logic, and has been proposed for fMRI analysis by Genovese and colleagues (2002). In this approach, not the overall number of false positive voxels is controlled but the number of false positive voxels among the subset of voxels labeled as significant. Supra-threshold voxels given a specific threshold value are called "discovered" voxels or "voxels declared as active". With a false discovery rate of q < 0.05 one would accept that 5% of the discovered (supra-threshold) voxels would be false positives. Given a dvoxelsesired false discovery rate, the FDR algorithm calculates a single-voxel threshold, which ensures that the beyond that threshold contain not more than the specified proportion of false positives. With a q value of 0.05 this also means that one can "trust" 95% of the supra-threshold (i.e. color-coded) voxels since the null hypothesis has been rejected correctly. Since the FDR logic relates the number of false positives to the amount of truly active voxels, the FDR method adapts to the amount of activity in the data: The method is very strict if there is not much activity in the data, but assumes less conservative thresholds if larger regions of the brain show task-related effects. In the extreme case that not a single voxel is truly active, the calculated single–voxel threshold is identical to the one computed with the Bonferroni method. The FDR method appears ideal for fMRI data because it does not require spatial smoothing and it detects voxels with a high sensitivity (low beta error) if there are true effects in the data. The FDR method for correction of multiple comparisons is the default method used in BrainVoyager QX.

Methods Exploiting Anatomical Information

Another simple approach to the multiple comparisons problem is to reduce the number of tests by using anatomical information. Most correction methods, including Bonferroni and FDR, can be combined with this approach since a smaller number of tests leads to a less strict control of the alpha error and thus a smaller beta error is made as compared to inclusion of all voxels. In a simple version of an anatomical constraint, an intensity threshold for the basic signal level can be used to remove voxels outside the head. The number of voxels can be further reduced by masking the brain, e.g., after performing a brain segmentation step. These simple steps typically reduce the number of voxels from about 100,000 to about 50,000 voxels. In a more advanced version, statistical data analysis may be restricted to grey matter voxels, which may be identified by standard cortex segmentation procedures (e.g., Kriegeskorte & Goebel, 2001). This approach not only removes voxels outside the brain but also excludes voxels in white matter and ventricles. Note that anatomically informed correction methods do not require spatial smoothing of the data and not only reduce the multiple comparisons problem, but also reduce computation time since fewer tests (e.g. GLM calculations) have to be performed. Anatomically constrained data analysis, especially cortex-based data analysis, is supported in BrainVoyager QX by providing means to calculate and apply (cortex-based) masks.


Copyright © 2014 Rainer Goebel. All rights reserved.