BrainVoyager QX v2.8

Basic Concepts

It might be helpful to introduce a few distinctions and application scenarios influencing the choice of multivariate analysis tools. First, "supervised" and "unsupervised" multivariate analysis (learning) methods are distinguished. Unsupervised methods (not further considered here) attempt to find structure in the data without using knowledge of the conducted tasks (e.g. ICA). Supervised methods on the other hand explicitly use knowledge of the experimental protocol.

Learning, Testing and Generalization Performance

Available data is divided in two sets, a "training" and a "test" set. The training set is used during the "learning phase" to estimate a function which maps brain activation patterns (input) to corresponding class labels (target). As an example, consider a study presenting different images of object categories, such as "houses" and "faces". All brain activation patterns in the training set evoked when subjects looked at houses would be labeled as "house class", those patterns evoked when subjects looked at faces would be labeled as "face class" and so on. After the training phase, the learning machine should be able to correctly classify not only the learned activity patterns but also novel activation patterns (test data) as belonging to the "house" or "face" class. Note that the output on test input data with known target labels is crucial for assessing the generalization performance of a classifier. A classifier that learns correctly all input-target pairs, but performs at chance level for new data would be useless. Poor generalization performance indicates overfitting, i.e. the classifier might have learned the training exemplars "too well". Some neural network classifiers suffer from this problem. The support vector machine (SVM) on the other hand is a "smart" classifier because it avoids overfitting the data ensuring optimal generalization performance.

Voxels = Features

An exemplar (input pattern) used during learning is represented as a feature vector x with N elements. The value of an element usually specifies the presence of a certain feature. What are the "features" in fMRI activity patterns? In most applications, a feature refers to a response measure of a specific voxel. The dimension N of a fMRI feature vector, thus, corresponds to the number of included voxels in the analysis. If all (brain) voxels are included, this will lead to hundred thousand or more "features", which will make it difficult to learn the right classification function from a few activation patterns. Such situations with many features but relatively few training exemplars are characterized as suffering from the "curse of dimensionality".

ROI Analysis or Whole-Brain Mapping?

To solve this problem, classifiers are often applied to response patterns originating from a reduced number of voxels, e.g. from voxels of anatomically or functionally defined regions-of-interest (ROIs). This results in feature vectors with dimensions in the order of hundreds or thousands of voxels. With 10-100 training exemplars, such problems are suited for standard classifiers, such as support vector machines (SVMs).

A problem with the ROI approach is that one often does not know the brain locations (ROIs) a priori where different conditions may be separated using multivariate analysis tools. In this situations, one would like to discover the discriminative brain regions. In other words, one is interested in multivariate brain mapping. But how can this be done without getting lost in the curse of dimensionality? One proposed solution is the "searchlight" approach. As in univariate analysis, each voxel is visited, but instead of using only the time course of the visited voxel for analysis, several voxels in the neighborhood are included forming a set of features for joined multivariate analysis. The neighborhood is usually defined roughly as a sphere, i.e. voxels within a certain (Euclidean) distance from the visited voxel are included. The result of the multivariate analysis is then stored at the visited voxel (e.g. a t value resulting from a multivariate statistical comparison). By visiting all voxels and analyzing their respective (partially overlapping) neighborhoods, one obtains a whole-brain map in the same way as when running univariate statistics. Since the performed multivariate analysis operates with voxels in a neighborhood, this approach is also called local pattern effects mapping.

While the searchlight mapping approach is very attractive, it makes the strong assumption that discriminative information is located within small brain regions. While this assumption is surely appropriate in light of the known functional organization of the brain, there are also situations where discriminative voxels are distributed more widely, e.g. in both hemispheres or extended across several areas along a processing stream. What one would like in such situations is to include initially voxels from a large region (up to the whole brain) and to gradually exclude non-discriminative features until a core set of voxels remains with the highest discriminative power. Such techniques have been indeed developed in machine learning and introduced to neuroimaging. In BrainVoyager, a variant called recursive feature elimination (RFE) has been implemented.