Once the thickness maps are available for each subject, statistical analysis can be performed across the entire cortex or for specific regions. Currently no cortical segmentation tool is available, so the only regional analysis that is actively supported is mean thickness.
The statistical framework is built upon R, though a few pieces of code are only available in matlab.
This page serves two functions: to provide links to the man pages of the analysis functions as well as an overview of what is currently possible combined with more overview style documentation.
The purpose of cortical statistics is to allow for hypothesis testing at every vertex of the surface. For each analysis both input data and a series of independent variables are required. The simplest case is testing whether cortical thickness is related to group on either a regional or global level, though more advanced multivariate models are certainly possible.
The input data contains one line per subject. In each line the first column is the filename of the cortical thickness output, and subsequent columns are for independent variables. The very first line of the input file is optionally a header for all of the subsequent columns. A sample file would thus look like the follwing:
Filename Intercept Group Age /somedir/someplace/subject001_thickness.txt 1 Normal 23 /somedir/someplace/subject002_thickness.txt 1 Patient 25 /somedir/someplace/subject003_thickness.txt 1 Patient 22 /somedir/someplace/subject004_thickness.txt 1 Normal 20
Some notable elements:
The usual statistical run proceeds in a series of ordered steps:
The first step is to start the R statistical environment by typing "R" in a shell. This should after a short delay start R.
The second step is to load the library for dealing with cortical statistics. This is accomplished with the following invocation:
> library(mni.cortical.statistics)
The input matrix is prepared in a simple text file as described above. This is then loaded with the following command mni.read.glim.file
> gf <- mni.read.glim.file(filename)
There are two optional boolean arguments to mni.read.glim.file, namely header and fill. Both are by default set to FALSE; set header=TRUE if the first line of the input file contains a dataheader, and set fill=TRUE if the input file is supposed to contain missing values.
The next step in the analysis is to define the hypothesis and to express it in terms of linear model. Any variable from the glim file can take the place of the usual beta parameters (if no header was provided in the glim file, they will be assigned V1 V2 V3 ... Vn). So, for example, given the above example, to test the effect for age the following formula would be used:
y ~ Age
Or, to test the effect of Group + age, the following:
y ~ Group + Age
In all these example, y takes the place of the metric to be estimated, usually cortical thickness.
An often interesting question is to test the effect of mean cortical thickness against the predictors. This would be done in the following way:
> ms <- mni.mean.statistics(gf, 'y ~ Age') > summary(ms)
The key call here is to mni.mean.statistics, which in its basic form takes two parameters: the glim matrix as read in by mni.read.glim.file, and a symbolic version of the formula. The terms tested against have to exist in either the general R environment or be specified in the glim file. The function returns a linear model object, which can be further interrogated by calling the summary function on it.
Testing for an effect at each vertex is almost identical to testing for effects on the mean. The main difference is that it takes longer, and that the output is not as easily summarised by a single command. To repeat the test against age used on the mean above, the following command would be invoked:
> vs <- mni.vertex.statistics(gf, 'y ~ Age')
The only change is the replacement of mean in the function name by vertex, i.e. mni.vertex.statistics The output is a list of the important statistical elements, such as the F and t-statistics, R-squared values, etc, with one entry for each vertex tested. To make sense of it this list should be written out to file and loaded into some visualisation tool.
Once the statistics have been run on the mean as well as each vertex, the results should be written out to file. To follow the above example, the following incantation should be used:
> mni.write.vertex.stats(vs, "output_filename.vertstats", mean.stats=ms, glim.matrix=gf)
This will place the results, in the vertstats format, into the file specified as the second argument using the function mni.write.vertex.stats Specifying the mean statistics and the glim matrix is optional; if the arguments are provided, the relevant information will be included in the header, which can be quite useful for recovering what was done at a later point.
The example session described above deals with the most common usage of the statistical environment. There are several more advanced topics as well, however, dealing with more intricate statistics. In no particular order, the topics covered are:
An interaction term refers to a predictor that has to be modified in some way before usage in the formula. The most common use is for nonlinear terms. A second-order formula, for example, would look like this:
> y ~ Age + I(Age^2)
Basically, the I expression causes whatever is inside to be interpreted arithmetically rather than symbolically.
Sometimes it is desireable to compare the efficacy of two separate formulas. For example, where in the cortex is a second-order equation superior to a simple first-order one? The function >mni.vertex.compare.models performs an ANOVA and returns an F-value at every vertex. It is used like so:
> mni.vertex.compare.models(gf, 'y ~ Age', y ~ 'Age + I(Age^2)', dt)
where gf is the glim file as read by mni.read.glim.file and dt is the data table as created by mni.build.data.table.
One of the issues that any user of the cortical statistics package will have to address is that of the correction for multiple comparisons. Briefly, the problem is that in the normal statistical run over 40,000 individual linear models will be computed. If the standard threshold of alpha = 0.05 is used, that then means that on average over 2000 vertices will be considered significant even if the data is purely random. We currently have two ways of computing a sensible threshold in the face of multiple comparisons: Random Field Theory and the False Discovery Rate.
Random Field Theory is the more stringent of the two methods, as it tries to assure within a certain confidence level that no false positives will be above the threshold chosen. At this point in time, computing it is also more difficult, involving several matlab scripts. Briefly, the procedure used is the following:
In practice, the commands needed to perform this analysis are the following:
Computing the False Discovery Rate (FDR) is a whole lot simpler, as it is entirely incorporated into the statistics package. First of all, the goal of the FDR is different: the threshold will give you the expected percentage of false positives in your results. I.e. if you have a threshold of 0.05, and 1000 vertices end up above that threshold, then 50 of those on average are false positives. For each t-statistics in your results, a corresponding array of q values is computed as well. The q values can be roughly interpreted as the probability of that vertex in these results being a false positive. You can then simply threshold the data by setting a maximum q value that is acceptable to you. To explicity get a threshold, this invocation will work:
> q <- mni.compute.FDR(t.stats, df=degrees.of.freedom, fdr=0.05) > q$fdr.threshold
where t.stats was computed using mni.vertex.statistics (though this method is generic and can take any set of t-stats or even p-values ... see the manual), df is the degrees of freedom in the data, and fdr is the desired threshold. See the man page for mni.compute.FDR.
It is often useful to determine the statistical power of the cortical statistics methods given a certain population. Basically, what is need is the standard deviation of either the mean thickness or the thickness at each vertex. Given that and either the desired change to be recaptured or the number of subjects in each group one can determine the power of the technique. The function used is mni.vertex.power.analysis, and the necessary parameters are:
The results returned contain both the n needed to recapture the specified delta, as well as the delta that can be obtained given two groups with n subjects