NOTE: This section may be obsolete or inaccurate and incomplete

Introduction to R

There are several ways to do the statistical analysis for VBM. At the BIC, glim_image was a popular program used for this purpose. However, in 2004, Jason Lerch created a library package in R called RMINC. With the introduction of MINC 2, this has become an increasingly popular implementation method.

Included below is some documentation highlighting specific commands and functions used. However, it would probably be beneficial to learn the basics of R (an open source offshoot of S-plus). Please see the following links for more information:

Please note there are some important requirements in order to use R:

  1. First, and most obviously, it must be installed on your machine. If you are running R at the BIC, it may already be installed on your machine. Simply type R at the command line to find out. If you do not have it, you can ssh to another machine (e.g. yorick). Of course, this assumes that you have some disk space available for your analysis. Otherwise, you will need to install it yourself. Go to the R project website for instructions about that.
  2. In addition to installing R, if you are planning on doing VBM or cortical thickness analysis and plan on using Jason Lerch’s R libraries, then it will helpful to have MINC2 installed. There are several ways to do this.
    • Either install MINC2 on your machine, which can be somewhat time consuming. See Andrew Janke’s instructions on the BIKI regarding MINC
    • Perhaps, talk to Claude Lepage about installing the latest version of the CIVET quarantine, which includes MINC2.
    • You can source someone else’s environment… assuming of course that they have MINC2 installed. (e.g. source ~lau/environment_minc2.csh).
  3. Once set up, you may want to familiarize yourself with R. There is not that much documentation on R yet. However, since R is the open source version of S-plus, you may consider looking at S-plus documentation (such as Mixed effects models in S and S-plus; Pinheiro & Bates 2000). Included here are a couple of links to R tutorials that might come in handy. However, in addition to these R tutorials, a basic understanding of statistics is imperative.
  4. If you are planning on using the R tutorials, something that is not necessarily covered in these tutorials is how to install and load the various R data packages:

To install a specific package (not entirely sure how this works):

  • install.packages(“faraway”,lib=“./”)

To install the package in a location other than the default library:

  • library(faraway,lib.loc=“./”)

Voxel Based Morphometry (VBM)

Voxel-based morphometry is a computational approach to neuroanatomy that measures differences in local concentrations of brain tissue, through a voxel-wise comparison of multiple brain images (Ashburner and Friston, 2000). The value of VBM is that it allows for comprehensive measurement of differences, not just in specific structures, but throughout the entire brain. This page provides an example of the most common use of VBM, which allows a voxel-by-voxel comparison of the local concentrations of grey matter, between MR images of two groups of subjects.

The images must be put in talairach space and corrected for intensity non-uniformity. Each voxel is then classified as white matter, grey matter, cerebrospinal fluid, or background. The classification is based on the intensity of the voxel combined with the spatial probability of that voxel being white, grey, CSF, or background. After the classification, the image can be segmented, so that only one tissue type appears. Next, the image is smoothed using a Gaussian kernel, which gives a weighted average of tissue density in a defined area surrounding the selected voxel. Regression analysis using the general linear model can be conducted to compare the density of grey matter in the two groups, in order to identify differences in the concentration of grey matter that are related to specific variables under study. The output of this method is a statistical map that demarcates regions in which there is a significant difference in the concentration of grey matter between compared groups (Ashburner and Friston 2000).

Cortical Thickness Statistics

The purpose of cortical thickness statistics is to allow for hypothesis testing at every vertex of the surface. For each analysis both input data and a series of independent variables are required. The simplest case is testing whether cortical thickness is related to group on either a regional or global level, though more advanced multivariate models are certainly possible.Jason Lerch has created a statistic package for Cortical Thickness Analysis called mni.cortical.statistics. [edit] Creating the spreadsheet

The spreadsheet can be created using whatever software, Excel being the most obvious. It should have a header row giving a name to each column. The first column should contain the filenames (with absolute paths) of all the scans to be analyzed. Every subsequent column contains information associated with that scan, see below:

Filename Intercept Group Age
/somedir/someplace/subject001_thickness.txt 1 Normal 23
/somedir/someplace/subject002_thickness.txt 1 Patient 25
/somedir/someplace/subject003_thickness.txt 1 Patient 22
/somedir/someplace/subject004_thickness.txt 1 Normal 20


Some notable elements:

  • The first line, the header, is optional.
  • The second column, the intercept term, is also optional.
  • Different columns are separated by a single space.
  • Missing values are allowed, use an extra space where needed (i.e. don’t insert anything).
  • Factors are allowed as well - in the example given above, the two groups (Normal and Patient) are specified as factors. Make sure that the spelling is always identical
  • It is easiest to specify all possible independent variables in one file. Just because they are included does not mean that they need to be used in every analysis - that is specified separately.

If a longitudinal dataset is being investigated then a column giving subject (as opposed to scan) IDs is also necessary so that the scans can be grouped for later analysis.

Once the preparation of the spreadsheet is complete, save it as a CSV (Windows) file. To convert it to a GLIM file (talk about GLIM files):

(Check and see if .csv files are now loadable into R for use with mni.cortical.statistics)

~/mni-programs/nih_make_glim.pl -native -nlin -fwhm 30 input.csv output.glim

In this command, replace input with the actual name of the file that was created in Excel.

Beginning the Statistical Analysis Using R

Start the R statistical engine by typing R. Once the R shell has started up, you need to load the MNI statistical libraries like mni.cortical.statistics:

library(mni.cortical.statistics)

If you receive an error message saying There is no package called…, you may need to configure your R libraries environment variable (see section on Installing R), or simply install the mni.cortical.statistics package.

The input matrix is prepared in a simple text file as described above. This is then loaded with the following command mni.read.glim.file

gf <- mni.read.glim.file(“filename.glim”, TRUE, TRUE, “csv”)

(replacing the filename with the real file name of course) and build the data table

There are three optional arguments to mni.read.glim.file, namely header, fill, and file type. The first two are by default set to FALSE; set header=TRUE if the first line of the input file contains a data header, and set fill=TRUE if the input file is supposed to contain missing values. If your file is comma rather than space separated, set the last argument to file.type=“csv”.

Once you have the input matrix loaded, the individual data elements need to be read into memory. This is done with the following command:

dt <- mni.build.data.table(gf)

The dt variable now has one column for each subject in the GLIM matrix, and one row for each datapoint (usually 40962).

Oliver’s way:

out <- rep(0,40962) (to initialize the vector space)
for(i in 1:40962){ out[i] <- summary(lm(dt[i,]~gf[,2])) if(i1000==0){print(i)} }

Viewing the Results

We now have to fit our t-stat model onto an image. It should ideally be the mid-surface average (mid_surfaces are located in the surfaces folder produced by CIVET) for all of the subjects. We will need to create a left average to compare to our left input model as well as a right average to compare to our right input model. This can be accomplished by using average_objects.

average_objects output.obj [input1.obj] [input2.obj] …
brain-view [options] left_mid_avg.obj vertex_file.txt

Once I have my vertices of interest, I will feed them back into R and plot a given vertex with whatever we are regressing against (in this case age):

plot(dt[30279,],gf[,5]) Why am I not seeing 0 and 1????
abline(lm(gf[,5]~ dt[30279,]))
pairs(cbind(dt[30843,],dt[27830,], dt[30279,], gf[,5], gf[,6])) ???
pairs(cbind(dt[30843,], gf[,5], gf[,4]))

We should try to view this all in 3D, so we tried to get a certain library, but couldn’t:

install.packages(“scatterplot3d”) install.packages(“scatterplot3d”,lib=“./”)

Back to Statistical Analysis Tools

CIVET Home