Evaluation of 14 nonlinear deformation algorithms applied to human brain MRI registration

Arno Klein, Jesper Andersson, Babak A Ardekani, John Ashburner, Brian Avants, Ming-Chang Chiang, Gary E Christensen, D Louis Collins, James Gee, Pierre Hellier, Joo Hyun Song, Mark Jenkinson, Claude Lepage, Daniel Rueckert, Paul Thompson, Tom Vercauteren, Roger P Woods, J John Mann, Ramin V Parsey, Arno Klein, Jesper Andersson, Babak A Ardekani, John Ashburner, Brian Avants, Ming-Chang Chiang, Gary E Christensen, D Louis Collins, James Gee, Pierre Hellier, Joo Hyun Song, Mark Jenkinson, Claude Lepage, Daniel Rueckert, Paul Thompson, Tom Vercauteren, Roger P Woods, J John Mann, Ramin V Parsey

Abstract

All fields of neuroscience that employ brain imaging need to communicate their results with reference to anatomical regions. In particular, comparative morphometry and group analysis of functional and physiological data require coregistration of brains to establish correspondences across brain structures. It is well established that linear registration of one brain to another is inadequate for aligning brain structures, so numerous algorithms have emerged to nonlinearly register brains to one another. This study is the largest evaluation of nonlinear deformation algorithms applied to brain image registration ever conducted. Fourteen algorithms from laboratories around the world are evaluated using 8 different error measures. More than 45,000 registrations between 80 manually labeled brains were performed by algorithms including: AIR, ANIMAL, ART, Diffeomorphic Demons, FNIRT, IRTK, JRD-fluid, ROMEO, SICLE, SyN, and four different SPM5 algorithms ("SPM2-type" and regular Normalization, Unified Segmentation, and the DARTEL Toolbox). All of these registrations were preceded by linear registration between the same image pairs using FLIRT. One of the most significant findings of this study is that the relative performances of the registration methods under comparison appear to be little affected by the choice of subject population, labeling protocol, and type of overlap measure. This is important because it suggests that the findings are generalizable to new subject populations that are labeled or evaluated using different labeling protocols. Furthermore, we ranked the 14 methods according to three completely independent analyses (permutation tests, one-way ANOVA tests, and indifference-zone ranking) and derived three almost identical top rankings of the methods. ART, SyN, IRTK, and SPM's DARTEL Toolbox gave the best results according to overlap and distance measures, with ART and SyN delivering the most consistently high accuracy across subjects and label sets. Updates will be published on the http://www.mindboggle.info/papers/ website.

Figures

Fig. 1
Fig. 1
Brain image data. The study used four different image datasets with a total of 80 brains. The datasets contain different numbers of subjects (n) and different numbers of labeled anatomical regions (r) derived from different labeling protocols: LPBA40 (LONI Probabilistic Brain Atlas: n=40, r=56), IBSR18 (Internet Brain Segmentation Repository: n=18, r=84), CUMC12 (Columbia University Medical Center: n=12, r=128), and MGH10 (Massachusetts General Hospital: n=10, r=74). A sample brain from each dataset is shown. For each brain, there are three columns (left to right): original T1-weighted MRI, extracted brain registered to nonlinear MNI152 space, and manual labels registered to nonlinear MNI152 space (used to extract the brain). Within each column the three rows (top to bottom) correspond to sagittal (front facing right), horizontal (front facing top, right on right side), and coronal (right on right side) views. The LPBA40 brains had already been extracted and registered to MNI (MNI305 vs. MNI152) space (Shattuck et al., 2008). The scale, position, and contrast of the MR images have been altered for the figure. The colors for the manual labels do not correspond across datasets. (In the Discussion the reader is referred to the web version of this article.)
Fig. 2
Fig. 2
Registration equations. The three stages of the study were to compute, apply, and evaluate registration transforms. To compute the transforms, we linearly registered each source image Is to a target image It (both already in MNI space), resulting in a “linear source image” Is→t as well as a linear transform Xs→t (a straight arrow denotes linear registration). Each nonlinear algorithm Ai then registered (warped) the linear source image to the same target image, generating a second, nonlinear transform X[s→t]⇝t (a curved arrow denotes nonlinear registration). We applied the linear transform to the source labels Ls to give the corresponding “linear source labels” Ls→t, and applied the nonlinear transform to Ls→t to produce the final warped source labels L[s→t]. Finally, we compared these labels to the manual labels for the target, Lt, using a set of evaluation measures Eq.
Fig. 3
Fig. 3
Overview. This diagram provides an overview of the study for a single nonlinear registration algorithm, placing example preprocessed data from Fig. 1 into the equations of Fig. 2. The three stages include linear registration, nonlinear registration, and evaluation (left to right). The four different datasets (LBPA40, IBSR18, CUMC12, and MGH10) are aligned along the left in four different versions: images, surfaces derived from the images, labels, and borders derived from the labels. A source and target are drawn from each version (image volumes are shown as coronal slices for clarity). A source image Is is linearly then nonlinearly registered to a target image It. The linear and nonlinear transforms (Xs→t and X[s→t]⇝t) are applied to the corresponding source labels Ls. The resulting nonlinearly transformed labels L[s→t]⇝t are compared against the target labels Lt. This comparison is used to calculate volume overlap and volume similarity per region. The target surface St is intersected with the target labels Lt and warped source labels L[s→t]⇝t to calculate surface overlap. Borders between each labeled region and all adjacent labeled regions are constructed from Lt and L[s→t]⇝t, and average distances between the resulting borders Bt and B[s→t]⇝t are calculated per region.
Fig. 4
Fig. 4
Overlap. This study uses volume and surface overlap, volume similarity, and distance measures to evaluate the accuracy of registrations. The equations for the three overlap measures: target overlap, mean overlap, and union overlap use the terms in this schematic Venn diagram of two partially overlapping objects, a source S and a target T. Their intersection is denoted by ST and their union by ST. S|T indicates the set (theoretic complement) of elements in S but not in T.
Fig. 5
Fig. 5
Overlap by registration method. These box and whisker plots show the target overlap measures between deformed source and target label volumes averaged first across all of the regions in each label set (LPBA40, IBSR18, CUMC12, and MGH10) then across brain pairs. Each box represents values obtained by a registration method and has lines at the lower quartile, median, and upper quartile values; whiskers extend from each end of the box to the most extreme values within 1.5 times the interquartile range from the box. Outliers (+) have values beyond the ends of the whiskers. Target, union and mean overlap measures for volumes and surfaces (and the inverse of their false positive and false negative values) all produced results that are almost identical if corrected for baseline discrepancies. Similarities between relative performances of the different registration methods can even be seen here across the label sets. (SPM_N*=“SPM2-type” normalization, SPM_N=SPM's Normalize, SPM_US=Unified Segmentation, SPM_D=DARTEL pairwise).
Fig. 6
Fig. 6
Volume and surface overlap by registration method: LPBA40 regions. These brain images show the mean target overlap calculated across all 1,560 brain pairs for the (A) volume and (B) surface of each LPBA40 region, and depicts that mean as a color (blue indicates higher accuracy). The values for each registration method are projected on one of the LPBA40 brains, seen from the left, looking down from 30°, with the frontal pole facing left. (SPM_N*=“SPM2-type” Normalize, SPM_N=Normalize, SPM_US=Unified Segmentation, SPM_D=DARTEL pairwise).
Fig. 7
Fig. 7
Indifference-zone ranking of the registration methods: LPBA40 overlaps. This matrix uses a color scale that reflects the relative performance of the registration methods (with blue indicating higher accuracy). Each colored rectangle represents the average score for a given method for a given region, averaged over 1,560 LPBA40 registrations. The scores are {−1,0,1} values indicating the pairwise performance of the method relative to each of the other methods (see text), according to target volume overlap (union and mean overlap results are almost identical). The colors (and color range) are not comparable to those of the other label sets (Figs. 8, 9, and 10 in Supplementary section 3). (SPM_N*=“SPM2-type” Normalize, SPM_N=Normalize, SPM_US=Unified Segmentation, SPM_D=DARTEL pairwise).

Source: PubMed

3
Abonneren