On NPR last week, I heard about an interesting pair of articles published in May dealing with the limitations of fMRI studies in brain research.
This type of brain imaging is widely used in neuroscience, and autism research is no exception. The past decade or so has brought us lots of different hypotheses about how autistic brains work: probably the most extensively studied one I can think of is the observation that, when they're shown images of human faces, autistic people tend to activate a different part of their brains* than non-autistic people do. Functional MRI studies have also yielded a jumble of tentative observations about differences between autistic and neurotypical brain anatomy: this or that structure appears bigger or smaller on average in this or that series of brain scans.
What the authors of the second of the two studies (full text here) I linked in my first paragraph --- Edward Vul, Christine Harris, Piotr Winkielman**, and Harold Pashler --- noticed about many fMRI studies of emotion, personality and social cognition was the phenomenally high correlations between activity observed in a certain brain region in response to a given stimulus (say, images of happy, angry or frightened faces, or recordings of angry speech, or a semi-scripted interaction meant to make the subject feel lonely or rejected) and individual personality traits (empathy, say, or extraversion or anxiety).
They noticed that the correlations reported in many of these studies --- often above 0.8, on a scale of 0 to 1 --- were higher than they could be given the reliability of the tools used to measure each variable:
This, then, is the puzzle. Measures of personality and emotion evidently do not often have reliabilities greater than .8. Neuroimaging measures seem typically to be reliable at .7 or less. If we assume that a neuroimaging study is performed in a case where the underlying correlation between activation in the brain area and the individual difference measure (i.e., the correlation that would be observed if there were no measurement error) is perfect, then the highest expected correlation would be √(.8 x .7), or .74. Surprisingly, correlations exceeding this upper bound are often reported in recent fMRI studies on emotion, personality, and social cognition.To solve this mystery, Vul et al. surveyed the authors of fifty-five recent social-neuroscience articles describing fMRI studies, asking them exactly how they settled on which values to use (out of the tens or hundreds of thousands of individual data points, or "voxels," making up each image!) in their calculations.
In the articles we are focusing on here, the final result, as we have seen, was always a correlation value --- a correlation between each person's score on some behavioral measure and some summary statistic of their brain activation. The latter summary statistic reflects the activation or activation contrast within a certain set of voxels. ... [V]oxels may be selected based on anatomical criteria [i.e., those roughly corresponding to the targeted brain structure in spatial terms], functional criteria [i.e., those determined to show activity in response to relevant but not irrelevant stimuli], or both. Within those broad options, there are a number of additional more fine-grained choices. It is hardly surprising, then, that brief method sections rarely suffice to describe how the analyses were done in adequare detail to really understand what choices were being made.While there was a lot of diversity in the approaches respondents employed, and the distribution across the different approaches was fairly even, there was one fairly important trend that emerged.
In our survey, we first inquired whether the fMRI signal measure that was correlated across subjects with a behavioral measure represented the average of some number of voxels or the activity from just one voxel that was deemed most informative (referred to as the peak voxel).
If it was the average of some number of voxels, we asked whether the voxels were selected on the basis of anatomy, or activation seen in those voxels, or both. If activation was used to select voxels, or if one voxel was determined to be most informative based on its activation, we asked what measure of activation was used. Was it the difference in activation between two task conditions computed on individual subjects, or was it a measure of how this task contrast correlated with the individual difference measure? Finally, if functional data were used to select the voxels, we asked if the same functional data were used to compute the reported correlation.
First, to lead into what this trend was, I'd like to point out the two places where the distribution was not even: when the average of a group of voxels was used, those voxels were much more often selected using functional criteria than not --- i.e., the functional-only and mixed functional-anatomical approaches accounted for more than three-quarters of the articles (23 of 30, as opposed to only 7 studies using only anatomical criteria) --- and every study that used functional criteria to identify voxels of interest then re-used the same data they had used to select the voxels as their output measure for correlating with the behavioral data.
If your Circular-Reasoning Alarm is starting to sound, you're not alone:
The key [to explaining the "implausibly high" correlations often reported in fMRI studies] ... lies in the 53% of respondents who said that "regression across subjects" was the functional constraint used to select voxels, indicating that voxels were selected because they correlated highly with the behavioral measure of interest.In other words, because the pool of available data points is so vast, patterns will crop up wherever you choose to look for them. This is what the study's authors term "non-independence error": using the same functional measures for data analysis that you've already used to select your data set.
Figure 3 shows very concretely the sequence of steps that these respondents reported following when analyzing their data. A separate correlation across subjects was performed for each voxel within a specific brain region. Each correlation relates some measure of brain activity in that voxel (which might be a difference between responses in two tasks or in two conditions) with the behavioral measure for that individual. Thus, the number of correlations computed was equal to the number of voxels, meaning that thousands of correlations were computed in some cases. At the next stage, researchers selected the set of voxels for which this correlation exceeded a certain threshold, and reported the correlation within this set of voxels.
This graph shows all the studies Vul et al. reviewed --- you can see, the values of the correlations each study unearthed range from about 0.25 to 1.0. The red squares represent correlations derived using non-independent analyses, and the green ones represent independent analyses. You can see that the red ones tend to have much higher values than the green ones --- most of the green ones are clustered between 0.5 and 0.65, which are well below the "upper bound" of 0.74 cited earlier.
Further reading: Mind Hacks, the Neurocritic, and the Neuroskeptic have all posted on this a while ago; also worth reading are these two posts by Andrew Gelman at Statistical Modeling, Causal Inference, and Social Science, this post at The Amazing World of Psychiatry, and two articles in response to Vul et al., one also published in Perspectives on Psychological Science and the other posted online as a draft.
*This would be the "fusiform face area," the existence of which was proposed in 1997 by Nancy Kanwisher and her colleagues. Kanwisher, interestingly enough, seems to be working with Ed Vul --- she co-wrote this longer article about non-independence error with him. Sadly, none of the articles Vul et al. evaluated dealt with the fusiform gyrus, face recognition, or autism, so I can't draw any conclusions about the robustness of that theory. Nor can I (right now, anyway) apply this article's findings more directly to the neuroscience of autism.
Vul, E., Harris, C., Winkielman, P., & Pashler, H. (2009). Puzzlingly High Correlations in fMRI Studies of Emotion, Personality, and Social Cognition Perspectives on Psychological Science, 4 (3), 274-290 DOI: 10.1111/j.1745-6924.2009.01125.x