Having taken some other IATs before, I was actually pretty surprised that the presidential one matched so well. (I've taken the race IAT and the sexuality IAT, thereby discovering that I apparently favor gay white people). I had been (and continue to be) really skeptical that the IAT actually measures internal biases at all, rather than being an indicator of reaction time, mental flexibility or some other trait more closely related to punching keys in response to visual stimuli.
This time, rather than sit around and grumble, I actually hunted around for articles on the IAT, to see if any psych researchers had leveled the same criticisms at it that I did. Sure enough, I found such an article, courtesy of Mixing Memory, who has this to say about IATs in general (italics mine):
In my mind, giving the IAT so much publicity is the most irresponsible thing I've seen in psychology since I began studying it, short of testifying in court that there is scientific verification of the existence of recovered memories (the IAT, at least, has not ruined anyone's life). While the IAT has been publicized (by its authors!) as a measure of implicit attitudes, and even more, as a measure of implicit prejudice, there is no real evidence that it measures attitudes, much less prejudices. In fact, it's not at all clear what it measures, though the fact that its psychometric properties are pretty well defined at least implies that it measures something.
It isn't just guys on the Internet saying this, either. This presentation was given to a 2001 meeting of the Society for Experimental Social Psychology in Spokane, Washington by Dr. Anthony Greenwald, who has published many articles about IATs. The presentation lists the ten biggest problems with the IAT, split into six "measurement" problems and four "conceptual" problems, which are presumably more fundamental. Sure enough, what should #10 be but "Order of combined tasks influences the measure"? (I definitely noticed, while taking the race IAT, and to a lesser extent the presidential one, that my fingers had greater difficulty hitting the right buttons early in the test; by the end, I was pretty decent at it. So if I had been randomly assigned, say, black faces and "good" words, white faces and "bad" words, black faces and "bad" words and then white faces and "good" words, in that order, my results would show up as being strongly white-supremacist, when really I had just taken longer than average to accustom myself to the tasks.)
Other effects I wondered about that were addressed in this presentation were #8, "IAT effects are reduced with repeated administrations" (see my progression from IAT results that surprised me, in the race IAT, to ones congruent with my conscious preferences, in the presidential IAT), #5, "IAT measures are influenced by measurement context variables", #4, "IAT appears to be slightly fakeable" (Neurocritic claims on his blog to have faked the outcome of at least one IAT, and links this 2007 study that finds IAT takers are able to skew their own results when bidden to do so by experimenters) and #1, "How the IAT measures association strengths is not yet well understood."The American Psychologist article I linked above makes a point strikingly similar to the one Stephen Jay Gould makes about IQ in The Mismeasure of Man, when he accuses IQ-test proponents of "reifying" the average score on a battery of cognitive tests as "intelligence," when it has never been demonstrated that this average (g) actually describes or predicts anything beyond performance on IQ tests. The article finds a similar reification going on with the scale of implicit bias used in IAT results. What the IAT actually measures are response times for various pairs of words or images, and calculates bias by subtracting average response time for pairings that would conflict with the bias being tested (say, "black" and "good" in the race IAT) from those for pairings that would confirm the bias ("white" and "good" or "black" and "bad"), with a nonzero result indicating bias in either direction. The problem, the article's authors say, is that no one has tried to control for the emotional strength of the words or images. (Indeed, Greenwald's presentation admits that IAT effects are smaller for images than for words, indicating some difference in how the stimuli are processed). A zero IAT score might therefore not actually mean no biases exist, if some of the stimuli are stronger than others and the distribution of more emotionally affecting images favors one side or another. A person might go in with a bias in one direction, but if the images he's confronted with push his emotions the opposite way, he might come out entirely neutral, when really that's the one thing he's never been.
On the presidential-IAT discussion at Pandagon, some of the commenters wondered if the pictures of the candidates were chosen with equal attention to how flattering they were. Each candidate had four photos that flashed onscreen, taken over what seemed to be a broad span of time. Some of them also looked more flattering and polished than others, but it seemed to me at least that each candidate had one good picture, one bad one and two middling ones. But how good someone looks in a photograph is a highly subjective quality; some commenters felt that all of the pictures of some candidates (Clinton and Obama were mention, although McCain was also singled out for having the single ugliest picture) were given unflattering photos.
It's almost as though the IAT were created in a drunken game of "Let's throw as many confounding variables in here as we possibly can"!