Speaking of replication…

A small conference sponsored by the European Association of Personality, held in Trieste, Italy last summer, addressed the issue of replicability in psychological research.  Discussions led to an article describing recommended best practices, and the article is now “in press” at the European Journal of Personality.  You can see the article if you click here.

Update November 8: Courtesy of Brent Roberts, the contents of the special issue of Perspectives in Psychological Science on replicability are available.  To go to his blog post, with links, click here.

On inference (updated x 2)

At a conference I attended last month, I heard for the first time about an Oxford philosopher who, according to his fellow philosophers, has pretty much proved that we live inside of a computer simulation. I’ll take the philosophers’ word for it when they say the inferential logic appears to be impeccable.

Which brings me to draw the following lesson: Any system of rigid (or automatic) inferential rules, followed out on a long enough chain, will eventually lead to an absurd conclusion. (If someone else has already coined this principle as a proverb, or something, I’d love to hear about it.)

For example, consider rigid applications of constitutional law. The Second Amendment of the US Constitution says that the right to bear arms shall not be infringed. Therefore, as an American citizen, I cannot be prohibited from owning a pistol, an assault rifle or (why not?) a nuclear bomb. The logic is fine; the conclusion is ridiculous.

The vulnerability of automatic systems to absurd outcomes is one reason  I dislike the term “inferential statistics.” There is really no such thing. All statistics are descriptive. Some describe the probability of a result under the null, which is not uninteresting. But this calculation can’t do your inference for you, as brainlessly comforting as that would be. Instead, you need to think about the actual result. You should consider, for example, its a priori plausibility, its theoretical context, and its consistency with other known facts, not to mention its replicability. You might even add a dollop of – dare I say it? – common sense.

Of course, your inference might be wrong. That’s the thing about inferences. But a system of rules that tries to make your inferences for you (especially if the rules include arbitrary standards like the .05 threshold) risks drawing conclusions that are out-and-out absurd.

Update November 10, 2012: On reading post-election commentary I’m realizing that what I said above has some problems, or is at best incomplete.  Right before the election a battle raged on the internet between Nate Silver, of the incomparable blog fivethirtyeight.com, and various pundits — most but not all of whom were conservative.  Silver earned the pundits’ ire by using a sophisticated statistical predicting model that produced a high-confidence prediction of an Obama victory, whereas the pundits “knew in their guts” or from “years of experience” that Romney was going to win — in a landslide, some even said.  Of course we know now that Silver was right and the pundits wrong.  This outcome is a clear victory for statistical prediction over what the pundits surely would have been willing to characterize “as a priori plausibility…theoretical context… consistency with other known facts…[and] common sense” (see above).

So where does that leave my little aphorism and its supposed implications?  In an uncomfortable spot, that’s where. Common sense and “gut reactions” (about which Gerd Gigerenzer has written a an entire, brilliant book) remain indispensable, especially in situations where the data to calculate a Silver-ish model aren’t available, which is probably most situations in real life.  But relying on common sense and the gut also can make one’s conclusions vulnerable to wishful thinking, which seems to be what happened in the case of the election pundits.  When you have a wealth of relevant data and a model, based on past experience and reasonable theorizing, for combining them into a prediction, then you probably do want to rely on the model and not on so-called common sense.

However: Note the word “probably” in the preceding sentence.  Even Nate Silver’s final prediction was only issued with 91% confidence.  Maybe the remaining 9% is where common sense saves itself.  I still don’t like the term “inferential statistics” and the arbitrary .05 threshold for deciding whether something is true (which Silver doesn’t use, by the way; he always reports exact probabilities).  And, I still don’t think we live inside a computer simulation.  Who would program a universe like this?

Update November 14, 2012: It turns out that of all the pundits and prognosticators for the 2012 presidential election, only three got perfect scores on predicting the electoral college.  Two of them were statistical modelers, the previously mentioned Nate Silver (of fivethirtyeight.com) and a professor at Emory University named Drew Linzer.  But the one with the very best predictive record — with a near perfect estimate of the margin of victory or defeat in each of the swing states — was Markos Moulitsas, proprieter of the liberal blog dailykos. What was the edge he had over the statisticians?  To quote his own answer:

 All three of us used data to arrive at our conclusions. The difference between them and me? They were wedded to their algorithmic and automatic models, but my model is manual, allowing me the freedom to evaluate each piece of data on its merits and separate the wheat from the chaff, while mixing in early vote performance to further refine my calls.

That’s the point I’ve been struggling to make, above.  We can and should be informed by our statistical calculations — and not simply ignore them, as a startling number of pundits did — but we still have to take responsibility for the conclusions we draw.  Sometimes human judgment adds value, and every once in a great while it can save us from fatal error.  Or just silly conclusions.  I still don’t believe we live inside a computer simulation.