What if Gilbert is Right?

I. The Story Until Now (For late arrivals to the party)
Over the decades, since about 1970, social psychologists conducted lots of studies, some of which found cute, counter-intuitive effects that gained great attention. After years of private rumblings that many of these studies – especially some of the cutest ones – couldn’t be replicated, a crisis suddenly broke out into the open (1). Failures to replicate famous and even beloved findings began to publicly appear, become well known, and be thoroughly argued-over, not always in the most civil of terms. The “replicability crisis” became a thing.
But how bad was the crisis really? The accumulation of anecdotal stories and one-off failures to replicate was perhaps clarified to some extent by a major project organized by the Center for Open Science (COS), published last November, in which labs around the world tried to replicate 100 studies and, depending on your definition, “replicated” only 36% of them (2).
In the face of all this, some optimists argued that social psychology shouldn’t really feel so bad, because failed replicators might simply be incompetent, if not actually motivated to fail, and the typical cute, counter-intuitive effect is a delicate flower that can only bloom under the most ideal climate and careful cultivation. Optimists of a different variety (including myself) also pointed out that psychology shouldn’t feel so bad, but for a different reason: problems of replicability are far from unique to our field. Failures to reproduce key findings have become seen as serious problems within biology, biochemistry, cardiac medicine, and even – and disturbingly –cancer research. It was widely reported that the massive biotech company Amgen was unable to replicate 47 out of 53 of seemingly promising cancer biology studies. If we have a problem, we are far from alone.

II. And Then Came Last Friday’s News (3)
Prominent psychology professors Daniel Gilbert and Tim Wilson published an article that “overturned” (4) the epic COS study. Specifically, their reanalysis concluded that the study not only didn’t persuasively show that most of the studies it addressed couldn’t be replicated, its data were actually consistent with the possibility that all of the studies were replicable! The article was widely reported not just in press releases but in outlets including the Washington Post, Wired, the Atlantic on line, and the Christian Science Monitor, to name just a few.
Psychologists who had been skeptical of the “replication movement” all along – come one, we know who you are — quickly tweeted, Facebooked and otherwise cheered the happy news. Some even began to wonder out loud whether “draconian” journal reforms adopted to enhance replicability could now be repealed. At the same time, and almost as quickly, members of the aforesaid replication movement – come one, we know who you are too (5) – took close looks at the claims by Gilbert and Co., and within 48 hours a remarkable number of blogs and posts (6) began to refute their statistical approach and challenge the way they summarized some of the purported flaws of the replication studies. I confess I found most of these responses pretty persuasive, but that’s not my point for today. Instead my point is:

III. What if Gilbert is Right?
Let’s stipulate, for the moment, that Gilbert and Co. are correct that the COS project told us nothing worth knowing about the replicability of social psychological research. What then?

IV. The COS Study Is Not the Only, and Was Far From the First, Sign that We Have A Problem.
One point I have seen mentioned elsewhere – and I’ll repeat it here because it’s a good point – is that the COS project was far from being the only evidence that social psychology has a replicability problem. In fact, it came after, not before, widespread worry had been instigated by a series of serious and compelling failures to reproduce very prominent studies, and many personal reports of research careers delayed if not derailed by the attempt to follow-up on lines of research that only certain members of the in-crowd knew were dead ends. As this state of affairs became more public over the past couple of years, the stigma of failing to replicate some famous psychologist’s famous finding began (not entirely!!) to fall away, and a more balanced representation of what the data really show, on all sorts of topics, began to accumulate in public file drawers, data repositories, and outlets for replication studies. The COS study, whatever its merits, came on top, not as a foundation, of all that.

V. Other Fields Have Replicability Problems Too
A point I haven’t, in this context, seen mentioned yet – and my real motivation for writing this post – is that – remember! – the replication crisis was never exclusive to psychology  in the first place. It has affected many other fields of research as well. So, if Gilbert & Co. are right, are we to take it that the concerns in our sister sciences are also overblown? For example, was Amgen wrong? Were all those cancer biology studies perfectly replicable after all? Do biochemistry, molecular biology, and the medical research community share social psychology’s blight of of uncreative, incompetent, shameless little bullies aiming to pull down the best research in their respective fields?
Well, maybe so. But I doubt it. It seems extremely unlikely that the kinds of complaints issued against the studies that failed to replicate psychological findings apply in the same way in these other fields. It seems doubtful that problems in these other fields stem from geographical or temporal differences in social norms, unique aspects of student demographics, changes in wordings of scale items, exact demeanor of research assistants, or other factors of the sort pointed out by Gilbert & Co. as bedeviling attempts to replicate psychological findings. I also have no reason to think that molecular biology is full of shameless little bullies, but I stand ready to be corrected on that point.

VI: The Ultimate Source of Unreliable Scientific Research
So let’s go back to where some of us were before the COS study, when we pointed out that social psychology is not alone in having replication problems. What did this fact imply? Just this: The causes of a scientific literature full of studies that can’t be replicated are not specific to social psychology. The causes are both deeper and broader. They are deeper because they don’t concern concrete details of particular studies, or even properties of particular areas of research. They are broader because they affect all of science.
And the causes are not hard to see. Among them are:
1. An oversaturated talent market full of smart, motivated people anxious to get, or keep, an academic job.
2. A publication system in which the journals that can best get you a job, earn you tenure, or make you a star, are (or until recently have been) edited with standards such as the “JPSP threshold” (of novelty), and the explicit (former) disdain of Psychological Science for mere “bricks in the wall” that represent solid, incrementally useful, but insufficiently “groundbreaking” findings. I have been told that the same kinds of criteria have long prevailed in major journals in other fields of science as well. And of course we all know what kind of article is required to make it into Science.
3. And, even in so-called lesser journals, an insistence on significant findings as a criterion for publication, and a strong preference for reports of perfect, elegant series of studies without a single puzzling data point to be seen. “Messy” studies are left to work their way down the publication food chain, or to never appear at all.
4. An academic star system that radically, disproportionately rewards researchers whose flashy findings get widespread attention not just in our “best” journals but even in the popular media. The rewards can include jobs in the most prestigious universities, endowed chairs, distinguished scholar awards, Ted talks, and even (presumably lucrative) appearances in television commercials! (7)

It is these factors that are, in my opinion, both the ultimate sources of our problem and the best targets for reforming and improving not just psychology, but scientific research in all fields. And, to end on an optimistic note, I think I see signs that useful reforms are happening. People aren’t quite as enthusiastic about cute, counter-intuitive findings as they used to be. Hiring committees are starting to wonder what it really means when a vita shows 40 articles published in 5 years, all of which have perfect patterns of results. Researchers are occasionally openly responding – and getting publicly praised for openly responding — rather than defensively reacting, to questions about their work. (8)

VII. Summary and Moral
The replicability crisis is not just an issue for social psychology, and its causes aren’t unique to social psychology either. Claims that we don’t have a problem, because of various factors that are themselves unique to social psychology, fail to explain why so many other fields have similar concerns. The essential causes of the replicability crisis are cultural and institutional, and transcend specific fields of research. The remedies are too.

(1) The catalyst for this sudden attention appears to have been the nearly simultaneous appearance in JPSP of a study reporting evidence for precognition, and the exposure of massive data fraud by a prominent Dutch social psychologist. While these two cases were unrelated to each other and each exceptional by any standard, together they illuminated the fallibility of peer review and the self-correcting processes of science that were supposed to safeguard against accepting unreliable findings.
(2) Or 47%, or 39% or 68%, again, depending on your definition.
(3) Or a bit earlier, because Science magazine’s embargo was remarkably leaky, beginning with a Harvard press release issued several days before the article it promoted.
(4) To quote their press release; the word does not appear in their article.
(5) Full disclosure. This probably includes me, but I didn’t write a blog about it (until just now).
(6) A few: Dorothy Bishop, Andrew Gelman, Daniel Lakens, Uri Simonsohn, Sanjay Srivastava, Simine Vazire
(7) I strongly recommend reading Diederik Stapel’s vivid account (generously translated by Nick Brown) of how desperately he craved becoming one of these stars, and what this craving motivated him to do.
(8) Admittedly, defensive reactions, amplified in some cases by fan clubs, are still much more common. But I’m looking for positive signs here, and I think I see a few.