Why doesn’t personality psychology have a replication crisis?

Because It’s Boring

“[Personality psychology] has reduced the chances of being wrong but palpably increased the fact of being boring. In making that transition, personality psychology became more accurate but less broadly interesting.”  — Roy Baumeister (2016, p. 6)

Many fields of research – not just social psychology but also biomedicine, cancer biology, economics, political science, and even physics – are experiencing crises of replicability.  Recent and classic results are challenged by reports that when new investigators try to repeat them, often they simply can’t.  This fact has led to gnashing of teeth and rending of garments, not to mention back-and-forth controversies pitting creativity against rigor (see the article quoted in the epigram), and spawned memorable phrases such as “replication police” and “shameless little bullies.”

But, as the quote above attests, personality psychology seems to be immune.  In particular, I am not aware of any major finding (1) in personality psychology that has experienced the kind of assault on its reliability that has been inflicted upon many findings in social psychology (2).  Why not?  Is it because personality psychology is boring?  Maybe so, and I’ll come back to that point at the end, but first let’s consider some other

Possible Reasons Personality Psychology Does Not Have a Replication Crisis

  1. Personality Psychology Takes Measurement Seriously

The typical study in personality measures some attribute of persons (usually a personality trait) and also measures an outcome such as a behavior, a level of attainment, or an indicator of mental or physical health.  Even though everyone chants the mantra “correlation is not causality,” generally research proceeds on the (generally reasonable) presumption that the trait can be thought of as the independent variable, and the outcome as the dependent variable.  The IV is measured with several different indicators (items) and its reliability its calculated and reported.  The same practice is followed with the DV, and converging evidence is conventionally required that both the IV and the DV are reasonably good indicators of the constructs they are supposed to represent. Compared to other areas of psychology, the N is usually pretty large too.

Contrast this with the typical study in social psychology.  Many have only two levels of the IV, being the two experimental conditions (experimental and control); maybe there are three or four if the experiment is extra-fancy.  But the typical IV is scaled as merely high or low, or even present or absent. For example, subjects might be asked to unscramble words that do or do not have certain classes of content embedded within them.  Neither the reliability nor the generalizability of this manipulation is assessed (would the manipulation have the same effect if used more than once? Is the manipulation related to, or does it have the same effect as, other manipulations of ostensibly the same psychological variable?), much less its size.  The DV might get a bit more attention, in part because unlike the IV it usually has more than two values (e.g., walking speed) and so the reliability of its measurement (say, by two RA’s) might be reported, but the wider generalizability (aka construct validity) of the DV remains unexamined.

And I won’t even mention the problems of low power that go along with small-N studies, and the resulting imprecision of the results.  That one has been hashed out elsewhere, at length, so as I said, I’m not mentioning it.

A truism within personality psychology is that good research begins with good measurement, of both the dependent and independent variables.  Not all areas of psychology pay as much attention to this principle.

  1. Personality Psychology Cares about Effect Size

Results in personality psychology are always reported in terms of effect size, usually the familiar Pearson correlation coefficient.  Social psychology is different (3); social psychologists often state that they don’t care about effect size because in the context of their research the number is nearly meaningless anyway.  The argument goes like this: Because the experimental conditions are designed to be as different from each other as possible, in order to maximize chances of finding out whether anything will happen at all, and also because experiments, by design, control for things that covary in nature, the sizes of the resulting effects don’t really mean anything outside of the experimental context.  All that matters, for purposes of further theory development, is that an effect is found to exist.  The size is only important if you are doing applied work (4).

I actually think this argument has a point, but it reveals an essential limitation of the two-group experiment.  The method can be informative about the direction of causality, and the direction of the effect (positive or negative).  But it can tell us little or nothing about how big, robust and yes, replicable, this finding will turn out to be.

In contrast, close attention to measurement has produced a research literature establishing that

    3.  Many Key Findings of Personality Psychology are Robustly Replicable

These include:

  • Behavior is consistent across situations
  • Personality predicts longevity, job performance, relationship satisfaction and many other important life outcomes
  • Acquaintances largely agree with each other about the personality traits of the people they know well
  • People agree (with some interesting exceptions) with their acquaintances’ assessments of their personality
  • Measures of personality predict central tendencies of density distributions of behavior (for example, a trait measure of extraversion can predict how many extraverted behaviors you will display, on average)
  • Much of the information (not all) in the 17,953 trait words in the unabridged English dictionary can be reduced to a “”Big Five” basic traits: Extraversion, Neuroticism, Agreeableness, Conscientiousness, and Openness to Experience.

This is a very partial list.  But lest I be accused of bias (5), I will also note that:

4. Too Many Findings in Personality Psychology are Robust but Trivial

I actually co-authored a paper with the author of the epigram above (Baumeister, Vohs & Funder, 2007) that, among other things, took personality psychology to task on this very point.  A lot – too much – research in personality psychology correlates one self-report with another self-report.  Can you say “method variance?”  I’ve done such studies myself and they have their uses, and sometimes are they are all one can do, so my overall attitude is forgiving, even while I also believe that there truly is something to forgive.

Trivial findings will replicate! Correlations among different self-report scales can be expected to robustly replicate because the relationships are often built right into the content of the scales themselves.

Studies with self-report scales are common in part because they are so easy to do, but

5. Many Important Findings in Personality Psychology are Very Difficult to Repeat

Some of these findings come from longitudinal studies, in which individuals are repeatedly assessed over long periods of time.  These studies have shown that conscientious people live longer and that the consistency of individual differences is maintained over decades, and have also charted secular trends in personality development, showing how traits such as extraversion and conscientiousness wax and wane over the lifespan.   These findings have been replicated, but only “conceptually” because no two major longitudinal studies have ever used exactly the same methods.  A skeptic would literally need decades to really double-check them.

Other findings might not take decades to reproduce, but are still no walk in the park.  Consider a study from my lab (Fast & Funder, 2008).  This study was actually in one of the issues of JPSP that was targeted by the Center for Open Science replications project.  But nobody tackled it.  Why not?  Our study looked at correlates between personality, as judged by peers, and the frequency with which people used words in different categories, during a life history interview.  To replicate this study, here’s all you have to do: Recruit a sample of 250 undergraduates.  Recruit two peers of each of them to describe their personalities (500 peers in all).  Subject each of these 250 students to a one-hour life-history interview conducted by a licensed clinical psychologist.  Transcribe the recordings of these interviews, delete the interviewer’s comments, and clean up the transcript so that it can undergo linguistic analysis.  Run the transcript through a linguistic analysis program (we used LIWC) and see which categories of word use are related to personality, as judged by peers. Gathering the data in this project took two years.  Transcribing the interviews and cleaning the transcriptions took another two years, and the analyses took around a year beyond that, so about five years of work, in all.   I do NOT know whether the findings would replicate, though we tried hard to use internal checks to reveal results that were as robust as possible.  I would seriously love to see someone else do the same study to see if our results hold up.  What do you think the chances are that anyone ever will?

The kinds of non-trivial studies that Baumeister, Vohs and I advocated, that gather direct measurements of observed and meaningful behavior, are difficult to do, especially with a sufficiently large N, and commensurately a lot of work to replicate.  I’d like to think  — in fact, I do think — that most of these findings would survive direct replication, but who really knows? Hardly anybody has the time, resources, or sufficient motivation to check. In the meantime, these findings remain immune to the replication controversy.

But, going back to the opening quotation, there is one more reason why personality psychology has avoided a replication crisis, and I believe this reason is the most important of all.

6. Personality Psychology Is Not Afraid to be Boring

Modern personality psychology (since 1950 or so) has never really striven to be clever, or cute, or counter-intuitive.  Its historic goal has been to be useful.  The gold standard in personality research is prediction (6).  Personality is measured in order to predict – and understand – behaviors and life outcomes, in order to be useful in psychological diagnosis, personnel selection, identification of at-risk individuals, career counseling, mental health interventions, improvements in quality of life, and many other purposes. Occasionally the findings are surprising, such as the now well-established fact that the trait of conscientiousness predicts not only longevity, but also job performance in every single occupation where it has ever been tested.  Nobody expected its implications to be so far-reaching.  The Big Five personality traits are not exactly surprising, but they aren’t obvious either.  If they were, it wouldn’t have taken 60 years of research to find them, and there wouldn’t still be controversy about them. Still,  studies in personality psychology typically lack the kind of forehead-slapping surprise value that characterizes many of the most famous (and problematical) findings in social psychology.

According to Bargain Basement Bayesian analysis, counterintuitive findings have low prior probabilities, by definition.  And thus, in the absence of extra-strong evidence, they are unlikely to be true and therefore unlikely to be replicable. I am about the one-hundred thousandth blogger to observe that ignoring this principle has gotten social psychology into tons of trouble.  In contrast, the fact that personality psychology never saw counter-intuitive (or, as some might put it, “interesting”) findings as its ultimate goal, seems to have turned out to be a protective factor.

Conclusion

Admittedly, some of advantages of personality psychology are bugs and not features. It isn’t particularly salutary that so many of personality psychology’s findings are trivially replicable because they amount to intercorrelations of self-report scales. And, the fact that some of the most interesting findings are almost immune to replication studies because they are so difficult to repeat, does not necessarily mean all of those findings are true.  Despite appearances, personality psychology probably has replicability issues too.  They are just harder to detect, which makes it even more important for personality researchers to get it right, the first time.  Nobody might come this way again.

Here’s another quote from the same article excerpted at the beginning of this post:

Social psychology might think carefully about how much to follow in personality psychology’s footsteps. Our entire field might end up being one of the losers.

Well, none of us wants to be a “loser,” but the present comparison of the neighboring disciplines leads to a different conclusion.  Social psychology (and science generally) might in fact do well to draw a few lessons from personality psychology: Take measurement seriously. Use large samples.  Care about effect size.  And don’t be afraid to be boring.  More exactly, push back against the dangerous idea that findings have to be surprising or counter-intuitive to be interesting.  How “interesting,” in the end, is a flashy finding that nobody can replicate?

 Footnotes

(1) Or to be honest, any study at all, but I’m trying do a little CYA here.

(2) To name a few: elderly priming, money priming, too much choice, glucose as a remedy for ego depletion, cleanliness and moral judgment, bathing and loneliness, himmicanes, power posing, precognition (this last finding might not really belong to social psychology, but it was published in the field’s leading empirical outlet).

(3) Even when effect sizes are reported, as required by many journal policies, they are otherwise typically ignored.

(4) This is NOT a straw man.  I used to think it was.  See my earlier blog post, which includes verbatim quotes from a prominent (anonymous) social psychologist.

(5) This will happen anyway; see footnote 7.

(6) In the opinion of many, including myself, the best graduate-level textbook ever written on this topic was Personality and Prediction (Wiggins, 1973).  It says a lot about measurement.  Everybody should read it.

(7) An indicator of my bias: I have written a textbook in personality psychology, one which, by the way, I tried very hard to make not-boring.

Acknowledgment

Ryne Sherman and Simine Vazire gave me some helpful tips, but none of this post should be considered their fault.

References

Baumeister, R.F. (2016). Charting the future of social psychology on stormy seas: Winners, losers, and recommendations. Journal of Experimental Social Psychology.

Baumeister, R.F., Vohs, K.D., & Funder, D.C. (2007). Psychology as the science of self-reports and finger movements: Whatever happened to actual behavior? Perspectives on Psychological Science, 2, 396-403.

Fast, L.A., & Funder, D.C. (2008). Personality as manifest in word use: Correlations with self-report, acquaintance-report, and behavior. Journal of Personality and Social Psychology, 94, 334-346.

Wiggins, J.S. (1973). Personality and prediction: Principles of personality assessment. Reading, MA: Addison-Wesley.

 

 

 

 

6 thoughts on “Why doesn’t personality psychology have a replication crisis?

  1. From an outsider’s perspective, I always assumed that personality studies are p-hacked as hell. Each study requires much resources and there are tons of measures to choose from. So, there is all the motivation to find something, and also many methods to p-hack. Much like fMRI studies. I’m still not convinced that this is not the case.

    One must remember that social psychology is in “crisis” right now mostly because some researchers decided to stop the game. In most other fields, most/all researchers simply continue to play the game. It advances careers just fine, it only doesn’t advance science much.

  2. Great post. I just wanted to tell you that I have the data to do a close replication of your 2008 study. We interviewed real people instead of undergrads, but otherwise we have the interviews and observer ratings from multiple sources. Of course, the interviews need to be transcribed….

  3. Great post. I think we find some more replication issues when looking at interactions with personality. Especially interactions with situations, but also not tons of clear or robust trait x trait interactions to point at.

  4. “Many fields of research (…) even physics – are experiencing crises of replicability”
    . . . Maybe I read too few physics journals since leaving university, but where does physics experience a crisis of replicability? I know that some fields of physics like cosmology and high-energy physics experience a crisis of experimentation, because their theories predict the outcome of experiments that are well beyond today’s and tomorrow’s experimental capabilities. Are there fields, however, where experimental results have been accepted, but now cannot be replicated?
    . . . Those fields that deal with impossible experiments I expect to “mathematize” – using more the criteria of mathematics like being free of inner contradiction and being elegant, rather than the experimental outcome that is impossible to produce in the first place. Academically, they belong to the math department where General Relativity has often already found a good home.

  5. Pingback: Some Good Things  | My Scholarly Goop

Leave a comment