How High is the Sky? Well, Higher than the Ground

Challenged by some exchanges in my own personal emails and over in Brent Robert’s “pigee” blog, I’ve found myself thinking more about what is surely the weakest point in my previous post about effect size: I failed to reach a clear conclusion about how “big” an effect has to be to matter. As others have pointed out, it’s not super-coherent to claim, on the one hand, that effect size is important and must always be reported yet to acknowledge, on the other hand, that under at least some circumstances very “small” effects can matter for practical and/or theoretical purposes.

My attempt to restore coherence has two threads, so far. First, to say that small effect sizes are sometimes important does not mean that they always are. It depends. Is .034 (in terms of r) big enough? It is, if we are talking about aspirin’s effect on heart attacks, because wide prescription can save thousands of lives a year (notice, though, that you need effect size to do this calculation). Probably not, though, for other purposes.

But honestly, I don’t know how small an effect is too small. As I said, it depends. I suspect that if social psychologists, in particular, reported and emphasized their effect sizes more often, over time an experiential base would accrue that would make interpreting them easier. But, in the meantime, maybe there is another way to think about things.

So the second thread of my response is to suggest that perhaps we should focus on the ordinal rather than absolute nature of effect sizes. While we don’t often know exactly how big an effect has to be to matter, in an absolute sense, there are many contexts in which we care which of two things matters **more**. Personality psychologists routinely publish long (and to some people, boring) lists of correlates; such lists draw attention to the personality variables that appear to be more and less related to the outcome of interest, even if the exact numerical values aren’t necessarily all that informative.

Social psychological theorizing is also often, often, phrased in terms of relative effect size, though the actual numbers aren’t always included. The whole point of Ross & Nisbett’s classic book “The Person and the Situation” was that the effects of situational variables are larger than the effects of personality variables, and they draw theoretical implications from that comparison that — read almost any social psychology textbook or social psych. section of any intro textbook — goes to the heart of how social psychology is theoretically framed at the most general level. The famous “Fundamental Attribution Error” is explicitly expressed in terms of effect size — situational variables allegedly affect behavior “more” than people think. How do you even talk about that claim without comparing effect sizes? The theme of Susan Fiske’s address at the presidential symposium at the 2012 SPSP was that “small” manipulations can have “large” effects; this is also effect size language expressing a theoretical view. Going back further, when attitude change theorists talked about direct and indirect routes to persuasion, this raised a key theoretical question of relative influence of the two effects. More recently, Lee Jussim wrote a whole (and excellent) book about the size of expectancy effects, comparing them to the effects of prior experience, valid information, etc. and building a theoretical model from that comparison.

I could go on, but, in short, the relative size of effects matters in social psychological theorizing whether the effects are computed and reported, or not. When they aren’t, of course, the theorizing is proceeding in an empirical vaccum that might not even be noticed – and this happens way too often, including in some of the examples I just listed. My point is that effect size comparisons, usually implicit, are ubiquitous in psychological theorizing so it would probably be better if we remembered to explicitly calculate them, report them, and consider them carefully.

5 thoughts on “How High is the Sky? Well, Higher than the Ground

  1. I agree with you on the fundamental question of the need/benefits of routinely calculating effect sizes in social psychology — if nothing else, researchers are often shooting themselves in the foot by not being fluent enough in them and running underpowered studies and giving up on interesting questions. It’s the ironic flipside of p-hacking.

    However, I will say that I don’t think calculating the effect sizes would necessarily add much to all FAE experiments. Take the Wall Street vs. Community Game studies, where people who are nominated as likely cooperators and defectors by their dorm mates are no more likely than one another to defect in a Prisoner’s Dilemma game, while the name given to the game, because it changes people’s construal of the situation, makes all the difference.

    The goal is not to say there are no individual differences that might predict people’s behavior in the game — there certainly are. Only that people’s intuitive beliefs about their ability to predict behavior based on personality can be quite weak relative to the power of the situation itself. Does it matter what the effect sizes are — I would say not really, when one is clearly zero and the other is non-zero, the point has been made. And that’s the thing, it’s a demonstration, not a “junior physics” attempt to document how much exactly personality or the situation matter in this context. After all, the context is highly specific and intentionally artificial, and of no real interest beyond the context of the study itself.

    To put it a different way, I don’t think there’s much to be gained if you were to learn that the effect size of the context manipulation was .31 and of the personality manipulation was .22 and these were significantly different. It’s not that it would have been wrong to report them, it’s just that despite the relative effect sizes being the whole point of the study, the actual effect sizes are really beside the point. If we were trying to measure the same variable in a context that was of real interest, then certainly things change, but some experiments are designed to do something completely different.

    Just my two cents — doesn’t take anything away from the overall argument that we should be calculating effect sizes as a matter of routine and that this would be a great benefit to everyone.

    • Interesting point about the ironic flip-side of p-hacking; I hadn’t thought of it that way.
      About the FAE: in general the ability to predict behavior from a knowledge of personality is definitely more than zero (see: 40 years of personality research subsequent to Mischel’68). Which means that if we are going to accuse people of committing the FAE we will have to compare effect sizes — either (a) the effect size they expect vs. the effect size that exists or (b) the effect size of a personality vs. situational variable. In the history of claims about the FAE, neither sort of clear comparison is reported very often, if ever. Which is only one of my pet peeves about the FAE.

  2. The very concept of a p-value only makes sense assuming that we do care about effect sizes, because a p value depends upon the effect size and the sample size, and the latter is something we choose, based on the former. We decide “Well, I will care about this effect if it is greater than X, and to detect that effect I would need a sample size of Y”.

    That may nor may not be part of a formal power calculation. But the point is that choosing a given sample size, and then calculating a p value, assumes that at least implicitly, you are thinking about effect sizes… Sadly even top journals don’t require publication of effect sizes.

    It’s just incoherent to claim that “I only care about the existence of the effect, not the size”. If that were true, you ought never to run a study with a sample size of less than the whole population of the world!

  3. (Speaking as one of those dreaded quant persons.)

    I’m going to turn it around. For the point of any individual article I agree that an effect size may or may not matter, though I certainly fall on the side of saying that it almost always does.

    However, as I always tell students writing their dissertations (and I’ve been on a lot, being one of those dreaded quant persons), their work doesn’t exist in isolation. Instead it’s in the ecology of overall studies in an area. Their dissertation in and of itself may not be that important as a piece of science. It probably won’t be, though every once in a while one is.

    However, the information they do gather may end up being part of estimating the file drawer of a meta-analyst to be (possibly their future selves). Or the descriptive statistics may be a key part of someone else’s power study (possibly theirs). Or the only statistics measured on that population and thus be a key part of the evidence base. This can be an important reality check when someone asks a question like “I’m interested in comparing the experiences of primarily rural and urban religious minorities. Are there any studies that consider the use of acculturation scales on Hasidic Jews?”

    If this information is rhetorically distracting, put it in the appendix.

  4. Pingback: I don’t care about effect sizes — I only care about the direction of the results when I conduct my experiments | The Trait-State Continuum

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s