You’ve probably seen the chart from the recent Financial Times reporting on the decline of Conscientiousness (a personality trait measure of being organized, goal-oriented, and disciplined) among people age 16-39, and you might have come across the conflicting commentaries either amplifying the doom or stating that it’s dramatic, twisted stats of the reckless “damned lies” variety. In this post, I’d like to surface some of the underlying context and explain some of the psychometrics/stats to help lay readers better understand the data FT is reporting on and whether subsequent criticisms of the stats are valid.

Some Background On Me

While not my primary area of expertise (that would be the psychology of gaming and virtual worlds), I have published multiple peer-reviewed academic papers (1, 2, 3) on the intersection of the Big 5 personality traits and longitudinal behavior in virtual worlds. One of these papers was published in the journal of Social Psychology and Personality Science. And although personality assessment hasn’t been the focus on my work, the use of established psychometric methods to assess gaming motivations (1, 2, 3) has been a focus of my work for many years. I started on the academic side of things and am currently on the industry/applied side, so my perspective also balances academic rigor and presenting research in applied settings.

Isn’t It a Drop of Just 2.25 on a Scale from 9-45?

While FT transformed the raw data into percentiles, the raw mean scores are accessible online from USC Dornsife. In the underlying Big 5 inventory, there are 9 items for Conscientiousness, each scored on a 1-5 scale. For this particular study, the scores are summed. Thus, the raw scores for Conscientiousness range from 9-45. The online data shows that among the 16-39 cohort, the drop in Conscientiousness goes from 35.51 (2015) to 33.26 (2025)–so a drop of 2.25.

But just because 2.25 “looks” small doesn’t mean the effect itself isn’t meaningful. After all, a body temperature change of 2.25 degrees Fahrenheit can be enough to cause mild hypothermia or fever. To assess the true magnitude of this drop of 2.25, we need to understand the typical variance of Conscientiousness.

Some commenters have pointed out that this 33.26 would still be “pretty high” on a scale that goes from 9-45, so any claims of “low Conscientiousness” are unwarranted.

The issue is that the raw scores on personality inventories aren’t meant to be directly interpretable this way. Consider these potential inventory items for Conscientiousness:

  • I can make a 1-month plan and stick to it
  • I can make a 5-year plan and stick to it

Certainly, more people would select “Agree”/”Strongly Agree” with the former than the latter since it’s a much lower bar, but these are both valid items of Conscientiousness. Psychometric inventory items always include a hidden magnitude component like this. For example, the raw response score would differ if you phrase a gaming motivation as “enjoy winning a match” vs. “enjoy dominating other players”. Thus, it’s possible to generate equivalent, highly-correlated inventories that have very different raw scores.

Consider also that there is likely a social desirability bias component in answering Big 5 inventory items like “Tends to be lazy” which would also elevate the raw scores.

This is why psychometric scores can only meaningfully be interpreted comparatively. As Sanjay Srivastava, one of the well-known researchers on the BFI-44, has stated: “Norms are most emphatically NOT an absolute interpretation — they are unavoidably comparative.”.

As an analogy, IQ scores are also always presented on a normed basis, centered around 100. Your raw score on an IQ test (which is typically not provided) is not meaningful because IQ is an entirely relative construct. What matters is how you score relative to other people, not where the raw score falls in the potential range. Personality is the same: you are only more or less introverted/extraverted in comparison with other people.

And the 33.26 in Conscientiousness, while high in the range of the 9-45 scale, is below average in comparison with the population norm.

But Isn’t It Still Very Small From an Effect Size Standpoint?

To assess the magnitude of a raw delta, we typically calculate its effect size. The 2022 paper from Sutin et al based on the same underlying data set shows the standard deviations (SDs) of Conscientiousness for different subsets in that year range (between 2014-2022 in Table 1). The average SD for Conscientiousness is 5.61. So an estimated effect size (Cohen’s d) for the FT finding would be 0.40. For the stats nerds, Cohen’s d in this context is equivalent to the delta as a function of the SD–in this case, it’s 40% of the SD.

Cohen provides some basic benchmarks for interpreting d, with 0.2=small effect and 0.5=medium effect. While some academics have characterized this as a “very small effect”, I think it would be more accurate to describe this a “small-to-medium” sized effect if we’re applying Cohen’s benchmarks.

The estimated effect size is d=0.40, which is a “small-to-medium” sized effect.

While Cohen was very deliberate in stating that his benchmarks (e.g., 0.2=small effect) should only be taken loosely, academic psychology research has a tendency to apply them broadly and rigidly. But Cohen was abundantly clear in his writing:

The terms ‘small,’ ‘medium,’ and ‘large’ are relative, not only to each other but to the area of behavioral science or even more to the specific content and research method being employed in any given investigation. Thus, the conventions serve only as a general guide to what may be considered a small, medium, or large effect, but there is no absolute, universal, or objective definition of what constitutes a small, medium, or large effect. As such, they should be used with caution and with awareness of the specific research context.

Effect Size is a Deceptively Decontextualized Metric

Effect size is a statistic measuring the impact of the IV (independent variable, e.g., an experimental condition/intervention, comparison between groups/time) on the DV (dependent variable, i.e., the outcome), but it’s not actually a metric of practical importance. There’s a lot of context that isn’t part of its calculation:

  • It doesn’t account for the study context. For example, psychology experiments in controlled, sterile labs have inflated effect sizes compared to real-world interventions since the impact of many extraneous variables (e.g., room temperature) can be minimized in a lab setting.
  • It doesn’t account for the magnitude of the IV. For example, say I designed a study examining the impact of a 2-year 24/7 personal-trainer-guided nutrition and exercise course on weight loss and I tell you the effect size is huge. Does this finding have practical importance if hardly anyone could afford this intervention?
  • It doesn’t account for the importance of the DV. It’s easier to detect differences in stable aggregated repeat measures (like asking about Conscientiousness 9 times) than in a single behavioral outcome (like whether you actually graduate from college), but the latter is presumably a much more salient metric.

Effect sizes tell you how much impact an IV has on a DV, but we shouldn’t solely use effect size to determine whether a finding has practical importance.

What is an Effect Size of 0.40 Concretely?

The easiest way to grok effect sizes is to show what this equates to in more familiar contexts. A decline with an effect size of 0.40 is the same as:

  • All men in the US (age 20+) losing on average 18 lbs (SD = 44 lbs, source)
  • All men in the world (age 18+) losing on average 1.2 inches in height (SD = 2.98 inches, source)
  • Everyone in the world losing on average 6 points of IQ (SD = 15)

It’s the same as all men (age 20+) in the US losing on average 18 lbs.

Small Effect Sizes Can Have Outsized Impacts at the Population Level

Even if an effect size is small, its cumulative effect across an entire population can be substantial. This is also a part of the context that the calculation of effect size leaves out–whether the finding pertains to just Bob next door, or to everyone who has asthma, or to every single person in the US population.

Consider that low-dose aspirin is classified as a first-line Class I Level A intervention for secondary prevention of heart attacks (i.e., among people who have already had a heart attack), but the actual effect size for this intervention is only between 0.14-0.21 (i.e., less than half the effect size of the FT finding). Each year, about 200,000 people have a secondary heart attack. Low-dose aspirin confers a 20% relative risk reduction. Thus, even though the effect size is small, this intervention is still reducing the total number of secondary heart attacks each year by the tens of thousands. At a population level, it is a worthwhile intervention.

The Poor Got Poorer

An important aspect of the FT finding that may be getting lost in the shuffle is that the largest decline in Conscientiousness is occurring within the demographic cohort (age 16-39) that had the lowest Conscientiousness to begin with. So even if the drop were “small”, the cumulative effect of the drop on top of the initial depressed value is an important part of the context to keep in mind.

Let’s Compare Apples to Apples

To put into proper context the FT finding on the decline of Conscientiousness, we need to identify a relevant baseline. An appropriate benchmark for this is longitudinal lifespan studies of personality–how large are naturally-occurring shifts in Conscientiousness? Is it a very stable trait or a volatile trait?

In a meta-analysis of hundreds of longitudinal personality studies, the largest contiguous change in Conscientiousness occurs between ages 12-47–teenagers become more conscientious as they get older, peaking around age 47, and then become less conscientious. The y-axis on the chart below is showing the cumulative change in effect size (in Cohen’s d). The cumulative effect size between ages 12-47 (a 35 year period) is around 0.59.

The FT finding is an effect size of 0.40 over a 10-year period. We can compare the rate of change from both data sets and it comes out to a baseline change of .0169/year vs. the FT finding of .04/year. So the FT observed decline in Conscientiousness is occurring at a rate that is ~2.4 times higher than what would naturally occur between age 12-47. To me, personally, that feels like a substantive difference. But FT’s framing of “Conscientiousness in freefall” is probably overstating the effect: even in their reported data, the rate of decline has slowed in recent years.

Isn’t The Data Based on Non-Representative Internet Surveys?

Some have characterized the underlying data set as being wholly unrepresentative of the US population. For example, Chris Ferguson in his criticism of the FT reporting, argued that the findings “can’t be generalized to the general public” because it’s based solely on “people who use the internet often enough to fill out internet surveys”.

This is a mischaracterization. The Understanding America Study (UAS) is a nationally representative sample based on ~6,000 (and growing) US adults who were recruited via address-based sampling (i.e., physical residential addresses), randomly sampling US adults and then using weights to align the collected data with census-based demographic targets (e.g., race, sex, age, education, household size). While the survey itself was presented via the internet (i.e., an online survey), the panel members were recruited via physical address-based sampling. And crucially, the UAS provided internet access and a tablet computer to any panel members who lacked them. So the UAS is technically an “internet panel”, but to the claim that it’s a sample of unrepresentative people on the internet is incorrect.

The study respondents were recruited based on random sampling of physical residential addresses.

Transforming Raw Scores to Percentiles Actually Makes a Lot of Sense

Some commentaries have accused FT of “data torturing” the raw survey scores to percentiles to produce sensationalized media headlines. I respectfully disagree. Having had to convey statistical (often survey-based) findings to non-technical teams over the past 20 years, I’ve learned that most people don’t have a good intuition for what constitutes a meaningful delta on a 5-point psychometric scale in large sample surveys. They often severely overestimate it, intuiting that deltas >1 (between groups) are common and only deltas >1 are meaningful. In reality, with a trait like Conscientiousness that has an average SD of 0.62 (based on Sutin’s data), two groups would only need to differ by 0.43 on their mean scores to reach an effect size of 0.70 (a large effect size by Cohen’s d benchmarks).

Lay readers often severely overestimate what constitutes a meaningful delta on a 5-point psychometric scale in large sample surveys.

Also in the case of the Big 5, the means and standard deviations of each factor are different. Conscientiousness consistently has a lower SD than Neuroticism, Extraversion, and Openness in Sutin’s analysis (see Table 1). This means that a 2.25 drop in Conscientiousness is a larger decrease than a 2.25 drop in N/E/O, which is also completely not intuitive to lay readers.

Transforming the raw scores into percentiles for lay readers gets around both these issues: it avoids lay readers misinterpreting raw deltas on a survey scale and aligns the changes across the 5 personality traits in the same magnitude scaling.

In Our Data of 1.85M+ Gamers, The Appeal of Strategy Declined Over the Same Time Period

One of the reasons why the FT finding intrigued us is because it happens to align with a longitudinal finding in our own survey data of gaming motivations among 1.85+ million gamers collected over the past 10 years. Of the 12 gaming motivations in our model, Strategy (the appeal of thinking and planning) has changed the most–it has decreased noticeably since 2015. The effect size we saw in our data was 0.44, which is really close to the estimated 0.40 in FT’s analysis.

So we have two data sets covering the same time period, looking at roughly the same psychometric construct, but using completely different population samples, completely different methodologies and questions, and yet, the findings end up aligning almost perfectly. In our data, we also find that it’s the youngest gamers who saw the largest declines in Strategy.

When we published the Strategy finding last year, gamers tended to blame game developers for making dumbed-down games and some market analysts wondered if this was just the consequence of gaming becoming more mainstream. It wasn’t clear whether we were seeing a “gamer only” finding or if it was something broader. The FT finding provides some evidence that the Strategy finding was indeed part of something much bigger.

It’s Hard to Pinpoint Why Conscientiousness is Dropping

It’s not hard to find examples of our attention spans decreasing over time. For example, in recent years, shorter YouTube videos have garnered a higher share of overall views. And an Atlantic article in late 2024 found that many college students, even at elite colleges, now find it challenging to read books.

While we often intuitively blame cellphones and social media for our decreased attention spans, there’s a lack of concrete causal evidence for this. And because all longitudinal effects are inherently correlational, it’s difficult to pin down cause and effect. Of course, it bears pointing out that causal evidence for this would be inherently difficult to produce since it’s unethical to raise children in artificial labs.

It could indeed end up that this drop in Conscientiousness is caused by cellphones and social media, but it could also be microplastics accumulation, or long COVID, or sleep deprivation from bright-blue-light screens at night, or mental exhaustion from the rage-baited news cycles, or some bits of all of the above, or something else entirely. For now, it’s very much an effect without a clear, evidence-based cause. We just don’t know.

A Counterpoint: Is Lower Conscientiousness a Bad Thing?

I’ll end with an invitation to think about personality traits in a more nuanced way.

While it’s easy to assume that low Conscientiousness would lead to poorer life outcomes (and personality research certainly points that way), it bears pointing out that the FT finding is that Conscientious is “lower than before”, but that’s not the same as saying that Conscientiousness is “lower than it should be”. Personality trait measures are entirely relative things; there is no “baseline” Conscientiousness beyond historical data. There is no “norm” beyond current population means.

Conscientiousness is “lower than before”, but that’s not the same as saying it’s “lower than it should be”.

It’s much easier to plan things and set long-term goals when you live in a relatively stable society, say the US between 1990-2015, before Brexit/COVID/war in Ukraine/US capitol riot/rise of AI/tariff yo-yo, and before media headlines constantly reminded us that everything is “unprecedented”. Could it be that Conscientiousness had been skewing higher than “normal” during that earlier, stable period? During periods of flux, isn’t the ability to adapt and not stick stubbornly to a fixed 10-year plan an advantage?

Or put simply: Could you confidently tell an 18-year-old college student right now what they should be majoring in?

What Do You Think?

Tell us below what you think is causing this drop in Conscientiousness, especially among people in the 16-39 age range.