Why Most Published Research Findings Are False

26 Feb 2018 » James Diao » New Haven, CT

One of the first papers that my mentor at HMS assigned to me was entitled: “Why Most Published Research Findings Are False”. I was caught by surprise—it sounded like the poorly informed doubting of a conspiracy theorist, rather than the most cited paper in PLoS Medicine, written by the legendary Stanford epidemiologist John Ioannidis.

The paper itself is incredibly to-the-point, almost brash. It begins:

Published research findings are sometimes refuted by subsequent evidence, with ensuing confusion and disappointment. Refutation and controversy is seen across the range of research designs, from clinical trials and traditional epidemiological studies [1–3] to the most modern molecular research [4,5]. There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims [6–8]. However, this should not be surprising. It can be proven that most claimed research findings are false.

How does one even begin to prove such a claim? The simplest approach would be to actually replicate representative experiments one-by-one, and other scientists actually do this in later papers. But in this article, Ioannidis approaches the issue from a statistical standpoint. Here’s where it gets interesting.

1. Overview

Ioannidis explores the reliability of research findings in terms of positive predictive value (PPV)—the probability that a finding is actually true, given that it is claimed to be true.

\[ PPV = \frac{P(\textrm{Claimed True and Actually True})}{P(\textrm{Claimed True})} \]

Using basic probability theory, we can derive an expression for PPV in terms of well-defined and estimable parameters: power (1-β), significance level (α), bias (u), number of studies (n), and pre-study odds (R).

\[ PPV = \frac{R(1-\beta’)}{R(1-\beta’) + \alpha’} \]

where, if you account for bias and multiple studies under a few assumptions:

\begin{align} \beta’ &= (1-u)^n \beta^n \
\alpha’ &= 1 - (1-u)^n(1-a)^n \end{align}

(The derivation of the above expressions are given in parts 3-4 of this post.)
This expression allows you to do two things: estimate PPV for different study designs, and model how it changes as a function of external factors.

1.1 Estimation of PPV for Different Study Designs

Table 4: PPV of Research Findings for Various Combinations of
Power (1 − β), Ratio of True to Not-True Relationships (R), and Bias (u)

Power (1-β)
Ratio (R)
Bias (u)
Study Example
0.80 1:1 0.10 Well-designed RCT
0.95 2:1 0.30 Confirmatory meta-analysis (well-designed RCTs)
0.80 1:3 0.40 Corrective meta-analysis (small inconclusive studies)
0.20 1:5 0.20 Phase I/II RCT (underpowered, unbiased)

0.20 1:5 0.80 Phase I/II RCT (underpowered, high bias)
0.80 1:10 0.30 Exploratory epidemiological study (well-powered)
0.20 1:10 0.30 Exploratory epidemiological study (underpowered)
0.20 1:1000 0.80 Discovery research with massive testing
0.20 1:1000 0.20 Same as above, but standardized (lower bias)

Note: significance level is assumed to be α = 0.05 for any single study; PPV values do not account for study multiplicity.

Ioannidis does not describe how he arrived at his estimates for power, bias, and pre-study odds, and one might disagree on these numbers. However, given the popularity of this article, the scientific community seems to agree with these conclusions. Real-world studies might be more or less predictive, but achieving a PPV of over 50% is quite challenging in Ioannidis’s model, and would be even more challenging once study multiplicity is accounted for.

1.2 Modeling PPV as a Function of External Factors

Power decreases as you move down the rows (A to B to C). Bias and number of conducted studies are indicated by colors (blue to orange).

1) Power vs. Bias
PPV as a Function of Bias
2) Power vs. Study Number
PPV as a Function of Multiplicity

PPV can assume a range of values, but low pre-study odds, low power, heavy bias, or large study multiplicity can dramatically decrease PPV.

Although Ioannidis’s figures are revealing, it’s easier to draw insights from an interactive application. Michael Zehetleitner and Felix Schönbrodt have developed an awesome interface for observing how PPV changes based on Ioannidis’s model. Check it out here!

2. Corollaries

Ioannidis describes several corollaries (which I’ve tabulated below) that describe how real-world factors can affect the reliability of research.

Corollary Study Characteristic Effect on Model
Effect on Findings
1 Confirmatory Testing
↑ Pre-Study Odds ↑ PPV
2 Discovery Testing
↓ Pre-Study Odds
3 ↓ Sample Sizes ↓ Power ↓ PPV
4 ↓ Effect Sizes ↓ Power ↓ PPV
5 ↑ Study Design Flexibility
↑ Bias ↓ PPV
6 ↑ Financial Interests ↑ Bias ↓ PPV
7 ↑ Conceptual Prejudices ↑ Bias ↓ PPV
8 ↑ Popularity of Field
↑ Number of Studies

3. Derivation of the Basic PPV Model

Although Ioannidis provides a quick overview of his math, I thought it would be a useful exercise to derive everything more thoroughly.

3.1 Definitions

Let the events A and C indicate the actual and claimed truth value of a given proposition. For example, the probability that the proposition is true is P(A), and the proposition that the proposition is false is P(Ac) = 1 - P(A). For readability, I’ll use the following notation instead:

Recall the definition of type I and II errors:

3.2 Prevalence of Actually True Propositions

The first step is to look at how often propositions are actually true. Let $R$ denote the odds-ratio: \[R = \frac{P(AT)}{P(AF)} \]

$R$ also indicates the pre-study odds and the ratio of true to false propositions in a circumscribed field. We can then derive:

\begin{align} P(AT) &= \frac{P(AT)}{P(AT)+P(AF)} = \frac{R}{R+1} \\
P(AF) &= \frac{P(AF)}{P(AT)+P(AF)} = \frac{1}{R+1} \end{align}

Using these results and the definitions of type I and II errors, we can complete the confusion matrix:

3.3 Confusion Matrix

A confusion matrix is a 2x2 table that records the probability of true positives (top left), false positives (top right), false negatives (bottom left), and true negatives (bottom right). Each square is a joint probability, which can be decomposed into a product of known values by Bayes’ Rule.

Confusion Matrix 1

3.4 Positive Predictive Value

To determine whether “most published research findings are false,” we can evaluate the positive predictive value (PPV) by dividing the value in top left corner by the sum of the top row.

\begin{align} PPV &= P(AT \,|\, CT) = \frac{P(CT, AT)}{P(CT)} \\\ &= \frac{P(CT, AT)}{P(CT, AT) + P(CT, AF)} \\
&= \fbox{\(\frac{R(1-\beta)}{R(1-\beta) + \alpha} \)} \end{align}

Thus, PPV is a function of the pre-study odds (R), the power (β), and the significance level (α).

4. Derivation of PPV Model Corrections

4.1 α and β Corrections for Bias ($u$)

Ioannidis investigates the impact of bias as it influences α and β. Here, he introduces a new term $u$, the fraction of probed analyses that should not have been findings, but end up reported as such. In other words, $u$ is the fraction of propositions in CF that become CT.

Let the subscript [ ]$u$ indicate the value of [ ] after considering bias. \[ u = \frac{CF_{changed}}{CF_{original}} = \frac{P(CF)-P(CF_u)}{P(CF)} = 1 - \frac{P(CF_u)}{P(CF)} \]

We want to know how α and β are affected by bias. First, we derive a few relations: \begin{align} P(CF_u) &= (1-u)P(CF) \\
P(CT_u) &= 1- P(CF_u) \\
&= 1 - (1-u)(1-P(CT)) \\
&= u + (1-u)P(CT) \end{align} Using the above relations: \begin{align} \beta_u &= P(CF_u \,|\, AT) \\
&= (1-u) P(CF \,|\, AT) \\
&= \fbox{\( (1-u) \beta \)} \\
\alpha_u &= P(CT_u \,|\, AF) \\
&= u + (1-u) P(CT \,|\, AF) \\
&= \fbox{\( u + (1-u)\alpha \)} \end{align}

4.2 α and β Correction for Multiple Studies ($n$)

Let $n$ indicate the number of studies, and the subscript [ ]$n$ indicate the value of [ ] after considering $n$ studies.

If all studies are independent:

\[ P(CF_n) = \prod_{i=1}^n P(CF_i) \]

If all studies are equally powered:

\[ P(CF_n) = P(CF)^n \]

Using the above relation (under assumptions of independence and equal power):

\begin{align} \beta_n &= P(CF_n \,|\, AT) \\
&= P(CF \,|\, AT)^n \\
&= \fbox{\( \beta^n \)} \\
\alpha_u &= P(CT_n \,|\, AF) \\
&= 1 - P(CF_n \,|\, AF) \\
&= 1 - P(CF \,|\, AF)^n \\
&= \fbox{\( 1-(1-\alpha)^n \)} \end{align}

4.3 Joint α and β Corrections

Although Ioannidis only considers bias and multiple studies one at a time, it is simple to extend his model to consider both simultaneously.

Let the subscript [ ]$u$,$n$ indicate the value of [ ] after correcting for bias and multiple studies. Since biases are study specific, they are accounted for before aggregating.

If power and bias are the same across all studies:

\begin{align} \beta_{u,n} &= \left[ (1-u)\beta \right]^n \\
&= \fbox{\( (1-u)^n \beta^n \)} \\
\alpha_{u,n} &= 1 - \left( 1-[u + (1-u) a] \right)^n \\
&= \fbox{\( 1 - (1-u)^n(1-a)^n \)} \end{align}

These are the modified α and β values that I listed at the beginning. If power and bias differ between studies, we must index them individually:

\begin{align} \beta_{u,n} &= \prod_{i=1}^n (1-u_i) \beta_i \\
\alpha_{u,n} &= 1 - \prod_{i=1}^n (1-u_i)(1-a_i) \end{align}

5. Citation

All derivations are based the paper Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124. http://doi.org/10.1371/journal.pmed.0020124

<< Back to Posts