When Things Aren’t What They Seem: Simpson’s Paradox

Also known as “Simpson’s Bias,” this way of analyzing and looking at data can lead to mistakes in everything from policy to medical decisions.

René F. Najera, MPH, DrPH

--

Aggregated data tells a different story than looking at data by groups.

Simpson’s bias is a statistical paradox first described by Edward H. Simpson in a 1951 paper titled “The Interpretation of Interaction in Contingency Tables.” It demonstrates how aggregated data can mask underlying behaviors within sub-groups. This phenomenon occurs when a relationship observed within multiple groups reverses direction when the groups are combined. This is due to the lurking variables that are not accounted for in the aggregated data.

Origins and History

The paradox was named after British statistician Edward Simpson, who detailed the effect in his work. However, the phenomenon had been noticed earlier, with instances documented in the 1890s. Karl Pearson, another statistician, encountered a similar issue in 1899, illustrating that the paradox has been challenging researchers across various fields for over a century.

Application in Everyday Decisions

Simpson’s bias can significantly affect everyday decisions, especially in scenarios where decisions are made based on…

--

--

René F. Najera, MPH, DrPH

DrPH in Epidemiology. Associate/JHBSPH. Adjunct/GMU. Epidemiologist. Father. Husband. (He/Him/His/El)