Damn Lies and Statistics

What never ceases to amaze me is how incredibly good the human mind is at some tasks (like divining the state of other human minds – a kind of “social superpower“) and yet so bad at others – like probability and statistics. It’s not just that our intuition is sometimes unable to answer questions about risk and chance; often, our intuition is indignantly screaming the wrong answer. Examples are plentiful: The Monte Hall Problem, Simpson’s ParadoxBertrand’s Box, The Shared Birthday Problem, Wason’s Selection Task, among many others. In each case, the most common reaction must be corrected with lengthy logical explanations to the contrary – see Daniel Kahneman’s system one and system two. Since thinking is costly, the mind gets by on heuristics, which are quick and dirty, and often mislead.

A recent post had a great example of one of the problems that can be caused by selection bias, in which the sample is not truly random, since their is some “unobserved variable” wreaking havoc. That is, if you only look at a subset of a population that was selected because of some combination of two variables, it will look like there is a negative correlation between the variables, even if in reality they are totally independent. This is sometimes called Berkson’s paradox.

See here for one graph that explains everything:

\

This data was generated so that looks and smarts are totally independent. To be selected as an actor, however, the sum of looks and smarts must exceed a certain minimum threshold. Since it is much less likely to have both superstar looks and genius level intelligence simultaneously than to have one or the other, if you only look at the subset of employed actors and ignore the overall population they were selected from, you would erroneously conclude that there is some negative correlation between smarts and looks.

Author: lnemzer

Associate Professor Nova Southeastern University