Today is election day. The big questions everyone is asking are: who will be the next leader of the free world, and who called it?
Nate Silver made his 538 website a household name in the wake of the 2008 Presidential election, for which his model correctly predicted every state. Silver’s approach was to disdain the “hot takes” of pundits, and relied instead on complicated algorithms that crunched hard poll numbers. In particular, the 538 model uses national data to compensate for missing state polls, and also accounts for correlations between states.
While Silver’s reputation continues to be sterling, he has caused some consternation among the Hillary faithful during the home-stretch of the election by projecting much higher chances for Trump (above 1 in 3 for a while) than other sites that have cropped up in the past few election cycles. For example, the New York Times forecast gives Trump a 16% chance on election day:
While some sites were even more sanguine for Clinton:
In fact, the battle-lines have shifted from Silver vs. Taleb (as I wrote about here), to Silver vs. Sam Wang and the Huffington Post. Silver is highly skeptical, based on the possibility of polling error, that anyone can say that a Clinton victory is more than 98% likely.
A (very) naive approach, taking all states as independent, will lead to a lot of false certainty, since there is only a minuscule path for Trump given the multiplying the state-by-state probabilities he needs to win. However, we know that states are not independent spins. News that flips one state is likely to have an effect on other states as well. Nevertheless, if the lead is big enough, sites like the Huffington Post that assume relatively weak correlations give a Clinton a 98% slam-dunk.
Conversely, my feeling is that the gears and levers under the hood of 538’s model have created such a non-linear system that even a modest margin-of-error, but especially if you allow for the chance that there is a systemic bias, can tip an almost-certainty into serious doubt. Remember, 538 runs nationwide simulations in which state-state correlations are highly relevant, on top of using nationwide polls to adjust state numbers. This magnifies uncertainty in a close race, as states hover on the edge of being flipped.
So who is right? From a philosophical point of view, we may never be able to know the “correct” probability of a one time event, which depends so much on the assumptions used and cannot be repeated to build up statistics. However, if a model predicts many events – like state-by-state results – Jordan Ellenberg suggests the Brier score may be of use in evaluating its performance. The formula is:
Which rewards both accuracy and risk-taking. From Wikipedia:
Suppose that one is forecasting the probability P that it will rain on a given day. Then the Brier score is calculated as follows:
- If the forecast is 100% (P = 1) and it rains, then the Brier Score is 0, the best score achievable.
- If the forecast is 100% and it does not rain, then the Brier Score is 1, the worst score achievable.
- If the forecast is 70% (P=0.70) and it rains, then the Brier Score is (0.70-1)² = 0.09.
- If the forecast is 30% (P=0.30) and it rains, then the Brier Score is (0.30-1)² = 0.49.
- If the forecast is 50% (P=0.50), then the Brier score is (0.50-1)² = (0.50-0)² = 0.25, regardless of whether it rains.
Sticking with a maximally un-informed prior (50/50) always gives the same score. In order to improve, the model both has to get it right, and also assign useful probabilities. Notice that even a perfect model will be wrong a lot of the time. For example, an event that is correctly predicted to have a 15% chance will still happen… 15% of the time. The quality of the model can thus only be assessed over many predictions.