Who Will Win?

Today is election day. The big questions everyone is asking are: who will be the next leader of the free world, and who called it?

Nate Silver made his 538 website a household name in the wake of the 2008 Presidential election, for which his model correctly predicted every state. Silver’s approach was to disdain the “hot takes” of pundits, and relied instead on complicated algorithms that crunched hard poll numbers. In particular, the 538 model uses national data to compensate for missing state polls, and also accounts for correlations between states.

Capture.JPG

While Silver’s reputation continues to be sterling, he has caused some consternation among the Hillary faithful during the home-stretch of the election by projecting much higher chances for Trump (above 1 in 3 for a while) than other sites that have cropped up in the past few election cycles. For example, the New York Times forecast gives Trump a 16% chance on election day:

Capture.JPG

While some sites were even more sanguine for Clinton:

Capture.JPG

In fact, the battle-lines have shifted from Silver vs. Taleb (as I wrote about here), to Silver vs. Sam Wang and the Huffington Post. Silver is highly skeptical, based on the possibility of polling error, that anyone can say that a Clinton victory is more than 98% likely.

Capture.JPG

A (very) naive approach, taking all states as independent, will lead to a lot of false certainty, since there is only a minuscule path for Trump given the multiplying the state-by-state probabilities he needs to win. However, we know that states are not independent spins. News that flips one state is likely to have an effect on other states as well. Nevertheless, if the lead is big enough, sites like the Huffington Post that assume relatively weak correlations give a Clinton a 98% slam-dunk.

Conversely, my feeling is that the gears and levers under the hood of 538’s model have created such a non-linear system that even a modest margin-of-error, but especially if you allow for the chance that there is a systemic bias, can tip an almost-certainty into serious doubt. Remember, 538 runs nationwide simulations in which state-state correlations are highly relevant, on top of using nationwide polls to adjust state numbers. This magnifies uncertainty in a close race, as states hover on the edge of being flipped.

So who is right? From a philosophical point of view, we may never be able to know the “correct” probability of a one time event, which depends so much on the assumptions used and cannot be repeated to build up statistics. However, if a model predicts many events – like state-by-state results – Jordan Ellenberg suggests the Brier score may be of use in evaluating its performance. The formula is:

Capture.JPG.

Which rewards both accuracy and risk-taking. From Wikipedia:

Suppose that one is forecasting the probability P that it will rain on a given day. Then the Brier score is calculated as follows:

  • If the forecast is 100% (P = 1) and it rains, then the Brier Score is 0, the best score achievable.
  • If the forecast is 100% and it does not rain, then the Brier Score is 1, the worst score achievable.
  • If the forecast is 70% (P=0.70) and it rains, then the Brier Score is (0.70-1)² = 0.09.
  • If the forecast is 30% (P=0.30) and it rains, then the Brier Score is (0.30-1)² = 0.49.
  • If the forecast is 50% (P=0.50), then the Brier score is (0.50-1)² = (0.50-0)² = 0.25, regardless of whether it rains.

Sticking with a maximally un-informed prior (50/50) always gives the same score. In order to improve, the model both has to get it right, and also assign useful probabilities. Notice that even a perfect model will be wrong a lot of the time. For example, an event that is correctly predicted to have a 15% chance will still happen… 15% of the time. The quality of the model can thus only be assessed over many predictions.

Curse of the Butterfly

Butterflies are already under suspicion for many catastrophes, as in: “Does the flap of a butterfly’s wings in Brazil set off a tornado in Texas?” This so-called Butterfly effect, in which the final outcome is very sensitive to the exact initial conditions, is one of the basic finding of chaos theory. We may never know which butterfly to blame for Hurricane Matthew, but the storm sent me on an unexpected trip from South Florida to Tallahassee.

Image result for hurricane matthew

While visiting the museum inside Florida’s old capitol, I came upon the exhibit for the 2000 presidential election, which I remember distinctly, since it was the first I was old enough to vote in. Of course, the exhibit put the most pro-Florida spin possible on the debacle that demonstrated “the resilience of our democracy,” including the fact that the State Supreme Court ordered a full recount, only to be overruled by the US Supreme Court. The size of the victory either way was so razor-thin that it was certainly swamped by the margin of error. Mathematician John Allen Paulos suggested that flipping a coin was the most fair method to resolve the election, rather than trying to measure “bacteria with a yardstick.” Yet, the need to have a definitive answer caused us to scrutinize “hanging chads.” Multiple factors converged to swing the election. Primarily, a kind of butterfly was blamed, the “butterfly ballot.”

Image result for butterfly ballot old capitol

The museum wisely placed it behind protective glass, lest it be attacked by vengeful Gore supporters.

Image result for butterfly ballot old capitol

The irony was that the whole purpose of the unconventional ballot was to help older voters by increasing the size of the font. Combined with the many ballot issues that year, the decision was made to put candidate names on both sides. However, this confused some voters who wanted to vote for Gore, but mistakenly punched the second hole from the top, registering a vote for Pat Buchanan. While we can’t know for sure, there is strong evidence that this factor alone cost Gore thousands of votes, certainly many, many more than 537, the ultimate official margin of victory (2,912,790 to 2,912,253). During the weeks of uncertainty, many Bush voters wondered how anyone could miss the obvious, large arrows that clearly mark the proper hole. The curse of knowledge was at work. Once you know the right answer, it is hard to put yourself back in the mindset of not knowing and imagine  bumbling it. However, it is very plausible that someone seeing the ballot for the first time and told to adhere to the five minute time limit just saw the left page and punched accordingly.

Says Who?

Few things are more exciting to me that an intellectual throw-down between minds I respect on topics of fundamental importance. Recently, Nate Silver, who makes his living building prediction models, was under fire from Nassim Nicholas Taleb, who makes his living breaking them down. At issue was the foundational epistemological question of how much we really know, or rather, how confident we may presume in what we think we know. Taleb – who is famous for his books Fooled by Randomness, The Black Swan, and Anti-fragile – lives by the watchword “incerto” which means we should be much more circumspect about what we think we understand. He popularized the concept of a “Black Swan,” a momentous event that could not have been anticipated based on all previous observations. Consistent with this, Taleb has successfully implemented strategies that benefit from extreme, unexpected events that have previously been undervalued. Taleb has long inveighed against the widespread misuse of Gaussian distributions, especially those implicated in the 2008 finical crisis. Nate Silver’s models are much more sophisticated, so I was interesting when Taleb came after them:

The basic idea is that the prediction probabilities at 538 for each candidate to win are too volatile to be right. Instead, Taleb suggested a model based on his specialty, pricing options.

taleb2

The result is a much more parsimonious model, in which the chances are stuck at 50/50 until right before election day, at which time they jump to near certainty:

taleb1

Silver’s podcast riposte was swift and harsh, saying that when properly modeled – particularly by correctly accounting for the correlations between states – data from polls are actually very good predictors of the future election result. Especially after the conventions, the probability for a candidate to win can be forecast we a least some confidence.

This debate reminded me of Sean Carroll’s new book, The Big Picture. In it, he makes the astounding claim that the “Core Theory” of physics, which includes quantum field theory and relativity, can explain every experiment ever performed on Earth.

The reason for such incredible predictive power is that quantum field theory itself provides a recipe for including progressively smaller correction terms, which converge to a very good degree of accuracy. Carroll says that physics is “simple” compared with other fields, like biology or economics, in that reductionism works fantastically well, and every particle, and interaction between them, can be understood separate from the rest of the Universe. This is why the dimensionless magnetic moment of the electron is known to an uncertainty of a few parts per trillion.

This is, of course, not to say that we have (or will ever have) the functional omniscience of Laplace’s Demon, who can turn perfect knowledge of the laws of nature and the configuration of all particles at any time to perfectly predict their positions at every other time. The wild undulations of chaos theory, and the inherent uncertainty in quantum mechanics, preclude this. So perhaps the best approach is the one Carroll himself takes in “The Big Picture” – a Bayesian framework in which we are aware the credences we apply to various propositions and work to keep them up to date as new information becomes available.

Who ‘ya gonna call? Scientists!

A big change has occurred in the way our heroes are portrayed in fiction.

A good example is 1984 film Ghostbusters and its recent update.

In addition to the obvious gender change, a more substantive alteration to the characters is their profession

in the original they are parapsychologists, but now they are physicists (or engineers, plus a history buff). The main tension throughout the film is their effort to convince the world that the study of ghosts is real “science.” From the MIT consultants,  the movie is filled with real quantum mechanics equations.

Gallery Image

The science doesn’t end there. The proton packs are explained to be mini particle accelerators, complete with “quadrapole” superconducting magnets. As in the original, ghosts are classified by “classes,” but now they are summoned with blue glowing devices instead of rituals performed by demonic demigods.

I see a larger trend of “Scientization” of the paranormal in fiction. That is, heroes combat “the unknown” with science. Ghosts are seen to be so fearsome exactly because appear to represent “cosmic horror” that is dangerous and beyond human understanding. But with science, they are just another, albeit hazardous, phenomenon to poke, prod, and categorize.

Do not all charms fly
At the mere touch of cold philosophy?
There was an awful rainbow once in heaven:
We know her woof, her texture; she is given
In the dull catalogue of common things.
Philosophy will clip an Angel’s wings,
Conquer all mysteries by rule and line,
Empty the haunted air, and gnomèd mine—
Unweave a rainbow, as it erewhile made
The tender-person’d Lamia melt into a shade.

-“Lamia” by John Keats

This trend shows up in other media. In Rick and Morty, even the Devil is bested by science.

And the whole premise of Gravity Falls, a very binge-worthy show, is the rational study of the “weirdness” surrounding the Mystery Shack.

In fact, in the very first minute of the series, we get the introduction:

“My name is Dipper. The girl about to puke is my sister Mabel. You may be wondering what we’re doing in a golf cart, fleeing from a creature of unimaginable horror…Rest assured, there’s a perfectly logical explanation.”

Once again, a “cosmic horror” is cut down to manageable size, and ultimately defeated, with some rational thinking. Does this reflect our collective expectations, having battled problems in real life, like germs, famine, and natural disasters, with science?

STEM-Powered

I recently visited the Oregon Museum of Science and Industry in Portland, with an emphasis on “industry.” Housed in a former power-plant, there were many opportunities for visitors to actually build things. From 3D-printers to shake-tables for earthquake testing block towers, there was plenty for young makers to enjoy.

Compared with my recollections of (many, many) hours spent in science museums during my youth, I felt that there was a greatly reduced emphasis on simply observing, as opposed to doing.

Perhaps in an era of instant access to YouTube, simply watching a video or demonstration on a scientific principle is no longer a compelling enough reason to make the trip. Since knowledge is freely available, what is valuable is the ability to create something new. The museum gift-shop offered tools for budding programmers to learn to code.

Cavalier Predictions

As a native Ohioan, I was thrilled to see the Cleveland Cavaliers complete their remarkable Championship season, ending a 52-year drought for the city.

However, at least one person was somewhat less enthused that Cleveland came out on top in 2016:

 

This is not to pick on one pundit in particular; the track-records of sports (and politics) prognosticators are rarely checked, but it should be obvious that a history of accuracy is not a strict requirement to be invited back on TV. To be generous, it may be that the point of making predictions is to summarize one’s current state of belief, in a Bayesian sense. In other words, listeners should hear “pundit X believes, based on all evidence currently available, that outcome Y is the most likely.” Unfortunately, the process of computing this belief is almost always done in an unsystematic manner in the pundit’s head, and therefore subject to numerous cognitive biases, especially the availability heuristic, recency illusion, and the subadditivity bias.

When election guru Nate Silver apologized for being wrong about the GOP primaries, he did so for acting like “a pundit” – almost all of whom, by the way, were also wrong – instead of doing the “data journalism” for which he is famous. He wrote (emphasis added):

The big mistake is a curious one for a website that focuses on statistics. Unlike virtually every other forecast we publish at FiveThirtyEight — including the primary and caucus projections I just mentioned — our early estimates of Trump’s chances weren’t based on a statistical model. Instead, they were what we “subjective odds” — which is to say, educated guesses. In other words, we were basically acting like pundits, but attaching numbers to our estimates. And we succumbed to some of the same biases that pundits often suffer, such as not changing our minds quickly enough in the face of new evidence. Without a model as a fortification, we found ourselves rambling around the countryside like all the other pundit-barbarians, randomly setting fire to things.

So the main “sin” was putting numbers to a guess, which, unlike other 538 predictions, was not based on an actual analysis. Some believe that a way to improve the quality of predictions is to have people put money on the line, as in prediction markets. The idea is that the profit motive will help eliminate inefficiencies, but his is not certain. For example, consider the UK referendum (that has the charming portmanteau “Brexit”) occurring this week on whether to “Remain” part of the European Union or “Leave.” Prediction markets today have “Remain” a 3 to 1 favorite, despite the fact that poll results have gone back and forth and currently rest almost evenly split. Perhaps some punters believe that voters may flirt with leaving, but be blocked by their better judgement at the last moment, as with the Scotland referendum. Or, as King James would put it: