Lucky or Good

A fortunate bounce that goes the way of your favorite team may tempt you to say: “it’s better to be lucky than good.” But are we even able to distinguish between the two? Advanced sports analytics begins to help us disentangle the effects of luck and skill.

So as to avoid the philosophical questions or semantics involved, let’s simply define luck as something that is not expected to continue – and teams experiencing a stretch of good or back fortune soon regressing to the mean – while skill persists over time. The law or large numbers tells us that, if we as much sports as we wanted, all lucky or unlucky deviations would “wash out” on average, and we would know the true skill of a player or team. Of course, we don’t live in such a data paradise. A simple way to quantify future win probability is with the Pythagorean expectation. At the most fundamental level, the true skill of a team depends on the number of points it scores and allows. It is possible that there exist some special skill in eking out close games, but winning games is likely to be simply a matter of “bunching” points to the best effect. For example, the 1960 World Series is famous in that the Pirates won even though “the losing team scored more than twice as many runs as the winning team, as the Yankees won three blowout games (16–3, 10–0, and 12–0), while the Pirates won four close games (6–4, 3–2, 5–2, and 10–9).” Analogies to the electoral college are obvious. The Pythagorean win expectation formula is:

where the exponent k depends on how much luck is involved. Larger values mean that the higher “quality” team wins more often. As I wrote in a previous post, the chances a game was won by the best team or the luckiest depends on the sport. The best fit exponents for different sports have been calculated:

  • English Premiere League : 1.3
  • NHL : 2.15
  • NFL : 2.37
  • NBA : 13.91

Surprisingly, the Pythagorean win expectation can better at predicting future win/loss record than even the past record. Better at predicting future record, even more than actual current record. Compare this formula with the Hill equation in biology, which models cooperative behavior like the binding of oxygen to hemoglobin

Consider the 2016 MLS Cup. The Seattle Sounders prevailed in penalty kicks despite not generating a single shot on goal for 120 minutes of regulation and extra time combined. In contrast, Toronto had seven shots on goal, including one that looked like a sure game-winner, except for an incredible save by Stefan Frei. Had it gone in, everyone would have congratulated Toronto on a dominating 1-0 victory. Instead, Seattle ended up with the cup.

Or how about Leicester city, who overcame 5000:1 odds to win the English Premiere League last year? They benefited from poor showings from the traditional EPL powerhouses, and also were lucky enough to edge out quite a few close games.

Image result for Leicester city

While most expected a return to Earth after such a meteoric rise, I don’t think many expected such a fiery crash landing. This year, Leicester city is fighting to avoid relegation.

Ice hockey is (somewhat) better when I comes to rewarding the best team, but even then, games can be decided by a bounce of the puck. To help figure out if an NHL team’s success is attributable to luck or skill, we can turn to Corsi, Fenwick, and PDO.

Unlike baseball and football, hockey doesn’t have well defined “states” to analyze. Instead, we can use shots on goal as a way to approximate puck possession. Corsi is the sum of shots on goal, missed shots and blocked shots. Fenwick is the same with blocked shots excluded. Why are shots so important?

The basic idea is that generating scoring chances takes skill, but whether a goal is actually scored is unpredictable. Just putting the puck on net allows good things to happen, like a deflection or rebound, even if the original shot doesn’t go in. Also, your opponent can’t score you have the puck. So Corsi/Fenwick is a measure of skill independent of the “luck” of goals going in. Conversely, the sum of shooting percentage and save percentage is called PDO. The thought is that these values are the luck potion, that should tend to regress to 100% over time. Of course, PDO can remain high if you have an exceptionally good goalie, or sharpshooting skaters.



From the author that brought you Moneyball and The Big Short, comes the Hollywood-ready story of an epic scientific bromance that overturned decades of economic thinking.

Michael Lewis starts his new book, The Undoing Project, with the caveat that it is kind of like the inverse of Moneyball. Instead of focusing on how people might use pure data and analytics to compensate for the fallibility of human judgement, he is going to write about the study of those foibles themselves. Although I had already read “Thinking, Fast and Slow,” Nobel laureate Daniel Kahneman’s magnum opus on how the Human brain uses heuristics, not pure rationality, to make decisions, I was still riveted by the narrative of how he and collaborator Amos Tversky worked together on these ideas with very different, but complimentary styles.

Similar to using optical illusions to understand how vision works, Kahneman and Tversky used surveys to demonstrate mental blind spots hardwired into the brain.

Even if you know about it, the Conjunction fallacy is very hard to resist:

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.
Which is more probable?
(A) Linda is a bank teller.
(B) Linda is a bank teller and is active in the feminist movement.

Since the set of all bank tellers includes bank tellers who are feminists, (A) must be more likely. By adding more restrictions, (“active in the feminist movement”) it tricks us into thinking it will be more likely. Another example: Estimate the likelihood that at least 1,000 people will have to evacuate from California this year. Now estimate the likelihood that a forest fire will start in Southern California and 1,000 people will have to evacuate this year. Again, the brain uses the representative of the narrative as an imperfect proxy for how probable we should think something is.

The main principle is that real people are affected by the way choices are framed – whether as a loss or a gain – or by how representative a description sounds. While we might now consider some of these findings obvious – people are clearly not omniscient, perfectly rational, purely selfish members of homo economicus – Behavioral economics upset a great deal of Economic theory. This is because mathematical models of economic behavior rely on assumptions like stable preferences of rational actors. The idea of “bounded rationality” makes everything a muddle. And if people are Predictably Irrational, then the errors are not just random noise, but rather a systematic bias that won’t even wash out on average.

[Parenthetically, this is another great example of the difference between uncorrelated errors, which can be improved with aggregation, versus systematic bias, which cannot]

The upside, however, is that if real people are not perfectly rational, they can be nudged into doing the right thing, like saving for retirement, with simple changes to the framing or “choice architecture,” which is basically how the options are framed.


Appendix – Behavioral Economics Bibliography:

Animal Spirits
Misbehaving: The Making of Behavioral Economics

Predictably Irrational
Stumbling on Happiness
Thinking, Fast and Slow
The Undoing Project
The Upside of Irrationality


Who Will Win?

Today is election day. The big questions everyone is asking are: who will be the next leader of the free world, and who called it?

Nate Silver made his 538 website a household name in the wake of the 2008 Presidential election, for which his model correctly predicted every state. Silver’s approach was to disdain the “hot takes” of pundits, and relied instead on complicated algorithms that crunched hard poll numbers. In particular, the 538 model uses national data to compensate for missing state polls, and also accounts for correlations between states.


While Silver’s reputation continues to be sterling, he has caused some consternation among the Hillary faithful during the home-stretch of the election by projecting much higher chances for Trump (above 1 in 3 for a while) than other sites that have cropped up in the past few election cycles. For example, the New York Times forecast gives Trump a 16% chance on election day:


While some sites were even more sanguine for Clinton:


In fact, the battle-lines have shifted from Silver vs. Taleb (as I wrote about here), to Silver vs. Sam Wang and the Huffington Post. Silver is highly skeptical, based on the possibility of polling error, that anyone can say that a Clinton victory is more than 98% likely.


A (very) naive approach, taking all states as independent, will lead to a lot of false certainty, since there is only a minuscule path for Trump given the multiplying the state-by-state probabilities he needs to win. However, we know that states are not independent spins. News that flips one state is likely to have an effect on other states as well. Nevertheless, if the lead is big enough, sites like the Huffington Post that assume relatively weak correlations give a Clinton a 98% slam-dunk.

Conversely, my feeling is that the gears and levers under the hood of 538’s model have created such a non-linear system that even a modest margin-of-error, but especially if you allow for the chance that there is a systemic bias, can tip an almost-certainty into serious doubt. Remember, 538 runs nationwide simulations in which state-state correlations are highly relevant, on top of using nationwide polls to adjust state numbers. This magnifies uncertainty in a close race, as states hover on the edge of being flipped.

So who is right? From a philosophical point of view, we may never be able to know the “correct” probability of a one time event, which depends so much on the assumptions used and cannot be repeated to build up statistics. However, if a model predicts many events – like state-by-state results – Jordan Ellenberg suggests the Brier score may be of use in evaluating its performance. The formula is:


Which rewards both accuracy and risk-taking. From Wikipedia:

Suppose that one is forecasting the probability P that it will rain on a given day. Then the Brier score is calculated as follows:

  • If the forecast is 100% (P = 1) and it rains, then the Brier Score is 0, the best score achievable.
  • If the forecast is 100% and it does not rain, then the Brier Score is 1, the worst score achievable.
  • If the forecast is 70% (P=0.70) and it rains, then the Brier Score is (0.70-1)² = 0.09.
  • If the forecast is 30% (P=0.30) and it rains, then the Brier Score is (0.30-1)² = 0.49.
  • If the forecast is 50% (P=0.50), then the Brier score is (0.50-1)² = (0.50-0)² = 0.25, regardless of whether it rains.

Sticking with a maximally un-informed prior (50/50) always gives the same score. In order to improve, the model both has to get it right, and also assign useful probabilities. Notice that even a perfect model will be wrong a lot of the time. For example, an event that is correctly predicted to have a 15% chance will still happen… 15% of the time. The quality of the model can thus only be assessed over many predictions.

Curse of the Butterfly

Butterflies are already under suspicion for many catastrophes, as in: “Does the flap of a butterfly’s wings in Brazil set off a tornado in Texas?” This so-called Butterfly effect, in which the final outcome is very sensitive to the exact initial conditions, is one of the basic finding of chaos theory. We may never know which butterfly to blame for Hurricane Matthew, but the storm sent me on an unexpected trip from South Florida to Tallahassee.

Image result for hurricane matthew

While visiting the museum inside Florida’s old capitol, I came upon the exhibit for the 2000 presidential election, which I remember distinctly, since it was the first I was old enough to vote in. Of course, the exhibit put the most pro-Florida spin possible on the debacle that demonstrated “the resilience of our democracy,” including the fact that the State Supreme Court ordered a full recount, only to be overruled by the US Supreme Court. The size of the victory either way was so razor-thin that it was certainly swamped by the margin of error. Mathematician John Allen Paulos suggested that flipping a coin was the most fair method to resolve the election, rather than trying to measure “bacteria with a yardstick.” Yet, the need to have a definitive answer caused us to scrutinize “hanging chads.” Multiple factors converged to swing the election. Primarily, a kind of butterfly was blamed, the “butterfly ballot.”

Image result for butterfly ballot old capitol

The museum wisely placed it behind protective glass, lest it be attacked by vengeful Gore supporters.

Image result for butterfly ballot old capitol

The irony was that the whole purpose of the unconventional ballot was to help older voters by increasing the size of the font. Combined with the many ballot issues that year, the decision was made to put candidate names on both sides. However, this confused some voters who wanted to vote for Gore, but mistakenly punched the second hole from the top, registering a vote for Pat Buchanan. While we can’t know for sure, there is strong evidence that this factor alone cost Gore thousands of votes, certainly many, many more than 537, the ultimate official margin of victory (2,912,790 to 2,912,253). During the weeks of uncertainty, many Bush voters wondered how anyone could miss the obvious, large arrows that clearly mark the proper hole. The curse of knowledge was at work. Once you know the right answer, it is hard to put yourself back in the mindset of not knowing and imagine  bumbling it. However, it is very plausible that someone seeing the ballot for the first time and told to adhere to the five minute time limit just saw the left page and punched accordingly.

Says Who?

Few things are more exciting to me that an intellectual throw-down between minds I respect on topics of fundamental importance. Recently, Nate Silver, who makes his living building prediction models, was under fire from Nassim Nicholas Taleb, who makes his living breaking them down. At issue was the foundational epistemological question of how much we really know, or rather, how confident we may presume in what we think we know. Taleb – who is famous for his books Fooled by Randomness, The Black Swan, and Anti-fragile – lives by the watchword “incerto” which means we should be much more circumspect about what we think we understand. He popularized the concept of a “Black Swan,” a momentous event that could not have been anticipated based on all previous observations. Consistent with this, Taleb has successfully implemented strategies that benefit from extreme, unexpected events that have previously been undervalued. Taleb has long inveighed against the widespread misuse of Gaussian distributions, especially those implicated in the 2008 finical crisis. Nate Silver’s models are much more sophisticated, so I was interesting when Taleb came after them:

The basic idea is that the prediction probabilities at 538 for each candidate to win are too volatile to be right. Instead, Taleb suggested a model based on his specialty, pricing options.


The result is a much more parsimonious model, in which the chances are stuck at 50/50 until right before election day, at which time they jump to near certainty:


Silver’s podcast riposte was swift and harsh, saying that when properly modeled – particularly by correctly accounting for the correlations between states – data from polls are actually very good predictors of the future election result. Especially after the conventions, the probability for a candidate to win can be forecast we a least some confidence.

This debate reminded me of Sean Carroll’s new book, The Big Picture. In it, he makes the astounding claim that the “Core Theory” of physics, which includes quantum field theory and relativity, can explain every experiment ever performed on Earth.

The reason for such incredible predictive power is that quantum field theory itself provides a recipe for including progressively smaller correction terms, which converge to a very good degree of accuracy. Carroll says that physics is “simple” compared with other fields, like biology or economics, in that reductionism works fantastically well, and every particle, and interaction between them, can be understood separate from the rest of the Universe. This is why the dimensionless magnetic moment of the electron is known to an uncertainty of a few parts per trillion.

This is, of course, not to say that we have (or will ever have) the functional omniscience of Laplace’s Demon, who can turn perfect knowledge of the laws of nature and the configuration of all particles at any time to perfectly predict their positions at every other time. The wild undulations of chaos theory, and the inherent uncertainty in quantum mechanics, preclude this. So perhaps the best approach is the one Carroll himself takes in “The Big Picture” – a Bayesian framework in which we are aware the credences we apply to various propositions and work to keep them up to date as new information becomes available.

Who ‘ya gonna call? Scientists!

A big change has occurred in the way our heroes are portrayed in fiction.

A good example is 1984 film Ghostbusters and its recent update.

In addition to the obvious gender change, a more substantive alteration to the characters is their profession

in the original they are parapsychologists, but now they are physicists (or engineers, plus a history buff). The main tension throughout the film is their effort to convince the world that the study of ghosts is real “science.” From the MIT consultants,  the movie is filled with real quantum mechanics equations.

Gallery Image

The science doesn’t end there. The proton packs are explained to be mini particle accelerators, complete with “quadrapole” superconducting magnets. As in the original, ghosts are classified by “classes,” but now they are summoned with blue glowing devices instead of rituals performed by demonic demigods.

I see a larger trend of “Scientization” of the paranormal in fiction. That is, heroes combat “the unknown” with science. Ghosts are seen to be so fearsome exactly because appear to represent “cosmic horror” that is dangerous and beyond human understanding. But with science, they are just another, albeit hazardous, phenomenon to poke, prod, and categorize.

Do not all charms fly
At the mere touch of cold philosophy?
There was an awful rainbow once in heaven:
We know her woof, her texture; she is given
In the dull catalogue of common things.
Philosophy will clip an Angel’s wings,
Conquer all mysteries by rule and line,
Empty the haunted air, and gnomèd mine—
Unweave a rainbow, as it erewhile made
The tender-person’d Lamia melt into a shade.

-“Lamia” by John Keats

This trend shows up in other media. In Rick and Morty, even the Devil is bested by science.

And the whole premise of Gravity Falls, a very binge-worthy show, is the rational study of the “weirdness” surrounding the Mystery Shack.

In fact, in the very first minute of the series, we get the introduction:

“My name is Dipper. The girl about to puke is my sister Mabel. You may be wondering what we’re doing in a golf cart, fleeing from a creature of unimaginable horror…Rest assured, there’s a perfectly logical explanation.”

Once again, a “cosmic horror” is cut down to manageable size, and ultimately defeated, with some rational thinking. Does this reflect our collective expectations, having battled problems in real life, like germs, famine, and natural disasters, with science?