Supposedly Nate Silver’s credibility took a major hit last November, which will no doubt discourage many potential readers of his book. This interpretation is wrong, but palatable, because the sorts of commentators who would come to such conclusions shouldn’t be trusted with it. This book is about how to be more intelligent when making predictions and be wrong less often. Such an attitude is not common—most “predictions” are political pot-shots or, as discussed previously, avaricious attempts to put the cart before the horse.
Let’s begin with a discussion of a few major tips. Most of these things should be taught in high school civics (how can you responsibly vote without a concept of base rates?!), but aren’t. Perhaps the most important thing is to limit the number of predictions made, so you can easily come back and score them. Calibration is recommended—nine out of ten predictions made with 90% confidence should come true.
Political pundits are terrible about these sorts of things. Meteorologists are actually great at it. Now your local weatherman is regularly wrong, but the National Weather Service makes almost perfectly calibrated forecasts1. This is, in part, because their models are under constant refinement, always seeking more accuracy. And it pays off: NWS predictions have improved drastically over the last few decades, due to improved models, more data collection, and faster computers. But more on that later.
Local meteorologists, on the other hand, are incentivized to make outlandish forecasts which drive viewership (and erode trust in their profession). One might see this as evidence that public entities make better predictions than private ones, but we quickly see that that is no panacea when we turn to seismology and epidemiology.
Part of the problem, in those fields, is that government and university researchers are under considerable pressure from their employers to develop new models which will enable them to predict disasters. This is a reasonable enough desire, but a desire alone does not a solution make. We can quite easily make statistical statements about approximately how frequently certain locations will experience earthquakes, for instance. But attempts beyond a simple logarithmic regression have so far been fruitless, not just failing to predict major earthquakes but specifically prediction that some of the most destructive earthquakes in recent memory would not occur.
Silver’s primary case study in this comes from the planning for Fukushima Daiichi Nuclear Power Plant. When engineers were designing it in the 1960s, it was necessary to extrapolate what sort of earthquake loads it might need to withstand. Fortunately, the sample size of the largest earthquakes is necessarily low. Unfortunately, there was a small dogleg in the data, an oh-so-tempting suggestion that the frequency of extremely large earthquakes was exceedingly low. The standard Gutenberg-Richter model suggests that a 9.0-magnitude earthquake would occur in the area about once every 300 years; the engineers’ adaptation suggested every 13,000. They constructed fantastical rationalizations for their model and a power station able to withstand 8.6. In March of 2011 a 9.0-magnitude earthquake hit the coast of Japan and triggered a tsunami. The rest, as they say, is history.
The problem in seismology comes from overfitting. It is easy, in the absence of hard knowledge, to underestimate the amount of noise in a dataset and end up constructing a model which predicts random outliers. Those data points don’t represent the underlying reality; rather, they are caused by influences outside the particular thing you’re wishing to study (including the imprecision of your instruments).
And it can take awhile to realize that this is the case, if the model is partially correct or if the particular outlier doesn’t appear frequently. An example would be the model developed by Professor David Bowman at California State University-Fullerton in the mid-2000s, which identified high-risk areas, some of which then experienced earthquakes. But the model also indicated that an area which soon thereafter experienced an 8.5 was particularly low-risk. Dr. Bowman had the humility to retire the model and admit to its faults. Many predictors aren’t so honest.
On the other hand, we see overly cautious models. For instance, in January of 1976, Private David Lewis of the US Army died at Fort Dix of H1N1, the same flu virus which caused the Spanish Influenza of 1918. The flu always occurs at military bases in January, after soldiers have been spread across the country for Christmas and New Year’s. The Spanish Influenza had also first cropped up at a military base, and this unexpected reappearance terrified the Center for Disease Control. Many feared an even worse epidemic. President Ford asked Congress to authorize a massive vaccination program at public expense, which passed overwhelmingly.
The epidemic never materialized. No other cases of H1N1 were confirmed anywhere in the country and the normal flu strain which did appear was less intense than usual. We still have no idea how Private Lewis contracted the deadly disease.
Alarmism, however, broke public confidence in government predictions generally and on vaccines particularly. The vaccination rate fell precipitously in the following years, opening the way to more epidemics later on.
Traditionally, this category of error was known as crying wolf. Modern writers have forgotten it and have to be reminded to not do that. Journalists and politicians make dozens if not hundreds of “predictions” each year, few if any of which are scored, in no small part because most of them turn out wrong or even incoherent.
Sadly, the pursuit of truth and popularity are uncorrelated at best. As Mr. Silver has learned, striving for accuracy and against premature conclusions is a great way to get yourself berated2. Forecasting is not the field for those seeking societal validation. If that’s your goal, skipping this book is far better than trying to balance its lessons and the public’s whim.
But let’s suppose you do want to be right. If you do, then this book can help you in that quest, though it is hardly a comprehensive text. You’ll need to study statistics, history, economics, decision theory, differential equations, and plenty more. Forecasting could be an education in its own right (though regrettably is not). The layman, however, can improve vastly by just touching on these subjects.
First and foremost is an understanding of probability, specifically Bayesian statistics. Silver has the courage to show us actual equations, which is more than can be said for many science writers. Do read this chapter.
Steal an example from another book, suppose two taxi companies operate in a particular region, based on color. Blue Taxi has a larger market share. If you think you see a Green Taxi, there’s a small chance that it’s really Blue and you’re mistaken (and a smaller chance if you see Blue, it’s really Green). The market share is the base rate, and you should adjust up or down based on the reasons you might feel uncertain. For instance, if the lighting is poor and you’re far away, your confidence should be lower that if you’re close by at mid-day. Try thinking up a few confounders of your own.
To better develop your Bayesian probability estimate of a given scenario, you need to assess what information you possess and what information you don’t possess. These will be your Known Knowns and Known Unknowns. The final category is Unknown Unknowns, the thing you aren’t aware are even a problem. A big part of rationality is trying to consider previously ignored dangers and trying to mitigate risk from the unforeseen.
This is much easier to do ex post facto. By that point, the signal you need to consider stands out against hundreds you can neglect. Beforehand, though, it’s difficult to determine which is the most important. Often, you’re not even measuring the relevant quantity directly but rather secondary and tertiary effects. Positive interference can create a signal where none exists. Negative interference can reduce clear trends to background noise. There’s a reason signal processing pays so well for electrical engineers.
The applications range from predicting terrorist attacks to not losing your shirt gambling. An entire chapter discusses the Poker Bubble and how stupid players make the game profitable for the much smaller pool of cautious ones. In addition to discussing the mechanics and economics of the gambling, I got a decent explanation of how poker is played. Certainly interesting.
Another chapter tells the story of how Deep Blue beat Gary Kasparov. Entire books have been written on the subject, but Silver gives a good overview of the final tournament and what makes computers so powerful in the first place.
Computers aren’t actually very smart. Their strength comes from solving linear equations very, very quickly. They don’t make the kinds of arithmetic mistakes which humans make, especially when the iterations run into the millions. Chess is a linear game, however, so it was really a matter of time until algorithms could beat humans. There’s certainly a larger layer of complexity and strategy than many simpler games, but it doesn’t take a particularly unique intelligence to look ahead and avoid making mistakes in the heat of the moment.
Furthermore, the stating position of chess is always the same. This is not the case for many other linear systems, let alone nonlinear ones. Nonlinear systems exhibit extreme sensitivity to initial conditions; the weather a classical example. The chapter on meteorology discusses this in detail—we have very good models of how the atmosphere behaves, but because we don’t know every property at every location, we’re stuck making inferences about the air in-between sampling points. Add to this finite computing power, and the NWS can only (only!) predict large-scale weather systems with extreme accuracy a few days ahead.
With more sampling points, more computing capacity, or more time, we could get better predictions, but all of these factors play off one another. This dilemma arises throughout prediction. More research will allow for more accurate results but delays your publication data. (This assumes that the data you need is even available: frequently, it isn’t3.)
Producing useful predictions is not about having the best data or the most computing power (though they certainly help). It is primarily about constraining your anticipation to what the evidence actually implies. Nate Silver lays out several techniques for pursuing this goal, with examples. It’s a good introduction for us laymen; experienced statisticians will probably find little they didn’t already know.
I would not recommend this book, however, unless you’re willing to do the work. Prediction is a difficult skill to master, and those without the humility to accept their inexperience can get into a lot of trouble. Should you want to test your abilities, try doing calibrated predictions and see how accurate you are. Julia Galef has a number of mostly harmless suggestions for trying this out.
If you are serious, however, The Signal and the Noise offers a quality primer on several important rationality techniques, and a good deal of information about a variety of other topics. I found it an enjoyable read and hope Nate Silver writes more books in the future.
1Major aggregators like the Weather Channel and AccuWeather tend to take the NWS predictions and paste an additional layer of modelling on top of it, for better or for worse.
2In the week before the 2016 election, several liberal commentators accused Mr. Silver of throwing the nation into unwarranted fear for only having Hillary Clinton’s odds of winning at ~70%. As it turns out, his model was one of the most balanced of mainstream predictions, yet everyone then acted as if he had reason to be ashamed for getting it wrong.
3The data may be concealed in confidential documents, nominally available but out of sight, or sitting right under your nose. Most often, however, it’s hiding in the noise. Economic forecasts suffer from this last problem. There’s econometric data everywhere, but basically no one has found more than rudimentary ways to make predictions with it. Perverse incentives complicate matters for private sector analysts, who often then ignore the few semi-reliable indicators we’ve got.