Nate Silver’s Value Added and Systematic Forecast Errors

Nate Silver had a very good night on November 6th.  He forecast that President Obama would win 313 electoral votes and was within 5.7% of the 332 electoral votes received by the President.  He proved to be more accurate than many individual pollsters.  Silver bases his projections on state poll averages, national polls, “state fundamentals” and trends in polls.  Anyone could have forecast the election outcome in at least 35 states, so the best way to understand Silver’s contribution is to focus on states where the election was expected to be close.  Both state poll averages and Silver’s projections underestimated President Obama’s strength in swing states, but Silver’s 538 model had substantially less systematic bias than state poll averages.

Nate Silver has brought statistical analysis of elections to the masses (or at least to the readers of the New York Times).  His blog makes it clear that when an average of polls prior to the election shows a candidate has a 55-45 advantage, it doesn’t mean that she will lose 45% of the time.  It means instead that she will almost certainly win the election.  Poll averages improve the precision of forecasts because errors due to sampling variation are decreased.  More importantly, the 538 model incorporates “state fundamentals” and poll trends to improve the precision of  forecasts. 

Consider the following comparison of state poll averages, Silver’s projections and election outcomes in 11 states that were expected to close on election day:

State State Poll Avg. Silver’s Projection Election Outcome
Colorado DEM:  1.9 DEM:  2.5 DEM:  4.7
Florida REP:    0.7 D/R:    0.0 DEM:  0.9
Iowa DEM:  2.6 DEM:  3.2 DEM:  5.6
Michigan DEM:  4.7 DEM:  7.1 DEM:  9.5
Nevada DEM:  3.6 DEM:  4.5 DEM:  6.6
New Hampshire DEM:  2.6 DEM:  3.5 DEM:  5.8
North Carolina REP:    1.9 REP:    1.7 REP:    2.2
Ohio DEM:  3.0 DEM:  3.6 DEM:  1.9
Pennsylvania DEM:  4.6 DEM:  5.9 DEM:  5.2
Virginia DEM:  1.3 DEM:  2.0 DEM:  3.0
Wisconsin DEM:  4.3 DEM:  5.5 DEM:  6.7

Notice that in every one of these states Silver’s 538 model predicted that the President would outperform the polls.  Although this infuriated many conservatives Silver was correct, on average.  The President’s vote share in the states listed above was 2.0 percentage points higher than forecasted by poll averages.  The President even out-performed Silver’s forecast by receiving 1.1 percent more of the votes cast than predicted by the 538 model.  This occurred despite the fact that Mitt Romney exceeded Silver’s expectations in North Carolina, Ohio and Virginia. 

The standard deviation of the forecast error was 1.7% for state poll averages and 1.4% for Silver’s model (about 18% lower) in the eleven swing states listed above.  Thus in the states where the election was contested Silver’s simulations were slightly more accurate than a simple average of state polls.  However, the most important contribution of the 538 model in 2012 was that it substantially reduced the systematic underestimate of President Obama’s vote share from 2.0% to 1.1%.

The systematic gap between vote totals and state polls has little to do with sampling variation and more to do with mis-estimation of voter enthusiasm and turnout.   While averages of state polls provide more efficient forecasts than individual polls in a given election there will be systematic errors across states because of the difficulty in forecasting voter turnout and assessing voter enthusiasm.  Based on the past four presidential elections the systematic poll gap favors Democrats in some elections and Republicans in others and is likely to be similar in magnitude to the 2.0% difference observed in 2012.  If the 2.0% systematic gap in poll forecasts had favored Mitt Romney on Tuesday he would have received 266 electoral votes and lost the presidency by less than 5,000 votes in New Hampshire.

Nate Silver simulations are valuable when they reduce the magnitude of the systematic gap between state poll averages and election outcomes as they did in 2012.  His work correctly identified that polls were underestimating President Obama’s support even though the 538 model also contained systematic bias.  This source of forecast error can’t be reduced by taking more polls but can be mitigated somewhat by supplementing poll averages with additional information.

Nate Silver and the Accuracy of Late Presidential Polls

In my blog post yesterday I explained why Nate Silver’s projection that President Obama has more than an 85% chance of winning the election tomorrow might be inaccurate.  Yesterday Mr. Silver listed 60 state poll averages over the three presidential elections from 2000 to 2008 in his New York Times blog.  The average state on his list had been the subject of between 6 and 7 late polls leading up to the election.  The average prediction error, the difference between the actual election outcome and the average of state polls, was about one-quarter of a percentage point averaged across all states and all years.  The margin of error, or 95% confidence interval, for the prediction error in a single state and election was plus or minus 6.38%.  Averages of state polls are accurate, on average, but can be widely inaccurate from one state/election to the next.

Mitt Romney needs to win the bulk of swing states in order to reach 270 electoral votes.  This is more difficult to do if unexpected election outcomes are statistically independent across states.  For example, if nine swing states are toss-ups and the challenger needs to win at least seven of them to win the electoral college, the challenger will lose 91% of the time with statistical independence across states.  If however, success in one state is correlated with success in another, the challenger’s likelihood of winning can be much different from 9%.   Everyone, including Nate Silver, knows that election surprises are correlated across states.   Nationwide differences in voter enthusiasm, turnout, late deciding voters and under-represented voters can all make prediction errors positively correlated across states.  For Mitt Romney to win tomorrow the nationwide component of the prediction error in polls must break his way and be large enough for him to carry the bulk of the swing states.

Nate Silver believes his model has accurately captured the true correlation in prediction errors, or unexpected election outcomes, across states and polls in 2012.  This is a very difficult task because: (1) we won’t know the 2012 prediction errors until the election is over, (2) polls differ in their methodology and possible biases and (3) correlations in prediction errors among states and polls in previous elections need not hold today.  Mr. Silver is savvy enough to recognize that regardless of the election outcome his statement that Mitt Romney is more than a 6:1 longshot will be hard to refute.   By late tomorrow, however, it will be easier to evaluate his prediction that the President will win 307 electoral votes.

Forecasting the Presidential Election: Is it Wise to Bet Against Nate Silver?

In today’s New York Times Nate Silver establishes President Obama as a prohibitive favorite in Tuesday’s electoral vote contest.  His projection is based largely on the information in state polls in key swing states.  While national polls suggest that the popular vote total will be close, Silver’s simulations indicate that President Obama has an 85% chance of winning re-election on Tuesday.  Silver’s projections, which have come under attack from Romney supporters in recent weeks, rely on a combination of state, national and tracking polls.  His state-by-state forecasts use a sophisticated algorithm to weight information based on the accuracy of previous polls, the number of undecided voters in each state, and recent trends.

I have no doubt that the President is the favorite to win re-election on Tuesday because Mitt Romney needs to win the vast majority of swing states in order to reach 270 electoral votes.  Silver, a successful former poker player, has established Mitt Romney as an 11:2 longshot.  Given the closeness of the national polls it may seem tempting to some to question Silver’s forecast.  Could Silver possibly be wrong?  The answer rests with his reliance on state polls, and the accuracy of state polls in swing states where the races are fairly close.

In today’s column Silver makes a convincing case that state polls are accurate, on average, especially as we get close to election day.  He states:

“Of the 77 states with at least three late polls [since 1988], the winner was called correctly in 74 cases …There has been little tendency for the state polling averages to overrate either Democrats or Republicans, or either incumbents or challengers.”

I examine only the 60 state poll averages since 2000 listed by Silver (and ignore the small group of state polls he lists from 1988-1996).  Silver is correct that state polls are correct, on average.  The average difference between the election outcome and the late poll in a state is about one-quarter of one percentage point.  Nonetheless, state polls tend to lean one way in one election and another way in a different election.  In 2000, Al Gore outperformed state poll projections by an average of 2.14 percentage points.  In 2004 George W. Bush outperformed state poll projections by an average of 1.52 percentage points, a swing of 3.66 percentage points in just 4 years.  In addition, John McCain outperformed state polls by 0.70 percentage points in 2008.  If the prediction error in state polls was truly independent of all factors, we would only expect to see such large differences in average prediction errors across elections less than one time in three hundred (less than once every millennium).

What does it mean that state polls lean in different directions from election-to-election?  As a practical matter it means that once we observe the difference between actual election outcomes and late state polls in a state such as New Hampshire on Tuesday, we will have a better idea what will actually happen in Florida, Ohio, Virginia and other swing states.  In the language of statistics the forecast errors from polls are correlated among states in the same election cycle.  These correlations within the same election cycle mean that the candidate that exceeds expectations in one state is likely to exceed expectations in other states as well.

State polls are clearly missing something that varies from election-to-election.  I have no idea whether the late state polls in 2012 will miss the forecast because of turnout, momentum, the political preferences of late deciders or how voters who refuse to answer pollsters will actually cast their ballots.  I am also not sure whether the forecast errors will favor the President or Mitt Romney.  What does seem plausible, however, is that the difference between late state polls and actual vote totals will be more than one-quarter of a percentage point, the overall average since 2000.

When Silver establishes the President as an overwhelming favorite it is based on a model that has been estimated over a relatively small number of situations in recent elections where the race was expected to be close at the state level.  The projected margin of victory/loss in state poll averages was within 6 percentage points only 28 times in the three presidential elections from 2000 to 2008.  To statisticians, that is a fairly small sample.  But more importantly, when one state poll tends to underestimate the strength of a candidate, whether Democrat or Republican, other state polls tend to make the same mistake.  Silver’s forecast is betting that state polls will be very accurate in 2012, which is different from saying they have been accurate, on average, across several elections.  So if you see Nate Silver and he offers you 11:2 odds on a bet that Mitt Romney wins the election, you might want to take it.  But be careful of his poker face.

%d bloggers like this: