In today’s New York Times Nate Silver establishes President Obama as a prohibitive favorite in Tuesday’s electoral vote contest. His projection is based largely on the information in state polls in key swing states. While national polls suggest that the popular vote total will be close, Silver’s simulations indicate that President Obama has an 85% chance of winning re-election on Tuesday. Silver’s projections, which have come under attack from Romney supporters in recent weeks, rely on a combination of state, national and tracking polls. His state-by-state forecasts use a sophisticated algorithm to weight information based on the accuracy of previous polls, the number of undecided voters in each state, and recent trends.

I have no doubt that the President is the favorite to win re-election on Tuesday because Mitt Romney needs to win the vast majority of swing states in order to reach 270 electoral votes. Silver, a successful former poker player, has established Mitt Romney as an 11:2 longshot. Given the closeness of the national polls it may seem tempting to some to question Silver’s forecast. Could Silver possibly be wrong? The answer rests with his reliance on state polls, and the accuracy of state polls in swing states where the races are fairly close.

In today’s column Silver makes a convincing case that state polls are accurate, *on average*, especially as we get close to election day. He states:

“Of the 77 states with at least three late polls [since 1988], the winner was called correctly in 74 cases …There has been little tendency for the state polling averages to overrate either Democrats or Republicans, or either incumbents or challengers.”

I examine only the 60 state poll averages since 2000 listed by Silver (and ignore the small group of state polls he lists from 1988-1996). Silver is correct that state polls are correct, *on average. * The average difference between the election outcome and the late poll in a state is about one-quarter of one percentage point. Nonetheless, state polls tend to lean one way in one election and another way in a different election. In 2000, Al Gore outperformed state poll projections by an average of 2.14 percentage points. In 2004 George W. Bush outperformed state poll projections by an average of 1.52 percentage points, a swing of 3.66 percentage points in just 4 years. In addition, John McCain outperformed state polls by 0.70 percentage points in 2008. If the prediction error in state polls was truly independent of all factors, we would only expect to see such large differences in average prediction errors across elections less than one time in three hundred (less than once every millennium).

What does it mean that state polls lean in different directions from election-to-election? As a practical matter it means that once we observe the difference between actual election outcomes and late state polls in a state such as New Hampshire on Tuesday, we will have a better idea what will actually happen in Florida, Ohio, Virginia and other swing states. In the language of statistics the forecast errors from polls are correlated among states in the same election cycle. These correlations within the same election cycle mean that the candidate that exceeds expectations in one state is likely to exceed expectations in other states as well.

State polls are clearly missing something that varies from election-to-election. I have no idea whether the late state polls in 2012 will miss the forecast because of turnout, momentum, the political preferences of late deciders or how voters who refuse to answer pollsters will actually cast their ballots. I am also not sure whether the forecast errors will favor the President or Mitt Romney. What does seem plausible, however, is that the difference between late state polls and actual vote totals will be more than one-quarter of a percentage point, the overall average since 2000.

When Silver establishes the President as an overwhelming favorite it is based on a model that has been estimated over a relatively small number of situations in recent elections where the race was expected to be close at the state level. The projected margin of victory/loss in state poll averages was within 6 percentage points only 28 times in the three presidential elections from 2000 to 2008. To statisticians, that is a fairly small sample. But more importantly, when one state poll tends to underestimate the strength of a candidate, whether Democrat or Republican, other state polls tend to make the same mistake. Silver’s forecast is betting that state polls will be very accurate in 2012, which is different from saying they have been accurate, *on average*, across several elections. So if you see Nate Silver and he offers you 11:2 odds on a bet that Mitt Romney wins the election, you might want to take it. But be careful of his poker face.

The 11:2 odds are _after_ taking the above into account – in fact, the “2” in that ratio consists almost entirely of elections in which systematic statistical bias is disadvantaging Romney. That’s basically the entire point of his post from yesterday: http://fivethirtyeight.blogs.nytimes.com/2012/11/03/nov-2-for-romney-to-win-state-polls-must-be-statistically-biased/

You miss my point. The forecast errors in the state-specific polls are not statistically independent. It doesn’t matter how accurate the polls are averaged across 3 or 6 different elections. Between 2000 and 2004 the average prediction error by state changed by 3.66 percentage points.

Agreed – I think Silver makes those sorts of points (about polling means being frequently correct even though his sample data doesn’t align particularly well with this year’s data) because they are easier to grasp for most people.

Mostly I just wanted to make the point that you’re not giving Silver enough credit: his model does include the uncertainty that your post implies is missing, though (as you point out) we have little data on which to estimate the impact of this uncertainty. So it’s very sensitive to whatever priors you choose regarding the probability distribution of values of E. Even so: we have tentative evidence that the probability distribution is becoming narrower over time (possibly as a function of increased polling).

Also unknown: the variance of bias between states (i.e. what is the standard deviation of polling-average-bias across states within a given year?) Call this E’. I would guess that this, too, is low and declining.

Fair points. When I wrote the post I was agnostic about how he treated the key correlation between states. I assumed he did something because he is smart. I would have given him a bit more credit had I seen the quote you provided. I don’t believe the term “bias” is helpful because it makes it seem intentional or systematic even though in the long run I would expect a zero mean. You would have been a good referee had this been a journal.

I defined E to be a constant across all states in a given year – sorry of that wasn’t clear. E is the mean error of the actual result minus the mean_polling result (for the Republican in my definition, so you expect the Republican to overperform the polling mean in each state by E, on average) – just what you describe in your post. The noted correlation between polls-means in all states (within a given year) was the reason I defined it this way.

From the linked fivethirtyeight post, in which Nate Silver already explained all this but apparently nobody read it: “That leaves only the final source of polling error, which is the potential that the polls might simply have been wrong all along because of statistical bias…If there is such a bias, furthermore, it is likely to be correlated across different states, especially if they are demographically similar. If either of the candidates beats his polls in Wisconsin, he is also likely to do so in Minnesota…Based on the historical reliability of polls, we put the chance that they will be biased enough to elect Mr. Romney at 16 percent.”

Oh I see. Yes, that is the point. An average prediction error of 2 percent favoring one candidate or the other seems possible, but who knows. We have a really small sample size for understanding how E varies from one election to the next. But if you are making the wager on Romney, you are gambling on E. The quote you identified is a lot more relevant than Silver’s blather about 74/77 polls picking the winner in the past, when most of the states weren’t even expected to be close.

Unless I’m quite confused, I understand your point exactly: In any given election year the mean of error rates among all state poll averages will be statistically significant in the direction of one candidate/party. Actual results compared to state poll averages were D+2.14 in 2000, R+1.52 in 2004, and R+0.70 in 2008. Call this error E, where mean(actual_GOP-polling_mean_GOP) across all states=E (i.e. the mean amount the Republican will overperform state_polling mean).

Every night Nate Silver runs a few thousand simulations, with varying values of E. In all of the 15% of these simulations in which Mitt Romney wins, the election took place with a high E.

Did I miss anything?

(Not pictured: an apparent, though by no means certain, decrease in the magnitude of E over time).

You are right that state poll averages may be getting more accurate over time (probably because there are more of them included in the average). Using your terminology E in New Hampshire is correlated with E in Ohio. So if there is a 2012-specific component of the error term it will help one candidate or the other. I agree with Krugman’s post today that at this point we don’t know. But the swings he mentions in his post today (an average prediction error in of 2%) is a possibility. We encounter this type of correlation in error terms all the time in econometrics. I am just not sure that Silver has taken it into account. If he hasn’t 11:2 is a good bet.

I always spent my half an hour to read this weblog’s content all the time along with a mug of coffee.