The perils of trying to define 'an accurate pollster'

by Steve Singiser for Daily Kos Elections

Sunday, May. 24, 2015 Sunday, May. 24, 2015 at 9:59:03am PDT

no image description available — If the majority of polls in 2014 were accurate, this guy would be Governor of Kansas.

Democratic Kansas gubernatorial candidate Paul Davis

If the majority of polls in 2014 were accurate, this guy would be Governor of Kansas.

Next week, you can expect to see a piece offering a review of the performances of the polling community from the 2014 cycle. It is the third time I have taken on this particular task—you can see the efforts from 2012 and 2010 by clicking on the appropriate links.

You might note that I changed the formula for the rankings between 2010 and 2012. That's because in 2010, the focus of the study was a bit more specific (the notion of whether there was a left-leaning or right-leaning "bias" among the more prolific pollsters). In 2012, we went for a little more comprehensive rating.

The plan, for 2014, was to try to generate some continuity by employing the same formula.

That is still the plan. But ... whoo boy. Not to give away the ending, but the formula employed in 2012 gave us some folks at the front of the pack who were not only generally acknowledged to be cruddy, but it was nearly a reversal of the 2012 ratings. What's more: a quick look at the criteria from 2012 points to a problem—there is something in each of those parameters that can be critiqued.

When all is said and done, the more I dive into the matter, the quicker I come to a single conclusion: there is no "one best way" to measure accuracy in polling. Follow me across the fold as I explain why.

For those who do not remember how this went in 2012, a quick refresher course:

Given my background (I am a polls guy, but from the political angle, not necessarily the math angle), I decided to do a very Algebra I approach to grading the pollsters.
Here's how it worked:

1. I made two lists of pollsters. The first list was every pollster that released polling in at least five separate races (not counting national polls). That wound up being a grand total of 34 different pollsters. Then I did a secondary list, which was the "major pollsters" list. Here, I excluded two groups: pollsters who primarily worked for campaigns, and pollsters that only worked in 1-2 states. This left us with a list of 17 "major" pollsters.

2. I then excluded duplicate polls. Therefore, pollsters were only assessed by their most recent poll in each race. Only polls released after October 1st were considered in the assessment process.

3. I graded each of the pollsters on three criteria:

The first criterion was a simple one--in how many contests did the pollster pick the correct winner? If the pollster forecasted a tie, then that counted for one-half a correct pick. I then rounded to the nearest whole percent, for a score between 0-100.
The second criterion was a simple assessment of error. I rounded each result to the nearest whole number, did the same with the polling results, and then calculated the difference. For example, if the November 5th PPP poll out of North Carolina was 49-49, and Romney eventually won 50-48, the "simple error" would be two points.
I then gave each pollster an overall "error score" based on how little average error there was in their polling. The math here is painfully simple. No error at all would yield 100 points, while an average error of ten points would get you zip, zero, nada. By the way, if you think 10 points was too generous, bear this in mind: two GOP pollsters had an average error in 2012 of over ten points.

The math here was basic: for every tenth of a point of average error, I deducted one point from the 100 point perfect score. Therefore, the clubhouse leader on this measurement (a tie between Democratic pollsters Lake Research and the DCCC's own in-house IVR polling outfit) had an average error of just 2.0 percent. That would yield them the score of 80.

The third measurement sought to reward those who did not show a strong partisan lean. This was called the "partisan error" score. Here, we took the error number from criteria two, and added an element. The question: did the pollster overestimate the Democratic performance, or the Republican one? The total number of points on the margin for each party were added up, and then the difference was taken. That was then divided by the number of polls. This led to a number that (usually) was lower than the "error" score, because a good pollster won't miss in favor of just one party every single time.
Interestingly, virtually every pollster had an average error that overestimated the performance of the GOP. This echoes the national polls we saw, which tended to lowball the lead that President Obama held over Mitt Romney.

For this criterion, the 0-100 score was calculated the same way. For example, Rasmussen, on average, erred in favor of the GOP by 3.5 percent (you'd have thought it'd be higher, but they had a couple of big point misses in blowouts like the North Dakota gubernatorial election. That muted their GOP swing). Therefore, their "partisan error" score would be 65.

Let's examine these criteria.

The first part is utterly noncontroversial, and remains the same for this cycle. The pollsters examined had to do a minimum of five distinct races. As in 2012, there were two lists created: one for any pollster who did five races, and the other ("major pollsters") for those who did not focus on one or two states or work for campaigns/PACs.

The only difference in 2014, perhaps predictably, is that both lists were smaller in size. In 2012, there were 34 pollsters that shouldered that work load. In 2014, it was 23. The "major pollster" list actually varied only slightly (down to 16 from 17), although one pollster (Univ. of New Hampshire) actually went from the "major" list to the larger, all-inclusive list, because their late polling focused solely on two states (Maine and New Hampshire).

In the name of continuity, I also stuck with no "final poll" provision, but ... damn ... that one gives me pause, and it is the first truly assailable point in our discussion. To only focus on the final poll gives cover to some really bad polling. Consider:

Graph of the volatility of House polling conducted by the polling unit at the University of New Hampshire

What you are seeing, of course, is one of the more notorious pollsters among readers in the Daily Kos Elections universe: the aforementioned University of New Hampshire. You will see, when the numbers are crunched and released next week, that they had a very solid 2014, in the final analysis. With the exception of one race they deemed a tie, they picked the winner in all seven races they polled. What's more: Their average error was better than most. But that is only because their final polling hit the fairway. Prior to that, as demonstrated in the above graphic, they were spraying balls into the woods left and right, often offering wildly disparate results with polls that were in the field only a handful of days apart.

This, of course, is somewhat of a tradition with UNH. But it isn't reflected in the pollster grades, as we currently assess them, because only that final effort is taken into account.

Now, onto the three-pronged criteria itself.

The first leg of the pollster triathlon is based, in short, on the answer to a relatively simple question: Did you forecast the winner correctly?

Because pollsters do sometimes show races completely deadlocked, that has to be accounted for (1/2 win, 1/2 loss, of course). And then a straight percentage is taken. 100 percent winners equals 100 points.

Last time around, I considered this the least controversial measurement. And then I saw Mason Dixon in 2014. M-D, after a godawful 2012, laid relatively low in 2014. Indeed, they barely made the cut by polling precisely five races. Yet they scored near the top of the charts. Why? Because they were a perfect five-and-zero on siding with the winners.

But, on closer review, how could they not? The five races they polled were never in doubt. The closest of the five (the Minnesota gubernatorial race) came in a little closer than expected, but still was a relatively comfortable 50-45 re-election win for Democratic Gov. Mark Dayton. The average margin of victory in the five races? A snug 22-point margin.

Meanwhile, there was no shortage of close races with quirks in them that made them tougher to poll. Three come to mind immediately: Alaska's gubernatorial election (where the Democratic nominee elected to stand down and become the lieutenant governor nominee of a nascent, and ultimately successful, Independent candidate), and the two races in Kansas.

The Kansas races presented a classically confounding dilemma for pollsters. How do you model an electorate where the traditional red-tinted political headwinds were, as always, clear, but there was also a huge undercurrent of uncertainty, given late candidate withdrawals on the Senate side and a toxic GOP incumbent on the gubernatorial side?

Trying to navigate those crosswinds proved difficult for most pollsters, and the bulk of polling erred in assuming a relatively strong performance for the non-Republicans in the race. But, if you were a pollster who managed to avoid that race, your batting average was likely to be untainted by the pain of trying to forecast a particularly difficult race.

The second part of the criteria is simple to understand, but often deceptively difficult to gauge. That second parameter is average error: the simple average margin by which the pollster's result deviated from the final actual result.

Here, too, there are complications. Consider one of the pollsters in our study that had the toughest cycle: Hendrix College, a local scholastic outfit that polls in their home state of Arkansas. They had the highest average error of all 23 pollsters in our study: an average miss of 9.6 percent. That, of course, is enormous.

But consider that they were hamstrung by two factors. One, and this defies an easy explanation, is the fact that some states wind up confounding everyone on Election Day. In 2010, they were Connecticut and Hawaii. In 2014, without question, one of those states was Arkansas. Nobody really had an image of Democrats Mark Pryor or Mike Ross actually scoring victories in November, but no one had them both getting thumped by double digits. And, if you are Hendrix College, and this is the only state you poll, you get dinged while others see their huge misses in Arkansas balanced out by states that perform a bit more predictably elsewhere. The second problem is that, because Hendrix only polled Arkansas, four of their six races were House races. For a variety of reasons, House races tend to produce bigger polling misses than statewides. Whether it is a tendency to break later because candidates define themselves later or an inability of pollsters to herd because of the lack of volume (if you are an adherent to the "herding" theory), House races just tend to be a tougher target—House races made up over 17 percent of the races where our pollsters missed by double digits, but only 9.8 percent of the races where our pollsters got close to the bullseye (0-4 points).

Finally, the third criteria is the one where I appreciate the premise the most, and have the most problem with quantifying it. From the very beginning, I have wanted to add a component that appreciated the partisan element of polling. I will freely admit that, back in 2010, one of my main hypotheses that I endeavored to prove was that Rasmussen's results were actually more partisan than PPP (this was back in the halcyon days when political media outlets routinely dismissed PPP as a partisan pollster, but presumed that Rasmussen did not have a similar partisan axe to grind).

The problem is finding a decent way to quantify that. In 2012, I went with simplicity, as you can read above. However, something occurred to me. A pollster could be wildly errant on one poll in favor of one party, and that could offset a litany of less errant polls, even if they uniformly went to a single party. This happened in 2012, actually, when Rasmussen markedly oversold Democratic prospects in two races (ND-Gov and NM-Sen). This offset a strong GOP lean in many of their other 40-plus polls, and made their overall partisan "score" fairly moderate (a GOP lean of 3.5 points).

So, the partisan component is still on the drawing board. And, honestly, for 2014, the impact is muted anyway. All but two of the 23 pollsters, on balance, erred in the direction of the Democrats (hence the enhanced pain for Democrats when election night eventually rolled around). As a result, the two "least partisan" pollsters by the old standard were two GOP outfits: Public Opinion Strategies and Vox Populi. But the proper reaction to that news probably shouldn't be "gee, look how fair they are." The proper reaction should probably be "wow, what are their numbers going to look like when the GOP doesn't have a surprisingly strong wave election?"

So, the number crunching continues. Check back to Daily Kos/Daily Kos Elections next weekend to see what the final numbers look like. And then, gratefully, I will be putting the 2014 election cycle to rest.