Well, we are almost to the election, which means an end, finally, to the interminable projection polls. Ok, I actually like statistics, but I'm not sure I accept that political polls are not playing a little fast and loose with the assumptions that are needed to compute confidence intervals. I love it when the election goes the wrong way and they have to come up with scenarios for WHY they blew it. Of course with so many of them out there making 95% confidence intervals, about five percent of the ones you hear SHOULD be wrong... but I think there is more to the problem than just that.
I came across a blog from Iowahawk ( I didn't provide a link because my students come here and some of his language is not the sort of thing I display for my students..they know all the words anyway, but they won't hear them from me) that had a nice expression of what I felt, so I stole parts of it shamelessly...
Statisticians love balls and urns. A typical Stats 101 midterm, for example, usually includes a question along these lines:
"You take a simple random sample of 1000 balls from an urn containing 120,000,000 red and blue balls, and your sample shows 450 red balls and 550 blue balls. Construct a 95% confidence interval for the true proportion of blue balls in the urn."
From this the typical Intro stats student can deduce that they are 95% certain the real proportion of blue balls in that urn is 55%, plus or minus 3.1% .
"This is, for all intents and purposes, how political pollsters compute the mysterious "margin of error," which has everything to do (and only to do) with pure mathematical sampling error. If you look at the formula above and round it just a smidge, you get a simple rule of thumb for the margin of error of a sampled probability:
Margin of Error = 1 / sqrt(n)
So if the sample size is 400, the margin of error is 1/20 = 5%; if the sample size is 625 the margin of error is 1/25 = 4%; if the sample size is 1000, it's about 3%.
"It works pretty well if you're interested in hypothetical colored balls in hypothetical urns, or survival rates of plants in a controlled experiment, or defects in a batch of factory products. It may even work well if you're interested in blind cola taste tests. But what if the thing you are studying doesn't quite fit the balls & urns template?"
What if 40% of the balls have personally chosen to live in an urn that you legally can't stick your hand into?
What if 50% of the balls who live in the legal urn explicitly refuse to let you select them?
What if the balls inside the urn are constantly interacting and talking and arguing with each other, and can decide to change their color on a whim?
What if you have to rely on the balls to report their own color, and some unknown number are probably lying to you?
What if you've been hired to count balls by a company who has endorsed blue as their favorite color?
What if you have outsourced the urn-ball counting to part-time temp balls, most of whom happen to be blue?
What if the balls inside the urn are listening to you counting out there, and it affects whether they want to be counted, and/or which color they want to be?
If one or more of the above statements are true, then the formula for margin of error simplifies to
Margin of Error = Who the heck knows?