Thursday, November 4, 2010

Strange beliefs involving numbers: When is a group overrepresented?

One of my hobbies is collecting bizarre and unusual examples of confused quantitative reasoning. Some innumeracies are very common (such as thinking that if an item's price has been discounted twice, first 50% and then 20%, it means that the overall discount is 70%), other ones not so much. Here's an example of the latter kind.

Poland has had presidential elections a few months ago. There were two candidates in the second (and final) round (Poland has a two-candidate runoff system). On election night, when polls were closed but votes were not fully counted, the media were of course talking about election-day polls. Those polls showed a stark contrast between the candidates' relative support among rural and urban voters. About 25% of one candidate's (Komorowski) electorate was rural, whereas the other candidate's electorate (Kaczynski) was reported to be 48% rural. The media concluded that Komorowski was overrepresented among urban voters while Kaczynski was overrepresented among rural voters. One blogger took issue with this interpretation, offering a quite creative argument against it:
The fact is that Kaczynski is represented equally by the whole country. Exactly equally. Because those 2 percent are just statistical error. Rural and urban electorates support Kaczynski equally strongly.
What's bizarre (and, to be honest, quite stupid) about this argument is the implicit assumption that a candidate's representation among different groups is equal if those group's shares in his electorate are equal. Which is absurd, of course; I mean, if some candidate's support was split 50%-50% between people under and over the age of 80, would you say that the candidate is equally supported by young and old voters? In a one-dimensional case, to claim that support is equal it has to be roughly proportional to the base rate. Since about 62% of Poles live in cities, Kaczynski is indeed overrepresented in rural areas.

Interestingly (or perhaps not), in the short passage I quote, the blogger makes two additional quantitative mistakes. First, he writes "2 percent" where he means "two percentage points." Second, he assumes that because the sampling error in the poll is at least 2 points it mean that the true rate is 50% rather than 48%. Sure, it could be 50%. But it's equally likely that it's 46%.

Three staggering mistakes in three sentences! (Yes, I do mean three. The second period mark is artificial.)

No comments:

Post a Comment