Saturday, December 26, 2009

We like ladders better than trees

Sports fans are usually obsessed with rankings (be they player or team rankings); soccer fans are no different. FIFA (the international soccer organization) has its own rankings of national teams, but given how flawed those are, many soccer statisticians are trying to come up with their own. The best ones seem to be Voros McCracken's (his soccer blog is also very interesting) and Nate Silver's (the political scientist that runs the famous FiveThirtyEeight.com website).

The problem with rankings is that they imply a linear order that sometimes just doesn't exist in reality. For example, the usefulness of ranking soccer teams is presumably that if we see that, say, Spain is ranked no. 2 in the world whereas Denmark is ranked no. 12, then Spain is the more likely winner in a Spain vs. Denmark game. The problem is that linear rankings require transitivity (i.e., transitivity is a necessary condition for a relation to be a linear order.) The transitive property is easy to explain. Suppose Spain is a better soccer team than Denmark, and Denmark is better than Venezuela. Then it seems common sense that Spain is also better than Venezuela. If this is indeed the case and if, in fact, this property is true for any three teams we choose, then the relation "better soccer team than" is transitive. If there's no transitivity, we cannot rank soccer teams from best to worst. And in reality it isn't there. For example, all major soccer rankings put Mexico ahead of the U.S. But, at least in the recent eight years, the U.S. has a better head-to-head record against Mexico. It seems as though Mexico's record is superior to that of the U.S.--except when those two teams play each other. But this means there's no transitivity: Denmark is ahead of the U.S., Mexico is ahead of Denmark, and the U.S. is ahead of Mexico. The linearity of the order breaks down.

This is not to say that statistical methods of measuring team strength are useless. Far from it. Both Silver's and McCracken's systems, for example, are capable of producing the odds for essentially any given game. That's extremely useful--but very different than providing a consistent linear ordering of all teams in the world. The latter just does not exist in reality.

Coincidentally, there are many more examples of us trying to impose a linear order where there is none. It's often the case that we see a total order in situations where the true order is partial--i.e. we're trying to put things on a ladder where the true underlying structure is that of a branching tree. (Note: unlike the soccer rankings situation, a partial order is still transitive. It's just not linear.) For example, you sometimes hear the question: "If humans evolved from chimps, why are there still chimps?" The confusion from which this question arises is that of treating the relation "evolved from" as a linear order, whereas in reality it's a partial one (evolution is not a ladder but a tree; chimps are not our "parents" but "cousins:" our "parents" and chimps' "parents" were "siblings"). For another example, the way we teach grammar in middle schools is based on an implicit assumption that grammatically correct sentences can be derived from rules if sentences are treated as strings of words. This assumption is incorrect: as shown by Noam Chomsky, in order for us to be able to provide algorithmic rules for generating syntactically correct sentences of any human language, those sentences necessarily have to be treated as trees, not strings, of words and phrases.

No comments:

Post a Comment