Saturday, July 16, 2011

Bill Simmons' Rating Nihilism

Earlier he claimed, on scarce evidence, that "Rotten Tomatoes scares me as a metric," because "people are idiots," and that "their 'top critics' rating is much more useful."[1]

Now he explains that:
I believe Michael Jordan is the greatest basketball player ever, and I can prove it. I believe The Breaks of the Game is the greatest sports book ever, but I can't prove it. Books can't be measured that way — they hit everyone differently, so when we're evaluating them, we can only say, "You can't mention the greatest books (or albums, paintings, TV shows, movies or whatever) without mentioning that one." That's as far as you can go.
I don't get it. He thinks you can sort of rate movies (if you trust only the experts), but you can never rate books except for saying which ones are "among the best"? This is inconsistent.

He's right that there is a key difference between sports and film/writing, although it is not, as he claims, that the latter "hit[s] everyone differently." The difference is that in sports there is a known goal--for the team to win. That means that, at least theoretically, it is possible to tease out which player stats tend to correlate with winning, and then use those stats to evaluate players.

But notice the causality here. We can't evaluate individual players well until we know which stats are generally good indicators that a player will help eir team win. Intuition does not necessarily serve well here, an insight upon which books have been written and careers have been made.

In film/writing there is no such clear objective, and thus the ratings by individuals who have seen/read them must be subjective. So instead of evaluating statistics based on how they correlate with the objective of winning, we must instead evaluate rating systems based on inter-rater reliability. The goal is that if you added more independent ratings by unbiased raters, there should be as small of a deviation as possible between the new and old ratings.

The obvious suggestion is that, if we want better opinions, we need more of them to average out more of our random biases, like how hungry we were when we first saw the movie.

Again, the input must be subjective. But once we've decided upon the best rating system, its output is objectively our best estimate of that film/book's quality. Just as in sports, personal intuition is not the best estimate of quality, and to believe otherwise is simply hubris.

Perhaps it should not surprise us that a key opinion maker is arguing that we should only trust key opinion makers, instead of wide-scale opinion aggregators. But the rest of us don't have to buy it.


[1]: Rotten Tomatoes ratings have many problems, like the fact that they threshold scores into "good" and "bad" and count the percentage of each instead of employing a continuous scale. But that is a straw man for the claim that open, aggregated movie ratings are bad.