Thursday, January 15, 2009

Tuesday Statisticz: Foreign Ratings of US Comedies Part I

In Creative Destruction, Tyler Cowen argues that
Movie producers know that action films are easiest to export to many different countries. Heroism, excitement, and violence do not vary so much across cultures. comedies, with their nuances of dialogue and their culturally specific references, are the hardest to sell abroad. A global market in cinema therefore encourages action films more than it does sophisticated comedy.
Previously this would be based on intuition, but with imdb's huge data corpus we can now make it empirical. The null hypothesis is that the US and non-US samples do not differ significantly. Should we reject it?

So as to not bias my sample based on the trash that my friends and I enjoy watching, I randomly sampled 2 movies from each decade since the 1960s from Wikipedia's list of US comedies using this random integer generator. Movies were re-generated unless they had at least 2000 total votes. I also generated 10 US movies from the 1960s on from the action category from the Wikipedia's list here.

The preliminary data suggests that there is no significant difference between the ratings. The mean US ratings for these comedies is 6.91 and the non-US mean rating is 6.71, but due to the high standard deviations (.80 and .89), the difference is not significant with a two-tailed t-test (p=.604). The mean US ratings for the action films is 6.22, SD 1.097, and that for the non-US ratings is 6.13, SD 1.025, which is clearly also not significant.

My work is not nearly complete, because I think there were some serious flaws in the methodology. These are the questions I will contemplating in the upcoming weeks:

1) Should I conduct a t-test for each of the different movies and then conduct a meta-test on all of them, or should I sum their ratings first?

2) Given a large sample like the non-US ratings for Logan's Run here, how do I calculate the standard deviation? I know the formula and I can do it for small samples where I can input each value but this larger one is giving me trouble.

3) Would a sample based on only the 2000s be more indicative of current cultural trends? I think it might, because so many of the old movies have pre-existing cultural status that US or non-US raters might be especially biased before watching.

4) Is the world is a mere hotch-potch of random cohesions and dispersions, or is it a unity of order and providence? Should I care?