Tuesday, May 13, 2008

Tuesday Statisticz: Breaking down the success of superhero movies

Obviously, Jon Favreau's Iron Man has been a huge hit at the box office thus far, and it has done surprisingly well on imdb as well. Currently it is rated 8.3 and #133 on the top 250, but those numbers will drop after a few months like they always do with new movies.

I naturally began to wonder what specific factors lead these superhero movies to do well at the box office. I refined my search to movies within the past 6 years that feature one main superhero, which excludes X-Men are the Fantastic Four, as well as any wack older movies. Here's a chart with imdb's weighted rating (accessed May 13 on imdb) on the y-axis and total US gross box office numbers (found here) on the x-axis:

There's a definite correlation here, with a pretty high r squared of 0.356. But there are some deviations from the trend line. Hellboy made much less money based on what we'd expect from it's imdb rating (rated 6.7, grossed $59,000,000), while Spider-Man 3 made substantially more money than we'd expect (rated 6.6, grossed #$336,000,000).

Why did Spider-Man 3 make so much more? A quick glance at their demographics (found here and here) shows that Spider-Man 3 did much better in the category of females under 18. While they are not huge voters on imdb, experience tells us that young women go to the movies in droves. So here's a chart with females less than 18's ratings versus the total US gross box office numbers:

As you can see, there is a high correlation here as well (with an r squared of 0.399), although that is to be expected since females under 18 are one constituency of imdb's total votes. Although this chart tells us that we may be on the right track, what we are really looking for is the difference in ratings between females under 18 and the total ratings, versus the deviations from the trendline that we found in our first chart. We want to find out if the ratings of females under 18 can explain those deviations. So here's a chart:

While the r squared value of 0.177 is not as high here as in the previous two charts, it is more interesting because it's non-intuitive. Who would have thought that young girls would have such an impact on the box office numbers of movies based on comic book characters, which you would assume to be male-targeted audience? The other possibility is that the females under 18 on imdb are one of the only groups that don't self-censor their own ratings. Perhaps they vote based on their own levels of enjoyment instead of considering how others in the future will judge them.

The trendline in the first chart predicts that Iron Man will eventually gross $333 million. I would expect that the number will be a little bit lower because its rating will eventually drop from 8.3, and because females under 18 don't love it, giving it a 7.5. So my official guess is $300, up from its current total of $178 (million). Check back in a few months for the results of one of my first statistical predictions.