Tuesday, July 24, 2012

Book Review: Great Flicks By Dean Simonton

Attention conservation notice: Review and notes from a book discussing an academic topic that will likely only interest you insofar as it generalizes to other topics, unless you are both a huge stats and film nerd.

I'm fascinated by movie ratings and what they tell us about: 1) the best ways to use rigorous methods to study the quality of a subjective output, 2) how variable people's assessment of quality are, and 3) how people conceptualize their own opinion in the context of everyone else's. Dean Simonton is a giant in the psychology of creativity, and I loved his book Creativity in Science. So, as soon as I saw this one, I clicked "buy it now" on its Amazon page

My typical gripe against academic investigations of movie ratings is that they discount imdb.com, a huge resource with millions of data points, segregated by age, gender, geographical location, on an incredibly rich array of movies. So, soon after buying the book, I searched in the Kindle app for "imdb" and found very few results. This predisposed me to disliking it. 

A few of my other gripes: 

1) It takes awhile to get used to Simonton's academic writing style. 

2) The book takes few risks stylistically. Each chapter feels like it could be its own separate article. Thus, he does not take full advantage of the long-form medium. 

3) When he discussed a few of the measures (such as the correlation between different award shows), I felt that there was some issues with his account of the causality. Surely there is some, non-negligible probability that people take the ratings of others' into account when they make their own judgments. He mentions this sometimes, but not enough for me, and ideally he'd come up with some creative way to try to get around it.

4) Finally, there are a few typos. I actually like seeing typos, because it makes me think that I am learning from a more niche source that others are less likely to appreciate, but YMMV.

By midway through the book, Simonton had won me back to a large extent. His analyses of his data were very well-done and he supplies tables so you can look at the regression coefficients yourself. And there are many good nuggets, such as: 

- the best predictors of higher ratings are awards for better stories (e.g., best screenplay and best director), as opposed to visual or musical awards
- having individuals on the production team who play multiple roles (such as writer, cinematographer, and editor all at once) makes the film more successful, presumably due to creative freedom 
- some amount of repeated collaboration over multiple films with the same individuals, but not too much, is optimal for winning awards (i.e., there is a trade-off between stimulation and stagnation) 
- higher box office returns are inversely correlated with success at awards shows
- the typical film is unprofitable; "about 80% of Hollywood's entire profit can be credited to just a little over 6% of the movies"  
- the curse of the superstar: "if a star is paid the expected increase in revenue associated with his or her performance in a movie then the movie will almost always lose money" (this is because revenue is so positively skewed) 
- divides movies into two types: those that are extremely successful commercially, and those that are extremely successful artistically (people often use the former to subsidize the latter) 
- negative critic reviews have a more detrimental impact than positive reviews have a boosting effect on box office returns
- on ratings, critics and consumers have similar tastes, although consumers' tastes are more difficult to predict, presumably because their proclivities are more diverse
- for a consumer, the most important factor for whether they will watch a movie is its genre (#2 is word of mouth) 
- dramas do worse in the box office, better at the awards shows; comedies are the reverse
- PG-13 movies make the most money; some romance, but no actual nudity, is best (and lots of action but no gore) 
- on average, sequels do far worse in ratings and awards than the original movies
- the greater the involvement of the author in an adapted movie, the less money it will make (they interfere more and might care more about "artistic integrity" than making money) 
- directors tend to peak in their late 30s; they have more success in their late 20s than their late 50s, on average
- divides directors into two types: conceptual (innovative and imaginative; think Welles) and experimental (technical and exacting; think Hitchcock)
- conceptual directors express ideas through visual imagery and emotions, often leave behind one defining film, and decline quickly
- experimental directors emphasize more realistic themes, slowly improve their methods, and their best films often occur towards (but almost never *at*) the end of their careers
- female actors make less money than their male counterpoints, and the best picture award correlates much better with best male actor than best female actor
- awards for scores are much better predictors of a film's quality than awards for songs

All in all, this book is far from perfect, but it is likely the best full-length treatment of quantitative movie ratings available. If you are interested in the topic, and occasionally find yourself doing things like browsing the rating histograms on imdb, then this is essential reading. 

Friday, July 20, 2012

Statistics Is Like Medicine, Not Software

Stats questions--even when they're pure cut-and-dried homework--require dialog. Medicine might be a better analogy than software: what competent doctor will prescribe a remedy immediately after hearing the patient's complaint? One of our problems is that the [Stack Exchange] mechanism is not ideally suited to the preliminary dialog that needs to go on.
That's from the ever erudite William Huber, in this chat about why the statistics Q&A site has problems that the software Q&A site does not. Some users argue that a high proportion of questions on the stats site should not be answered unless they are disambiguated further.

You might assume that "more answers are better," but answering an ill-posed question adds more noise to the internet. When searching to clarify an ambiguous term, somebody might find that question, read the answer, and end up even more confused. Recall that this is a field already stricken by diametric ideology and short-term incentives.

Here is my previous post on the wisdom of Huber

Sunday, July 8, 2012

The Psychosocial Costs Of Ambition

If I had accepted that leadership role, there would have been a lot of pressure on me to do something really exciting. I can sometimes do exciting things, but I can't do them on demand. My energy level waxes and wanes. My creativity is irregular. When I do have an idea, sometimes it catches on and other times people just stare at me and think "What's wrong with him?" 
To be ambitious is a commitment. It's saying "Take a chance on me, and I will continue being creative and exciting and dependable for the foreseeable future." It's promising people that you're never going to let them down. If you act ambitious, and then when push comes to shove you say "Nah, I don't need the aggravation", then you don't look ambitious and high-status, you look like a flake.
Scott's lucid explanation of the costs to ambition is a great example of hot and cold decision-making. We tend to overestimate the stability of our beliefs, so when we feel ambitious and full of energy (like, at the start of a project) we assume that this feeling will continue indefinitely.

Of course, this is unlikely. Most of us have daily circadian rhythms, some less-well understood medium-term fluctuations, and gradual decays in interest.

So, as Scott learned early in life, it is prudent to anticipate and explicitly correct for your likely decline in motivation towards a topic when you present yourself to the world. The insidious part of this is that if you want to obtain resources you need to actually complete a task (e.g., a job, a grant, collaborators), you will have to sell yourself. This is why trade-offs aren't fun; they're just real.

Friday, July 6, 2012

Is *Any* Human Activity Long-Run Sustainable?

Intensive rice agriculture began in the Yangtze basin about 8,000 years BP, a sustainable model for agriculture by any reasonable standard. The extensive water infrastructure network around Chengdu, China, has diverted part of the Min River through the Dujiangyan for both flood control and irrigation without restricting fish connectivity since 256 BC, while some forests in India have been actively managed by surrounding communities for even longer periods.
That's from a guardedly optimistic article by Matthews and Boltz. It's academic but contains most of the good aspects of scholarly writing (copious references, measured tone), without most of the others (argument to authority, unwillingness to point out the obvious). There are 2650 words so you should expect to read it in about 12 minutes. Assuming you are an average blog reader (well, above average if you're reading this one), I recommend it, unless you haven't stared at the Vostok ice core data recently, in which case you should do that for 15 seconds first. 

Wednesday, July 4, 2012

The Meaning Of The Mean

Bob Carpenter has a few enlightening thoughts on the distinctions between 1) the sample mean, 2) the expected value of a random variable, and 3) the mean of a distribution. I've long been confused by the difference between the mean and expected value, and his trichotomy helps alleviate my confusion. With that in mind, I changed the intro of the article mean, which had remained static since the dawn of time (2001, when the page was created).

Of the major subjects on Wikipedia, statistics seems to be the most convoluted. My two explanations for this are that 1) there are so many schools that disagree on fundamental interpretations (likelihood, non-parametric, empirical Bayesian, objective Bayesian, etc), and 2) many practitioners are so busy with applications that they don't have time to reconcile their disagreements.