The Mentaculus

Sunday, June 30, 2013

Feed Changes

1) In case you find yourself in an acute panic looking for a last-minute alternative for GR, I am now using Feedly as my RSS reader. I haven't used it enough to recommend it one way or another, but import at least is easy.

2) In an effort to consolidate various interests, I am now blogging exclusively here. If you're interested in keeping up with me, please follow that link and/or paste it (http://andrewtmckenzie.com/news/) into your RSS reader of choice. Even if you don't want to subscribe to the new blog, thanks for reading.

Wednesday, September 12, 2012

The Embryology Of Spin

Yavchitz et al. looked at which factors correlated with the presence of "spin" in the reporting of medical randomized control trials. Spin is emphasizing the benefits of a treatment more than is appropriate on the basis of the data. They cooked up a multivariate regression with the explanatory variables of journal type, funding source, sample size, type of treatment (drug or other), results of the primary outcomes (all nonstatistically significant versus other), author of the press release, and the presence of “spin” in the abstract conclusion.

In their sample (N = 41), the only factor that correlated significantly with spin in the press release and news article was spin. Spin in the abstract conclusions of a study leads to a 5.6 (95% CI 2.8–11.1) times higher relative risk of there being spin in the press release and news reports. So, to the extent that we care about curbing vicious information cascades, it's essential for authors and editors to be conscientiousness about word choice and framing in the abstract.

Tuesday, September 4, 2012

The Trade-Offs Of Publicizing Your Goals

In Ben Casnocha's reflections on writing his book The Start-Up Of You, he mentions this tidbit:

When you embark on a project that’s going to take awhile, you have to decide how much to publicize the fact – on your blog, to your friends — that you’ve started. ... When you publicly announce that you’re starting toward a goal, you can benefit from the self-fulfilling prophecy effect, you can collect feedback from your network, and be held accountable to lots of external people tracking your progress. On the flip side, when you announce a goal, you risk tricking your mind into believing you’ve already partially accomplished your it when in fact you’ve done nothing. Derek Sivers says: “Keep your goals to yourself.”) Plus, external accountability of the wrong kind can add unhealthy pressure.

This is a complicated and thus interesting trade-off. It involves an interaction between managing your own psychology and providing others the context they need to offer you help. The best-case scenario (not necessarily possible) would be for others to know what you are attempting to do without you knowing that they know. On the other hand, the worst-case scenario (much more plausible) is that you think others know when they don't.

In science, there are often norms against sharing too much of your project with outsiders, to prevent it from being "scooped". It strikes me now that these norms serve the dual purpose of preventing you from taking mental credit for something that you haven't yet done the grunt work to accomplish.

Anyway, I certainly don't have any general solutions to this trade-off, and it is something I worry about too.

I am probably slightly biased about the book because Ben is a friend, but I recommend it highly.

Thursday, August 30, 2012

In Praise Of The Obvious, Pt 2

Scott Aaronson explains the usefulness of the Church-Turing thesis in a way that makes intuitive sense to me, a newbie to TCS. Awesome! That kind of post is why I love subscribing to his blog. The commenter Keith apparently disagrees, saying,

It occurs to me that if you’re taking positions in arguments that I, a layman, could easily take, then you’re either wasting your time or indulging a hobby.

His attitude exemplifies why it's so important to relentlessly praise those who understand the minutia in a field yet take the time and status hit to point out the obvious. If the pressure to complicate is manifest in an academic's blog posts, just imagine what one would feel while writing an illustrious journal article.

Tuesday, July 24, 2012

Book Review: Great Flicks By Dean Simonton

Attention conservation notice: Review and notes from a book discussing an academic topic that will likely only interest you insofar as it generalizes to other topics, unless you are both a huge stats and film nerd.

I'm fascinated by movie ratings and what they tell us about: 1) the best ways to use rigorous methods to study the quality of a subjective output, 2) how variable people's assessment of quality are, and 3) how people conceptualize their own opinion in the context of everyone else's. Dean Simonton is a giant in the psychology of creativity, and I loved his book Creativity in Science. So, as soon as I saw this one, I clicked "buy it now" on its Amazon page.

My typical gripe against academic investigations of movie ratings is that they discount imdb.com, a huge resource with millions of data points, segregated by age, gender, geographical location, on an incredibly rich array of movies. So, soon after buying the book, I searched in the Kindle app for "imdb" and found very few results. This predisposed me to disliking it.

A few of my other gripes:

1) It takes awhile to get used to Simonton's academic writing style.

2) The book takes few risks stylistically. Each chapter feels like it could be its own separate article. Thus, he does not take full advantage of the long-form medium.

3) When he discussed a few of the measures (such as the correlation between different award shows), I felt that there was some issues with his account of the causality. Surely there is some, non-negligible probability that people take the ratings of others' into account when they make their own judgments. He mentions this sometimes, but not enough for me, and ideally he'd come up with some creative way to try to get around it.

4) Finally, there are a few typos. I actually like seeing typos, because it makes me think that I am learning from a more niche source that others are less likely to appreciate, but YMMV.

By midway through the book, Simonton had won me back to a large extent. His analyses of his data were very well-done and he supplies tables so you can look at the regression coefficients yourself. And there are many good nuggets, such as:

- the best predictors of higher ratings are awards for better stories (e.g., best screenplay and best director), as opposed to visual or musical awards

- having individuals on the production team who play multiple roles (such as writer, cinematographer, and editor all at once) makes the film more successful, presumably due to creative freedom

- some amount of repeated collaboration over multiple films with the same individuals, but not too much, is optimal for winning awards (i.e., there is a trade-off between stimulation and stagnation)

- higher box office returns are inversely correlated with success at awards shows

- the typical film is unprofitable; "about 80% of Hollywood's entire profit can be credited to just a little over 6% of the movies"

- the curse of the superstar: "if a star is paid the expected increase in revenue associated with his or her performance in a movie then the movie will almost always lose money" (this is because revenue is so positively skewed)

- divides movies into two types: those that are extremely successful commercially, and those that are extremely successful artistically (people often use the former to subsidize the latter)

- negative critic reviews have a more detrimental impact than positive reviews have a boosting effect on box office returns

- on ratings, critics and consumers have similar tastes, although consumers' tastes are more difficult to predict, presumably because their proclivities are more diverse

- for a consumer, the most important factor for whether they will watch a movie is its genre (#2 is word of mouth)

- dramas do worse in the box office, better at the awards shows; comedies are the reverse

- PG-13 movies make the most money; some romance, but no actual nudity, is best (and lots of action but no gore)

- on average, sequels do far worse in ratings and awards than the original movies

- the greater the involvement of the author in an adapted movie, the less money it will make (they interfere more and might care more about "artistic integrity" than making money)

- directors tend to peak in their late 30s; they have more success in their late 20s than their late 50s, on average

- divides directors into two types: conceptual (innovative and imaginative; think Welles) and experimental (technical and exacting; think Hitchcock)

- conceptual directors express ideas through visual imagery and emotions, often leave behind one defining film, and decline quickly

- experimental directors emphasize more realistic themes, slowly improve their methods, and their best films often occur towards (but almost never *at*) the end of their careers

- female actors make less money than their male counterpoints, and the best picture award correlates much better with best male actor than best female actor

- awards for scores are much better predictors of a film's quality than awards for songs

All in all, this book is far from perfect, but it is likely the best full-length treatment of quantitative movie ratings available. If you are interested in the topic, and occasionally find yourself doing things like browsing the rating histograms on imdb, then this is essential reading.

Friday, July 20, 2012

Statistics Is Like Medicine, Not Software

Stats questions--even when they're pure cut-and-dried homework--require dialog. Medicine might be a better analogy than software: what competent doctor will prescribe a remedy immediately after hearing the patient's complaint? One of our problems is that the [Stack Exchange] mechanism is not ideally suited to the preliminary dialog that needs to go on.

That's from the ever erudite William Huber, in this chat about why the statistics Q&A site has problems that the software Q&A site does not. Some users argue that a high proportion of questions on the stats site should not be answered unless they are disambiguated further.

You might assume that "more answers are better," but answering an ill-posed question adds more noise to the internet. When searching to clarify an ambiguous term, somebody might find that question, read the answer, and end up even more confused. Recall that this is a field already stricken by diametric ideology and short-term incentives.

Here is my previous post on the wisdom of Huber.

Sunday, July 8, 2012

The Psychosocial Costs Of Ambition

If I had accepted that leadership role, there would have been a lot of pressure on me to do something really exciting. I can sometimes do exciting things, but I can't do them on demand. My energy level waxes and wanes. My creativity is irregular. When I do have an idea, sometimes it catches on and other times people just stare at me and think "What's wrong with him?"

To be ambitious is a commitment. It's saying "Take a chance on me, and I will continue being creative and exciting and dependable for the foreseeable future." It's promising people that you're never going to let them down. If you act ambitious, and then when push comes to shove you say "Nah, I don't need the aggravation", then you don't look ambitious and high-status, you look like a flake.

Scott's lucid explanation of the costs to ambition is a great example of hot and cold decision-making. We tend to overestimate the stability of our beliefs, so when we feel ambitious and full of energy (like, at the start of a project) we assume that this feeling will continue indefinitely.

Of course, this is unlikely. Most of us have daily circadian rhythms, some less-well understood medium-term fluctuations, and gradual decays in interest.

So, as Scott learned early in life, it is prudent to anticipate and explicitly correct for your likely decline in motivation towards a topic when you present yourself to the world. The insidious part of this is that if you want to obtain resources you need to actually complete a task (e.g., a job, a grant, collaborators), you will have to sell yourself. This is why trade-offs aren't fun; they're just real.

Friday, July 6, 2012

Is Any Human Activity Long-Run Sustainable?

Intensive rice agriculture began in the Yangtze basin about 8,000 years BP, a sustainable model for agriculture by any reasonable standard. The extensive water infrastructure network around Chengdu, China, has diverted part of the Min River through the Dujiangyan for both flood control and irrigation without restricting fish connectivity since 256 BC, while some forests in India have been actively managed by surrounding communities for even longer periods.

That's from a guardedly optimistic article by Matthews and Boltz. It's academic but contains most of the good aspects of scholarly writing (copious references, measured tone), without most of the others (argument to authority, unwillingness to point out the obvious). There are 2650 words so you should expect to read it in about 12 minutes. Assuming you are an average blog reader (well, above average if you're reading this one), I recommend it, unless you haven't stared at the Vostok ice core data recently, in which case you should do that for 15 seconds first.

Wednesday, July 4, 2012

The Meaning Of The Mean

Bob Carpenter has a few enlightening thoughts on the distinctions between 1) the sample mean, 2) the expected value of a random variable, and 3) the mean of a distribution. I've long been confused by the difference between the mean and expected value, and his trichotomy helps alleviate my confusion. With that in mind, I changed the intro of the article mean, which had remained static since the dawn of time (2001, when the page was created).

Of the major subjects on Wikipedia, statistics seems to be the most convoluted. My two explanations for this are that 1) there are so many schools that disagree on fundamental interpretations (likelihood, non-parametric, empirical Bayesian, objective Bayesian, etc), and 2) many practitioners are so busy with applications that they don't have time to reconcile their disagreements.

Wednesday, June 20, 2012

When You Should Be Most Skeptical

One of the hardest things we can do as readers is disagree with the methods of authors we agree with ideologically. It makes us feel good to find authors who agree with us, but this is when we should be at our most skeptical. Searching the world for self-justification is not a worthwhile goal, it simply turns you into another short-sighted, argumentative know-it-all.

That's from Keely's scathing, analytical review of The Giver. I like the idea that we should be especially skeptical of the arguments of those we agree with, to counteract out natural tendency to the contrary.

Wednesday, June 13, 2012

When More Data Trumps Logic

A difficulty with the “more data is better” point of view is that it’s not clear how to determine what the tradeoffs are in practice: is the slope of the curve very shallow (more data helps more than better algorithms), or very steep (better algorithms help more than more data). To put it another way, it’s not obvious whether to focus on acquiring more data, or on improving your algorithms. Perhaps the correct moral to draw is that this is a key tradeoff to think about when deciding how to allocate effort. At least in the case of the AskMSR system, taking the more data idea seriously enabled the team to very quickly build a system that was competitive with other systems which had taken much longer to develop.

That's Michael Nielsen in an interesting post describing how machine learning question-and-answer systems work. I completely agree that identifying trade-offs is one of the most useful ways to decide how to proceed on a problem. That's why I think the general study of trade-offs, across fields, is underrated.

Monday, May 28, 2012

GATCACA

In many American states it is legal to screen and select on the basis of sex, for non-medical reasons. In fact, a 2006 study (see below) found that 9% of [preimplanation genetic diagnosis] procedures carried out in [in-vitro fertilization] clinics in the U.S. were performed for this reason. Other reasons include screening for an embryo with the same immune type (“HLA type”) as a current child who is ill and requires a transplant of some sort. Screening for these “savior siblings” was done in 1% of PGD procedures. And 3% used it for a reason I personally find jarring – to specifically select embryos with a mutation causing a genetic condition. This is usually in cases where both parents have either deafness or dwarfism and they want their child to be similarly affected. This gets into the political movement objecting to society labelling conditions as “disabilities”. I can sympathise with that to some degree – more for some conditions than others – but I think, if it were my child, I would still rather he or she could hear.

That's Kevin Mitchell, discussing GATTACA, an entertaining sci-fi movie with a respectable 7.8 imdb rating. Spoiler alert, the premise of the movie is that at some point in the future there will be strong stratification of people into two classes, the "valids" and the "invalids", based on whether they had healthy traits selected for via preimplanation genetic diagnosis.

It seems to me highly unlikely (<0.01%) that a nightmare scenario of this sort would actually occur. One of the main reasons is because of the large plurality of values among parents, as seen above. A prevailing reason people have kids is to propagate a form of themselves into the future, and in many ways it defeats the purpose when you select against certain traits or even perform some sort of genetic engineering.

The other reason is something we know now better than we did 15 years ago, when GATTACA was released. And that is that DNA doesn't actually explain all that much of physiology and behavior--there are also strong epigenetic effects as well as stochastic effects of gene expression.

Sunday, May 27, 2012

No Darkness But Ignorance

Here's Nancy Kanwisher's suggestion on how to improve the field of neuroimaging:

NIH sets up a web lottery, for real money, in which neuroscientists place bets on the replicability of any published neuroimaging paper. NIH further assembles a consortium of respected neuroimagers to attempt to replicate either a random subset of published studies, or perhaps any studies that a lot of people are betting on. Importantly, the purchased bets are made public immediately (the amount and number of bets, not the name of the bettors), so you get to see the whole neuroimaging community’s collective bet on which results are replicable and which are not. Now of course most studies will never be subjected to the NIH replication test. But because they MIGHT be, the votes of the community are real....

First and foremost, it would serve as a deterrent against publishing nonreplicable crap: If your colleagues may vote publicly against the replicability of your results, you might think twice before you publish them. Second, because the bets are public, you can get an immediate read of the opinion of the field on whether a given paper will replicate or not.

This is very similar to Robin Hanson's suggestion, and since I assume she came up with the idea independently, it bodes well for its success. Both Hanson and Kanwisher are motivated to promote an honest consensus on scientific questions.

When John Ioannidis came to give a talk at the NIH (which was interesting), I asked him (skip to 101:30) for his thoughts on this idea. He laughed and said that he has proposed something similar.

Could this actually happen? Over the next ten years, I'd guess almost certainly not in this precise form; first, gambling is illegal in the US, and second, the markets seem unlikely to scale all that well.

However, the randomized replication portion of the idea seems doable in the near term. This is actually now being done for psychology, which is a laudable effort. It seems to me that randomized replications are likely precursors to any prediction markets, so this is what interested parties should be pushing now.

One objection is that these systems might encourage scientists to undertake more iterative research, as opposed to game-changing research. I have two responses. First, given the current incentives in science (i.e., the primacy of sexy publications), this might actually be a useful countervailing force.

Second, it seems possible (and useful) to set up long-standing prediction markets for a field, such as, "will the FDA approve an anti-amyloid antibody drug to treat Alzheimer's disease in the next ten years?". This would allow scientists to point to the impact that their work had on major questions, quantified by (log) changes in the time series of that market after a publication.

Saturday, May 26, 2012

Evaluating The Regret Heuristic, Part II

In a comment to my post on how our regrets change over time, Eric Schwitzgebel asks,

But why adopt regret minimization as a goal at all? Regret seems distorted by hindsight bias, status quo bias, and sunk cost bias, at least.

I've written before that projecting your future views about your present actions can be a good way to make decisions. So, Eric's prompting is a good occasion to re-evaluate that.

Given perfect information, the theoretically best way to make decisions is to 1) calculate the costs and benefits of each possible outcome, 2) estimate how your choice affects the relative probability of those outcomes, 3) use the costs and benefits as inputs to some sort of valuation function, and 4) make the decision with the highest probabilistic value.

Cost-benefit analysis is a common way to implement this, with, say, QALYs as the value measure. If you have perfect information, this is just math.

But as Ben Casnocha says, if you don't have enough information, that framework can break down. In particular, even when #2 is pretty straightforward, #1 can still be very tricky. For example, although studying for the LSAT makes it much more likely that I will earn a JD, it's still hard to quantify the precise costs and benefits of entering that earning that degree.

Here is where the regret heuristic can be useful. Instead of explicitly tallying each cost and benefit, it asks: in total, which would you regret more: studying or not studying?

This is in fact a simplifying measure, but there remains oodles of freedom in how you perform the regret estimation. For example, you can:

use as reference classes your current regrets about your previous, similar actions and/or the regrets of other people who have made similar decisions, or just make it up;
integrate your regret over all of your possible future states (e.g., ages) weighted by their probability of occurring, or just choose one arbitrary time point;
extend the regret over broad classes of choices you could make, or keep it local;
apply systematic techniques to adjust for various biases (e.g., impact, confirmation, status quo, hindsight, planning, sunk cost), or not.

Ultimately, I still think that the regret heuristic can be a useful one. But tread carefully, as there are many crucial micro-decisions to make; it's not magic.