The Mentaculus: March 2011

Monday, March 28, 2011

Trade Offs In Self-Identifying

Katja Grace has written a fascinating mini-dialogue between two of her mental modules, one of which is irrationally anxious and one of which is rationally calm. She concludes,

Identifying with being rational is a useful trick because it provides a convenient alternative emotional imperative – to follow the directions of the more reasonable part of oneself – in any situation where the irrational mental module can picture a rationalist.

She links to, and seems to conflict with, Paul Graham's famous advice to keep your identity small. His idea is that since "people can't think clearly about anything that has become part of their identity, then all other things being equal, the best plan is to let as few things into your identity as possible."

As I wrote back in Feb '09 when his article was published, there has to be a trade-off to this, because there are trade-offs to everything. My stab at it was that self-identification is a useful shortcut:

If you identify yourself with fewer things, then you will constantly have to make decisions.... [W]e can only make so many decisions before we become tired and revert to shortcuts that expend the least possible energy. So you can't keep your identity small, because you will be worn out by making trivial decisions throughout the day. But what you can do is loosely identify with various identities and be constantly open to change.

But Katja's internal dialogue suggests why identifying yourself with particular stances is a useful shortcut. That is, self-identifying allows you to save the time spent explaining to yourself precisely why you should take a certain stance every time you encounter a slightly novel scenario.

Now, Paul is right that there are still advantages to keeping your identity small. Even definable forms of rationality have their failures. So, this seems like a specific case of the more general trade-off, plasticity vs specialization. You can either pay a cost in time explaining to remain plastic and able to change, or you can specialize and potentially bias yourself, but free up resources for use elsewhere.

George Ainslie has written extensively about interpersonal bargaining, so if you are interested in these issues you should check out one of his books (here and here). Lots of terminology (i.e., you will def learn what hyperbolic discounting means), but worth the investment.

Saturday, March 26, 2011

Commonly Held But False Beliefs Impede Tech Development

That is the stance of George Church, who works on new methods of DNA synthesis and, more generally, genetics. In this Nature Biotech article, he also says, of his conversations with fellow scientists:

Sometimes they'll say something is impossible, but if you drill down they say that it's really expensive, and if you drill further, they say it's really expensive now.

One of the issues is that people mean different things when they say words like "impossible." They are referring to different time scales, different probability measures (does impossible mean in 1 in 1000 or 1 in 1,000,000 worlds?), and different reasons why (intrinsic to the tech, coordination problems, regulation, etc). Really want we want is incentives to make people more precise about their predictions.

Saturday, March 19, 2011

What How You Read A Paper Says About You

abstract only = lazy bum

intro only = n00b

methods only = nerd

results only = conceited

discussion only = naive

intro + methods = tease

intro + results = arrogant

intro + discussion = proud owner of a Jump to Conclusions Mat

methods + results = overly-technical

methods + discussion = gullible

results + discussion = impatient

intro + methods + results = somebody who is no fun at parties

intro + results + discussion = somebody who is way too much fun at parties

methods + results + discussion = somebody who doesn't go to parties

figures only = honorary member of the Derek Zoolander Center For Kids Who Can't Read Good And Wanna Learn To Do Other Stuff Good Too

text only = too lazy to deal with pop-up windows

figures first, then full text = overachiever

Thursday, March 17, 2011

The Seven Most Discussed Scientific Biases

From David Chavalariasa and John Ioannidis (pubmed, doi:10.1016/j.jclinepi.2009.12.011):

confounding bias = when you think you are measuring the effect of variable X on variable Y, but in reality there is another variable Z that correlates with X and also affects Y, which you haven't considered.
selection bias = when you think that all the various sub-groups of the population are proportionally just as likely to be in your sample, but in reality certain groups are more likely to be present than proportional, because of the way you collect your data.
publication bias = when you are more likely to publish or tell others about your results if they 1) conform to what you expect, or 2) are what you think others would prefer to hear.
response bias = when respondents answer your questions in the way they think you want them to answer, rather than according to their true beliefs; this could also happen in animal research if you reward animals for responding in a certain way outside of the main test.
attention bias = when you focus only on data that supports your hypothesis and ignore data that would make your hypothesis less likely.
recall bias = when respondents are more likely to remember the content of your question if they hold a certain belief on it.
sampling bias = when you think your sample is representative of the population, but really it is not, because it is skewed in ethnicity, attractiveness, age, gender, and/or etc, casting doubt on your generalizations from the sample to the population. (this is actually a sub-category of selection bias, with the distinction of external vs internal validity that sounds cool but also troublesomely postmodern)

The legendary Ioannidis apparently had a dream of writing a "book of bias," but settled for this paper. They also have a cool cluster map of how all the biases relate to one another, but unfortunately this paper is not open access so I can't post it here for your viewing pleasure. The above is based on text-mining from pubmed, which of course has its own biases.

Added 3/21: Eduardo Zugasti has translated this post to Spanish here.

Tuesday, March 15, 2011

Trade Off #19: Exploration vs Exploitation

When an agent first enters a system, it will have to decide whether it should stay still and try to understand its surroundings, or to begin to move towards what it initially considers the most attractive direction. And indeed, at any point in its existence in that system, it will still face this trade-off: should it "explore" by gaining more info about the system, or should it "exploit" its knowledge by devoting resources to mining the option that currently has the highest expected value?

The downside to the strategy of "all exploration" is that you never reap the benefits of your information, while the downsides to "all exploitation" are that 1) you can get locked into local maxima, and 2) you can't adapt when circumstances change. There are many domains in which we find this trade-off:

One might think of basic science research as "exploration" and of applied science research as "exploitation." That is, investing in basic science now takes scarce resources, but makes future applied efforts more powerful. (see article, pdf)
In the multi-armed bandit problem, a gambler is faced with multiple levers to pull from, each of which has an unknown but unique distribution of payouts. He can either sample a diverse set of levers to discover more info about their expected reward, or he can just choose the lever with the highest current expected payout. (see here)
Organizations can "exploit" their worker's knowledge by forcing them to socialize to their particular code more quickly, but this leads to less "exploration" and ultimately a lower equilibrium knowledge level for the org. (see article, pdf--this is a classic paper, with 6000+ citations)

In previous revisions of the canon, this trade off was subsumed by switching costs vs change gains. But now I think that they really are distinct, because this trade-off describes a continuous case whereas switching costs vs change gains is binary. But yes this is confusing, and yes that is a bad sign: building a taxonomy of trade-offs has not proven as easy as I initially thought!

Of the two downsides to the "all exploitation" strategy, the local maxima problem seems like it could be surmounted, if one became very good at prediction. But the utility of exploration in allowing an agent to adapt to a changing environment seems to be very robust.

The one way that the "all exploration" strategy might break down is if one's end goal was merely and wholly understanding--if, in other words, it were the case that "the journey is the destination." So if you've always thought that phrase sounded like commie fiddlesticks, then this trade off might be right up your alley.

(photo credit to flickr user Ronny R)

Saturday, March 12, 2011

The Defects Of Others

First here's Anne Lamont in her book of advice, Bird by Bird (via Ben Casnocha):

A person's faults are largely what make him or her likable. I like for narrators [of novels] to be like the people I choose for friends, which is to say that they have a lot of the same flaws as I. Preoccupation with self is good, as is a tendency toward procrasination, self-delusion, darkness, jealousy, groveling, greediness, addictiveness. They shouldn't be too perfect; perfect means shallow and unreal and fatally uninteresting.

Then here's Santiago Ramon y Cajal in the foreword to a later edition of his Advice for a Young Investigator (Swanson translation):

However, we decided not to undertake a detailed editing of this modest little product of youth. Whether good or bad, every book has a spiritual personality. The public knows this and demands that the author respect it; they do not want it replaced under the guise of improvement. And this could very easily happen today, when, on the threshold of old age, we appear (and occasionally are) somehow defective. It is precisely this feature that attracts the reader's attention and gains his sympathy--just as with men, we admire and respect books for their good qualities; but we can only love them for certain faults that they display.

It seems legit that in both friends and books (and is not a book a sort of friend?) we tend to prefer those with some defects, some sort of burden. Why might this be? Possibly it's because just anyone could like perfection, but it takes one's uniqueness to accept and even appreciate defects.

Tuesday, March 8, 2011

How To Find Which Basketball Stats Matter

Dave Johns today summarized whether basketball stats can actually tell how much a player is helping his team. It seems that there are two key problems:

1) The one stat we know must be good in the long run, adjusted plus-minus, is too noisy to make good inferences based on short run data.

2) The stats that we do have lots of short-run game data on, like points, rebounds, and FG%, we don't know how to interpret in terms of how much they actually help the team. For example, Kevin Love is piling up boards, but does that actually help the Timberwolves win ballgames?

One approach to solve these problems would be to use a large set of training data for both adjusted +/- and summary stats, spanning many years. For each player and each game (or even each quarter), you try to use the statistics from (2), like points and rebounds, as features to try to predict the player's adjusted +/- in that time period. Some of the statistics will be able to predict the +/- really well, whereas others won't. So going forward, we'll be able to say which of the stats are good short-run proxies for long-run +/- and which are not. That's it. It will be beautiful.

For a few minutes I thought about trying to do this myself, but I couldn't find easy enough access to raw +/- data. That is to say, screen scraping nba.com does not sound like much fun. If anyone knows of a nice and clean data set, holla acha boy.

Frankly I was surprised Johns didn't mention this approach in his article (thus this post), but I assume that it's what teams using bball sabermetrics are doing. The approach is similar to the netflix prize or to many articles in machine learning, like Burstein et al '09, who predict the function of proteins based on training features. I'm pasting their figure 1 below for a schematic of the process, although note theirs is binary whereas our classification system would be continuous. Think of the "features" as either simple stats like assists, or more complicated ones like under-/over-shooting, and think of "classification algorithms" as either naive things like "how many points did the player score?", or more complicated things like John Hollinger's PER.

doi:10.1371/journal.ppat.1000508

Tuesday, March 1, 2011

Use Words To Convey Probabilities

William Strunk, author of what is surely the most highly cited text on writing, The Elements of Style, favors the use of words as vehicles for bold statements. From the foreword,

[H]is original Rule 11 was "Make definite assertions." That was Will all over. He scorned the vague, the tame, the colorless, the irresolute. He felt it was worse to be irresolute than to be wrong.

But what if you are not certain of a belief, something that should happen to reasonable people nearly all of the time? Apparently Strunk suggests you should hide this, as it makes you look "colorless" and low status.

On the contrary, we should use employ the wide range of our language to calibrate our words to the probability that we assign to events. However, we want to use words that unambiguously assign probabilities. Ideally, we'd have a clear mapping between our words and the probabilities we assign to the events those words describe. There have been at least two attempts to do this, as described by wikipedia. Combining what I see as the benefits of both of these scales, going forward I'll try to use the following system:

"Surely" = > 99% probability
"Likely" = ~ 90 - 99% probability
"Probable" = ~ 60 - 90% probability
"Chances about even" = ~ 40 - 60% probability
"Improbable" = ~ 10 - 40% probability
"Unlikely" = ~ 1 - 10% probability
"Surely not" = < 1% probability

One key arbitrary choice here is putting "likely" above "probable" in the hierarchy, which feels right but doesn't have much precedence. Any suggestions?