Tuesday, March 15, 2011

Trade Off #19: Exploration vs Exploitation


When an agent first enters a system, it will have to decide whether it should stay still and try to understand its surroundings, or to begin to move towards what it initially considers the most attractive direction. And indeed, at any point in its existence in that system, it will still face this trade-off: should it "explore" by gaining more info about the system, or should it "exploit" its knowledge by devoting resources to mining the option that currently has the highest expected value?

The downside to the strategy of "all exploration" is that you never reap the benefits of your information, while the downsides to "all exploitation" are that 1) you can get locked into local maxima, and 2) you can't adapt when circumstances change. There are many domains in which we find this trade-off:
  • One might think of basic science research as "exploration" and of applied science research as "exploitation." That is, investing in basic science now takes scarce resources, but makes future applied efforts more powerful. (see article, pdf)
  • In the multi-armed bandit problem, a gambler is faced with multiple levers to pull from, each of which has an unknown but unique distribution of payouts. He can either sample a diverse set of levers to discover more info about their expected reward, or he can just choose the lever with the highest current expected payout. (see here)
  • Organizations can "exploit" their worker's knowledge by forcing them to socialize to their particular code more quickly, but this leads to less "exploration" and ultimately a lower equilibrium knowledge level for the org. (see article, pdf--this is a classic paper, with 6000+ citations)
In previous revisions of the canon, this trade off was subsumed by switching costs vs change gains. But now I think that they really are distinct, because this trade-off describes a continuous case whereas switching costs vs change gains is binary. But yes this is confusing, and yes that is a bad sign: building a taxonomy of trade-offs has not proven as easy as I initially thought!

Of the two downsides to the "all exploitation" strategy, the local maxima problem seems like it could be surmounted, if one became very good at prediction. But the utility of exploration in allowing an agent to adapt to a changing environment seems to be very robust.

The one way that the "all exploration" strategy might break down is if one's end goal was merely and wholly understanding--if, in other words, it were the case that "the journey is the destination." So if you've always thought that phrase sounded like commie fiddlesticks, then this trade off might be right up your alley.

(photo credit to flickr user Ronny R)