The Mentaculus: The Internet Echo Chamber?

Some of the comments on Charlie Hoehn's recent post focused on whether the internet is merely an echo chamber or if intrinsic quality plays a larger role. As always in these "nature / nurture proxy" debates the winning answer is "somewhere in the middle," and the more useful question is how much each variable can explain.

To the extent that folk's behavior in listening to and downloading music is indicative of folk's propensity to e-mail, re-blog, or re-tweet articles*, then Mathew Salganik and Duncan Watts's two studies of web-based music listens and downloads, here and here, may be helpful in resolving this debate.

The researchers created a music downloading web site and uploaded 48 songs by unknown bands. They then recruited somewhat tech-savvy individuals to listen to, rate, and possibly download the songs. Folks downloaded on average 1 out of 7 songs they listened to, indicating some modicum of selectivity.

In one study, the researchers assigned all incoming visitors to either the "social influence" condition, in which they could see the rating and downloading behavior of others, or the "independent" condition in which they could not. Within the "social influence" condition, visitors were also assigned to one of a few identical "worlds," which should have different rating and downloading trends due to random chance.

When the songs were presented to visitors in a single column sorted by popularity, social influence was at its highest. Participants listened to the most downloaded song about ~45% of the time and the second and third most downloaded songs ~30% of the time, while they only listened to songs downloaded an average number of times ~5% of the time.

Salganik and Watts then used the download trends of individuals in the independent condition to predict download trends of individuals in the social condition. In experiment 2, knowledge of the independent data decreased naive prediction errors for the social influence condition by 16%. In experiment 3, with older and more international demographics, knowledge of the independent data decreased naive prediction errors for the social influence condition by 38%. This averages out to 27% as a rough proxy for the usefulness of independent appeal data for predicting which songs will be succesful in the social influence condition. Not great, but not that bad!

In the next study, the researchers had similar set up but in two of their social influence conditions they used an intervention: inverting the download rankings after 752 visitors (~27% of the overall number) had visited the site. This immediately increases the number of downloads for the previously lower rated songs, but eventually some of the top rated ones begin to climb back:

This study also included a non-inverted social influence condition to compare and an independent condition to measure intrinsic appeal. The r correlation between download ranks in the non-inverted social influence condition and independent condition is a strikingly high 0.82, corresponding to an explained variance of 67%. The inverted social influence conditions have much weaker correlations of 0.40 and 0.45 (corresponding to explained variances of 16% and 20%), but these show that even when social influence is directly manipulated against what folks independently prefer, there is still a positive trend between intrinsic appeal and downloading trends.

Salganik and Watts also mention the rating incompleteness theorem (see here): "On the one hand, by revealing the existing popularity of songs to individuals, the market provides them with real, and often useful, information; but on the other hand, if they actually use this information, the market inevitably aggregates less useful information." So, it's hard to prevent people from becoming biased by other's preferences because looking at them is is often a rational choice designed to save precious time. In other words, it's hard to nudge away from a Nash equilibrium.

* This is not necessarily an apt comparison. Music downloading is much more private and personal, whereas what you choose to blog or tweet about is much more visible and thus will subject you to more public judging. On the other hand, reading and discussing articles on the internet is much nerdier than music listening and thus participants may have less emotional attachment, leading to more quality-driven preferences. I don't know of any more applicable experiments but please get at me if you do.

Bottom Line: To say that "the internet is an echo chamber, full stop" is foolhardy. Based on these music download experiments, it seems that around 25 to 70% of folk's decisions to are based on the intrinsic appeal of the material. There is also reason to expect that this percentage would be higher if the download data were less public and estimates of popularity were more noisy, as they are in real life.

Monday, January 11, 2010

The Internet Echo Chamber?