Monday, May 30, 2011

Which Parts Of Crowds Are Wise?

Peter Freed has written a pretty ambitious critique of Jonah Lehrer's summary of this study (pdf) on the wisdom of the crowds. The crux is that:
But now that I realized he really meant median, and that maybe he didn’t know what median meant.  Because median guesses are not guesses by a crowd, as Lehrer states.  They are guesses by a single person... [Lehrer] is talking about that 0.7% single-person data point: one person, selected after giving their answer, got close to the correct answer on one of six questions.  One person guessed 10,000 when the answer was 10,067.  That’s one hit out of 144 x 6 = 864 attempts.  That seems about right to me, from a common sense perspective. Which is to say, that is a shitty batting average.
Scrolling through the comments, I was pleased to see Ian Sample point out the critique of Freed's critique that I was going to make:
In Wisdom of Crowds studies you can look at the mean and / or the median. The median usually gives the best result if the guesses *do not* follow a normal distribution. The mean, of course, exploits the error-cancelling advantage that WOC is known for, that is, as many people under-estimate as over-estimate the right answer, so averaging cancels all but systematic biases. But to my point. To dismiss the median answer – one guy’s response – misses the fact that without the crowd you have no median answer to dismiss. Without the crowd, you do not know which value to pick. That’s the whole point. The crowd steers you to the median value, which in many cases outperforms the mean.
The median is indeed generated by only one person, but it becomes interesting only in the context of all the other estimates. It is useful here because it offers resistance to outliers. For example, some less numerate soul might have guessed 1,000,000, which is way off from the true value of ~ 10,000, thus skewing the arithmetic mean. In that case you'd much prefer a more robust statistic like the trimmed mean or the median.

In prediction markets, the most recent price of a transaction doesn't always best represent the current beliefs of the market. There's more info if you look at the whole distribution of orders. Similarly, it is unfair of Freed to dismiss the whole data set just because one type of estimator is flawed. This is one of the coolest parts of statistics, using potentially counter-intuitive methods to extract useful info out of data, to find the wisdom in the crowds.

Sunday, May 15, 2011

Fighting The Lernaean Hydra Bias

I'll just only mention the heads I do cut off

In one Greek myth, Hercules takes on the task of killing a serpent-like, many-headed beast. This is made more difficult by the fact that its heads regenerate, so even if Hercules chops one off with his sword, another will simply sprout in its place. John Ioannidis uses this frustrating scenario as an analogy for a problem in the world of scientific publishing in his discussion of meta-analyses (doi:10.1002/jrsm.19).

The example Ioannidis employs to explain this problem is his experience doing a meta-analysis on the pharmacogenetics of certain polymorphisms for asthma treatment (doi:10.1097/01.fpc.0000236332.11304.8f). There were many studies that fit the criteria, but they each evaluated their own endpoints and genetic contrasts. That is, in most of the studies, the vast majority of possible correlations that could have tested with the data between phenotype and genotype were either not done or not reported.

So the surface problem, in so far as this case generalizes to others, is that published studies are not as exhaustive as they could be. But the central, troubling implication is that these studies do not fail to be exhaustive because of time or computational constraints, but because the researchers want to emphasize the usefulness and/or interestingness of their results. This is more insidious--this is why the hydra heads regenerate.

Now, one can use meta-analysis to retrospectively "chop off" findings that are truly insignificant by combining the results of many different data sets. But meta-analysis itself can be biased in many ways (e.g., during study selection), and moreover, later researchers can just come back to the issue and cherry pick more novel associations, thus "sprouting" more statistically significant findings.

When faced with the hydra, Hercules knew he couldn't go it alone, so he called on his nephew for help, who suggested that they cauterize the stumps with fire before the heads could regrow. An analogy to this strategy might be to post warnings on the electronic copy of papers that have been called into question by later studies. Such a warning would be much milder and hopefully less political than a retraction, which typically implies some sort of error. Publishing a potentially informative result that is eventually overturned is still laudable.

But instead of this type of patchwork fix, a more fundamental approach seems more fruitful. In the original myth, only one of the hydra's heads was truly immortal, and this was the one that Hercules needed to chop off to finally defeat the beast. The immortal head of the scientific publishing hydra is the incentive structure pushing researchers towards significance hunting in the first place.

Reworking these incentives is what Ioannidis is fundamentally arguing for, as the way to kill the Lernaean hydra bias once and for all: more standardization, more consortia, and more of a push towards openness and replicability. Every study might combine previous data with its own for estimating the posterior probability of the parameters it is examining, and all research might be seen as a continuous and cumulative meta-analysis. Maybe one day.

(photo credit to Frank Rafik)

Saturday, May 14, 2011

Color Me Old-Fashioned

A fantastic idea from Risto Saarelma on how to re-design the comment section of the website Less Wrong:
Provide an ambient visual cue on how old a comment is. First idea is to add a subtle color tint to the background of each comment, that goes by the logarithm of the comment's age from reddish ("hot", written in the last couple of hours) to bluish ("cold", written several months or more ago). Old threads occasionally get new comments and get readers in via them, and the date strings in the comments require some conscious parsing compared to being able to tell between "quite recent" and "very old" comments in the same thread by glance.
Too true. Who takes the time to read the actual date of a comment? This way you wouldn't have to.

This sort of subtle clue is something that people will appreciate and pick up on quickly. For example, on the blog Marginal Revolution you can always tell whether Tyler or Alex is posting because Tyler only capitalizes the first word in the title of his posts whereas Alex capitalizes all of the words in his titles. Knowing this, you won't have to waste time scanning the byline as you plow through your RSS feeds because you'll already know who wrote it from the title.