Sunday, April 17, 2011

The Wisdom Of Whuber

That's William Huber, whuber for short, dispensed in his answers at the relatively new stats Q&A site, Cross Validated. His answers are the best on there, reputation normalized to the number of answers (with shrinkage). Here he writes about whether the median is a better summary stat than the mean:
Statistics does not provide a good answer to this question, IMO. A mean is ok to use, too, and is relevant in mortality studies for example. But ages are not as easy to measure as you might think: older people, illiterate people, and people in some third-world countries tend to round their ages to a multiple of 5 or 10, for instance. The median is more resistant to such errors than the mean....  Thus, for demographic, not statistical, reasons, a median appears more worthy of the role of an omnibus value for summarizing the ages of relatively large populations of people.
Here he writes about the biggest questions in statistics, from which I'll reproduce two (emphasis his):
  • Coping with scientific publication bias. Negative results are published much less simply because they just don't attain a magic p-value. All branches of science need to find better ways to bring scientifically important, not just statistically significant, results to light. (The multiple comparisons problem and coping with high-dimensional data are subcategories of this problem.)
  • Probing the limits of statistical methods and their interfaces with machine learning and machine cognition. Inevitable advances in computing technology will make true AI accessible in our lifetimes. How are we going to program artificial brains? What role might statistical thinking and statistical learning have in creating these advances? How can statisticians help in thinking about artificial cognition, artificial learning, in exploring their limitations, and making advances?
And here he writes about whether you should use a normal distribution to assign student grades:
I think that if any of those 800 students were to read this question, they might be offended. How well did they perform? How much learning was accomplished? That is what a grade should reflect, not some arbitrary statistical summary of their position in a group. IMHO this question should be recast in terms of teaching objectives, not statistical procedure, such as "what is a good way to convert raw scores to grades in a way that respects student accomplishments and advances the learning objectives of this class?" Statistics can help, but blind statistics--like standardization--will not.
Although they are often quite quantitative, his answers show how good stats rely on far more than just math.