Sunday, April 29, 2012

A Brief History Of Bioinformatics, 1996-2011


That's from an interesting article by Christos Ouzounis. Here he discusses the "adolescence" period:
One factor in policymakers' high expectations might have been a certain lack of milestones: due to the field's dual nature, that of science and engineering, computational biology rarely has the “eureka” moment of a scientist's discovery and is grounded in the laborious yet inspired process of an engineer's invention. 
And there's this bit, too:
The notion of computing in biology, virtually a religious argument just 10 years ago, is now enthroned as the pillar of new biology.
So why has "bioinformatics" become less discussed? In part, because it has been so successful. 

Friday, April 13, 2012

Schelling Points And Bioinformatics

A lot of what think about when I do bioinformatics is how to set parameters non-arbitrarily. Basically I am looking for Schelling points: round, clear numbers that are easy to justify. The classic case is setting a p-value threshold to 0.05, which has been around for over eighty years and is still going strong, despite the haters. Other examples are setting e-value thresholds to 0.01 and setting Bayes factor thresholds of 10 as the first to indicate "strong" evidence. Like any threshold, these are arbitrary, but following the paradigm of statistics as rhetoric, their staying power make sense insofar as scientists need to be able to resort to standard procedures to settle debates. Anyway, I have no profound point here, I just think it's cool that a seemingly esoteric topic affects what I actually do on a day-to-day basis.


Is It Possible, In Principle, To Do Methodologically Sound Research?

In a paper published 31 years ago, Joseph McGrath argues (html, pdf) that the answer is no. Specifically, he claims that any research design faces two trade-offs: 1) being obtrusive vs unobstrusive (which maps to my terminology as acquiring info vs altering subject), and 2) being generalizable vs context-cognizant (which maps to my terminology as precision vs simplicity).

In his terminology, these trade-offs allow for the optimization of three distinct values (generalizability of samples to populations; precision in measuring variables; and context realism for the participants). Initially, I disagreed with this. To me, intuition suggests that there should be four points which maximize certain qualities when you are considering the intersection of two-trade offs: one in each corner of the 2-d space.

One way to get around this is if you claim that, in the context of this decision (study design), the trade-offs are not independent. For example, it might be very difficult for a design to be both highly generalizable and highly obstrusive.

Below I've drawn an example. Think of the dots as realizations of actually feasible study designs sampled from someone's mental generation process; i.e., they are probably not at the absolute extremes of the theoretical distribution, but with enough realizations, would come close.


I'm not sure that I agree with this exact distribution, and it would need some justification, but it seems like the only way to justify his three-pronged rather than four-pronged set-up.