In clinical practice, biomarkers are critical for guiding treatment and predicting outcome. However, what ought to constitute the signature (viz. biomarkers) is a non-trivial problem. Truly universal signatures are lacking: Many reported signatures lack reproducibility across most studies, and some even do no better than random or irrelevant ones.
My current research is focused on addressing this failure from three aspects: the right hypothesis, the right assumptions, and the right statistics. In this regard, there are two main prongs in my work:
In the first, in the specialized context of systems and molecular biology, I am developing methodologies that more deeply integrate biological background knowledge into a (statistical) analysis.
In the second, in more general contexts, I am developing methodologies to identify what the main cause of a given failure is (wrong hypothesis, wrong assumption, or wrong statistics) and how to correct for it.
Besides these two main prongs that are more theoretical, I am also looking into constructing practical analytic systems that are self-diagnosing, self-correcting, and helpful, as well as providing natural interactions (e.g. spreadsheet-style interactions) to users.
I welcome students interested in data science, data mining, statistics, and computational biology to join my projects.