C Cheng, K Shen, C Song, J Luo and G. C. Tseng
Bioinformatics, 2009, 25(13), 1655-1661. DOI: 10.1093/bioinformatics/btp292
Motivation: Reproducibility analyses of biologically relevant microarray studies have mostly focused on overlap of detected biomarkers or correlation of differential expression evidences across studies. For clinical utility, direct inter-study prediction (i.e. to establish a prediction model in one study and apply to another) for disease diagnosis or prognosis prediction is more important. Normalization plays a key role for such a task. Traditionally, sample-wise normalization has been a standard for inter-array and inter-study normalization. For gene-wise normalization, it has been implemented for intra-study or inter-study predictions in a few papers while its rationale, strategy and effect remain unexplored.
Results: In this article, we investigate the effect of gene-wise normalization in microarray inter-study prediction. Gene-specific intensity discrepancies across studies are commonly found even after proper sample-wise normalization. We explore the rationale and necessity of gene-wise normalization. We also show that the ratio of sample sizes in normal versus diseased groups can greatly affect the performance of gene-wise normalization and an analytical method is developed to adjust for the imbalanced ratio effect. Both simulation results and applications to three lung cancer and two prostate cancer data sets, considering both binary classification and survival risk predictions, showed significant and robust improvement of the new adjustment. A calibration scheme is developed to apply the ratio-adjusted gene-wise normalization for prospective clinical trials. The number of calibration samples needed is estimated from existing studies and suggested for future applications. The result has important implication to the translational research of microarray as a practical disease diagnosis and prognosis prediction tool.