Claudio A. Gelmi, Purusharth Prakash, Jeremy S. Edwards and Babatunde A. Ogunnaike
With the primary objective of developing fundamental probability models that can be used for drawing rigorous statistical inference from microarray data, we have presented in a previous publication, theoretical results for characterizing the entire microarray data set as an ensemble. Specifically, we established, from first principles, that under reasonable assumptions, the distribution of microarray intensities follows the gamma model, and consequently that the underlying theoretical distribution for the entire set of fractional intensities is a mixture of beta densities. This probabilistic framework was then used to develop a rigorous statistical inference methodology whose outcome, for each gene, is an ordered triplet: a raw computed fractional (or relative) change in expression level; an associated probability that this number indicates lower, higher, or no differential expression; and a measure of confidence associated with the stated result. In this paper we validate the probabilistic framework and associated statistical inference methodology through confirmatory experimental studies of gene expression in Saccharomyces cerevisiae using Affymetrix Genechips®. The array data were analyzed using the probabilistic framework, and 9 genes-with indeterminate expression status according to the standard 2-fold change criteria, but for which our probabilistic method indicated high expression status probabilities-were selected for higher precision characterization. In particular, for genes CGR1, GOS1, ICS2, PCL5 and PLB1, the high probabilities of being differentially expressed (up or down) were found to be in excellent agreement with the expression status determined by the independent, high precision confirmatory experiments. These confirmatory experiments, using the high precision, medium throughput polonies technique, confirmed that the probabilistic framework performs quite well in correctly identifying the expression status of genes in general, but especially differentially expressed genes that would otherwise not have been identifiable using the standard 2-fold change criteria.
Compartilhe este artigo