Estimating probabilities of enrichment

4 January 2012 Leave a comment

Z. Yang, Z. Li, and D. R. Bickel, “Empirical Bayes estimation of posterior probabilities of enrichment,” Technical Report, Ottawa Institute of Systems Biology, Technical Report, Ottawa Institute of Systems Biology, arXiv:1201.0153 (2011). Full preprint | 2010 seed

This paper adapts novel empirical Bayes methods for the problem of detecting enrichment in the form of differential representation of genes associated with a biological category with respect to a list of genes identified as differentially expressed. A microarray case study illustrates the methods using Gene Ontology (GO) terms, and a simulation study compares their performance. We report that which enrichment methods work best depends strongly on how many GO terms or other biological categories are of interest.

Combining inferences from different methods

28 November 2011 Leave a comment

D. R. Bickel, “Resolving conflicts between statistical methods by probability combination: Application to empirical Bayes analyses of genomic data,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1111.6174 (2011). Full preprint

This paper proposes a solution to the problem of combining the results of differing statistical methods that may legitimately be used to analyze the same data set. The motivating application is the combination of two estimators of the probability of differential gene expression: one uses an empirical null distribution, and the other uses the theoretical null distribution. Since there is usually not any reliable way to predict which null distribution will perform better for a given data set and since the choice between them often has a large impact on the conclusions, the proposed hedging strategy addresses a pressing need in statistical genomics. Many other applications are also mentioned in the abstract and described in the introduction.

Minimax strength of statistical evidence

24 November 2011 Leave a comment

D. R. Bickel, “A predictive approach to measuring the strength of statistical evidence for single and multiple comparisons,” Canadian Journal of Statistics 39, 610–631 (2011). Full text | Revised preprint | 2010 draft

This paper introduces a novel approach to the multiple comparisons problem by generalizing a promising method of model selection developed by information theorists. The first two sections present that method and its main advantages over conventional approaches without burdening statisticians with unfamiliar terms from coding theory. A quantitative proteomics case study facilitates application of the new method to the analysis of data sets involving multiple biological features. The theorems describe its operating characteristics.

The cited medium-scale paper presented previous minimum description length (MDL) methods. Unlike those methods, the new MDL methods of the current paper are based on a conflation of the normalized maximum likelihood (NML) with the weighted likelihood (WL). The previous MDL methods are used in the CJS article for comparison with its NML/WL methods.

Analysis of -omics data

4 November 2011 Leave a comment

In the post-genomic era, sophisticated computational and statistical methods of analyzing transcriptomics and proteomics data are increasingly used to generate hypotheses and to draw scientific conclusions. Consequently, students in biochemistry and other life sciences need familiarity with such methods in order to critically read much of the literature. In addition, exposure to these techniques may help students interpret their own data in graduate studies and in future research careers.

The Biochemistry Graduate Program of the University of Ottawa is offering BCH5101 for Winter 2012 to meet this growing need. For more information, see Analysis of -omics data (BCH5101).

Categories: Education

Degree of caution in inference

26 September 2011 Leave a comment

D. R. Bickel, “Controlling the degree of caution in statistical inference with the Bayesian and frequentist approaches as opposite extremes,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1109.5278 (2011). Full preprint

This paper’s framework of statistical inference is intended to facilitate the development of new methods to bridge the gap between the frequentist and Bayesian approaches. Three concrete examples illustrate how such intermediate methods can leverage strengths of the two extreme approaches.

Software for local false discovery rate estimation

15 August 2011 Leave a comment

LFDR-MLE is a suite of R functions for the estimation of local false discovery rates by maximum likelihood under a two-group parametric mixture model of test statistics.

Categories: empirical Bayes, MDL, software

Extending the likelihood paradigm

10 August 2011 Leave a comment

image

D. R. Bickel, “The strength of statistical evidence for composite hypotheses: Inference to the best explanation,” to appear in Statistica Sinica (2011). 2010 Preprint

Postdoctoral training in Bayesian genomics

27 July 2011 Leave a comment

Reliable interpretation of genomic information makes unprecedented demands for innovations in statistical methodology and its application to biological systems. This unique opportunity drives research at the Statomics Lab of the Ottawa Institute of Systems Biology (http://www.statomics.com) to marshal strengths of robust Bayesian, empirical Bayes, and frequentist frameworks. The lab seeks a postdoctoral fellow who will collaboratively develop and apply novel methods of Bayesian inference to overcome current challenges in learning from genome-wide association data, high-dimensional gene expression data, and other data related to genomics.

Experience in computationally intensive data analysis is essential, as is the ability to quickly design and code reliable software implementing Markov chain Monte Carlo algorithms. Strong initiative, excellent communication skills, and reception of a PhD or equivalent doctorate in statistical genetics, statistics, bioinformatics, computer science, mathematics, physics, any field of engineering, or an equally quantitative field within four years prior to the start date are also absolutely necessary. The following qualities are desirable but not required: working knowledge of statistical genetics or genomics; familiarly with R, S-PLUS, Mathematica, C, Fortran, and/or LaTeX; experience in a UNIX or Linux environment.

To apply, send a PDF CV that has contact information of three references to dbickel@uottawa.ca, with “Bayes Postdoc” and the year of your graduation or anticipated graduation in the subject field of the message. In the message body, concisely present evidence that you meet each requirement for the position and describe your most significant papers and software packages with summaries of your contributions to them. All applicants are thanked in advance; only those selected for further consideration will receive a response.

Meeting: information theory & statistics

22 July 2011 Leave a comment
Categories: Meetings

Bayes/non-Bayes blended inference

13 July 2011 Leave a comment

D. R. Bickel, “Blending Bayesian and frequentist methods according to the precision of prior information with an application to hypothesis testing,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1107.2353 (2011). Full preprint

This framework of statistical inference facilitates the development of new methodology to bridge the gap between the frequentist and Bayesian theories. As an example, a simple and practical method for combining p-values with a set of possible posterior probabilities is provided.

In this new approach to statistics, Bayesian inference is used when the prior distribution is known, frequentist inference is used when nothing is known about the prior, and both types of inference are blended according to game theory when the prior is known to be a member of some set. (The robust Bayes framework represents knowledge about a prior in terms of a set of possible priors.) If the benchmark posterior that corresponds to frequentist inference lies within the set of Bayesian posteriors derived from the set of priors, then the benchmark posterior is used for inference. Otherwise, the posterior within that set that is closest to the benchmark posterior is used for inference.

Follow

Get every new post delivered to your Inbox.