Skip to Content

Abstracts

Fall 2011

September 7, 2011

Michael S. Noble
TCGA GDAC Pipeline Manager
Broad Institute
Cambridge, Massachusetts

The Broad Institute Firehose Sequencing Data Management System: Insights from Early TCGA Use

September 8, 2011

Roderick J.A. Little, Ph.D.
Richard D. Remington Professor of Biostatistics
University of Michigan School of Public Health
Ann Arbor, Michigan

Bayes and Multiple Imputation for Assay Data Subject to Measurement Error
Existing methods for the analysis of data involving assay data subject to measurement error are deficient. In particular, classical calibration methods have been shown to yield invalid inferences unless the measurement error is small. Regression calibration, a form of conditional mean imputation, has better properties, but is not well suited to adjusting for heteroscedastic measurement error. Bayesian multiple imputation is less common for measurement error problems than for missing data, but it represents an attractive option, providing superior inferences to existing methods and a convenient way of adjusting for measurement error using simple complete-data methods and multiple imputation combining rules. It also provides a convenient approach to limit-of-quantification issues, for which current approaches are also arguably deficient. I review some recent work with Ying Guo that develops multiple imputation methods for assay data, focusing particularly on three key aspects: internal versus external calibration designs, the role of the nondifferential measurement error assumption in these designs, and heteroscedastic measurement error.

October 5, 2011

Michael Berger, Ph.D.
Assistant Professor, Department of Pathology
Memorial Sloan-Kettering Cancer Center, New York, New York

Massively Parallel Sequencing for Discovery and Diagnostics in Cancer
Efforts to understand cancer at the molecular level have revealed genetic biomarkers that reflect the nature and course of disease and, in some cases, predict the likelihood that a patient will benefit from a particular treatment. Massively parallel "next-generation" sequencing techniques are being applied broadly across cancers to discover novel oncogenes and tumor suppressor genes. To be truly transforming, however, mutations in key cancer genes must be profiled systematically and reliably on formalin-fixed paraffin embedded (FFPE) tumor samples that are routinely encountered in the clinic and in archived tumor banks. I discuss our group's progress in developing and applying a targeted, massively parallel sequencing platform for both translational research and clinical diagnostics at Memorial Sloan-Kettering Cancer Center.

October 5, 2011

Clyde F. Martin, Ph.D.
P. W. Horn Professor, Department of Mathematics and Statistics
Texas Tech University, Lubbock, Texas

Dynamic Clinical Trials and Control Theory
Statistics has played a role in clinical trials since their first occurrence. With the advent of dynamic clinical trials, dynamic programming became an important optimization tool. Dynamic programming dates back to the work of Bellman in the 1950s. It is interesting to note that he was well known for his work in control theory. I discuss an important area of control theory that seems innately suited for use as a model in dynamic clinical trials. Using a switching system, we can think of two systems as representing some physiological variables of a subject in response to two different treatments. There is a rich body of literature that studies these systems. They have been used quite successfully in a variety of applications, including some medical applications. I will look at the detailed application of these systems in modeling an intervention studied by Susan Murphy.  This work has been done with the assistance of several graduate students at Texas Tech University: Yining Du (UTSPH), Siming Li (TTU), Ning Wang (VT), Masaki Ogura (TTU), Han Liu (UW), and others.

October 18

Rajarshi Guhaniyogi
Doctoral Candidate
Division of Biostatistics, School of Public Health
University of Minnesota, Minneapolis, Minnesota

Some Recent Developments on Bayesian Hierarchical Low-rank Spatial Process Models for Large Datasets
Over the past decade, technological and computational advances have created large spatially indexed datasets that provide extraordinary opportunities to answer complex questions encountered in environmental and natural sciences. The use of Bayesian hierarchical spatial models for analyzing these datasets is undermined by onerous computational burdens associated with parameter estimation. Low-rank spatial process models, such as the predictive process model, attempt to resolve this problem by projecting spatial effects onto a lower-dimensional subspace determined by a judicious choice of “knots” or locations that are fixed a priori. While carrying out an analysis in the lower-dimensional subspace, “low-rank models” often face problems, primarily due to biases in the residual variance components as a result of oversmoothing or model misspecification and due to suboptimal choices of the lower-dimensional subspace. I characterize these biases, demonstrate their presence as systemic phenomena, study inferential impacts, both from a theoretical and data analytic point of view, and explore remedial models that circumvent the oversmoothing, as well as allow stochastic modeling of the "knots." In addition, I propose a multivariate extension of low-rank models with space varying correlation among the univariate components. I illustrate this work using synthetic experiments and through applications to environmental and soil sciences. Part of this work has been done jointly with Dr. Sudipto Banerjee, Dr. Andrew Finley, and Dr. Alan Gelfand.

October 19

Roger Klein, Ph.D.
Professor of Economics
Rutgers University, New Brunswick, New Jersey

Semiparametric Selection Models with Binary Outcomes
Without making distributional assumptions, I address the estimation of a semiparametric sample selection model for which both the selection rule and the outcome variable are binary. Because the marginal effects are often of primary interest and are difficult to recover in a semiparametric setting, I develop estimators for both the marginal effects and the underlying model parameters. The marginal effect estimator uses only observations that are members of a high-probability set in which the selection problem is not present. A key innovation is that this probability set is data dependent. The model parameter estimator is a quasi-likelihood estimator based on regular kernels with bias corrections. I establish their large sample properties and provide simulation evidence confirming that these estimators perform well in finite samples.

October 28, 2011

Dongseok Choi, PhD
Associate Professor
Public Health and Preventive Medicine
Oregon Health and Science University
Portland, Oregon

Detecting Subclusters in Outliers
Medical researchers are often interested in finding subgroups within an outlier group.  For example, a certain medical condition can occur more frequently in a small group that is different from the majority of the population. The use of cluster analysis is one approach to finding groups within a data set.  Cluster analysis has been a popular tool for exploring potential group structures in complex data, and has received greater attention in recent years due to data mining and high-dimensional data, such as microarrays. I introduce a split-and-recombine procedure and demonstrate its application to a medical data set. In addition, I discuss the results of using other clustering methods to analyze the same data.

November 8, 2011

A. Jeffrey Goldsmith
Doctoral Candidate
Department of Biostatistics
Bloomberg School of Public Health
The Johns Hopkins University
Baltimore, Maryland

Cross-sectional and Longitudinal Penalized Functional Regression
I discuss fast-fitting methods for generalized functional linear models. The functional predictor is projected onto a large number of smooth eigenvectors and the coefficient function is estimated using penalized spline regression. This method can be applied to many functional data designs, including functions measured with and without error, sparsely or densely sampled over regular or irregular grids. The methods are also extended to the increasingly relevant longitudinal case, in which functional predictors and scalar outcomes are recorded over multiple visits. The approach can be implemented using standard mixed effects software or in a Bayesian framework. This work is motivated by a study of white matter demyelination via diffusion tensor imaging in which various cerebral white matter tract properties are used to predict cognitive and motor decline in patients diagnosed with multiple sclerosis. All methods are implemented in the “refund” package available on CRAN.

November 9

Xinyi Xu, Ph.D.
Assistant Professor, Department of Statistics
The Ohio State University, Columbus, Ohio

Calibrated Bayes Factors for Model Comparison
Bayes factor is a widely-used tool for Bayesian hypothesis testing and model comparison. Its value can be greatly effected by the prior elicitation for the model parameters. When the prior information is weak, people often use proper priors with large variances. In this work, I show that when the models under comparison differ in dimensions, the use of Bayes factors under convenient diffuse priors can be very misleading. Therefore, I propose an innovative method, calibrated Bayes factor, which uses data to calibrate the prior distributions before computing the Bayes factors. I show that this method provides reliable and robust model preferences under various true models. It is applicable to a large variety of model comparison problems because it makes no assumption on model forms (parametric or nonparametric) and can be used for both proper and improper priors.

November 16, 2011

Richard J. Chappell, Ph.D.
Professor, Department of Biostatistics & Medical Informatics
Department of Statistics
University of Wisconsin-Madison

Delta What? Choice of Outcome Scale in Non-inferiority Trials
Equivalence trials are experiments that attempt to show that one intervention is not too much inferior to another on some quantitative scale. The cutoff value is commonly denoted as Delta. For example, one might wish to show that the hazard ratio of disease-free survival among patients given an experimental chemotherapy versus a currently approved regimen is Delta = 1.3 or less, especially if the former is thought to be less toxic than or otherwise advantageous over the latter. Naturally, a lot of attention is given to the choice of Delta. In addition to this, I assert that even more than in superiority clinical trials the scale of Delta in equivalence trials must be carefully chosen. Since null hypotheses in superiority studies generally imply no effect, they are often identical or at least compatible when formulated on different scales. However, nonzero Deltas on one scale usually conflict with those on another. For example, the four hypotheses of arithmetic or multiplicative differences of either survival or hazard in general all mean different things unless Delta = 0 for differences or 1 for ratios.  This can lead to problems in interpretation when the clinically natural scale is not a statistically convenient one. In addition to this topic, I discuss background issues in non-inferiority studies.

Updated October 12, 2011


© 2012 The University of Texas MD Anderson Cancer Center