Abstracts
Fall 2009
Friday, October 23, 11:00 a.m.
FCT4.6057
Simon Lunagomez, Ph.D.
Department of Statistical Science
Duke University
Geometric Representations of Hypergraphs for Prior Specification and Posterior Sampling
A parametrization of hypergraphs based on the geometry of points in Rm is developed. Given this parametrization, informative prior distributions on hypergraphs are induced by priors on point configurations via spatial processes. This prior specification is used to infer conditional independence models or the Markov structure of a multivariate distribution. Specifically, we can recover both the factorization as well as the hyper Markov law. The advantages of this approach are greater control of the distribution of graph features than Erdos-Renyi random graphs, inference of factorizations that cannot be retrieved by a graph alone, and Markov chain Monte Carlo algorithms that allow for local and global moves in graph space. We illustrate the utility of this parametrization and prior specification using simulations.
Monday, October 12, 11:00 a.m.
CPB8.3059
Wolfgang Maass, Ph.D.
Visiting Professor
Department of Bioinformatics and Computational Biology
M. D. Anderson Cancer Center
Smart Drugs: Leveraging Ubiquitous Computing and Semantic Technologies in Healthcare Environments
Wrong medications and adverse effects are often results of functional and relational problems directly related to misunderstandings of drug information by patients, nurses and physicians. Furthermore, communication flows between different stakeholders in healthcare situations are sparse, which is a prime root for patient dissatisfactions, mistakes, and unnecessary time and effort.
The vision of Ubiquitous Computing is about full integration of information services into physical worlds, while Semantic Technologies target explicit representations of meanings so that data can be automatically processed by web-based applications unknown at the outset of data creation.
In this talk it is discussed how Ubiquitous Computing and Semantic Technologies can be used for improving communications in various environments and in particular for healthcare situations. In this approach, physical objects become Smart Products by adding instantiations of the pattern-based Semantic Product Description Object Model (SPDO). A dedicated middleware (Tip ‘n Tell) manages interactions between Smart Products and various services. It is presented how this approach can be applied to drugs (Smart Drugs) within the context of a reference model for ambient healthcare environments. It is tentatively discussed how Smart Drugs can be used as service integration points. This approach is presented as a means for lessening drug-related functional and relational problems.
Wednesday, October 7, 11:00 a.m.
FCT5.5049
Heejung Shim
Department of Statistics
University of Wisconsin - Madison
Bayesian Co-estimation of Alignment and Tree
Traditionally, phylogeny and sequence alignment are estimated separately: first a multiple sequence alignment is estimated and then a phylogeny based on the sequence alignment estimated in the previous step is inferred. However, uncertainty in the alignment estimation is ignored, resulting, possibly, in overstated certainty in phylogeny estimates. We develop a joint model for co-estimating phylogeny and sequence alignment, which improves estimates from the traditional approach by accounting for uncertainty in the alignment in phylogenetic inferences. Unlike alternative models for joint estimation of alignment and phylogeny, our indel model allows (1) arbitrary-length overlapping insertion and deletion (indel) events, (2) a general distribution for indel fragment size and (3) the expectation of the number of indel events to vary with branch length. We employ a Bayesian approach using MCMC to estimate the joint posterior distribution of a phylogenetic tree and a multiple sequence alignment. Our approach has a tree and a complete history of indel events mapped onto the tree as the state space in the Markov Chain while alternative previous approaches have a tree and an alignment. A large state space containing a complete history of indel events makes our MCMC approach more challenging, but it enables us to infer more information about the indel process. The performances of this joint method and traditional sequential methods are compared using simulated data as well as real data. Software named BayesCAT (Bayesian Co-estimation of Alignment and Tree) is expected to be available for public use.
Wednesday, September 30, 1:00 p.m.
Pickens Academic Tower, Conference Rooms 4 & 5
Rob Scharpf, Ph.D.
Postdoctoral Fellow
Department of Biostatistics
Johns Hopkins Bloomberg School of Public Health, Baltimore, MD
A Multilevel Model to Address Batch Effects in Copy Number Estimation Using SNP Arrays
Submicroscopic changes in chromosomal DNA copy number dosage are common and have been implicated in many heritable diseases and cancers. Recent high-throughput technologies have a resolution that permits the detection of segmental changes in DNA copy number that span thousands of basepairs across the genome. Genome-wide association studies (GWAS) may simultaneously screen for copy number-phenotype and SNP-phenotype associations as part of the analytic strategy. However, genome-wide array analyses are particularly susceptible to batch effects as the logistics of preparing DNA and processing thousands of arrays often involves multiple laboratories and technicians, or changes over calendar time to the reagents and laboratory equipment. Failure to adjust for batch effects can lead to incorrect inference and requires inefficient post-hoc quality control procedures that exclude regions that are associated with batch. Our work extends previous model-based approaches for copy number estimation by explicitly modeling batch effects and using shrinkage to improve locus-specific estimates of copy number uncertainty. Key features of this approach include the use of diallelic genotype calls from experimental data to estimate batch- and locus-specific parameters of background and signal without the requirement of training data. Locus-specific estimates of copy number can be plotted on the copy-number scale to investigate mosaicism and guide the choice of appropriate downstream approaches for smoothing the copy number as a function of physical position. The software is open source and implemented in the R package crlmm available at Bioconductor (http:www.bioconductor.org).
Wednesday, September 23, 11:00 a.m.
FCT5.5049
Riten Mitra
Research Assistant
Department of Biostatistics
University of North Carolina at Chapel Hill
Bayesian Models for Sequence-based Nucleosome Detection
Recent research has established that particular DNA sequences have inherently different properties with respect to nucleosome formation. This has motivated researchers to exploit sequence features for improved classification. To date, however, the relationship has been explored only on the basis of first obtaining nucleosome predictions, and then inferring potential influential sequence features on the basis of exploratory methods. We propose a novel general model framework that directly incorporates relevant sequence features, not restricted to dinucleotides, into nucleosome detection. For this purpose we have made an extension of a base continuous time HMM to (i) 'emission' and (ii) 'transition' models that allow the sequence features to influence the length of the hidden states and the distribution of the intensity data. Our results were applied on FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) data from the yeast genome and it was shown that incorporating nucleotides of higher order leads to a greater predictive power and smaller misclassification rates. We also explored the identifiability issues associated with such models and established asymptotic results (normality and contiguity properties) in this framework. Lastly, an improved motif nucleosome detection algorithm is proposed that combines the nucleosome position information as well as the strength of the sequence data signal for motif search.
Friday, September 18, 11:00 a.m.
Pickens Academic Tower, Rooms 3 & 6
Ana Tereza Vasconcelos
Head, Bioinformatics Laboratory
National Laboratory of Scientific Computation
Petropolis, Rio De Janeiro, Brazil
Some Features of Genomics and Bioinformatics in Brazil
We are going to present the main genomic activities developed at Bioinformatics Lab (Labinfo) of the National Laboratory of Scientific Computation, which is an interdisciplinary research group that includes biologists, computational scientists and mathematicians, all dedicated to developing computational and statistical methodologies that can be applied to Bioinformatics and Computational Biology. We are going to describe our main activities focusing this presentation and the preliminary results of the Brazilian Network for Cancer Research, for which the first project is Breast Cancer Sequencing.
Wednesday, September 2, 11:00 a.m.
FCT5.5049
Carl Matthew DiCasoli
Doctoral Candidate
Department of Statistics
North Carolina State University, Raleigh, North Carolina
Bayesian Regression Methods for Crossing Survival Curves
In survival data analysis, the proportional hazards (PH), accelerated failure time (AFT), and proportional odds (PO) models are commonly used semiparametric models for the comparison of survivability in subjects. These models assume that the survival curves do not cross. However, in some clinical applications, the survival curves pertaining to the two groups of subjects under study may cross each other, especially for long-duration studies. Hence, the three models stated above may no longer be suitable for making inference. Yang and Prentice (2005) proposed a model which separately models the short-term and long-term hazard ratios nesting both PH and PO. This feature allows for the survival functions to cross. First, we study the estimation procedure in the Yang-Prentice model with regard to the two-sample case. We propose two different approaches: (1) Bayesian Bootstrap and (2) Smoothing Methods. The first approach involves Bayesian Bootstrap with likelihoods corresponding to Binomial and Poisson forms while the second approach involves kernel smoothing methods as well as smoothing spline methods. A simulation is conducted to compare various methods under the two-sample case. Next, we extend the Yang-Prentice model to a regression version involving predictors and examine three likelihood approaches including Poisson form, pseudo-likelihood, and Bayesian smoothing. The effects of model misspecification on asymptotic relative efficiency are also studied empirically. The results from simulation studies indicate that the PH, AFT, and PO models are not robust to model misspecifications when the survival functions are allowed to cross.
Finally, we calculate the marginal density via variational methods to determine the Bayes factor. Either a full Bayesian or Bayesian approach is implemented to perform model selection. Both approaches accurately identify the correct model, even under slight misspecification, and are computationally more efficient than MCMC techniques.
Updated October 16, 2009

