Skip to Content


Building Models Helps Find Answers

Annual Report - Winter 2011

Statistics experts open door to cancer interventions

By Scott Merville

“You’re a statistics person. What are you doing working in a hospital?”

Sanjay Shete, Ph.D., professor in MD Anderson’s Department of Epidemiology, occasionally fields this question.

He’s here because the causes of cancer are complex, interrelated and buried in mountains of data that require sharp statistical analyses to dig them out.

“I feel that statistics are at the forefront of science,” Shete says, “because we have so many factors that play a role in cancer risk: the influence of multiple genes, personal behavior such as diet, exercise and tobacco use, and then the genetic makeup of the tumor. How do you put all of that together?”

It’s done by statistically modeling behavioral and genetic factors to predict a person’s cancer risk and then developing interventions to try and prevent the disease. 

“You can’t change a person’s genome (hereditary information), so interventions will most likely come through behavioral changes,” Shete says.

GWAS looks for genetic factors

He is principal investigator on a genome-wide association study, called a GWAS, for head and neck cancer. The project analyzes the genomes of nearly 1,500 head and neck cancer patients to find genetic factors that raise cancer risk. 

Fifteen years ago, a genome-wide scan might base its analysis on 300 spots in the genome. Now Shete and colleagues can scrutinize 660,000 single-point variations in each patient.

If a GWAS is the applied work of genetic epidemiology, Shete’s colleague, Paul Scheet, Ph.D., assistant professor in the Department of Epidemiology, focuses more on the basic research aspect of risk assessment.

Sanjay Shete, Ph.D. (left), and Paul Scheet, Ph.D., 
focus on cancer risk assessment.
Photo: F. Carter Smith

“To know what’s unusual in the genome, you need to know what’s normal,” Scheet says.

What makes a successful model?

To understand patterns of inheritance by analyzing the genealogy of a chromosome — how it has changed over time — a successful model needs to fill in the genotypes of portions of the chromosome that aren’t directly observed.

As a graduate student at the University of Washington, Scheet wrote a computer program to estimate missing genotypes and portions of chromosomes that are inherited together called haplotypes. The program, which was his doctoral thesis, called fastPHASE, is widely used in chromosomal analysis.

Scheet also works on disease-specific projects, collaborating on a study in the Netherlands on the genetics of human behavior, which includes analysis of such conditions as hyperactivity, autism and schizophrenia.

Scheet and Shete, who have offices next to each other, have started a course about statistical genetics at The University of Texas Graduate School of Biomedical Sciences, where Shete heads the program in biomathematics and biostatistics.

© 2015 The University of Texas MD Anderson Cancer Center