Top

Illumina Next-Generation Sequencing

Next Generation Sequencing and Data Analysis

Select a heading below to learn more about Next Gen Sequencing and Data Analysis Services

Next Generation Sequencing

The ATGC provides a complete next generation sequencing service. Investigators provide the facility with genomic DNA, total RNA or ChIP DNA (depending on the requested application) and the facility provides complete sample processing. The output of the NGS service is FASTQ files. Data analysis is available for full service (library preparation and sequencing) submissions.

NGS services include:

1. Project consultation and budget planning with a facility representative and a UT MD Anderson faculty bioinformatician.

2. Library Preparation: An NGS library is made up of random fragments that represent the entire sample. It is created by shearing DNA into 150-400 base fragments. These fragments are ligated to specific adapters. Library fragments of the appropriate size are then selected (size is application dependent) and isolated. Following a sample cleanup step, the resultant library is quantified by qPCR and checked for size distribution using the Agilent TapeStation. The ATGC has automated library preparation for most applications using the Eppendorf EPMotion 5075 and Agilent Bravo Liquid Handlers.

3. Cluster Generation: Library fragments are bound to a flow cell by hybridizing the fragments to a lawn of oligonucleotides complementary to the adapter sequences. Bound fragments are clonally amplified by bridge amplification to create millions of individual dense clusters of clones. Cluster generation occurs on-instrument, in a closed environment on the NovaSeq6000, NextSeq500 and iSeq 100 instruments.

4. Illumina Sequencing: Sequencing on the flow cell employs Illumina’s well-established sequencing-by-synthesis chemistry. This chemistry utilizes two (NovaSeq6000, NextSeq500) or four (MiSeq) reversible terminator nucleotides, each possessing a chemically blocked hydroxyl group. To begin sequencing, primers are hybridized to single stranded, covalently bound templates on the flow cell. Fluorescently labeled nucleotides are then flowed across the flow cell. During chain extension the fluorescent nucleotides compete for incorporation into the growing DNA chain. A single complementary nucleotide is incorporated into each DNA, terminating the chain and resulting in the simultaneous one base extension of millions of DNA clusters. The incorporated nucleotides are excited by a laser, and emit their characteristic fluorescence (or lack of fluorescence). This fluorescence is detected and recorded in an imaging step. Following base detection, the fluorescent dye is cleaved and the 3’ hydroxyl block is chemically reversed, allowing chain extension to continue. This is repeated 50 to 500 times, generating a series of images.

5. Data Analysis: The raw data generated is imaged, and base-called before sequence analysis begins. Sequences generated are de-multiplexed and transferred to an institutional server where the sequence data is accessed by UT MD Anderson bioinformaticians. Data analysis is performed in collaboration with faculty from Bioinformatics.

Data Analysis

Data analysis is available to investigators with full-service submissions (library preparation and sequencing) on a fee-for-service basis.

Application	Service Description	Price	Estimated Turnaround Time (Analysis only)
Bulk mRNA and Total RNA-Seq	a. Bam files b. FPKM value or normalized count data c. Differential expression analysis for comparison of two groups d. Gene set enrichment analysis (GSEA) for comparison of two groups	$90/sample	2 weeks: < 20 samples 3-4 weeks: 20–50 samples
Whole Exome	a. Bam files b. Variation detection including somatic mutation detection by GATK c. Copy number alteration detection including GISTIC analysis	$120 T/N pair	2 weeks: <20 samples 3-4 weeks: 20–50 samples
Bulk ChIP-Seq	a. MACS-based peak calling b. Differential peak detection by diffReps c. Motif discovery on differential peaks by Homer d. Super-enhancer/enhancer detection for H3K27ac ChIP-Seq by ROSE	$80/sample	2 weeks: < 20 samples 3-4 weeks: 20–50 samples
scRNA-Seq	a. pre-processing including pre-filtering and normalization and batch bias correction b. *auto-annotation of cell types and subtypes by clustering with marker genes detection for each cluster. c. Cellular composition analysis of cell type between case and control. d. Differential expression analysis for cell subpopulations of interest, GSVA analysis, and GSEA analysis. e. Cell trajectory analysis and pseudotime analysis. f. SCENIC analysis for the identification of transcription factors (TFs).	$400/sample	4–6 weeks
Whole Genome Analysis	a. Bam files b. Somatic mutation detection by GATK/Mutect2 c. Copy number alteration detection including GISTIC analysis d. Structural variant detection by GATK-SV and AnnotSV	$180 Per T/N pair	4 weeks: <20 samples 6-8 weeks: 20–50 samples

*sc data analysis is an interactive process that requires active participation by the requesting lab. Manual cell identification is performed by the requesting lab using supporting data provided by the ATGC.

ATGC Director of Bioinformatics

Xiaoping Su, Ph.D.

Data Scientist

Yunxin Chen

Associate Data Scientist

Lijin Joo, Ph.D.

Getting started

Project Consultation
The ATGC provides budget planning, technology consultations and project planning to cancer center members. We strongly recommend that first time NGS service users and investigators with large-scale projects schedule a meeting before initiating a project. To schedule a consultation meeting, please contact Erika Thompson at ejthomps@mdanderson.org.

Sample Submission
All samples should be accompanied by a completed sample submission form. If you need a sample submission form, please contact Erika Thompson. Sample submission requirements (minimum quantity and recommended sequence length) vary based on the service and sample type. Submit forms to NGSsubmissions@mdanderson.org