Illumina Next-Generation Sequencing
Next Generation Sequencing and Data Analysis
Select a heading below to learn more about Next Gen Sequencing and Data Analysis Services
Next Generation Sequencing
The ATGC provides a complete next generation sequencing service. Investigators provide the facility with genomic DNA, total RNA or ChIP DNA (depending on the requested application) and the facility provides complete sample processing. The output of the NGS service is FASTQ files. Data analysis is available for full service (library preparation and sequencing) submissions.
NGS services include:
1. Project consultation and budget planning with a facility representative and an MD Anderson faculty bioinformatician.
2. Library Preparation: An NGS library is made up of random fragments that represent the entire sample. It is created by shearing DNA into 150-400 base fragments. These fragments are ligated to specific adapters. Library fragments of the appropriate size are then selected (size is application dependent) and isolated. Following a sample cleanup step, the resultant library is quantified by qPCR and checked for size distribution using the Agilent TapeStation. The ATGC has automated library preparation for most applications using the Eppendorf EPMotion 5075 and Agilent Bravo Liquid Handlers.
3. Cluster Generation: Library fragments are bound to a flow cell by hybridizing the fragments to a lawn of oligonucleotides complementary to the adapter sequences. Bound fragments are clonally amplified by bridge amplification to create millions of individual dense clusters of clones. Cluster generation occurs on-instrument, in a closed environment on the NovaSeq6000, NextSeq500 and iSeq 100 instruments.
4. Illumina Sequencing: Sequencing on the flow cell employs Illumina’s well-established sequencing-by-synthesis chemistry. This chemistry utilizes two (NovaSeq6000, NextSeq500) or four (MiSeq) reversible terminator nucleotides, each possessing a chemically blocked hydroxyl group. To begin sequencing, primers are hybridized to single stranded, covalently bound templates on the flow cell. Fluorescently labeled nucleotides are then flowed across the flow cell. During chain extension the fluorescent nucleotides compete for incorporation into the growing DNA chain. A single complementary nucleotide is incorporated into each DNA, terminating the chain and resulting in the simultaneous one base extension of millions of DNA clusters. The incorporated nucleotides are excited by a laser, and emit their characteristic fluorescence (or lack of fluorescence). This fluorescence is detected and recorded in an imaging step. Following base detection, the fluorescent dye is cleaved and the 3’ hydroxyl block is chemically reversed, allowing chain extension to continue. This is repeated 50 to 500 times, generating a series of images.
5. Data Analysis: The raw data generated is imaged, and base-called before sequence analysis begins. Sequences generated are de-multiplexed and transferred to an institutional server where the sequence data is accessed by MD Anderson bioinformaticians. Data analysis is performed in collaboration with faculty from Bioinformatics.
Data Analysis
Data analysis is available to investigators with full-service submissions (library preparation and sequencing) on a fee-for-service basis.
| Application | Service Description | Price | Estimated Turnaround Time (Analysis only) |
|---|---|---|---|
| Bulk mRNA and Total RNA-Seq | a. Bam files b. FPKM value or normalized count data c. Differential expression analysis for comparison of two groups d. Gene set enrichment analysis (GSEA) for comparison of two groups |
$90/sample | 2 weeks: < 20 samples 3-4 weeks: 20–50 samples |
| Whole Exome | a. Bam files b. Variation detection including somatic mutation detection by GATK c. Copy number alteration detection including GISTIC analysis |
$120 T/N pair | 2 weeks: <20 samples 3-4 weeks: 20–50 samples |
| Bulk ChIP-Seq | a. MACS-based peak calling b. Differential peak detection by diffReps c. Motif discovery on differential peaks by Homer d. Super-enhancer/enhancer detection for H3K27ac ChIP-Seq by ROSE |
$80/sample | 2 weeks: < 20 samples 3-4 weeks: 20–50 samples |
| scRNA-Seq | a. pre-processing including pre-filtering and normalization and batch bias correction b. *auto-annotation of cell types and subtypes by clustering with marker genes detection for each cluster. c. Cellular composition analysis of cell type between case and control. d. Differential expression analysis for cell subpopulations of interest, GSVA analysis, and GSEA analysis. e. Cell trajectory analysis and pseudotime analysis. f. SCENIC analysis for the identification of transcription factors (TFs). |
$400/sample | 4–6 weeks |
| Whole Genome Analysis | a. Bam files b. Somatic mutation detection by GATK/Mutect2 c. Copy number alteration detection including GISTIC analysis d. Structural variant detection by GATK-SV and AnnotSV |
$180 Per T/N pair | 4 weeks: <20 samples 6-8 weeks: 20–50 samples |
*sc data analysis is an interactive process that requires active participation by the requesting lab. Manual cell identification is performed by the requesting lab using supporting data provided by the ATGC.
ATGC Director of Bioinformatics
Xiaoping Su, Ph.D.
Data Scientist
Yunxin Chen
Associate Data Scientist
Lijin Joo, Ph.D.
Getting started
Project Consultation
The ATGC provides budget planning, technology consultations and project planning to cancer center members. We strongly recommend that first time NGS service users and investigators with large-scale projects schedule a meeting before initiating a project. To schedule a consultation meeting, please contact Erika Thompson at ejthomps@mdanderson.org.
Sample Submission
All samples should be accompanied by a completed sample submission form. Sample submission requirements (minimum quantity and recommended sequence length) vary based on the service and sample type.