Illumina Next Generation Sequencing
Overview
The Sequencing and Microarray Facility (SMF) offers massively parallel next generation sequencing services on two Illumina HiSeq2000 Sequencers. The HiSeq2000 is Illumina’s newest and most advanced sequencing platform. It operates on Illumina’s well-established reversible terminator-based sequencing by synthesis chemistry and generates more than 500 gigabases (typically 550Gb-600Gb) of sequence per instrument run (100 nucleotides paired end).
The SMF provides comprehensive next generation sequencing services. Investigators provide the facility with genomic DNA, total RNA or ChIP DNA (depending on the requested application) and the facility provides complete sample processing.
The Technology and Workflow
The HiSeq2000 workflow can be divided into four parts: library preparation, cluster generation, sequencing by synthesis and data analysis.
Library Preparation
A NGS library is made up of random fragments that represent the entire sample. It is created by shearing DNA (Covaris S220) into 150-400 bp fragments. These fragments are ligated to specific adapters. Library fragments of the appropriate size are then selected (size is application dependent) and isolated. Following a sample cleanup step, the resultant library is quantified by qPCR and checked for quality using the Agilent Bioanalyzer. The SMF has automated library preparation for most applications using the Beckman SPRIworks system.
Cluster Generation
Library fragments are bound to a flow cell by hybridizing the fragments to a lawn of oligonucleotides complementary to the adapter sequences. Bound fragments are clonally amplified by bridge amplification to create millions of individual dense clusters of clones. Cluster generation occurs in a closed environment on the Illumina cBOT instrument.
HiSeq2000 Sequencing
Sequencing on the flow cell employs Illumina’s well-established sequencing-by-synthesis chemistry. This chemistry utilizes four reversible terminator nucleotides, each possessing a different fluorescent dye and a chemically blocked hydroxyl group. To begin sequencing, primers are hybridized to single stranded, covalently bound templates on the flow cell. Fluorescently labeled nucleotides are then flowed across the flow cell. During chain extension the fluorescent nucleotides compete for incorporation into the growing DNA chain. A single complimentary nucleotide is incorporated into each DNA molecule, terminating the chain and resulting in the simultaneous one base extension of millions of DNA clusters. The incorporated nucleotides are excited by a laser, and emit their characteristic fluorescence. This fluorescence is detected and recorded in an imaging step. Following base detection the fluorescent dye is cleaved and the 3’ hydroxyl block is chemically reversed, allowing chain extension to continue. This is repeated 36 to 100 times, generating a series of images.
Data Analysis
The raw data generated is imaged and bases are called before sequence analysis begins. Sequences generated are de-multiplexed, aligned to a reference genome and transferred to an institutional server where it is accessed by MDACC bionformaticians. Data analysis is performed in collaboration with faculty from the Department of Bioinformatics.

Paired End Runs
The paired end module is used to perform sequencing from both ends of the adapter-ligated fragments. Using the paired end module fragments are first sequenced from one end then essentially flipped, and sequenced from the other direction. This doubles the amount of sequence data obtained from each cluster and may provide positional information.
Services Provided
The Illumina Hiseq2000 sequencer is a very flexible platform, enabling a wide variety of applications that differ only in sample preparation and downstream data analysis.
Sample Preparation Services:
- Library preparation using the SPRIworks system- includes sample QC and quantification, sample fragmentation (Covaris ultrasonicator), library QC, qPCR library quantification
and cluster generation. - Sample Indexing: To reduce cost we are using barcodes that identify individual samples, which are then mixed together for sequencing in a single lane.
We can multiplex:
4+ exomes/lane (70x - 80x coverage)
6+ ChIP-seq/lane
2 - 8 RNAseq/lane - Exome/custom Target Enrichment using the Nimblegen EZ-exome , Agilent Custom Target Enrichment and the Agilent Haloplex system.
- cDNA Synthesis from total RN.
- Reduced Representation Bisulfite library preparation - Investigators provide Msp1 digested genomic DNA. The SMF will ligate methylation adapters, perform size selection, perform bisulfite treatment and evaluate conversion.
Supported Applications Include
Whole genome sequencing of Human, Mouse, Rat, Yeast, Monkey, Viral, Bacterial and other genomes. For applications in cancer research, the SMF provides sequencing of matched tumor and normal samples.
Transcriptome Analysis
Transcriptome analysis may be quantitative (gene expression analysis) and/or qualitative (transcript discovery, splice variant identification, coding SNP validation). The SMF offers several options for transcriptome analysis. The choice of sample preparation method is based on the investigators experimental objective and should be decided in conjunction with a bioinformatician.
mRNAseq
Uses oligo dT based capture for Poly (A) enrichment followed by cDNA synthesis using random and oligo dT priming. Sequences generated map to coding regions of the genome.
RNAseq
Here rRNA depletion is performed (no Poly (A) enrichment) followed by cDNA synthesis utilizing oligo-d(T) and random hexamers. This method allows the sequencing of mRNA and non-polyadenylated RNA including histone mRNAs, precursors for Cajal body related small RNAs, and lncRNAs. Sequences map to exons and intergenic regions.
Strand-specific RNAseq
Preserves strand information. In addition to the information provided by traditional RNAseq, strand-specific RNA-seq identifies antisense transcripts, determines the transcribed strand of non-coding RNAs and may help to demarcate the boundaries of overlapping genes.
MicroRNA-seq
Used to profile and identify changes in microRNA expression and to identify novel microRNAs
ChIP-seq
Used to identify transcription factor (protein) binding sites in genomes and specific cell types. The investigator performs chromatin IP and provides antibody captured DNA to SMF. Both ChIP sample and mock or IgG control are required.
Exome Resequencing
The Human Genome is comprised of approximately 3 billion base pairs,
of which only 1.2%-1.6% is coding. Exome resequencing selectively enriches for and sequences the coding regions. The SMF provides exome capture using solution based capture methods.
Targeted Resequencing
Selectively enriches for and sequences investigator defined regions of interest. The SMF provides targeted capture using solution-based methods, long range PCR and the Agilent Haloplex system.
Reduced Representation Bisulfite Sequencing and Whole Genome Methylation Sequencing
Provide genome wide views of DNA methylation. Both methods detect variations in methylation signatures with single-base resolution. Whole genome bisulfite sequencing provides comprehensive methylation analysis by sequencing the entire bisulfite treated genome. Reduced representation bisulfite sequencing enriches for CpG islands using Msp1 digestion of genomic DNA followed by size selection. This provides a genome wide but not comprehensive view of methylation.
Getting Started
Project Consultation
The SMF provides budget planning, technology consultations and project planning with a NGS specialist and a MDACC faculty bioinformatician. We strongly recommend that first time NGS service users and investigators with large-scale projects schedule a meeting before initiating a project. To schedule a consultation meeting please contact Erika Thompson ejthomps@mdanderson.org
Sample Submission
All samples should be accompanied by a completed sample submission form.
HiSeq2000 Submission Form (doc)
Sample Requirements
| Service | Sample Type | Minimum Quantity | Recommended Sequence Length |
|---|---|---|---|
| ChIP-Seq | ChIP DNA | 20ng | 36 nt single read |
| RNA-Seq | Total RNA | 2ug | 76 nt paired end, 100 nt paired end |
| RNA-Seq-strand specific | Total RNA | 4ug | 76 nt paired end |
| mRNA-seq | Total RNA | 1ug | 36, 50 or 76 nt single read or paired end |
| microRNA-Seq | Total RNA | 3ug | 36-nt single read |
| Exome Sequencing | genomic DNA | 2ug | 76 nt paired end |
| Targeted Sequencing | genomic DNA | 2ug | 76 nt paired end |
| Human Whole Genome | genomic DNA | 5ug | 100 nt paired end |
| Human Whole Genome-Bisulfite | genomic DNA | 10ug | 100 nt paired end |
| Reduced Representation-Bisulfite | genomic DNA | 5ug | 100 nt paired end |
Sample Preparation Services
| Service | Cancer Center Member (MDACC) | Non-Cancer Center Member (non-MDACC) |
|---|---|---|
| Library Preparation (required) | $200/sample | $410/sample |
| Sample indexing | $30/sample | $61.50/sample |
| microRNA-Seq | $380/sample* | $779/sample |
| mRNA-Seq | $300/sample* | $615/sample |
| NuGen cDNA synthesis from total RNA (required for RNA-seq) | $300/sample | $615/sample |
| NimbleGen Target Capture (up to 50 Mb) | Request a quote | Request a quote |
| NimbleGen Exome Capture | $700/sample | $1435/sample |
| Nimble-Gen Exome Capture-Multiplex 4/capture | $225/sample | $461/sample |
*Price includes library preparation.
HiSeq2000 Sequencing Cost
Note: Library preparation is not included in the sequencing price.
| Hiseq2000 sequencing | Estimated Output/lane (gigabases) V3 chemistry | *MDACC Investigators | Non-MDACC Investigators |
|---|---|---|---|
| NGS-36 nt Single Read | 5.4-5.6 Gb | $950/lane | $1948/lane |
| NGS-36 nt Paired End | 1.8-1.8 Gb | $1500/lane | $3075/lane |
| NGS-50 nt Single Read | 7.5-7.75 Gb | $1020/lane | $2091/lane |
| NGS-50 nt Paired End | 15-17.5 Gb | $1680/lane | $3444/lane |
| NGS-76 nt Single Read | 11.4-13.3 Gb | $1270/lane | $2604/lane |
| NGS-76 nt Paired End | 22-26.6 Gb | $1950/lane | $3998/lane |
| NGS-100 nt Single Read | 15-15.5 Gb | $1500/lane | $3075/lane |
| NGS-100 nt Paired End | 30-35 Gb | $2200/lane | $4510/lane |
| NGS custom/partial service | Request a quote | Not available |
*Estimate only-sequence output is not guaranteed.
Multiplex Samples Using Barcodes
To reduce cost we are using barcodes that identify individual samples, which are then mixed together for sequencing in a single lane.
Multiplex:
- 4 exomes/lane (70x-80x coverage)
- 4-6+ chIP-seq/lane
- 4-8 RNA-seq/lane
Illumina Next Generation Sequencing Prices (In Practical Terms)
Exome Resequencing
Multiplex: 4 exomes/lane
Multiplex capture (70x-80x coverage)
Sequence Type: 76 NT PE
Price: $950/exome
ChIP-seq
Multiplex: 6 ChIP-seq samples/lane
Sequence Type: 36nt SR
Price: $388/sample
RNAseq
Multiplex: 8 RNAseq samples/lane
Sample Preparation: NuGen cDNA
Sequence Type: 76nt PE
Price: $775/sample
mRNAseq
Multiplex: 8 RNA-seq samples/lane
Sequence Type: 76nt PE, Tru-seq
Price: $550/sample
OR
Sequence Type: 36nt SR, Tru-seq
Price: $425/sample
ssRNAseq
Multiplex: 8 RNAseq samples/lane
Sequence Type: 76nt PE,
Price: $670/sample
OR
Sequence Type: 36nt SR
Price: $545/sample
Data Analysis
Bioinformatics Faculty Collaborators
- ChIP-Seq: Shoudan Liang, PhD.
- All other services: Xiaoping Su, PhD.

