Skip to Content

Illumina Next Generation Sequencing

Overview

The Sequencing and Microarray Facility (SMF) offers massively parallel next generation sequencing services on two Illumina HiSeq2000 Sequencers.  The HiSeq2000 is Illumina’s newest and most advanced sequencing platform.  It operates on Illumina’s well-established reversible terminator-based sequencing by synthesis chemistry and generates more than 500 gigabases (typically 550Gb-600Gb) of sequence per instrument run (100 nucleotides paired end).

The SMF provides comprehensive next generation sequencing services. Investigators provide the facility with genomic DNA, total RNA or ChIP DNA  (depending on the requested application) and the facility provides complete sample processing.

The Technology and Workflow

The HiSeq2000 workflow can be divided into four parts: library preparation, cluster generation, sequencing by synthesis and data analysis.

Library Preparation
A NGS library is made up of random fragments that represent the entire sample. It is created by shearing DNA (Covaris S220) into 150-400 bp fragments. These fragments are ligated to specific adapters. Library fragments of the appropriate size are then selected (size is application dependent) and isolated.  Following a sample cleanup step, the resultant library is quantified by qPCR and checked for quality using the Agilent Bioanalyzer. The SMF has automated library preparation for most applications using the Beckman SPRIworks system.

Cluster Generation 
Library fragments are bound to a flow cell by hybridizing the fragments to a lawn of oligonucleotides complementary to the adapter sequences. Bound fragments are clonally amplified by bridge amplification to create millions of individual dense clusters of clones.  Cluster generation occurs in a closed environment on the Illumina cBOT instrument.

HiSeq2000 Sequencing
Sequencing on the flow cell employs Illumina’s well-established sequencing-by-synthesis chemistry. This chemistry utilizes four reversible terminator nucleotides, each possessing a different fluorescent dye and a chemically blocked hydroxyl group.  To begin sequencing, primers are hybridized to single stranded, covalently bound templates on the flow cell. Fluorescently labeled nucleotides are then flowed across the flow cell. During chain extension the fluorescent nucleotides compete for incorporation into the growing DNA chain. A single complimentary nucleotide is incorporated into each DNA molecule, terminating the chain and resulting in the simultaneous one base extension of millions of DNA clusters.  The incorporated nucleotides are excited by a laser, and emit their characteristic fluorescence.  This fluorescence is detected and recorded in an imaging step. Following base detection the fluorescent dye is cleaved and the 3’ hydroxyl block is chemically reversed, allowing chain extension to continue. This is repeated 36 to 100 times, generating a series of images.

Data Analysis 
The raw data generated is imaged and bases are called before sequence analysis begins. Sequences generated are de-multiplexed, aligned to a reference genome and transferred to an institutional server where it is accessed by MDACC bionformaticians. Data analysis is performed in collaboration with faculty from the Department of Bioinformatics.

Illumina GAII Process Diagram

Paired End Runs

The paired end module is used to perform sequencing from both ends of the adapter-ligated fragments. Using the paired end module fragments are first sequenced from one end then essentially flipped, and sequenced from the other direction. This doubles the amount of sequence data obtained from each cluster and may provide positional information.

Services Provided

The Illumina Hiseq2000 sequencer is a very flexible platform, enabling a wide variety of applications that differ only in sample preparation and downstream data analysis.

Sample Preparation Services:

  1. Library preparation using the SPRIworks system- includes sample QC and quantification, sample fragmentation (Covaris ultrasonicator), library QC, qPCR library quantification
    and cluster generation.
  2. Sample Indexing: To reduce cost we are using barcodes that identify individual samples, which are then mixed together for sequencing in a single lane.
    We can multiplex:
    4+ exomes/lane (70x - 80x coverage)
    6+ ChIP-seq/lane
    2 - 8 RNAseq/lane
  3. Exome/custom Target Enrichment using the Nimblegen EZ-exome , Agilent Custom Target Enrichment and the Agilent Haloplex system.
  4. cDNA Synthesis from total RN.
  5. Reduced Representation Bisulfite library preparation - Investigators provide Msp1 digested genomic DNA. The SMF will ligate methylation adapters, perform size selection, perform bisulfite treatment and evaluate conversion.

Supported Applications Include

Whole genome sequencing of Human, Mouse, Rat, Yeast, Monkey, Viral, Bacterial and other genomes. For applications in cancer research, the SMF provides sequencing of matched tumor and normal samples.

Transcriptome Analysis
Transcriptome analysis may be quantitative (gene expression analysis) and/or qualitative (transcript discovery, splice variant identification, coding SNP validation).  The SMF offers several options for transcriptome analysis.  The choice of sample preparation method is based on the investigators experimental objective and should be decided in conjunction with a bioinformatician.

mRNAseq
Uses oligo dT based capture for Poly (A) enrichment followed by cDNA synthesis using random and oligo dT priming.  Sequences generated map to coding regions of the genome.  

RNAseq
Here rRNA depletion is performed (no Poly (A) enrichment) followed by cDNA synthesis utilizing oligo-d(T) and random hexamers.  This method allows the sequencing of mRNA and non-polyadenylated RNA including histone mRNAs, precursors for Cajal body related small RNAs, and lncRNAs.  Sequences map to exons and intergenic regions.

Strand-specific RNAseq 
Preserves strand information.  In addition to the information provided by traditional RNAseq, strand-specific RNA-seq identifies antisense transcripts, determines the transcribed strand of non-coding RNAs and may help to demarcate the boundaries of overlapping genes.

MicroRNA-seq
Used to profile and identify changes in microRNA expression and to identify novel microRNAs

ChIP-seq
Used to identify transcription factor (protein) binding sites in genomes and specific cell types.  The investigator performs chromatin IP and provides antibody captured DNA to SMF.  Both ChIP sample and mock or IgG control are required.

Exome Resequencing
The Human Genome is comprised of approximately 3 billion base pairs,
of which only 1.2%-1.6% is coding. Exome resequencing selectively enriches for and sequences the coding regions. The SMF provides exome capture using solution based capture methods.

Targeted Resequencing
Selectively enriches for and sequences investigator defined regions of interest. The SMF provides targeted capture using solution-based methods, long range PCR and the Agilent Haloplex system.

Reduced Representation Bisulfite Sequencing and Whole Genome Methylation Sequencing
Provide genome wide views of DNA methylation.  Both methods detect variations in methylation signatures with single-base resolution. Whole genome bisulfite sequencing provides comprehensive methylation analysis by sequencing the entire bisulfite treated genome.   Reduced representation bisulfite sequencing enriches for CpG islands using Msp1 digestion of genomic DNA followed by size selection.  This provides a genome wide but not comprehensive view of methylation.

Getting Started

Project Consultation
The SMF provides budget planning, technology consultations and project planning with a NGS specialist and a MDACC faculty bioinformatician.  We strongly recommend that first time NGS service users and investigators with large-scale projects schedule a meeting before initiating a project.  To schedule a consultation meeting please contact Erika Thompson ejthomps@mdanderson.org

Sample Submission
All samples should be accompanied by a completed sample submission form.

 HiSeq2000 Submission Form (doc)

Sample Requirements

ServiceSample TypeMinimum QuantityRecommended Sequence Length
ChIP-SeqChIP DNA20ng36 nt single read
RNA-SeqTotal RNA2ug76 nt paired end, 100 nt paired end
RNA-Seq-strand specificTotal RNA4ug76 nt paired end
mRNA-seqTotal RNA1ug36, 50 or 76 nt single read or paired end
microRNA-SeqTotal RNA3ug36-nt single read
Exome Sequencinggenomic DNA2ug76 nt paired end
Targeted Sequencinggenomic DNA2ug76 nt paired end
Human Whole Genomegenomic DNA5ug100 nt paired end
Human Whole Genome-Bisulfitegenomic DNA10ug100 nt paired end
Reduced Representation-Bisulfitegenomic DNA5ug100 nt paired end

 

Sample Preparation Services

ServiceCancer Center Member (MDACC)Non-Cancer Center Member (non-MDACC)
Library Preparation (required)$200/sample$410/sample
Sample indexing$30/sample$61.50/sample
microRNA-Seq$380/sample*$779/sample
mRNA-Seq$300/sample*$615/sample
NuGen cDNA synthesis from total RNA (required for RNA-seq)$300/sample$615/sample
NimbleGen Target Capture (up to 50 Mb)Request a quoteRequest a quote
NimbleGen Exome Capture$700/sample$1435/sample
Nimble-Gen Exome Capture-Multiplex 4/capture$225/sample$461/sample

*Price includes library preparation.  

 

HiSeq2000 Sequencing Cost

Note: Library preparation is not included in the sequencing price.

Hiseq2000 sequencingEstimated Output/lane (gigabases) V3 chemistry*MDACC InvestigatorsNon-MDACC Investigators
NGS-36 nt Single Read5.4-5.6 Gb$950/lane$1948/lane
NGS-36 nt Paired End1.8-1.8 Gb$1500/lane$3075/lane
NGS-50 nt Single Read7.5-7.75 Gb$1020/lane$2091/lane
NGS-50 nt Paired End15-17.5 Gb$1680/lane$3444/lane
NGS-76 nt Single Read11.4-13.3 Gb$1270/lane$2604/lane
NGS-76 nt Paired End22-26.6 Gb$1950/lane$3998/lane
NGS-100 nt Single Read15-15.5 Gb$1500/lane$3075/lane
NGS-100 nt Paired End30-35 Gb$2200/lane$4510/lane
NGS custom/partial service Request a quoteNot available

*Estimate only-sequence output is not guaranteed.

Multiplex Samples Using Barcodes

To reduce cost we are using barcodes that identify individual samples, which are then mixed together for sequencing in a single lane.

Multiplex:

  • 4 exomes/lane (70x-80x coverage)
  • 4-6+ chIP-seq/lane
  • 4-8 RNA-seq/lane

Illumina Next Generation Sequencing Prices (In Practical Terms)

    Exome Resequencing
    Multiplex: 4 exomes/lane
    Multiplex capture (70x-80x coverage)
    Sequence Type: 76 NT PE
    Price: $950/exome

    ChIP-seq
    Multiplex: 6 ChIP-seq samples/lane
    Sequence Type: 36nt SR
    Price: $388/sample

    RNAseq
    Multiplex: 8 RNAseq samples/lane
    Sample Preparation: NuGen cDNA
    Sequence Type: 76nt PE
    Price: $775/sample

    mRNAseq
    Multiplex: 8 RNA-seq samples/lane
    Sequence Type: 76nt PE, Tru-seq
    Price: $550/sample
            OR
    Sequence Type: 36nt SR, Tru-seq
    Price: $425/sample

    ssRNAseq
    Multiplex: 8 RNAseq samples/lane
    Sequence Type: 76nt PE,
    Price: $670/sample    
            OR
    Sequence Type: 36nt SR
    Price: $545/sample

Data Analysis

Bioinformatics Faculty Collaborators

  • ChIP-Seq: Shoudan Liang, PhD.
  • All other services: Xiaoping Su, PhD.

© 2014 The University of Texas MD Anderson Cancer Center