SOLiD™ Next-Generation Sequencing

What is SOLiD™?

SOLiD™ stands for “Sequencing by Oligonucleotide Ligation and Detection.” The SOLiD™ System: Next-Generation Sequencing (NGS) from Applied Biosystems is a highly accurate, massively parallel next-generation sequencing platform that supports a wide range of applications. The flexibility of two independent flow cells and multiplexing capability allow multiple experiments to be conducted in a single run. With unparalleled throughput and greater than 99.94% base-calling accuracy, the SOLiD™ System enables you to complete large-scale sequencing and tag experiments more cost effectively than previously possible.

What are the key features of SOLiD™ for genomic applications?

  1. Scalable: The SOLiD™ System’s open slide format and flexible bead densities enable increases in throughput with modest analysis, protocol and chemistry optimizations.
  2. Accurate: With system accuracy greater than 99.94%, due to 2-base encoding, the SOLiD™ System distinguishes itself by providing data that is significantly more accurate than other next-generation platforms for variation detection. 2-base encoding enables unique error checking capability, providing higher confidence in each call so scientists can focus on the biological significance of their results.
  3. Ultra High Throughput: The SOLiD™ 3 Plus System generates 60+ gigabases and >1 billion tags per run, which is more usable data than any other next-generation system available today. This level of throughput enables large scale resequencing and tag-based experiments to be completed very cost effectively.
  4. Flexible: The independent flow cell configuration of the SOLID™ Analyzer enables you to run two completely independent experiments in a single run—essentially providing 2 instruments in one. The combination of multiple slide configuration and sample multiplexing capability enables you to cost-effectively analyze multiple samples for a variety of applications.
  5. Mate Pairs: The SOLiD™ System supports sample preparation for mate-paired libraries with insert sizes ranging from 600 bp up to 10 kbp. This broad range of insert sizes combined with ultra high throughput and flexible 2-flow cell configuration enables more precise characterization of structural variation across the genome.

What is the advantage of the SOLiD™ System compared to microarray for genomic applications?

The data collected by microarray are based on the fluorescence signal intensities (analogue data) from known sequenced oligo probes. Due to technical limitations, microarray data have issues regarding specificity, discrimination and sensitivity. However, the SOLiD™ NGS generates genome-wide sequence data as “digital data” in the resolution of single nucleotide and has no specificity, discrimination and sensitivity issues. The sequence data easily can be used to identify single nucleotide deletion, insertion, polymorphism and translocation and copy a number of changes on the DNA level. The sequence data generated from RNA in transcriptome-wide analysis could easily be used to identify transcript copy number variation between samples, splicing variants, single nucleotide change by RNA editing and discovery for all transcribed non-coding RNA transcriptome-wide.

What do I need to do for my research project in order to use the SOLiD™ System NGS technology?

Please design and plan your experiments well to address your questions scientifically. The next step is to collect high-quality DNA or RNA material as planned. You may present a request to the ncRNA Program and submit your samples for sequence data.

When should I expect to receive the sequence data?

The raw data generated by NGS are in color space and will be converted into base space after running through the SOLiD™ raw data pipeline. While the SOLiD™ data pipeline is running, the base space converted data will be mapped to reference sequence. The output data will come with sequence, copy number and reference genome location for DNA sequences. For RNA sequences, copy number to reference expressed genes within the blast database (Refseq, miRbase as examples) are generated.

What do I do with the sequence data and how do I translate it into meaningful biological data from downstream analysis?

An on-campus bioinformatician and biostatistician are available to provide downstream data analysis. Alternatively, off-campus computing laboratories offer the services as well.


What is the difference in total RNA required for mRNA and microRNA expression profiling?

General speaking, we require high-quality intact total RNA as starting material for microarray processing in mRNA profiling. For mRNA expression profiling, we recommend RNEasy Column purification, after RNA isolation by TRIzol®, to remove small RNA and contaminated protein. For microRNA expression profiling, however, we need whole total RNA without any column purification because mature microRNA in sizes 19~22nt and their precursors in sizes 60~110nt can be lost after column purification.

What types of oligo microarrays are available from the ncRNA Program?

The microarrays utilized in the ncRNA Program are oligo-based Affymetrix and custom arrays. The ncRNA Program provides expression profiling and SNP genotyping on all Affymetrix products commercially available. In addition, we build in-house oligo microarrays for ncRNA and microRNA expression profiling.

What do I need to provide?

The research investigator provides high-quality total RNA only. The ncRNA Program takes care of biotin-labeled antisense RNA (aRNA) target preparation, target/chip hybridization, chip post-hybridization signal detection, chip scanning, chip girding, data crunching, preliminary data analysis and data interpretation, if needed.

What expression profiling technology is offered by the ncRNA Program?

The technology for expression profiling is a single-color system on both Affymetrix and custom arrays. The oligo probes are in situ synthesis (Affymetrix 25mer) or spotted (custom ncRNA array 40mer) and immobilized on the chip. The biological testing sample is processed further by linear amplification and biotin-labeling during in vitro transcription (IVT) to generate a single strand antisense RNA (aRNA) as a target for chip hybridization. The labeled aRNA is on a single chip for hybridization and signal detection by Streptavidin-phycocerithrine in the post-hybridization process. The median normalized data from control and testing samples are compared to each other in fold change for further identification of differentially expressed genes.

How do I design my microarray experiments?

Due to the natural variation of samples collected from batches and individuals technically and biologically, it is most economical to prepare multiple biological samples and not request multiple chips for the same mRNA sample. Keep in mind that with 20K~45K spots, some spots may not hybridize equally on your control and treated chips because of technical reasons (minor defects, fiber, etc.). For this reason, it is essential that you repeat your experiment initially two times. You can then focus your initial analysis on the mRNAs that are altered in both experiments. If you find that this initial experiment provides data that is promising, you most likely want to perform a third experiment to facilitate statistical studies.

It is important to address that it is not efficient to perform only one control and treated chip without a duplication because you are likely to spend significant time analyzing data that are artificial. Typically, the ncRNA Program receives samples comparing testing samples from control sample. Of the thousands of samples processed, we have noticed two strategies that most labs take:

  • The first strategy involves repeating experiments only once and then use hits common on both experiments for RT-PCR and whole mount studies and really have no plan to initially publish their chip results. These groups validated most of their hits and have papers in progress. Keep in mind that this approach certainly will miss hits because the number of duplicate experiments is too low
  • The second strategy involves doing three to four replicates followed by statistical analysis performed either at the investigator’s own institution or through our collaboration with the bioinformatics group. This approach allows you to identify a comprehensive list of statistically significant hits prior to a more detailed biological analysis

While we recognize that many of you have limited resources that motivate opting for the first strategy, we strongly recommend the second strategy despite its higher initial cost.

If you have questions about your experimental design of for more information about bioinformatics analysis, please contact Dr. Chang-gong Liu prior to performing your experiment.

How do I isolate RNA for use in mRNA and microRNA microarray experiments?

The main objective is to generate RNA that is of high quality and sufficiently concentrated for use in a microarray experiment. Many researchers find that RNA isolation with TRIzol reagent gives good quality and quantity of RNA for these experiments. However, each type of starting material may generate RNA of different quality and no one technique is likely to work for all samples. Your data quality depends on the total RNA quality provided by you.

How do I evaluate the RNA quality for expression profiling on microarray?

RNA quality can be evaluated by visualizing the RNA on a gel or Agilent Bioanalyzer 2100, as well as by calculating the A260/A280 ratio. On a denaturing gel or Bioanalyzer 2100 (or on an ordinary agarose gel in denaturing buffer), the RNA should appear as two bright distinct bands that represent the 28S and 18S ribosomal species. The 28S band should be brighter than the 18S band. Tailing of these major bands down the gel or a background smear behind these bands that gets heavier at lower molecular weights may indicate degradation of the RNA. In these cases, it is best to isolate RNA from fresh tissue because degraded RNA will produce high background and low signal intensities on a microarray. The presence of very sharp bands higher than the 28S ribosomal band can indicate the presence of excess DNA in the sample, which can be removed by treatment with RNase-free DNase I.

The spectrophotometric ratio also will give an indication of the purity and integrity of the RNA and should be as close to 2.0 as possible. Generally, ratios less than 1.7 indicate that the RNA may be contaminated with other material and should be re-purified, perhaps run through a column.

What should be the concentration of the total RNA sample for submission to the ncRNA Program?

The concentration of the testing RNA should be 0.5 µg/µl minimal in RNase-free H2O. The ncRNA Program requires 5~10 µg of total RNA for each testing sample.

What steps should I take prior to submission of samples to the ncRNA Program?

Please ensure that:

  1. The RNA quality has been properly evaluated and provide an agarose gel image with the RNA samples or you may request that the ncRNA Program provide a quality check on the Agilent Bioanalyzer 2100
  2. The ncRNA Program Service Request form is completed with the appropriate information. Failure to provide complete information will delay the sample process on chips.

How should I address and send the RNA?

External users should send the RNA on dry ice via an overnight carrier to the address listed on our contacts page. All RNAs will be stored in -80°C freezer at the facility until processed.

What are the costs associated with a microarray experiment?

We are a fee-for-service program. MD Anderson and Baylor College of Medicine investigators are subject to the same fee schedule. External institutions should contact Dr. Chang-gong Liu for detailed pricing.

After completion of an experiment, what information and data can I expect to receive?

You will get back all raw data file as *cel files, *data files, *tif files and *gpr files. Also, the raw data that have been crunched and exported from the image data and preliminary analyzed data with fold change of listed differentially expressed genes will be provided in an Excel spreadsheet.

How do I retrieve the data?

Data can be retrieved in several ways. Most commonly, an ftp site (username and password protected) is established on the computer of the cancer center server with a private account for your data. Your data will be sent to the IP address of the relevant computer and you may access the site online to download the data to your own computer.

What is the turnaround time for experiments?

Work in the ncRNA Program is completed on a first come, first served basis. Generally, microarray experiments can be completed within one week of receipt of a small number of samples. However, the turnaround time for larger projects will be based on the number of samples and the queue in the laboratory.