Computational Genomic Infrastructure
Genomic data processing and analysis are both computational intensive and time consuming that requires the support of a well designed and implemented hardware and software infrastructure. We have been working closely with the HPCC group of MD Anderson and IBM scientists and developed an automated NGS data processing and analysis pipeline. The pipeline has been powering the sequencing efforts of the Moon Shots Program™ and the integration of genomic data into the institution's big data platform. Current research includes collaborations with IBM scientists to increase the throughput of the pipeline through hardware and software enhancements and management of the metadata.
Algorithm and Software Development
While working on the computational infrastructure or with biologists/clinicians on various projects, we have identified areas where new computational approaches need to be developed and implemented to make software that can be easily deployed to the production pipeline or researchers in the community. Several widely used applications have been developed. We are currently working on algorithms/software dealing with the identification of rare genomic variations, analyzing NGS data derived from PDX models and the potential of utilizing personalized reference genome to enhance the accuracy of alignment.
Customized Computational Data Analysis
With an automated production pipeline generating standardized analytical results, our computational scientists are able to make greater efforts to interact with the biologists/clinicians to understand the biological or clinical questions asked and curtail the down stream analysis accordingly. We have established long term collaborations with faculty members in departments throughout the institution and provided customized computational supports with excellent outcomes.