Welcome to the Du Lab Research


Our lab is part of the Department of Bioinformatics & Genomics, College of Computing and Informatics, University of North Carolina at Charlotte.

Our lab is located on the North Carolina Research Campus (NCRC). At NCRC, researchers from eight universities across North Carolina and industry are advancing the fields of nutrition, agruculture, and biotechnology. 


Our research focuses on developing novel computational and visual analytics algrithms for mass spectrometry-based proteomics and metabolomics studies. Our long-term research goal is to develop an integrated bioinformatics framework for large-scale -omics studies. Our on-going research projects include: 

1. Development of an Automated Data Analysis Platform for Structural Studies of Proteins Using Chemical Cross-Linking and Tandem Mass Spectrometry

Abstract: Chemical cross-linking combined with mass spectrometry provides a powerful method for identifying protein-protein interactions and probing the structure of protein complexes. A number of strategies have been reported that take advantage of the high sensitivity and high resolution of modern mass spectrometers. Approaches typically include synthesis of novel cross-linking compounds, and/or isotopic labeling of the cross-linking reagent and/or protein, and label-free methods. We report Xlink-Identifier, a comprehensive data analysis platform that has been developed to support label-free analyses. It can identify interpeptide, intrapeptide, and deadend cross-links as well as underivatized peptides. The software stream- lines data preprocessing, peptide scoring, and visualization and provides an overall data analysis strategy for studying protein-protein interactions and protein structure using mass spectrom- etry. The software has been evaluated using a custom synthesized cross-linking reagent that features an enrichment tag. Xlink-Identifier offers the potential to perform large-scale identifications of protein-protein interactions using tandem mass spectrometry.

Link to paper 1

Link to paper 2

2. Development of an Automated Data Analysis Platform for Mass Spectrometry-based Metabolomics Studies

Abstract: Recent technological advances have made it possible to carry out high-throughput metabonomics studies using gas chromatography coupled with time-of-flight mass spectrometry. Large volumes of data are produced from these studies and there is a pressing need for algorithms that can efficiently process and analyze data in a high-throughput fashion as well. We present an Automated Data Analysis Pipeline (ADAP) that has been developed for this purpose. ADAP consists of peak detection, deconvolution, peak alignment, and library search. It allows data to flow seamlessly through the analysis steps without any human intervention and features two novel algorithms in the analysis. Specifically, clustering is successfully applied in deconvolution to resolve coeluting compounds that are very common in complex samples and a two-phase alignment process has been implemented to enhance alignment accuracy. ADAP is written in standard C++ and R and uses parallel computing via Message Passing Interface for fast peak detection and deconvolution. ADAP has been applied to analyze both mixed standards samples and serum samples and identified and quantified metabolites successfully. ADAP is available at http://www.du-lab.org.

Link to paper.

3. Integration of Metabolomics Data with Transcriptomics and SNP Data

Our past projects include:

1. Estimation of the False Discovery Rate for Phosphopeptide Identifications

Abstract: The development of liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has made it possible to characterize phosphopeptides in an increasingly large-scale and high-throughput fashion. However, extracting confident phosphopeptide identifications from the resulting large data sets in a similar high-throughput fashion remains difficult, as does rigorously estimating the false discovery rate (FDR) of a set of phosphopeptide identifications. This article describes a data analysis pipeline designed to address these issues. The first step is to reanalyze phosphopeptide identifications that contain ambiguous assignments for the incorporated phosphate(s) to determine the most likely arrangement of the phosphate(s). The next step is to employ an expectation maximization algorithm to estimate the joint distribution of the peptide scores. A linear discriminant analysis is then performed to determine how to optimally combine peptide scores (in this case, from SEQUEST) into a discriminant score that possesses the maximum discriminating power. Based on this discriminant score, the p- and q-values for each phosphopeptide identification are calculated, and the phosphopeptide identification FDR is then estimated. This data analysis approach was applied to data from a study of irradiated human skin fibroblasts to provide a robust estimate of FDR for phosphopeptides. The Phosphopeptide FDR Estimator software is freely available for download at http://ncrr.pnl.gov/software/.

Link to paper


2. Development of A computational Strategy to Analyze Label-Free Temporal Bottom-Up Proteomics Data

Abstract: Biological systems are in a continual state of flux, which necessitates an understanding of the dynamic nature of protein abundances. The study of protein abundance dynamics has become feasible with recent improvements in mass spectrometry-based quantitative proteomics. However, a number of challenges still remain related to how best to extract biological information from dynamic proteomics data, for example, challenges related to extraneous variability, missing abundance values, and the identification of significant temporal patterns. We have developed a strategy to address these issues.

Link to paper



  • Site Counter: 68,545
  • Unique Visitor: 3,956