Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Generating functions of tandem mass spectra and their applications for peptide identifications

Abstract

Mass spectrometry (MS) has become the leading high- throughput technology for proteomics, a large-scale study of proteins. MS experiments generate tandem mass (MS/MS) spectra, each representing a peptide. Identifying peptides from MS/MS spectra is a basic and essential task in proteomics studies. At present, MS instruments and experimental protocols are rapidly advancing, however, the software tools to interpret MS/MS spectra are lagging behind with many computational problems remaining unsolved. In this dissertation, we present a novel approach to interpreting MS/MS spectra, called the generating function approach, and show how this approach enables us to solve key computational problems in MS. First, we address the problem of estimating statistical significance of Peptide-Spectrum Matches (PSMs). Since typically less than 30% of the generated spectra can be correctly interpreted, this problem is important in distinguishing between correct and incorrect PSMs. Using the generating function approach, we present the first analytical (rather than empirical) solution to this problem. Our MS-GF tool not only improves the accuracy of statistical significance estimates, but also in- creases the number of peptide identifications at a fixed error rate. Next, we present an alternative approach to peptide identifications based on generating all plausible de novo interpretations of a spectrum (spectral dictionary) and then quickly matching them against the protein database. Our MS-Dictionary tool enables proteogenomic searches in six-frame translation of genomic sequences that may be prohibitively time-consuming with traditional methods. We also present spectral profiles, a new representation of tandem mass spectra that compactly represent spectral dictionaries. Spectral profiles can be used to generate gapped peptides that are as useful as full-length peptides and as accurate as peptide sequence tags of length 3 traditionally used to speed up database searches. Lastly, we present a new database search tool MS-GF+ based on MS- GF. MS-GF+ is sensitive (it identifies more peptides than other database tools) and universal (works well for diverse types of spectra, different configurations of MS instruments and different experimental protocols). We benchmark MS-GF+ using diverse types of spectral datasets, and show that for all these datasets, MS-GF+ significantly increased the number of identified peptides compared to state-of-the-art methods for peptide identifications

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View