Over the last two decades, significant advances in technology and new methodologies have made proteomics an extremely powerful tool for protein scientists, biologists and clinical researchers. Scientific discoveries in proteomics largely depend on data analysis and data generation. Mass spectrometry (MS)-based proteomics has expanded significantly with improvements in software, data acquisition, and algorithms to provide more accessible and accurate data.
Direct protein analysis from tissue or biofluids, however, raises a variety of analytical challenges. Protein expression varies depending on the genetic background of an individual, and also on time, localisation, and as a physiological response to external stimuli. Moreover, because of the combined effects of alternative splicing, point mutation, post-translational modifications (PTMs) and endogenous proteolysis, a given protein can be expressed as many different proteoforms, each having a dedicated biological activity.
These analytical challenges create workflow bottlenecks and extend the time required for accurate results. This article will examine how these issues can be overcome to add efficiencies and expand capabilities for library-based approaches in quantitative proteomics.
Challenges in Proteomics
The core challenges in proteomics include converting complex liquid chromatography (LC) coupled with MS and the resulting datasets into tangible peptide spectrum matches (PSMs) and peptide identifications that can be used for protein inference, quantification, posttranslational modification (PTM) analysis, as well as proteome sequence coverage within complex samples. Database search algorithms have extensively transformed biological and medical research, yet as analytical instrumentation continues to become more sensitive, software developers must continuously expand these search algorithms to meet the demands of sample limitations in high-throughput protein quantitation.
In the simplest scenario, database search algorithms rely on precursor and fragment ion spectra to be matched in silico, suggesting the best fit and assigning a probability score. In many instances the best fit of a particular PSM is of an equal probability score or just marginally better than the next best fit, yet a single assignment is delivered. Inferring only one PSM, which may be incorrect given the reliance purely on the information contained in a fragment ion spectra, can lead to a greater chance of false identifications. As such, the field of proteomics benefits from stricter acceptance criteria such as lower false discovery rates, increases in unique peptide counts, protein sequence coverage and biological or technical replicates.
Advances in Database Search Platforms
To break the data analysis bottleneck, proteomics software developers are integrating real-time database search and smart acquisition in software algorithms. A parallel search engine in real-time (PaSER) is a GPU-powered real-time database search platform providing parallel computing power and real-time database search results for bottom-up proteomics. Researchers can view intricate details of data from the high-level experimental information to a specific fragment ion spectrum of interest. Additionally, a user-defined qualification of proteins or peptides at the end of a sample acquisition determines the progression of the sample queue, which checks suitability while also making the most of samples, consumables and instrument time.