UIUC Home Page
 


Sequence Gazer™
•Documentation
•Sequence Gazer™ App

 
 

    

SEQUENCE GAZER™ DOCUMENTATION

Introduction

Sequence Gazer™ is a tool for using MS/MS data to test hypotheses regarding the characterization of a protein of known sequence.  It allows the user to determine the likelihood that specific modifications exist at particular locations on the protein sequence.

 

Uses of Sequence Gazer™ range from confirming that a known PTM exists at a particular location to fully characterizing a poorly-known protein based on expert knowledge and theoretical expectation.  It provides the means to quantitatively measure the quality of a putative characterization against MS/MS experimental data.

Interface

Figure 1 - Sequence Gazer™ Interface

The Sequence Gazer™ interface consists of a search definition page and a solution refinement page.  To define a search, the user needs to provide the following information:

·        Precursor Mass Type: Monoisotopic or Average mass.

·        Fragment Mass Type: Monoisotopic or Average mass.

Sequence Gazer™ operates on lists of neutral masses.  Algorithms (such as THRASH) analyze the isotopic distribution in the m/z domain and calculate the average neutral mass (average of the isotopic distribution).  If there is sufficient information in the spectrum, the monoisotopic neutral mass (the mass resulting from only principal isotopes) may be extrapolated. 

·        Fragment Tolerance: In Daltons (Da) or Parts Per Million (ppm).

This setting specifies the maximum difference between observed and theoretical fragment masses that are considered to match.

·        Δm Mode: On or off.

This mode runs two analysis runs on the data.  The first is with the observed fragment list, and the second is with a duplicate fragment list, with the current mass difference subtracted from each fragment.  By observing which fragments matched, it is possible to localize an unknown modification to a particular fragment.

·        Fixed Modifications: Select from list of available fixed modifications to apply.

Fixed modifications are mass transformations applied to all instances of particular residues according to preset rules.  They represent pre-treatment of the sample by various chemical means.  For instance, if the sample is pre-treated to derivatize all cysteines, the user must apply the appropriate cysteine fixed modification to their search.  If this is not done, the characterization will fail, as Sequence Gazer™ will attempt to explain transformed observed masses with non-transformed theoretical masses.

·        Sequence: A protein sequence in IUPAC standard one-letter nomenclature.

Sequence Gazer™ supports the 20 IUPAC standard one-letter amino acid codes, and a set of modified RESID identifiers to represent expected modifications.  In order to specify an expected modification to a residue, the residue’s one-letter code must be followed in the sequence by parentheses that contain the RESID identifier of the modification, with the “AA” and leading zeroes stripped off.  For example, “M(21)AR” represents N-formyl-methionine - alanine - arginine.”

·        Precursor Mass List: A list of neutral masses, in Daltons.

·        Fragment Mass List: A list of neutral masses, in Daltons.

These lists of masses must match the mass type specified above.

Once a search is defined, the user is presented with the solution refinement page.  It displays the following sections:

·        Search Parameters: Precursor mass type, fragment mass type and tolerance, and Δm mode indicator for the current search.  These may be changed at any time during solution refinement.

·        Scores: Quantitative numerical measures of the solution quality, i.e., the degree to which the current solution explains the available data.

·        P_Score:  As presented by Meng, et al, this score represents the probability of the protein match arising by chance, assuming a poisson distribution of fragment masses[1].

·        PDE: As presented by Reid, et al, this score takes into account the relative abundances of observed fragment ions, and the frequency of fragmentation at preferential cleavage sites[2].

·        Fragments Explained: Displays the percentage of fragments in the MS/MS data that are explained by the last set of scored modifications to the solution.  Below this number is a Rescore button.  Pressing this button commits the current modifications, computes the new scores and fragments-explained values, and refreshes the page.

·        Mass Difference: Displays the overall observed experimental mass, the current theoretical mass of the proposed solution, and the difference between the two.

·        Protein Sequence: Displays the protein sequence specified during search configuration, the B- and Y-ions computed from the MS/MS data, and the PTMs currently proposed.  Clicking each residue selects it, and allows the user to modify the current proposed PTM for that residue in the Residue section.

·        Residue: Displays two sections about the currently-selected residue:

·        Residue Information: Contains information about the residue’s position, display encoding and currently-selected modification.

·        PTM Choices: Contains information about the possible PTMs that could exist at the residue.  These PTMs are restricted to only those that are theoretically-possible based on information contained in RESID.  The user may select one of the possible modifications, no modifications, or a custom, arbitrary modification with a user-defined mass.

·        Fragments: Contains two expandable subsections:

·        Show Matching Fragments: Expands to a list containing information about the fragments that are explained by the currently proposed solution.

·        Show Non-Matching Fragments: Expands to a list containing information about the fragments that are not explained by the currently-proposed solution.

Usage

Define Your Search

To use Sequence Gazer™, you first need data to analyze.  Sequence Gazer™ works on lists of neutral masses derived from MS/MS experiments.  It does not work on data in the m/z domain!  You may use algorithms such as THRASH or Decon to convert m/z data to neutral mass data.

Figure 2 - Typical Input Data

 

Once you have a list of intact neutral masses and a list of fragment neutral masses, it is time to define your search.  Specify the type of mass that you are using (monoisotopic or average) for both lists, specify your fragment tolerance, and specify whether you want to use Δm mode.  If you aren’t sure what parameters to use, don’t worry - all of these parameters may be modified later on, while you are refining your solution.

 

You must also specify the sequence of the protein you are attempting to characterize.  This sequence should be as accurate as possible, since differences in sequence result in differences in mass, and lower the degree to which you can explain the data you obtained.

 

You may specify the fixed modifications you used here.  This is your only opportunity to do so - there is no way to do it after you begin refining your solution, aside from discarding all your work and starting a new search.

 

Finally, add the intact and fragment masses to the appropriate boxes, and click Submit..

Refine Your Solution

Figure 3 - Analysis View

You will initially be presented with a solution based on the unmodified protein sequence.  You may now place modifications on particular residues to increase the degree to which your solution explains the data.  Based on the fragmentation pattern of the protein, specify the residues you believe are modified, and the modifications on them.  Once you have done so, click Rescore.

If your explanation for difference between the observed mass and the theoretical mass is valid, your overall score will improve, and the percentage of matching fragments will go up.  If not, it will decrease.  Your goal is to continue characterizing the sequence until you have reduced the difference between the observed mass and the theoretical mass of your solution to within the measurement error of your instrument. 
Based on your expert knowledge of the sample, you may wish to examine the list of matching and non-matching fragments directly.  By clicking on the arrows below the sequence display (“Show Matching Fragments”, “Show Non-Matching Fragments”), you may expand those particular lists.

Figure 4 - Matching Fragment List

The m/z columns in Matching and Non-Matching Fragments, as well as the Intensity column in Non-Matching Fragments are not used in this version of Sequence Gazer™.  They are used in NeuroProSight and ProSightPC™ when data is derived from a larger set of searches.

 



[1] Meng, F. B. J. Cargile, L. H. Miller, A. J. Forbes, J. R. Johnson and N. L. Kelleher. (2001)  Informatics and Multiplexing of intact protein identification in bacteria and the archaea. Nature Biotechnology 19: 952-957.

[2] Reid, G. E., H. Shang, J. M. Hogan, G. U. Lee and S. A. McLuckey. (2002) Gas-phase concentration, purification, and identification of whole proteins from complex mixtures. Journal of the American Chemical Society 124: 7353-7362.