SEQUENCE GAZER™ DOCUMENTATION
Introduction
Sequence Gazer™ is a tool for using MS/MS data to test
hypotheses regarding the characterization of a protein of known sequence. It allows the user to determine the
likelihood that specific modifications exist at particular locations on the
protein sequence.
Uses of Sequence Gazer™ range from confirming that a known
PTM exists at a particular location to fully characterizing a poorly-known
protein based on expert knowledge and theoretical expectation. It provides the means to quantitatively
measure the quality of a putative characterization against MS/MS experimental
data.
Interface

Figure 1 - Sequence Gazer™ Interface
The Sequence Gazer™ interface consists of a search
definition page and a solution refinement page.
To define a search, the user needs to provide the following information:
·
Precursor
Mass Type: Monoisotopic or Average mass.
·
Fragment
Mass Type: Monoisotopic or Average mass.
Sequence Gazer™
operates on lists of neutral masses.
Algorithms (such as THRASH) analyze the isotopic distribution in the m/z
domain and calculate the average neutral mass (average of the isotopic
distribution). If there is sufficient
information in the spectrum, the monoisotopic neutral mass (the mass resulting
from only principal isotopes) may be extrapolated.
·
Fragment
Tolerance: In Daltons (Da) or Parts
Per Million (ppm).
This setting
specifies the maximum difference between observed and theoretical fragment
masses that are considered to match.
·
Δm Mode: On
or off.
This mode runs
two analysis runs on the data. The first
is with the observed fragment list, and the second is with a duplicate fragment
list, with the current mass difference subtracted from each fragment. By observing which fragments matched, it is
possible to localize an unknown modification to a particular fragment.
·
Fixed
Modifications: Select from list of available fixed modifications to apply.
Fixed
modifications are mass transformations applied to all instances of particular
residues according to preset rules. They
represent pre-treatment of the sample by various chemical means. For instance, if the sample is pre-treated to
derivatize all cysteines, the user must apply the appropriate cysteine fixed
modification to their search. If this is
not done, the characterization will fail, as Sequence Gazer™ will attempt to
explain transformed observed masses with non-transformed theoretical masses.
·
Sequence:
A protein sequence in IUPAC standard one-letter nomenclature.
Sequence Gazer™
supports the 20 IUPAC standard one-letter amino acid codes, and a set of
modified RESID identifiers to represent expected modifications. In order to specify an expected modification
to a residue, the residue’s one-letter code must be followed in the sequence by
parentheses that contain the RESID identifier of the modification, with the
“AA” and leading zeroes stripped off.
For example, “M(21)AR” represents N-formyl-methionine
- alanine - arginine.”
·
Precursor
Mass List: A list of neutral
masses, in Daltons.
·
Fragment
Mass List: A list of neutral
masses, in Daltons.
These lists of
masses must match the mass type specified above.
Once a search is defined, the user is presented with the
solution refinement page. It displays
the following sections:
·
Search
Parameters: Precursor mass type, fragment mass type and tolerance, and Δm mode indicator for the current search. These may be changed at any time during
solution refinement.
·
Scores:
Quantitative numerical measures of the solution quality, i.e., the degree to
which the current solution explains the available data.
·
P_Score: As presented by Meng,
et al, this score represents the probability of the protein match arising by
chance, assuming a poisson distribution of fragment
masses.
·
PDE: As presented by Reid, et al, this score
takes into account the relative abundances of observed fragment ions, and the
frequency of fragmentation at preferential cleavage sites.
·
Fragments
Explained: Displays the percentage of fragments in the MS/MS data that are
explained by the last set of scored modifications to the solution. Below this number is a Rescore button. Pressing
this button commits the current modifications, computes the new scores and
fragments-explained values, and refreshes the page.
·
Mass
Difference: Displays the overall observed experimental mass, the current
theoretical mass of the proposed solution, and the difference between the two.
·
Protein
Sequence: Displays the protein sequence specified during search
configuration, the B- and Y-ions computed from the MS/MS data, and the PTMs currently proposed.
Clicking each residue selects it, and allows the user to modify the
current proposed PTM for that residue in the Residue section.
·
Residue:
Displays two sections about the currently-selected residue:
·
Residue
Information: Contains information about the residue’s position, display
encoding and currently-selected modification.
·
PTM
Choices: Contains information about the possible PTMs
that could exist at the residue. These PTMs are restricted to only those that are theoretically-possible
based on information contained in RESID.
The user may select one of the possible modifications, no modifications,
or a custom, arbitrary modification with a user-defined mass.
·
Fragments:
Contains two expandable subsections:
·
Show
Matching Fragments: Expands to a list containing information about the
fragments that are explained by the currently proposed solution.
·
Show
Non-Matching Fragments: Expands to a list containing information about the
fragments that are not explained by the currently-proposed solution.
Usage
Define Your Search
To use Sequence Gazer™, you first need data to analyze. Sequence Gazer™ works on lists of neutral masses derived from MS/MS
experiments. It does not work on data in the m/z domain! You may use algorithms such as THRASH or Decon to convert m/z data to neutral mass data.

Figure 2 - Typical Input Data
Once you have a list of intact neutral masses and a list of
fragment neutral masses, it is time to define your search. Specify the type of mass that you are using
(monoisotopic or average) for both lists, specify your fragment tolerance, and
specify whether you want to use Δm mode. If you aren’t sure what parameters to use,
don’t worry - all of these parameters may be modified later on, while you are
refining your solution.
You must also specify the sequence of the protein you are
attempting to characterize. This
sequence should be as accurate as possible, since differences in sequence
result in differences in mass, and lower the degree to which you can explain
the data you obtained.
You may specify the fixed modifications you used here. This is your only opportunity to do so -
there is no way to do it after you begin refining your solution, aside from
discarding all your work and starting a new search.
Finally, add the intact and fragment masses to the
appropriate boxes, and click Submit..
Refine Your Solution

Figure 3 - Analysis View
You will initially be presented with a solution based on the
unmodified protein sequence. You may now
place modifications on particular residues to increase the degree to which your
solution explains the data. Based on the
fragmentation pattern of the protein, specify the residues you believe are
modified, and the modifications on them.
Once you have done so, click Rescore.
If your explanation for
difference between the observed mass and the theoretical mass is valid, your
overall score will improve, and the percentage of matching fragments will go
up. If not, it will decrease. Your goal is to continue characterizing the
sequence until you have reduced the difference between the observed mass and
the theoretical mass of your solution to within the measurement error of your
instrument.
Based on your expert
knowledge of the sample, you may wish to examine the list of matching and
non-matching fragments directly. By
clicking on the arrows below the sequence display (“Show Matching Fragments”,
“Show Non-Matching Fragments”), you may expand those particular lists.

Figure 4 - Matching Fragment List
The m/z columns in
Matching and Non-Matching Fragments, as well as the Intensity column in
Non-Matching Fragments are not used in this version of Sequence Gazer™. They are used in NeuroProSight
and ProSightPC™ when data is derived from a larger
set of searches.