Inspect: A Proteomics Search Toolkit
Copyright 2007, The Regents of the University of California
Table of Contents
Overview
Copyright information
Installation
Database
Searching
Analysis
Basic Tutorial
Advanced Tutorial
Unrestricted Search Tutorial
Analysis
Inspect writes search results to a tab-delimited file. Up to ten search hits are written for each spectrum,
but typcially all but the first can be discarded.
The quality of each match can be determiend by the F-Score. The F-score is a weighted sum of two factors. First is the
MQScore, or match quality score (in column 6). Second
is the delta-score (in column 14), the difference in MQScore between this match and the best alternative.
Because delta-score is highly dependent on database size and search parameters, Inspect takes the ratio of
the delta-score to the average delta-score for all top-scoring matches.
The preferred method to compute the false discovery rate (FDR) for a collection of matches is to employ a decoy
database. This method requires you to generate shuffled protein records before search using the "ShuffleDB" script
(see the Database section for details). Then, run the ComputeFDR.jar script to compute the empirical false discovery
rate for a given f-score cutoff.
As of January 3, 2012, the columns have been updated slightly. Below is a list of all the columns and their meaning:
SpectrumFile - The file searched
Scan# - The scan number within the file; this value is 0 for .dta files; For MGF files, the scan# is equivalent to the SpecIndex, but is 0-based numbering.
Annotation - Peptide annotation, with prefix and suffix and (non-fixed) modifications indicated.
Example: K.DFSQIDNAP+16EER.E
Protein - The name of the protein this peptide comes from. (Protein names are stored to the .index file
corresponding to the database .trie file)
Charge - Precursor charge. If "multicharge" is set, or if no charge is specified in the source file, Inspect
attempts to guess the charge.
MQScore - Match quality score, the main measure of match quality.
Length - The length of the matched peptide in amino acids.
TotalPRMScore - Summed score for break points (between amino acids), based upon a Bayesian network modeling
fragmentation propensities
MedianPRMScore - Median score for break pounts.
FractionY - The fraction of charge 1 y ions detected
FractionB - The fraction of charge 1 b ions detected
Intensity - Fraction of high-intensity peaks which are b or y fragments. For a length-n peptide, the top n*3
peaks are considered.
NTT - Number of tryptic termini (or Unused, if no protease was specified). Note that the N- and C-terminus of
a protein are both considered to be valid termini.
InspectFDR - This is the FDR of all matches with F-score equal to or greater than this match. Since Inspect knows
nothing about a decoy database, it is often best to run ComputeFDR.jar to compute an empirical FDR.
DeltaScore - Difference between the MQScore of this match and the best alternative
DeltaScoreOther - Difference between the MQScore of this match and the best alternative from a different locus.
To see the difference between this and the previous column, consider a search that finds similar matches
of the form "M+16MALGEER" and "MM+16ALGEER". In such a case, DeltaScore would be very small, but DeltaScoreOther
might still be large.
RecordNumber - Index of the protein record in the database
DBFilePos - Byte-position of this match within the database
SpecFilePos - Offset, in the input file, of this spectrum; useful for passing to the "Label" script (see below)
PrecursorMZ - The precursor m/z given in the spectrum file.
PrecursorError - The difference (in m/z units) between the precursor m/z given in the file and the theoretical m/z of the identified peptide.
SpecIndex - This is a one-based number of the index of the spectrum in the original spectrum file. Only MS2+ spectra are counted.
Post-processing
Python scripts for performing various analyses are included in the distribution.
Run a script with no command-line parameters to print a list of available arguments.
Label.py - Given a spectrum and a peptide annotation, label the spectrum peaks with
their associated fragments. Produces a .png image for a spectrum, with associated peptide interpretation. Requires
the Python Imaging Library (PIL). Sample command:
Label.py Shewanella.mzXML:6200392 R.SNGSIGQNQ+14TPGR.V
ComputeFDR.jar - Given Inspect output, filter to a user-determined FDR. The ComputeFDR.jar script can be used for many experiments but typical use for Inspect results would be
java -jar ComputeFDR.jar -f InspectResult.out 3 XXX -n 1 -p 2 -s 14 1 -fdr 0.01
Summary.py - Given Inspect output, produce an html-format summary of the results. The report provides
a "protein-level" look at the results. This script is also used when
producing a "second-pass" protein database, containing the proteins identified with high confidence.
PTMAnalysis.py - This script examines output from MS-Alignment (Inspect run in "blind" mode), and
highlights the most plausible evidence for PTMs. The script iteratively selects the most common
post-translational modifications, and report the selections. These selections require manual curation
and/or validation.