Skip to content

Commit

Permalink
Update documentation and change log
Browse files Browse the repository at this point in the history
  • Loading branch information
alchemistmatt committed Mar 23, 2021
1 parent 2822fcf commit 3cf5d79
Show file tree
Hide file tree
Showing 3 changed files with 88 additions and 31 deletions.
49 changes: 24 additions & 25 deletions ZippedReleases/ReferenceFiles/Syntax.txt
Original file line number Diff line number Diff line change
@@ -1,32 +1,32 @@

Usage: java -Xmx3500M -jar MSGFPlus.jar
-s SpectrumFile (*.mzML, *.mzXML, *.mgf, *.ms2, *.pkl or *_dta.txt)
Spectra should be centroided (see below for MSConvert example). Profile spectra will be ignored.
-d DatabaseFile (*.fasta or *.fa or *.faa)
[-conf ConfigurationFile] (Configuration file path; options specified at the command line will override settings in the config file)
Example parameter file is at https://github.com/MSGFPlus/msgfplus/blob/master/docs/examples/MSGFPlus_Params.txt
[-s SpectrumFile] (*.mzML, *.mzXML, *.mgf, *.ms2, *.pkl or *_dta.txt)
Spectra should be centroided (see below for MSConvert example). Profile spectra will be ignored.
[-d DatabaseFile] (*.fasta or *.fa or *.faa)
[-decoy DecoyPrefix] (Prefix for decoy protein names; Default: XXX)
[-o OutputFile (*.mzid)] (Default: [SpectrumFileName].mzid)
[-t PrecursorMassTolerance] (e.g. 2.5Da, 20ppm or 0.5Da,2.5Da; Default: 20ppm)
Use a comma to define asymmetric values.
E.g. "-t 0.5Da,2.5Da" will set 0.5Da to the left (ObservedPepMass < TheoreticalPepMass)
and 2.5Da to the right (ObservedPepMass > TheoreticalPepMass)
[-ti IsotopeErrorRange] (Range of allowed isotope peak errors; Default:0,1)
Takes into account the error introduced by choosing a non-monoisotopic peak for fragmentation.
The combination of -t and -ti determines the precursor mass tolerance.
E.g. "-t 20ppm -ti -1,2" tests abs(ObservedPepMass - TheoreticalPepMass - n * 1.00335Da) < 20ppm for n = -1, 0, 1, 2.
[-thread NumThreads] (Number of concurrent threads to be executed; Default: Number of available cores)
Use a comma to define asymmetric values.
E.g. "-t 0.5Da,2.5Da" will set 0.5Da to the left (ObservedPepMass < TheoreticalPepMass)
and 2.5Da to the right (ObservedPepMass > TheoreticalPepMass)
[-ti IsotopeErrorRange] (Range of allowed isotope peak errors; Default: 0,1)
Takes into account the error introduced by choosing a non-monoisotopic peak for fragmentation.
The combination of -t and -ti determines the precursor mass tolerance.
E.g. "-t 20ppm -ti -1,2" tests abs(ObservedPepMass - TheoreticalPepMass - n * 1.00335Da) < 20ppm for n = -1, 0, 1, 2.
[-thread NumThreads] (Number of concurrent threads to be executed; Default: Number of available cores)
This is best set to the number of physical cores in a single NUMA node.
Generally a single NUMA node is 1 physical processor.
The default will try to use hyperthreading cores, which can increase the amount of time this process will take.
This is because the part of Scoring param generation that is multithreaded is also I/O intensive.
[-tasks NumTasks] (Override the number of tasks to use on the threads; Default: (internally calculated based on inputs))
More tasks than threads will reduce the memory requirements of the search, but will be slower (how much depends on the inputs).
1 <= tasks <= numThreads: will create one task per thread, which is the original behavior.
tasks = 0: use default calculation - minimum of: (threads*3) and (numSpectra/250).
tasks < 0: multiply number of threads by abs(tasks) to determine number of tasks (i.e., -2 means "2 * numThreads" tasks).
One task per thread will use the most memory, but will usually finish the fastest.
2-3 tasks per thread will use comparably less memory, but may cause the search to take 1.5 to 2 times as long.
[-tasks NumTasks] (Override the number of tasks to use on the threads; Default: (internally calculated based on inputs))
More tasks than threads will reduce the memory requirements of the search, but will be slower (how much depends on the inputs).
1 <= tasks <= numThreads: will create one task per thread, which is the original behavior.
tasks = 0: use default calculation - minimum of: (threads*3) and (numSpectra/250).
tasks < 0: multiply number of threads by abs(tasks) to determine number of tasks (i.e., -2 means "2 * numThreads" tasks).
One task per thread will use the most memory, but will usually finish the fastest.
2-3 tasks per thread will use comparably less memory, but may cause the search to take 1.5 to 2 times as long.
[-verbose 0/1] (Console output message verbosity, Default: 0)
0 means Report total progress only
1 means Report total and per-thread progress/status
Expand All @@ -38,18 +38,18 @@ Usage: java -Xmx3500M -jar MSGFPlus.jar
1 means CID
2 means ETD
3 means HCD
[-inst MS2DetectorID] (0: Low-res LCQ/LTQ (Default), 1: Orbitrap/FTICR/Lumos, 2: TOF, 3: Q-Exactive)
[-e EnzymeID] (0: Unspecific cleavage, 1: Trypsin (Default), 2: Chymotrypsin, 3: Lys-C, 4: Lys-N, 5: glutamyl endopeptidase, 6: Arg-C, 7: Asp-N, 8: alphaLP, 9: no cleavage)
[-protocol ProtocolID] (0: Automatic (Default), 1: Phosphorylation, 2: iTRAQ, 3: iTRAQPhospho, 4: TMT, 5: Standard)
[-ntt 0/1/2] (Number of Tolerable Termini, Default: 2)
E.g. For trypsin, 0: non-tryptic, 1: semi-tryptic, 2: fully-tryptic peptides only.
[-inst InstrumentID] (0: Low-res LCQ/LTQ (Default), 1: Orbitrap/FTICR/Lumos, 2: TOF, 3: Q-Exactive)
[-e EnzymeID] (0: unspecific cleavage, 1: Trypsin (Default), 2: Chymotrypsin, 3: Lys-C, 4: Lys-N, 5: glutamyl endopeptidase, 6: Arg-C, 7: Asp-N, 8: alphaLP, 9: no cleavage)
[-protocol ProtocolID] (0: Automatic (Default), 1: Phosphorylation, 2: iTRAQ, 3: iTRAQPhospho, 4: TMT, 5: Standard)
[-ntt 0/1/2] (Number of Tolerable Termini, Default: 2)
E.g. For trypsin, 0: non-tryptic, 1: semi-tryptic, 2: fully-tryptic peptides only.
[-mod ModificationFileName] (Modification file; Default: standard amino acids with fixed C+57; only if -mod is not specified)
[-minLength MinPepLength] (Minimum peptide length to consider; Default: 6)
[-maxLength MaxPepLength] (Maximum peptide length to consider; Default: 40)
[-minCharge MinCharge] (Minimum precursor charge to consider if charges are not specified in the spectrum file; Default: 2)
[-maxCharge MaxCharge] (Maximum precursor charge to consider if charges are not specified in the spectrum file; Default: 3)
[-n NumMatchesPerSpec] (Number of matches per spectrum to be reported; Default: 1)
[-addFeatures 0/1] (Include additional features in the output, Default: 0)
[-addFeatures 0/1] (Include additional features in the output (enable this to post-process results with Percolator), Default: 0)
0 means Output basic scores only (Default)
1 means Output additional features
[-ccm ChargeCarrierMass] (Mass of charge carrier; Default: mass of proton (1.00727649))
Expand All @@ -60,7 +60,6 @@ Example (high-precision): java -Xmx3500M -jar MSGFPlus.jar -s test.mzML -d IPI_h

Example (low-precision): java -Xmx3500M -jar MSGFPlus.jar -s test.mzML -d IPI_human_3.79.fasta -inst 0 -t 0.5Da,2.5Da -ntt 2 -tda 1 -o testMSGFPlus.mzid -mod Mods.txt


For Thermo .raw files, obtain a centroided .mzML file using MSConvert, which is part of ProteoWizard (http://proteowizard.sourceforge.net/)
MSConvert.exe DatasetName.raw --filter "peakPicking true 1-" --mzML --32

Expand Down
8 changes: 8 additions & 0 deletions docs/Changelog.html
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,14 @@ <h1 class="pagetitle">MS-GF+ ChangeLog</h1>
<p>
<a href="index.html">MS-GF+ Documentation home</a>
</p>

<p>
<b>v2021.03.22</b>
</p>
<ul>
<li>When displaying parameters, show the value for IgnoreMetCleavage</li>
<li>Update online documentation, including example parameter files</li>
</ul>

<p>
<b>v2021.01.15</b>
Expand Down
62 changes: 56 additions & 6 deletions docs/MSGFPlus.html
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,6 @@ <h1 class="pagetitle">MS-GF+</h1>

<h1>MS-GF+</h1>

<p>
<a href="MS-GFDB.html" title="MS-GFDB">(How to migrate from MS-GFDB to MS-GF)</a>
</p>

<p>
<a href="Changelog.html" title="MS-GF+ ChangeLog">ChangeLog</a>
</p>
Expand Down Expand Up @@ -89,6 +85,8 @@ <h1>MS-GF+</h1>

<span class="code-keyword">[-ccm ChargeCarrierMass]</span> (Mass of charge carrier; <span class="code-object">Default: mass of proton (1.00727649)</span>)

<span class="code-keyword">[-ignoreMetCleavage 0/1]</span> (N-terminal methionine cleavage behavior; <span class="code-object">Default: 0</span>)

<span class="code-keyword">[-maxMissedCleavages Count]</span> (Exclude peptides with more than this number of missed cleavages from the search; <span class="code-object">Default: -1 (no limit)</span>)

<span class="code-keyword">[-numMods Count]</span> (Maximum number of dynamic (variable) modifications per peptide; <span class="code-object">Default: 3</span>)
Expand Down Expand Up @@ -121,6 +119,7 @@ <h3>Parameters:</h3>
</ul>
<p class="note"><code>MSConvert.exe --mzML --32 --filter "peakPicking true 1-" DatasetName.raw</code></p>
</li>

<li style="margin-bottom: 10px;">
<b>-d DatabaseFile</b> (*.fasta or *.fa or *.faa) - Required
<ul>
Expand All @@ -129,6 +128,7 @@ <h3>Parameters:</h3>
</ul>
<p class="note">If multiple MS-GF+ processes access the same database file, it is strongly recommended to index the database prior to the database search by <a href="BuildSA.html">running BuildSA</a>.</p>
</li>

<li style="margin-bottom: 10px;">
<b>-conf ConfigurationFile</b>
<ul>
Expand Down Expand Up @@ -159,6 +159,7 @@ <h3>Parameters:</h3>
<li>E.g. for the input spectrum file "test.mzML", the output will be written to "test.mzid" if this parameter is not specified.</li>
</ul>
</li>

<li style="margin-bottom: 10px;">
<b>-t PrecursorMassTolerance</b> (Default: 20ppm)
<ul>
Expand All @@ -167,6 +168,7 @@ <h3>Parameters:</h3>
<li>It is recommended to use a tight tolerance rather than a loose tolerance (e.g. for Orbitrap data, 10ppm or 20ppm usually identifies more spectra than 50ppm).</li>
</ul>
</li>

<li style="margin-bottom: 10px;">
<b>-ti IsotopeErrorRange</b> (Default: 0,1)
<ul>
Expand All @@ -176,13 +178,15 @@ <h3>Parameters:</h3>
<li>E.g. <span class="code-keyword"><code>-t 20ppm -ti -1,2</code></span> tests abs(ObservedPepMass - TheoreticalPepMass - n * 1.00335Da) &lt; 20ppm for n = -1, 0, 1, 2</li>
</ul>
</li>

<li style="margin-bottom: 10px;">
<b>-thread NumOfThreads</b> (Default: Number of available cores)
<ul>
<li>Number of concurrent threads to be executed together.</li>
<li>Default value is the number of available logical cores (e.g. 8 for quad-core processor with hyper-threading support).</li>
</ul>
</li>

<li style="margin-bottom: 10px;">
<b>-tasks NumTasks</b> (Default: internally calculated based on inputs)
<ul>
Expand All @@ -197,6 +201,7 @@ <h3>Parameters:</h3>
<li>2-3 tasks per thread will use comparably less memory, but may cause the search to take 1.5 to 2 times as long with a 23MB fasta file.</li>
</ul>
</li>

<li style="margin-bottom: 10px;">
<b>-verbose 0/1</b> (Default: 0)
<ul>
Expand Down Expand Up @@ -250,6 +255,7 @@ <h3>Parameters:</h3>
<li>If the identifier is 4, MS/MS spectra from the same precursor ion (e.g. CID/ETD pairs, CID/HCD/ETD triplets) will be merged and the &quot;merged&quot; spectrum will be used for searching instead of individual spectra. See Kim et al., MCP 2010 for details.</li>
</ul>
</li>

<li style="margin-bottom: 10px;">
<b>-inst InstrumentID</b>
<ul>
Expand All @@ -269,12 +275,13 @@ <h3>Parameters:</h3>
<li>For other HCD spectra, use 1.</li>
</ul>
</li>

<li style="margin-bottom: 10px;">
<b>-e EnzymeID</b> (Default: 1)
<ul>
<li>Enzyme identifier.
<ul>
<li>0: unspecific cleavage</li>
<li>0: unspecific cleavage (cleave after any residue)</li>
<li>1: Trypsin (default)</li>
<li>2: Chymotrypsin</li>
<li>3: Lys-C</li>
Expand All @@ -291,6 +298,7 @@ <h3>Parameters:</h3>
<li>For more info, see <a href="examples/enzymes.txt">enzymes.txt</a></li>
</ul>
</li>

<li style="margin-bottom: 10px;">
<b>-p ProtocolID</b> (Default: 0)
<ul>
Expand Down Expand Up @@ -338,38 +346,44 @@ <h3>Parameters:</h3>
See an <a href="examples/Mods.txt">example MS-GF+ modification file</a>.</li>
</ul>
</li>

<li style="margin-bottom: 10px;">
<b>-minLength MinPepLength</b> (Default: 6)
<ul>
<li>Minimum length of the peptide to be considered.</li>
</ul>
</li>

<li style="margin-bottom: 10px;">
<b>-maxLength MaxPepLength</b> (Default: 40)
<ul>
<li>Maximum length of the peptide to be considered.</li>
</ul>
</li>

<li style="margin-bottom: 10px;">
<b>-minCharge MinPrecursorCharge</b> (Default: 2)
<ul>
<li>Minimum precursor charge to consider. This parameter is used only for spectra with no charge.</li>
</ul>
</li>

<li style="margin-bottom: 10px;">
<b>-maxCharge MinPrecursorCharge</b> (Default: 3)
<ul>
<li>Maximum precursor charge to consider. This parameter is used only for spectra with no charge.</li>
</ul>
</li>

<li style="margin-bottom: 10px;">
<b>-n NumMatchesPerSpec</b> (Default: 1)
<ul>
<li>Number of peptide matches per spectrum to report.</li>
<li>Expected false discovery rates (EFDRs) will be reported only when this value is 1.</li>
</ul>
</li>
<li>

<li style="margin-bottom: 10px;">
<b>-addFeatures 0/1</b> (Default: 0)
<ul>
<li>If 0, only basic scores are reported.</li>
Expand All @@ -383,6 +397,37 @@ <h3>Parameters:</h3>
</li>
</ul>
</li>

<li style="margin-bottom: 10px;">
<b>-ccm ChargeCarrierMass</b> (Default: 1.00727649)
<ul>
<li>Override the default charge carrier mass</li>
</ul>
</li>

<li style="margin-bottom: 10px;">
<b>-ignoreMetCleavage 0/1</b> (Default: 0)
<ul>
<li>0: consider cleavage of methionine from the protein's N-terminus, even when NTT=2</li>
<li>1: disable N-terminal methionine cleavage</li>
</ul>
</li>

<li style="margin-bottom: 10px;">
<b>-maxMissedCleavages Count</b> (Default: -1, meaning no limit)
<ul>
<li>Exclude peptides with more than this number of missed cleavages</li>
</ul>
</li>

<li style="margin-bottom: 10px;">
<b>-numMods Count</b> (Default: 3)
<ul>
<li>Maximum number of dynamic (variable) modifications per peptide</li>
<li>If this value is large and multiple dynamic modifications are defined, the search will be slow (depending on FASTA file size)</li>
</ul>
</li>

</ul>

<h3>MS-GF+ output</h3>
Expand All @@ -392,22 +437,27 @@ <h3>MS-GF+ output</h3>
<li style="margin-bottom: 5px;">
<b>MS-GF:RawScore</b>: MS-GF+ raw score of the peptide-spectrum match
</li>

<li style="margin-bottom: 5px;">
<b>MS-GF:DeNovoScore</b><b>:</b> the score of the optimal scoring peptide for the spectrum (not necessary in the database)&nbsp;(MS-GF:RawScore &lt;= MS-GF:DeNovoScore)
</li>

<li style="margin-bottom: 5px;">
<b>MS-GF:SpecEValue</b>: spectral E-value (spectrum level E-value) of the peptide-spectrum match - the lower the better
</li>

<li style="margin-bottom: 5px;">
<b>MS-GF:EValue</b>: database level E-value (expected number of peptides in a random database having equal or better scores than the PSM score) - the lower the better
</li>

<li style="margin-bottom: 5px;">
<b>MS-GF:QValue</b>
<ul>
<li>PSM-level Q-value estimated using the target-decoy approach.</li>
<li>MS-GF:QValue is computed solely based on MS-GF:SpecEValue.</li>
</ul>
</li>

<li style="margin-bottom: 5px;">
<b>MS-GF:PepQValue</b>
<ul>
Expand Down

0 comments on commit 3cf5d79

Please sign in to comment.