Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PTM Stoichiometry #797

Draft
wants to merge 61 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
8bf52b1
Bug fix. Previous ParseModifications implementation could give negati…
pcruzparri Aug 27, 2024
dcede87
Saving draft implementation of a site-occupancy calculation.
pcruzparri Aug 27, 2024
0e59e82
Merge branch 'master' of https://github.com/smith-chem-wisc/mzLib int…
pcruzparri Sep 12, 2024
41ef6f4
Saving some initial progress on the occupancy calculation. Started Mx…
pcruzparri Sep 13, 2024
f06af28
temp
pcruzparri Sep 17, 2024
2ebe188
Merge branch 'master' of https://github.com/smith-chem-wisc/mzLib int…
pcruzparri Oct 11, 2024
b8fb4cb
PTM calculation implemented into FlashLFQ engine. Base method exists …
pcruzparri Oct 11, 2024
8d8658d
Removed the sandbox test Peter and changed the default arguments of P…
pcruzparri Oct 11, 2024
8fb7360
Added fixes to the FlashLFQResults and PositionFrequencyAnalysis impl…
pcruzparri Oct 11, 2024
74ed705
Fixed flipped logic in FlashLFQ/Peptide.GetTotalIntensity(). Cleaned …
pcruzparri Oct 14, 2024
6c18e9f
Transcriptomics Digestion and Fragmentation (#801)
nbollis Oct 15, 2024
68165b0
Refactored the PositionFrequencyAnalysis code to eliminate the nested…
pcruzparri Oct 18, 2024
58e6346
Bug fix. Previous ParseModifications implementation could give negati…
pcruzparri Aug 27, 2024
f0d67d0
Saving draft implementation of a site-occupancy calculation.
pcruzparri Aug 27, 2024
7b04937
Saving some initial progress on the occupancy calculation. Started Mx…
pcruzparri Sep 13, 2024
d2c240e
temp
pcruzparri Sep 17, 2024
af278f0
PTM calculation implemented into FlashLFQ engine. Base method exists …
pcruzparri Oct 11, 2024
ef3ec35
Removed the sandbox test Peter and changed the default arguments of P…
pcruzparri Oct 11, 2024
f577298
Added fixes to the FlashLFQResults and PositionFrequencyAnalysis impl…
pcruzparri Oct 11, 2024
f21d365
Fixed flipped logic in FlashLFQ/Peptide.GetTotalIntensity(). Cleaned …
pcruzparri Oct 14, 2024
f6caa30
Refactored the PositionFrequencyAnalysis code to eliminate the nested…
pcruzparri Oct 18, 2024
b146768
Merge branch 'ptm_stoich' of https://github.com/pcruzparri/mzLib into…
pcruzparri Oct 18, 2024
dc20e44
Neutral Mass Spectrum (#806)
nbollis Oct 29, 2024
7dcf9a9
Updated SpectraFileAveraging.cs to include the ScanFilter parameter (…
nbollis Oct 29, 2024
cb08d67
MSFragger Results Folder Reader (#792)
mzhastings Oct 30, 2024
b055693
Get Modifications from Full Sequence (#796)
nbollis Oct 30, 2024
e5cf73e
Changes to MBR within FlashLFQ (#802)
Alexander-Sol Nov 5, 2024
6411360
Fixed decoy order (#809)
Alexander-Sol Nov 18, 2024
848413d
saving progress on PeptideToProteinPTMOccupancy and updated Regex mod…
pcruzparri Dec 6, 2024
fed869f
Averaging slight adjustement for ensured thread safety (#810)
nbollis Dec 12, 2024
21c1702
Extended IBioPolymerWithSetMods interface, changed hashcodes (#811)
Alexander-Sol Dec 12, 2024
d98326b
IEquality Hotfix (#817)
nbollis Dec 16, 2024
90bd259
Changing the github actions workflow to test integration with MetaMor…
Alexander-Sol Dec 17, 2024
dc44773
IEquality hot fix hot fix (#819)
nbollis Dec 18, 2024
264521b
Db writer fix (#820)
Alexander-Sol Jan 10, 2025
5443e36
IsoDec Deconvolution Algorithm (#791)
nbollis Jan 10, 2025
331ee1d
Object Pooling (#822)
nbollis Jan 15, 2025
1b8b950
Digestion Consolidation and Optimization (#823)
nbollis Jan 17, 2025
5d9671a
Digestion: Fixed mod terminal fix and variable mod ordering (#825)
nbollis Jan 18, 2025
98ea879
Mzml writer now rounds to 4 decimal places (#821)
Alexander-Sol Jan 18, 2025
e33a478
Tims tof reader (#812)
Alexander-Sol Jan 19, 2025
de68239
Put those mods back where they came from or so help me (#827)
nbollis Jan 22, 2025
5d2b774
Changed rounding method (#826)
Alexander-Sol Jan 22, 2025
981fc57
Undo Rounding (#831)
Alexander-Sol Jan 31, 2025
6ed70c6
Change github action workflow to upload MM installer artifact (#829)
Alexander-Sol Jan 31, 2025
9835b78
Updated nuspec (#830)
Alexander-Sol Jan 31, 2025
7cdd05d
enable reading of lipid mods from protein xml with test (#828)
trishorts Jan 31, 2025
03f462b
Bug fix. Previous ParseModifications implementation could give negati…
pcruzparri Aug 27, 2024
e4d2853
Saving draft implementation of a site-occupancy calculation.
pcruzparri Aug 27, 2024
741dbd6
Saving some initial progress on the occupancy calculation. Started Mx…
pcruzparri Sep 13, 2024
82af7a6
temp
pcruzparri Sep 17, 2024
def5faa
PTM calculation implemented into FlashLFQ engine. Base method exists …
pcruzparri Oct 11, 2024
29ca983
Removed the sandbox test Peter and changed the default arguments of P…
pcruzparri Oct 11, 2024
282c1b5
Added fixes to the FlashLFQResults and PositionFrequencyAnalysis impl…
pcruzparri Oct 11, 2024
aa1aaec
Fixed flipped logic in FlashLFQ/Peptide.GetTotalIntensity(). Cleaned …
pcruzparri Oct 14, 2024
91dddfd
Refactored the PositionFrequencyAnalysis code to eliminate the nested…
pcruzparri Oct 18, 2024
cb3afd5
temp
pcruzparri Sep 17, 2024
4b3e4b9
PTM calculation implemented into FlashLFQ engine. Base method exists …
pcruzparri Oct 11, 2024
b81d218
Added fixes to the FlashLFQResults and PositionFrequencyAnalysis impl…
pcruzparri Oct 11, 2024
43f525b
saving progress on PeptideToProteinPTMOccupancy and updated Regex mod…
pcruzparri Dec 6, 2024
463d65a
rebased onto updated master
pcruzparri Feb 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 13 additions & 8 deletions mzLib/Omics/SpectrumMatch/SpectrumMatchFromTsv.cs
Original file line number Diff line number Diff line change
Expand Up @@ -107,27 +107,32 @@ public static Dictionary<int, List<string>> ParseModifications(string fullSeq)
//int patternMatches = regex.Matches(fullSeq).Count;
Dictionary<int, List<string>> modDict = new();


// If there is a missed cleavage, then there will be a label on K and a Label on X modification.
// It'll be like [label]|[label] which complicates the positional stuff a little bit. Therefore,
// RemoveSpecialCharacters will remove the "|", to ease things later on.
RemoveSpecialCharacters(ref fullSeq);
MatchCollection matches = regex.Matches(fullSeq);
int currentPosition = 0;
int captureLengthSum = 0;
foreach (Match match in matches)
{
GroupCollection group = match.Groups;
string val = group[1].Value;
int startIndex = group[0].Index;
int captureLength = group[0].Length;
int position = group["(.+?)"].Index;

List<string> modList = new List<string>();
modList.Add(val);

// The position of the amino acids is tracked by the positionToAddToDict variable. It takes the
// startIndex of the modification Match and removes the cumulative length of the modifications
// found (including the brackets). The difference will be the number of nonmodification characters,
// or the number of amino acids prior to the startIndex in the sequence.
int positionToAddToDict = startIndex - captureLengthSum;

// check to see if key already exist
// if there is a missed cleavage, then there will be a label on K and a Label on X modification.
// And, it'll be like [label]|[label] which complicates the positional stuff a little bit.
// if the already key exists, update the current position with the capture length + 1.
// otherwise, add the modification to the dict.

// int to add is startIndex - current position
int positionToAddToDict = startIndex - currentPosition;
if (modDict.ContainsKey(positionToAddToDict))
{
modDict[positionToAddToDict].Add(val);
Expand All @@ -136,7 +141,7 @@ public static Dictionary<int, List<string>> ParseModifications(string fullSeq)
{
modDict.Add(positionToAddToDict, modList);
}
currentPosition += startIndex + captureLength;
captureLengthSum += captureLength;
}
return modDict;
}
Expand Down
112 changes: 112 additions & 0 deletions mzLib/Test/FileReadingTests/TestPsmFromTsv.cs
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,21 @@
using Omics.SpectrumMatch;
using Proteomics;
using Readers;
using UsefulProteomicsDatabases;
using MathNet.Numerics.Distributions;
using static UsefulProteomicsDatabases.ProteinDbRetriever;
using System.Windows.Input;
using Readers.QuantificationResults;
using Easy.Common.Extensions;
using MzIdentML;
using Proteomics.ProteolyticDigestion;
using Omics.Modifications;
using Proteomics.AminoAcidPolymer;
using FlashLFQ;
using System.Printing.IndexedProperties;
using pepXML.Generated;
using System.Diagnostics.Eventing.Reader;
using Omics;

namespace Test.FileReadingTests
{
Expand Down Expand Up @@ -76,7 +91,104 @@ public static void ReadOGlycoPsmsLocalizedGlycans()
}

Assert.AreEqual(1, localGlycans.Count);
}
[Test]
public static void Peter()
{
// load database file and quantified peak file.
string dbFilePath1 = @"C:\Users\student\Downloads\MetaMorpheusVignette\MetaMorpheusVignette\uniprot-mouse-reviewed-1-24-2018.xml.gz";
string dbFilePath2 = @"C:\Users\student\Downloads\MetaMorpheusVignette\MetaMorpheusVignette\uniprot-cRAP-1-24-2018.xml.gz";
string quantifiedPeakFilePath = @"C:\Users\student\Downloads\MetaMorpheusVignette\MetaMorpheusVignette\2024-07-25-11-27-37\Task1-SearchTask\AllQuantifiedPeaks.tsv";
string quantifiedProteinGroupsFilePath = @"C:\Users\student\Downloads\MetaMorpheusVignette\MetaMorpheusVignette\2024-07-25-11-27-37\Task1-SearchTask\AllQuantifiedPeaks.tsv";


List<Protein> proteinDb = ProteinDbLoader.LoadProteinXML(dbFilePath1, true, DecoyType.None, null, false, null, out var a);
proteinDb.AddRange(ProteinDbLoader.LoadProteinXML(dbFilePath2, true, DecoyType.None, null, false, null, out var b));
var proteinDbDict = proteinDb.ToEasyDictionary(n => n.Accession);

var peaksFile = new QuantifiedPeakFile(quantifiedPeakFilePath);
peaksFile.LoadResults();

var ProteinGroupsFile = new SpectrumMatchFromTsvFile(quantifiedProteinGroupsFilePath);
ProteinGroupsFile.LoadResults();

var occupancyDict = new Dictionary<String, Dictionary<int, Dictionary<string, double>>>();
var proteinSeqRangesSeen = new Dictionary<String, List<Tuple<int, int, double>>>();
var proteinsNotInDb = new List<string>();

// go through the quantified peaks
foreach (var peak in peaksFile)
{
// if ambiguous peptides for a peak, go through each peptide
var peptidesFull = peak.FullSequence.Split('|');
List<string> peptidesBase = new List<string>();
foreach (string peptide in peptidesFull)
peptidesBase.Add(IBioPolymerWithSetMods.GetBaseSequenceFromFullSequence(peptide));

for (int i = 0; i < peptidesFull.Length; i++)
{
var peptideMods = Omics.SpectrumMatch.SpectrumMatchFromTsv.ParseModifications(peptidesFull[i]);

// go through each protein of the protein group
foreach (string protein in peak.ProteinGroup.SplitAndTrim(new char[] { '|', ';' }))
{
int offset = 0;
// try setting offset to the peptide start residue index, and catch when the protein is not in the database
try
{
offset = proteinDbDict[protein].BaseSequence.IndexOf(peptidesBase[i]);

// if the peptide is not a substring of the protein sequence, move on to the next protein
if (offset < 0)
break;
}
catch
{
if (!proteinsNotInDb.Contains(protein))
proteinsNotInDb.Add(protein);
}

// add the protein to the occupancy dictionary if it has not been seen, yet
if (!occupancyDict.ContainsKey(protein))
{
occupancyDict.Add(protein, new Dictionary<int, Dictionary<string, double>>());
proteinSeqRangesSeen.Add(protein, new List<Tuple<int, int, double>>());
}

proteinSeqRangesSeen[protein].Add(new Tuple<int, int, double>(offset, offset + peptidesBase[i].Length, peak.PeakIntensity));

// get the localized modifications from the peptide full sequence and add any amino acid/modification combination not
// seen yet to the occupancy dictionary
foreach (KeyValuePair<int, List<string>> aaWithModList in peptideMods)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In situations like this, you can use "var aaWithModList" instead of specifying the actual class

{
int aa = aaWithModList.Key + offset;
if (!occupancyDict[protein].ContainsKey(aaWithModList.Key))
occupancyDict[protein].Add(aaWithModList.Key, new Dictionary<string, double> { { "Total", 0 } });

foreach (string mod in aaWithModList.Value)
{
if (!occupancyDict[protein][aaWithModList.Key].ContainsKey(mod))
occupancyDict[protein][aaWithModList.Key][mod] = 0;

occupancyDict[protein][aaWithModList.Key][mod] += peak.PeakIntensity;
}
}
}
}
}

// Add the total intensity for each aa of each protein with a modification
foreach (string protein in occupancyDict.Keys)
{
foreach (int aa in occupancyDict[protein].Keys)
{
foreach ((int start, int end, double intensity) in proteinSeqRangesSeen[protein])
{
if ((start <= aa) && (aa <= end))
occupancyDict[protein][aa]["Total"] += intensity;
}
}
}
}

[Test]
Expand Down
Loading