Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PTM Stoichiometry #797

Draft
wants to merge 61 commits into
base: master
Choose a base branch
from
Draft

Conversation

pcruzparri
Copy link

@pcruzparri pcruzparri commented Aug 27, 2024

Creating a mzLib method to calculate the stoichiometry (or site-occupancy) of PTMs using the intensity of each quantified peak. The current inputs are the protein database(s) file(.xml) paths and the AllQuantifiedPeaks.tsv file path. The output, occupancyDict, is currently a dictionary of nested dictionaries with the following structure:

{{string PROTEIN1, {{int MAA1, {{string MODNAME1, double INTENSITY}, 
                                {string MODNAME2, double INTENSITY},
                                ...,
                                {string "Total", double INTENSITY}}} 
                    {int MAA2, {...}}, 
                   ...}},
 {string PROTEIN2, {...}},
 ...}

where PROTEINX is the protein accession, MAAX is the modified amino acid at protein position X, and MODNAME1 is the full label of the modification. For each MAAX, there is a "Total" key (instead of a modification name) that holds the total intensity of that amino acid measured in the quantified peaks file, including modified and unmodified peptides with that specific residue.

The general approach is to first get all of the modification intensities and record those in occupancyDict while storing in proteinSeqRangesSeen a dictionary with protein accession keys and values stored as a list of (STARTINDEX, ENDINDEX, INTENSITY) tuples. This helps keep track of the index ranges seen for each protein. Once we have parsed all of the mods, for every amino acid falling into any of those ranges, we increase its "Total" intensity by that amount.

From our discussion, I've added below some of the items I'd like to get some opinions about. Imade them a task list primarily for me to keep track of what I've figured out.

  • Where should this code live in mzLib. The most reasonable suggestions so far are in FlashLFQResults and Readers/QuantificationResults.
  • To interface this nicely with MetaMorpheus, what should the inputs be? My goal now is to look into how/where this will be integrated into MM, but any suggestions on places to look to figure this out are appreciated.
  • I have some ideas on making the code more efficient/succinct, especially foreseeing a lot more information about the peaks being readily available in MM (like the exact protein index for a peptide/peak). Any new ideas are welcomed.

Thanks in advance!

…ve amino acid positions depending on the length for the modification string and its index. Current approach fixes that.

// get the localized modifications from the peptide full sequence and add any amino acid/modification combination not
// seen yet to the occupancy dictionary
foreach (KeyValuePair<int, List<string>> aaWithModList in peptideMods)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In situations like this, you can use "var aaWithModList" instead of specifying the actual class

Copy link
Member

@nbollis nbollis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think readers/Quant... is the best place for it. That way it can be used to find occupancy of the results from another software should that be desired.

In order to optimize your inputs and outputs of the function, you should break your test method into two. One test method with reads in all the data you need. Another method (not a test method) that gets called to calculate the occupancy. This will help you to better understand what is needed for the method, and for use to help make recommendations

…LibUtil method for calculating a generalized occupancy. The flashLFQ caluculation will call that and use intensity values for quantification.
…in MzLibUtil.PositionFrequencyAnalysis. ParseModifications and RemoveSpecialCharacters methods from Omics were moved to MzLibUtil. FlashLFQResults now implements a CalculatePTMOccupancy method that populates its ModInfo property. FlashLFQEngine calls the FlashLFQResults Method after the peptide and protein quantification. Still need to finish testing the FlashLFQResults and FlashLFQEngine outputs.
…arseModificatons in the Omics folder to be consistent with previous testing.
@pcruzparri
Copy link
Author

Requesting a second round of reviews! The second to last commit contains a little more in detail most changes. Currently pending work is to create a small enough subset of the raw data to create a test similar to the TestFlashLFQoutputRealData() test. More rigorous testing can be done with some of the identifications in the vignette data, since some base sequences have enough variations in fullSequence mods and positions to have better case coverage.

I'd be happy to hear about 1) code optimization, 2) currently written tests, and 3) clarifications on code commenting. In a conversation, Nic suggested using objects for my main ptm calculation code rather than the 5-level deep dictionary, thoughts on that would be useful as well. Ofc, anything else is useful. TIA!

…ementations of the occupancy code due to issues with the PercolatorStyleIds(issue: peptide object did not have a ase sequence) and MatchBetweenRuns(issue: peptide marked for quantification not stored with an Peptide object) tests. Noticed some of the averaging tests were failing (issue: cleanup problem to to new directory names in TestOutputToCustomDirectoryAndNameMzML()), so I patched that, too.
Copy link

codecov bot commented Oct 11, 2024

Codecov Report

Attention: Patch coverage is 92.05500% with 104 lines in your changes missing coverage. Please review.

Project coverage is 76.51%. Comparing base (983c3b0) to head (b146768).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...zLib/Transcriptomics/Digestion/OligoWithSetMods.cs 87.78% 20 Missing and 7 partials ⚠️
mzLib/UsefulProteomicsDatabases/ProteinDbWriter.cs 88.23% 9 Missing and 3 partials ⚠️
mzLib/Transcriptomics/NucleicAcid.cs 92.08% 6 Missing and 5 partials ⚠️
mzLib/MzLibUtil/PositionFrequencyAnalysis.cs 89.79% 7 Missing and 3 partials ⚠️
...ProteomicsDatabases/Transcriptomics/RnaDbLoader.cs 93.71% 3 Missing and 7 partials ⚠️
mzLib/UsefulProteomicsDatabases/ProteinXmlEntry.cs 79.54% 6 Missing and 3 partials ⚠️
mzLib/MzLibUtil/ClassExtensions.cs 79.48% 6 Missing and 2 partials ⚠️
mzLib/Transcriptomics/ClassExtensions.cs 93.84% 1 Missing and 3 partials ⚠️
...zLib/Transcriptomics/Digestion/NucleolyticOligo.cs 96.39% 1 Missing and 3 partials ⚠️
mzLib/FlashLFQ/Peptide.cs 57.14% 2 Missing and 1 partial ⚠️
... and 3 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #797      +/-   ##
==========================================
+ Coverage   75.52%   76.51%   +0.99%     
==========================================
  Files         202      212      +10     
  Lines       30945    32091    +1146     
  Branches     3129     3304     +175     
==========================================
+ Hits        23371    24556    +1185     
+ Misses       7040     6969      -71     
- Partials      534      566      +32     
Files with missing lines Coverage Δ
mzLib/Chemistry/ClassExtensions.cs 100.00% <100.00%> (ø)
mzLib/FlashLFQ/FlashLFQResults.cs 92.02% <100.00%> (+0.17%) ⬆️
mzLib/FlashLFQ/FlashLfqEngine.cs 87.62% <100.00%> (+0.01%) ⬆️
mzLib/MzLibUtil/MzLibException.cs 100.00% <100.00%> (ø)
.../Fragmentation/Oligo/DissociationTypeCollection.cs 100.00% <100.00%> (+100.00%) ⬆️
...ragmentation/Oligo/TerminusSpecificProductTypes.cs 100.00% <100.00%> (ø)
mzLib/Omics/IBioPolymerWithSetMods.cs 95.23% <ø> (ø)
mzLib/Omics/SpectrumMatch/SpectrumMatchFromTsv.cs 97.05% <100.00%> (-0.29%) ⬇️
...ib/Transcriptomics/Digestion/RnaDigestionParams.cs 100.00% <100.00%> (ø)
mzLib/Transcriptomics/Digestion/Rnase.cs 100.00% <100.00%> (ø)
... and 15 more

... and 4 files with indirect coverage changes

pcruzparri and others added 14 commits October 14, 2024 10:27
…imports of TestPsmFromTsv. Added modInfo test for FlashLFQResults.
* Added in base classes

* Implemented all tests

* Made initial tests pass

* Removed unnecessary namespaces

* Expanded test coverage

* Responded to Alex Comments

* Add RNA support: loading, parsing, and decoy generation

Introduced support for handling RNA data within the UsefulProteomicsDatabases project. Key changes include:

- Added `Transcriptomics\TestData` folder to `Test.csproj`.
- Changed access modifiers in `ProteinDbLoader.cs` to internal.
- Added `using` directives for `Transcriptomics` in `ProteinXmlEntry.cs`.
- Introduced methods `ParseRnaEndElement` and `ParseRnaEntryEndElement` in `ProteinXmlEntry.cs`.
- Modified `ParseAnnotatedMods` to check for RNA modifications.
- Added project reference to `Transcriptomics.csproj` in `UsefulProteomicsDatabases.csproj`.
- Added `ClassExtensions.cs` with `CreateNew` method for nucleic acids.
- Added `RnaDbLoader.cs` for RNA database loading.
- Added `RnaDecoyGenerator.cs` for generating decoy RNA sequences.

* Add new properties and caching to oligo digestion

Updated `using` directives in `TestDigestion.cs` and `OligoWithSetMods.cs` to include necessary namespaces. Added assertions in `TestDigestion.cs` for `SequenceWithChemicalFormulas` and `FullSequenceWithMassShift`. Changed `namespace` in `OligoWithSetMods.cs` to `Transcriptomics.Digestion`. Implemented and cached `SequenceWithChemicalFormulas` property in `OligoWithSetMods.cs`.

* Add RNA sequence and database handling and related test cases

- Added new files `ModomicsUnmodifiedTrimmed.fasta` and `ModomicsUnmodifiedTrimmed.fasta.gz` to `Test.csproj` with `CopyToOutputDirectory` set to `PreserveNewest`.
- Removed the `Transcriptomics\TestData` folder from `Test.csproj`.
- Introduced `Transcribe` method in `ClassExtensions.cs` for DNA to RNA transcription.
- Added summary comment to `NucleolyticOligo` class in `NucleolyticOligo.cs`.
- Added `ApplyRegex` method in `FastaHeaderFieldRegex.cs`.
- Introduced `ProteinDbWriter` class in `ProteinDbWriter.cs` for writing protein and nucleic acid databases.
- Modified `GetModsForThisProtein` to `GetModsForThisBioPolymer` in `ProteinDbWriter.cs`.
- Added `RnaDbLoader` class in `RnaDbLoader.cs` for RNA FASTA header detection and sequence loading.
- Updated user dictionary in `mzLib.sln.DotSettings` with new terms.
- Added test cases in `TestDbLoader.cs` for RNA database loading and header detection.
- Introduced `TestDecoyGeneration` class in `TestDecoyGenerator.cs` for RNA decoy generation tests.
- Added RNA sequence file `ModomicsUnmodifiedTrimmed.fasta` and its compressed version.

* Refactor and enhance RNA and oligo handling in tests

- Added `using` directives for `Transcriptomics.Digestion` and `UsefulProteomicsDatabases.Transcriptomics` in `TestDecoyGenerator.cs`.
- Introduced `TestCreateNew` in `TestDecoyGenerator.cs` to verify RNA and oligo creation.
- Added `using` directive for `MzLibUtil` in `TestDigestion.cs`.
- Added a test in `TestDigestion.cs` for exception handling with invalid sequences.
- Added `using` directives for `Omics` and related namespaces in `TestFragmentation.cs`.
- Modified `TestFragmentation_Modified` in `TestFragmentation.cs` to use `OligoWithSetMods` directly and added assertions.
- Updated `ClassExtensions.cs` to allow setting `isDecoy` in new `RNA` objects.
- Refactored `OligoWithSetMods.cs` to return a dictionary from `GetModsAfterDeserialization`.
- Updated `OligoWithSetMods.cs` to initialize `_allModsOneIsNterminus` using the returned dictionary.

* Broke out TerminusSpecificProductTypes class and removed unnecessary namespaces

* Update ProteinXmlEntry.cs

* Added gene name to RNA constructore

* Added gene name to RNA constructore

* Refactor and enhance exception handling and tests

Refactored constructors, improved exception handling, and added comprehensive tests across multiple files. Key changes include:

- `MzLibException.cs`: Updated constructor to include `innerException`.
- `TestDecoyGenerator.cs`: Added assertions for `CreateNew` method.
- `TestDigestion.cs`: Added assertions and new test for RNA digestion exception.
- Refactored modification lists and added various tests for modifications.
- `TestNucleicAcid.cs`: Refactored methods, adjusted precision, and updated terminus assignments.
- `NucleolyticOligo.cs`: Changed parameter types, updated comments, and improved variable names.
- `OligoWithSetMods.cs`: Enhanced exception messages and updated modification location checks.
- `NucleicAcid.cs`: Added `using` directive, changed exception type, and refactored methods.
- `mzLib.sln.DotSettings`: Updated user dictionary entries.

* Add test data files and methods for RNA sequence handling

Added new test data files (`20mer1.fasta`, `20mer1.fasta.gz`, `20mer1.xml`, `20mer1.xml.gz`) to the `Transcriptomics\TestData` directory in the `Test.csproj` file, ensuring they are copied to the output directory. Introduced `TestDbReadingDifferentExtensions` in `TestDbLoader.cs` to verify RNA database reading from various formats. Added `TestDigestionMaxIsoforms` in `TestDigestion.cs` to test RNA sequence digestion with max isoforms. Updated `WriteNucleicAcidXmlDatabase` in `ProteinDbWriter.cs` with remarks for future implementation. Added a TODO in `RnaDecoyGenerator.cs` regarding palindromic sequences' impact on fragment ions. Included new RNA sequence data in test files for validation.

* Added test coverage to the localize method within BioPolymerWithSetMods

---------

Co-authored-by: Nic Bollis <nbollis@wisc.edu>
… dictionaries using some data objects instead for code readability. Updated all of the previous tests (MzLibUtil and FlashLFQ) to accomodate for the refactoring.
…ve amino acid positions depending on the length for the modification string and its index. Current approach fixes that.
…LibUtil method for calculating a generalized occupancy. The flashLFQ caluculation will call that and use intensity values for quantification.
…in MzLibUtil.PositionFrequencyAnalysis. ParseModifications and RemoveSpecialCharacters methods from Omics were moved to MzLibUtil. FlashLFQResults now implements a CalculatePTMOccupancy method that populates its ModInfo property. FlashLFQEngine calls the FlashLFQResults Method after the peptide and protein quantification. Still need to finish testing the FlashLFQResults and FlashLFQEngine outputs.
…arseModificatons in the Omics folder to be consistent with previous testing.
…ementations of the occupancy code due to issues with the PercolatorStyleIds(issue: peptide object did not have a ase sequence) and MatchBetweenRuns(issue: peptide marked for quantification not stored with an Peptide object) tests. Noticed some of the averaging tests were failing (issue: cleanup problem to to new directory names in TestOutputToCustomDirectoryAndNameMzML()), so I patched that, too.
…imports of TestPsmFromTsv. Added modInfo test for FlashLFQResults.
… dictionaries using some data objects instead for code readability. Updated all of the previous tests (MzLibUtil and FlashLFQ) to accomodate for the refactoring.
* began neutral mz spectrum

* Refactor visibility and clean up deconvolution code

Changed `ClassicDeconvolutionAlgorithm`, `DeconvolutionAlgorithm`, and `ExampleNewDeconvolutionAlgorithmTemplate` classes and their members from `public` to `internal` to restrict visibility within the assembly. Added summary comment to `DeconvolutionAlgorithm` class. Refactored `Deconvoluter` class to remove unnecessary `using` directives and simplify the `Deconvolute` method by removing switch-case logic. Updated `IsotopicEnvelope` class by removing `MassIndex` and `StDev` properties, and modified constructor and `ScoreIsotopeEnvelope` method accordingly. Updated `MzSpectrum` class to use `StandardDeviation` extension method from `Easy.Common.Extensions`. Removed various unnecessary `using` directives from multiple files.

* Finish NeutralMassSpectrum

- Added `InternalsVisibleTo` entries for "Development" and "Test" in `MassSpectrometry.csproj`.
- Changed `MostAbundantObservedIsotopicMass` to `internal` in `IsotopicEnvelope.cs`.
- Added a new constructor to `IsotopicEnvelope` with monoisotopic mass, intensity, and charge.
- Added XML documentation and changed `GeneratePeak` to `protected virtual` in `MzSpectrum.cs`.
- Removed unused `using` directives in `MzSpectrum.cs` and `NeutralMzSpectrum.cs`.
- Modified `NeutralMzSpectrum` constructor to validate array lengths.
- Added `Charges` property to `NeutralMzSpectrum` and initialized it in the constructor.
- Overrode `GeneratePeak` in `NeutralMzSpectrum` to convert to a charged spectrum using `Charges`.

* Refactor Deconvoluter and rename NeutralMzSpectrum

Added necessary using directives in Deconvoluter.cs.
Modified Deconvoluter class for short-circuit deconvolution.
Removed redundant lines in Deconvoluter.cs.
Renamed NeutralMzSpectrum to NeutralMassSpectrum.
Updated constructor and references accordingly.

* added neutral mass file bool

* Adjsuted and tested neutral mass spectra

* Refactor Deconvoluter and add new tests

Refactored Deconvoluter.cs to use a foreach loop for yielding IsotopicEnvelopes. Reformatted multiple test methods in TestDeconvolution.cs for better readability. Added new test methods to validate Deconvolute with NeutralMassSpectrum, ensuring correct processing of spectra with various charge states and ranges.

* Make FirstX and LastX properties virtual; update tests

- Changed FirstX and LastX properties in MzSpectrum to virtual.
- Included MzLibUtil namespace in NeutralMassSpectrum class.
- Updated NeutralMassSpectrum constructor to set FirstX and LastX.
- Overrode FirstX and LastX in NeutralMassSpectrum class.
- Added test NeutralMassSpectrum_MzRange to validate m/z range.

* fixed nuspec

* Update mzLib.nuspec

---------

Co-authored-by: Nic Bollis <nbollis@wisc.edu>
nbollis and others added 30 commits December 16, 2024 13:03
* fixed bugs

* More Equality Tests

* One More Time

* Digestion Agent Equality and additional comments

* updated protease and expanded test coverage

---------

Co-authored-by: Nic Bollis <nbollis@wisc.edu>
…pheus (smith-chem-wisc#818)

* Added second job

* comment out first job temp

* dsd

* dsd

* dsd

* workflow edit

* Workflow edit

* Workflow edit

* Workflow edit

* Workflow edit

* Workflow edit

* Workflow edit

* Workflow edit

* workflow edit

* workflow edit

* workflow edit

* workflow edit

* workflow edit

* workflow edit

* workflow edit

* workflow edit

* workflow edit

* workflow edit

* workflow edit

* workflow edit

* workflow edit

* workflow edit

* workflow edit

* workflow edit

* workflow edit
* Added parent accession check back to PeptideWithSetMods

* Added Test

* oopsies

* Refactor Equals methods and add IEquatable regions

Removed Equals from IBioPolymerWithSetMods interface in Omics.
Updated Equals in PeptideWithSetModifications to check OneBasedStartResidue and Parent?.Accession.
Updated Equals in OligoWithSetMods to check FullSequence and DigestionParams?.DigestionAgent.
Added IEquatable regions to PeptideWithSetModifications and OligoWithSetMods.
Organized Equals methods with #region IEquatable directives.

* workflow edit

* Even more equality tetsts

* adjusted oligo with set mods

* ugh

---------

Co-authored-by: Nic Bollis <nbollis@wisc.edu>
Co-authored-by: Alex <AlexSolivais@gmail.com>
* Started test

* Fixed equality by reference in IBioPolymer Equals

* Undid cherrypick, finished test

* Fixed ProteinDB Writer method to be deterministic

* Added comments

* More specific OrderBys

* Fixed broken summary comment
* Added in foundation for John to use

* removed charge from johnny decon parameters

* instantiated johhnydeconparams.decontype

* IsoDec incorporated!

* Did a little cleanup and made IsoDec run on my device

* Changed isodec to use the embedded dlls and resources

* changed around assembly references and added IsoDec to Deconvolution testing environment

* added test for negative mode

* updated nuspec to pack isodec resources

* Updated dll. Now just making monoisotopic errors but getting generally correct charge states

* IsoDec passes (updated) tests.

* began neutral mz spectrum

* Refactor visibility and clean up deconvolution code

Changed `ClassicDeconvolutionAlgorithm`, `DeconvolutionAlgorithm`, and `ExampleNewDeconvolutionAlgorithmTemplate` classes and their members from `public` to `internal` to restrict visibility within the assembly. Added summary comment to `DeconvolutionAlgorithm` class. Refactored `Deconvoluter` class to remove unnecessary `using` directives and simplify the `Deconvolute` method by removing switch-case logic. Updated `IsotopicEnvelope` class by removing `MassIndex` and `StDev` properties, and modified constructor and `ScoreIsotopeEnvelope` method accordingly. Updated `MzSpectrum` class to use `StandardDeviation` extension method from `Easy.Common.Extensions`. Removed various unnecessary `using` directives from multiple files.

* Finish NeutralMassSpectrum

- Added `InternalsVisibleTo` entries for "Development" and "Test" in `MassSpectrometry.csproj`.
- Changed `MostAbundantObservedIsotopicMass` to `internal` in `IsotopicEnvelope.cs`.
- Added a new constructor to `IsotopicEnvelope` with monoisotopic mass, intensity, and charge.
- Added XML documentation and changed `GeneratePeak` to `protected virtual` in `MzSpectrum.cs`.
- Removed unused `using` directives in `MzSpectrum.cs` and `NeutralMzSpectrum.cs`.
- Modified `NeutralMzSpectrum` constructor to validate array lengths.
- Added `Charges` property to `NeutralMzSpectrum` and initialized it in the constructor.
- Overrode `GeneratePeak` in `NeutralMzSpectrum` to convert to a charged spectrum using `Charges`.

* Refactor Deconvoluter and rename NeutralMzSpectrum

Added necessary using directives in Deconvoluter.cs.
Modified Deconvoluter class for short-circuit deconvolution.
Removed redundant lines in Deconvoluter.cs.
Renamed NeutralMzSpectrum to NeutralMassSpectrum.
Updated constructor and references accordingly.

* added neutral mass file bool

* Adjsuted and tested neutral mass spectra

* Refactor Deconvoluter and add new tests

Refactored Deconvoluter.cs to use a foreach loop for yielding IsotopicEnvelopes. Reformatted multiple test methods in TestDeconvolution.cs for better readability. Added new test methods to validate Deconvolute with NeutralMassSpectrum, ensuring correct processing of spectra with various charge states and ranges.

* Make FirstX and LastX properties virtual; update tests

- Changed FirstX and LastX properties in MzSpectrum to virtual.
- Included MzLibUtil namespace in NeutralMassSpectrum class.
- Updated NeutralMassSpectrum constructor to set FirstX and LastX.
- Overrode FirstX and LastX in NeutralMassSpectrum class.
- Added test NeutralMassSpectrum_MzRange to validate m/z range.

* fixed nuspec

* IsoDecDeconvolutionParameters and Multiple Monoisos

* Refactor IsoDec classes and enhance parameters

Updated IsoDecAlgorithm to use generic DeconvolutionParameters.
Enhanced IsoDecDeconvolutionParameters with new properties.
Refactored constructor to use camelCase parameter names.
Removed unused using directives from IsoDecAlgorithm.cs.
Ensured correct casting in IsoDecAlgorithm.
Renamed Css_Threshold to CssThreshold for consistency.

* Bug Fixes and parameter cleanup

* Fixed broken unit test and assertion structure in test deconvolution

* Cleaned up isotopic Envelope

* Refactor IsoDec classes and update parameters

Updated the `MassSpectrometry` namespace in `IsoDecAlgorithm.cs` and `IsoDecDeconvolutionParameters.cs`. In `IsoDecAlgorithm.cs`, added a type check for `DeconvolutionParameters` and replaced redundant type casting with `deconParams`. In `IsoDecDeconvolutionParameters.cs`, removed unnecessary `using` directives, moved the class under the `MassSpectrometry` namespace, added user-accessible and hard-coded parameters with comments, updated the constructor to initialize new parameters, and removed the nested class declaration.

* help me

* Changed resources from content to none

* nuspec edit

* Refactor Deconvolute method and update variable handling

- Change `_phaseModelPath` to readonly static string to prevent modification after initial set.
- Remove `process_spectrum` method declaration from the class.
- Add `try-finally` block in `Deconvolute` to ensure `matchedPeaksPtr` memory is freed even if an exception occurs.
- Move allocation of `matchedPeaksPtr` inside `try` block to allocate only if needed.
- Invert check for `process_spectrum` result to return empty enumerable if result is <= 0.
- Reformat loop processing matched peaks for better readability.
- Ensure `Marshal.FreeHGlobal` is called in `finally` block to free `matchedPeaksPtr` if not zero.

* Fixed memory allocation/deallocation issues

* simple restructure of parameter handling

* Update namespaces, references, and version number

Removed unused using directives from IsoDecAlgorithm.cs and Deconvoluter.cs.
Updated IsoSettings namespace in IsoDecAlgorithm.cs.
Simplified NUnit assertions in TestDeconvolution.cs.
Updated MassSpectrometry.csproj with HintPath and PackagePath for DLLs.
Replaced phase_model.bin with isogenmass.dll.
Incremented version number in mzLib.nuspec to 5.2.35.
Added isogenmass.dll to mzLib.nuspec for net8.0 and net8.0-windows7.0 targets.

* idk man

* look mom, I did it

* Adjusted in response to merging in master

* revised from PR and added tests for GetPeakIndicesWithinTolerance.

* Removed unnecessary changes

* Added comments to isodec algorithm

---------

Co-authored-by: trishorts <mshort@chem.wisc.edu>
Co-authored-by: jgpavek <jpavek@arizona.edu>
Co-authored-by: Nic Bollis <nbollis@wisc.edu>
* Added bassic object pools

* Refactor DigestionAgent to use HashSetPool for indices
* Added bassic object pools

* Refactor DigestionAgent to use HashSetPool for indices

Added `using` directive for `MzLibUtil`. Introduced a static readonly `HashSetPool<int>` named `HashSetPool` to manage a pool of hash sets. Updated `DigestionAgent` constructor to initialize `HashSetPool`. Refactored `GetDigestionSiteIndices` to use a hash set from `HashSetPool` for storing indices, ensuring no duplicates. Explicitly added start and end of protein sequence as cleavage sites. Implemented `try-finally` block to return hash set to pool after use. Final list of indices is now sorted before returning.

* string interpolation in BPWSM extensions

* Adjusted IEnumerable return in Protease.GetUnmodified

* Digestion Optimizations

* Moved testing class to proper subdirectory

* Adjusted ModFits to have the correct localization for peptide and protein termini

* Cleaned up hashset return

* Digestion Agent Hashset Return Cleanup

* set fixed mods now modifies in place using a pooled dictionary

* Added comments to digeston

* Refactor code for readability and efficiency

- Simplified initial check for `possibleVariableModifications.Count` and replaced `yield return null` with `yield break`.
- Adjusted indentation and loop structure for clarity.
- Refactored nested loop to remove unnecessary braces and streamline logic.
- Simplified construction of `modificationPattern` dictionary by removing redundant checks and directly using `modIndex`.

* Added many comments

* set fixed mods namechange

* Eliminated IsN or IS5' in favor of unified method

* Extracted all variable modification combination generation to parent class

* removed fixed mods changes

* removed unnecessary namespace

* Extracted AppendFixedToVariabel

* Update mzLib.nuspec

* Added shortreed comment
…em-wisc#825)

* Added bassic object pools

* Refactor DigestionAgent to use HashSetPool for indices

Added `using` directive for `MzLibUtil`. Introduced a static readonly `HashSetPool<int>` named `HashSetPool` to manage a pool of hash sets. Updated `DigestionAgent` constructor to initialize `HashSetPool`. Refactored `GetDigestionSiteIndices` to use a hash set from `HashSetPool` for storing indices, ensuring no duplicates. Explicitly added start and end of protein sequence as cleavage sites. Implemented `try-finally` block to return hash set to pool after use. Final list of indices is now sorted before returning.

* string interpolation in BPWSM extensions

* Adjusted IEnumerable return in Protease.GetUnmodified

* Digestion Optimizations

* Moved testing class to proper subdirectory

* Adjusted ModFits to have the correct localization for peptide and protein termini

* Cleaned up hashset return

* Digestion Agent Hashset Return Cleanup

* set fixed mods now modifies in place using a pooled dictionary

* Added comments to digeston

* Refactor code for readability and efficiency

- Simplified initial check for `possibleVariableModifications.Count` and replaced `yield return null` with `yield break`.
- Adjusted indentation and loop structure for clarity.
- Refactored nested loop to remove unnecessary braces and streamline logic.
- Simplified construction of `modificationPattern` dictionary by removing redundant checks and directly using `modIndex`.

* Added many comments

* set fixed mods namechange

* Eliminated IsN or IS5' in favor of unified method

* Extracted all variable modification combination generation to parent class

* removed fixed mods changes

* Fixed mod terminal adjustment

* removed unnecessary namespace

* Replaced List with Sorted Sets in Variable Mod Dictionary Pool

* Extracted AppendFixedToVariabel

* Refactor modification comparison handling

Encapsulated `ModificationComparer` within `DigestionProduct` by changing its access modifier to private. Simplified `PopulateVariableModifications` method by removing the `IComparer<Modification>` parameter and using a static `ModComparer` instance. Updated `ProteolyticPeptide` and `NucleolyticOligo` classes to reflect the new method signature, removing the unnecessary `modificationComparer` variable and its usage.

* supposedToBeDifferent

* dunno

* correct protein accession now

* j

* Adjusted shortreed Test

* Refactor modification comparison logic

Removed `ModificationComparer` and updated `Modification` and `ModificationMotif` classes to implement `IComparable` for custom comparison logic. Added new test cases to validate changes. Cleaned up unused code and adjusted methods to ensure proper functionality.

* Removed IComparable from Modificaiton Motiff

* Renamed xml databases to be more verbose

---------

Co-authored-by: trishorts <mshort@chem.wisc.edu>
…sc#827)

* Changed x64;AnyCPU to only AnyCPU in all project files

* One more change

* Nuget - 3.
Me - 0

* Started initial structure

* Revert "Started initial structure"

This reverts commit f06cbc0.

* Reverted mod localization change.

* Enhance handling of terminal modifications in digestion

Improved logic in `DigestionProduct.cs` to ensure correct application of N-terminal and C-terminal modifications to biopolymers, preventing overwriting unless the new modification is more specific.

Updated assertions and modification order in test cases to reflect changes, enhancing accuracy and robustness of the digestion process.

---------

Co-authored-by: Nic Bollis <nbollis@wisc.edu>
Co-authored-by: Nic Bollis <nbollis@comcast.net>
* Undo Rounding

* reverted remaining test

* FixedFlashLFQ test
…chem-wisc#829)

* edited workflow

* edited workflow

* edited workflow

* Added second artifact

* Fix

---------

Co-authored-by: trishorts <mshort@chem.wisc.edu>
Co-authored-by: trishorts <mshort@chem.wisc.edu>
…ve amino acid positions depending on the length for the modification string and its index. Current approach fixes that.
…LibUtil method for calculating a generalized occupancy. The flashLFQ caluculation will call that and use intensity values for quantification.
…in MzLibUtil.PositionFrequencyAnalysis. ParseModifications and RemoveSpecialCharacters methods from Omics were moved to MzLibUtil. FlashLFQResults now implements a CalculatePTMOccupancy method that populates its ModInfo property. FlashLFQEngine calls the FlashLFQResults Method after the peptide and protein quantification. Still need to finish testing the FlashLFQResults and FlashLFQEngine outputs.
…arseModificatons in the Omics folder to be consistent with previous testing.
…ementations of the occupancy code due to issues with the PercolatorStyleIds(issue: peptide object did not have a ase sequence) and MatchBetweenRuns(issue: peptide marked for quantification not stored with an Peptide object) tests. Noticed some of the averaging tests were failing (issue: cleanup problem to to new directory names in TestOutputToCustomDirectoryAndNameMzML()), so I patched that, too.
…imports of TestPsmFromTsv. Added modInfo test for FlashLFQResults.
… dictionaries using some data objects instead for code readability. Updated all of the previous tests (MzLibUtil and FlashLFQ) to accomodate for the refactoring.
…in MzLibUtil.PositionFrequencyAnalysis. ParseModifications and RemoveSpecialCharacters methods from Omics were moved to MzLibUtil. FlashLFQResults now implements a CalculatePTMOccupancy method that populates its ModInfo property. FlashLFQEngine calls the FlashLFQResults Method after the peptide and protein quantification. Still need to finish testing the FlashLFQResults and FlashLFQEngine outputs.
…ementations of the occupancy code due to issues with the PercolatorStyleIds(issue: peptide object did not have a ase sequence) and MatchBetweenRuns(issue: peptide marked for quantification not stored with an Peptide object) tests. Noticed some of the averaging tests were failing (issue: cleanup problem to to new directory names in TestOutputToCustomDirectoryAndNameMzML()), so I patched that, too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants