From c78c74be71984b2355a67786d464f1058a5190c2 Mon Sep 17 00:00:00 2001
From: emosca-cnr Introductionhttps://github.com/emosca-cnr/margheRita
Citation: Ettore Mosca, Marynka Ulaszewska, Zahrasadat Alavikakhki, +Edoardo Niccolò Bellini, Valeria Mannella, Gianfranco Frigerio, Denise +Drago, Annapaola Andolfo. MargheRita: an R package for LC-MS/MS SWATH +metabolomics data analysis and confident metabolite identification based +on a spectral library of reference standards. bioRxiv 2024.06.20.599545; +doi: https://doi.org/10.1101/2024.06.20.599545
Contacts:
The package requires a series of other R packages, which are availble -in CRAN, Bioconductor or github, namely:
+The package requires a series of other R packages, which are +available in CRAN, Bioconductor or github, namely:
## graphics, grDevices, stats, utils, clusterProfiler, pcaMethods, ComplexHeatmap, LSD, plotrix, pals, Hmisc, notame, Biobase, openxlsx, devtools
In most of the cases, the following instructions guarantee that all such dependencies are installed:
@@ -251,6 +258,14 @@The full version of the “Urine” dataset, which was used for +margheRita assessment and to generate this documentation, is available +at https://doi.org/10.5281/zenodo.11243781, files +“Urine_RP_NEG_norm.txt” and “Urine_RP_POS_norm.txt”. The corresponding +sample information files can be accessed as follows:
+
+sample_file_NEG <- system.file("extdata", "Urine_RP_NEG_norm_metadata.txt", package = "margheRita")
+sample_file_POS <- system.file("extdata", "Urine_RP_POS_norm_metadata.txt", package = "margheRita")
+ms <- as.metaboset(mRList)
## MetaboSet object with 303 features and 253 samples. ## 10 QC samples included @@ -271,7 +286,7 @@
Inter-operability## The object has the following parts (splits): ## FALSE: features
or as “PomaSummarizedExperiment” object, used by package “POMA” (Castellano-Escuder et al. 2021):
-diff --git a/docs/reference/calc_ppm_err.html b/docs/reference/calc_ppm_err.html index aced8cc..7772897 100644 --- a/docs/reference/calc_ppm_err.html +++ b/docs/reference/calc_ppm_err.html @@ -17,7 +17,7 @@+se <- as.PomaSummarizedExperiment(mRList)
## class: SummarizedExperiment ## dim: 303 253 @@ -293,12 +308,12 @@
Filtering, imputation and normal
The function
-filtering()
runs filters to exclude features/sample with many missing values, features with wrong m/z values and, lastly, performs imputation of missing values:+<- filtering(mL) - mL # Samples with >= 100 metabolites 243 / 243 -# Features occurring in >= 3 samples 604 / 604 -# Features with appropriate m/z values: 548 -# Features without appropriate m/z values: 56 -: imputation not performed. No NAs
<- filtering(mL) + mL # Samples with >= 100 metabolites 243 / 243 +# Features occurring in >= 3 samples 604 / 604 +# Features with appropriate m/z values: 548 +# Features without appropriate m/z values: 56 +: imputation not performed. No NAs
These three steps can be called independently through the function
filter_NA()
,m_z_filtering()
andimputation()
, respectively. In particular, @@ -309,7 +324,7 @@Filtering, imputation and normal
The function
-heatscatter_chromatography()
creates a graphic overview of the mz and rt values in the dataset:diff --git a/docs/reference/calc_RI.html b/docs/reference/calc_RI.html index d2b0455..20c0a4e 100644 --- a/docs/reference/calc_RI.html +++ b/docs/reference/calc_RI.html @@ -17,7 +17,7 @@+
margheRita provides three ways for normalizing metabolite @@ -324,22 +339,22 @@
Filtering, imputation and normal function
calc_reference()
sets up such column using average metabolite values and medians of QC samples. For example, here’s a call tonormalize_profiles()
using pqn: -+<- normalize_profiles(mL, method = "pqn") - mL_norm - PQN normalizationcalc_reference() function... - No reference profile found, using Using QC...
<- normalize_profiles(mL, method = "pqn") + mL_norm + PQN normalizationcalc_reference() function... + No reference profile found, using Using QC...
The comparison of the coefficient of variation of a metabolite in relation to QC samples provides a means to exclude low quality features. In particular, only features that have a CV ratio between no-QC samples and QC sample higher than a given threshold (by default 1) are kept:
-+<- CV_ratio(mRList = mL_norm) - mL_norm ratio (samples / QC): - Summary of CV - Min. 1st Qu. Median Mean 3rd Qu. Max. 0.3593 0.8025 1.1032 1.4645 1.6516 13.4896 - # Metabolites with appropriate CV 303 / 539
<- CV_ratio(mRList = mL_norm) + mL_norm ratio (samples / QC): + Summary of CV + Min. 1st Qu. Median Mean 3rd Qu. Max. 0.3593 0.8025 1.1032 1.4645 1.6516 13.4896 + # Metabolites with appropriate CV 303 / 539
The distributions of metabolite relative log-abundances can be calculated and visualized by means of:
-diff --git a/docs/reference/as.metaboset.html b/docs/reference/as.metaboset.html index 47dfc31..058c330 100644 --- a/docs/reference/as.metaboset.html +++ b/docs/reference/as.metaboset.html @@ -17,7 +17,7 @@+mL <- RLA(mRList = mL)
Typically, after normalization, the various samples should have similar distributions of relative log-abundances.
@@ -353,7 +368,7 @@Principal Component Analysis
+mL_norm <- mR_pca(mRList = mL_norm, nPcs=5, scaling="uv", include_QC=FALSE)
The results are added to the mRList in the element
pca
. It also provides some plots, like the visualization of distribution of @@ -363,7 +378,7 @@Principal Component AnalysisPlot2DPCA() function. The argument
col_by
enables the choice of themRList$sample_ann
column to be used to color samples: -@@ -376,7 +391,7 @@+Plot2DPCA(mRList = mL_norm, pcx=1, pcy=2, col_by="class", include_QC=TRUE)
Removing samplesmRList. Here, for example we remove all “Blank” samples: -
diff --git a/docs/reference/as.PomaSummarizedExperiment.html b/docs/reference/as.PomaSummarizedExperiment.html index 720ec69..629faf3 100644 --- a/docs/reference/as.PomaSummarizedExperiment.html +++ b/docs/reference/as.PomaSummarizedExperiment.html @@ -17,7 +17,7 @@+mL <- remove_samples(mRList = mL, ids = "Blank", column = "class")
In this case, the function removes all samples with value “Blank” in the column “class” of sample annotation.
@@ -387,7 +402,7 @@Collapsing techinical replicatesThe definition of mean metabolite abundance for every biological replicate is performed by means of
collapse_tech_rep()
function: -++mL_norm_bio <- collapse_tech_rep(mRList = mL_norm)
## AA_mealA_t00 AA_mealA_t01 AA_mealA_t02 AA_mealA_t03 AA_mealA_t04 ## F506 372.5212 314.9164 641.2731 328.3019 183.8177 @@ -413,16 +428,16 @@
Statistical analysis
mean_median_stdev_samples(mL_norm_bio) - - According to dataset size, this might take a few minutes. - Calculating means... - Calculating medians... Calculating standard deviations...
mean_median_stdev_samples(mL_norm_bio) + + According to dataset size, this might take a few minutes. + Calculating means... + Calculating medians... Calculating standard deviations...
The function
-univariate()
performs dataset-wide statistical tests (Student t-tests, Wilcoxon test, Anova and Kruskal-Wallis test) between levels of a particular factor defined in the sample annotation:@@ -485,7 +500,7 @@+mL_norm_bio <- univariate(mL_norm_bio, test_method="anova", exp.levels = c("AA", "DD", "MM"), exp.factor = "class")
## F p q DD-AA MM-AA ## F506 56.603887 6.500895e-16 9.379863e-15 0.000000e+00 0.000000e+00 @@ -442,7 +457,7 @@
Statistical analysis
+significant_features <- select_sign_features(mL_norm_bio, test_method="anova", test_value = "q", cutoff_value = 0.05)
## [1] "F3957" "F18426" "F19199" "F10248" "F9507" "F958"
Metabolite identificationIn this example, we load the margheRita library in positive modalitity with retention times of RPShort columns and we discard all peaks with relative intensity less than 10: -
diff --git a/docs/reference/annotate_univariate_results.html b/docs/reference/annotate_univariate_results.html index c7ea659..b644c4e 100644 --- a/docs/reference/annotate_univariate_results.html +++ b/docs/reference/annotate_univariate_results.html @@ -17,7 +17,7 @@+mR_library <- select_library(column = "RPShort", mode = "POS", accept_RI=10)
The resulting
@@ -537,7 +552,7 @@mR_library
is a list that contains information about precursorsMetabolite identificationfeatures specifies the features to be considered (all features if it is left
features=NULL
, as in the following example): -diff --git a/docs/reference/RLA.html b/docs/reference/RLA.html index 2e198e9..b649c11 100644 --- a/docs/reference/RLA.html +++ b/docs/reference/RLA.html @@ -17,7 +17,7 @@+mL_norm_bio <- metabolite_identification(mL_norm_bio, library_list = mR_library)
The function
metabolite_identification()
has a series of parameters that can be adjusted to optimize the identification process @@ -627,13 +642,13 @@Metabolite identification
The spectra from all the features that match a metabolite can be inspected creating the following plot through:
-diff --git a/docs/reference/Plot2DPCA.html b/docs/reference/Plot2DPCA.html index 87950a0..e8b3882 100644 --- a/docs/reference/Plot2DPCA.html +++ b/docs/reference/Plot2DPCA.html @@ -17,7 +17,7 @@+visualize_associated_spectra(mRList = mL_norm_bio, mR_library = mR_library, metabolite_id = "L1660")
The function
-h_map_MSMS_comparison()
draws heatmaps to visually compare ppm errors and RI differences between feature and metabolite spectra:@@ -646,7 +661,7 @@+h_map_MSMS_comparison(mL_norm_bio, metab_id = "L1660", feature_id = "F10165")
Retriving data
feature_stats
should be the name of any statistical test saved in themRList
or a custom data frame with Feature_ID as row names: -@@ -655,14 +670,14 @@+metab_stat <- annotate_univariate_results(mRList = mL_norm_bio, feature_stats = "anova")
The resulting data.frame is saved to file “data_stats_ann.csv”.
Metabolite abundance visualization
The function
-metab_boxplot()
draws boxplots of feature abundances grouped by the levels of a given factor:diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml index 294e61e..32e4165 100644 --- a/docs/pkgdown.yml +++ b/docs/pkgdown.yml @@ -3,5 +3,5 @@ pkgdown: 2.0.7 pkgdown_sha: ~ articles: margheRita: margheRita.html -last_built: 2024-06-19T14:05Z +last_built: 2024-06-28T16:15Z diff --git a/docs/reference/CV_ratio.html b/docs/reference/CV_ratio.html index 1232bf7..83b709e 100644 --- a/docs/reference/CV_ratio.html +++ b/docs/reference/CV_ratio.html @@ -21,7 +21,7 @@+metab_boxplot(mRList = mL_norm_bio, col_by="class", group="class", features = "F3957")
The function
-h_map()
provides heatmaps based on package ComplexHeatmap (Gu, Eils, and Schlesner 2016). Here we shoew the abundance of the most significant metabolites according to anova test:@@ -75,7 +75,7 @@+significant_features <- select_sign_features(mL_norm_bio, test_method="anova", test_value = "q", cutoff_value = 10e-10, feature_id = "Name") h_map(mL_norm_bio, scale_features=TRUE, features = significant_features, show_column_names=F, data.use = "data_ann")
Note that we extracted metabolite “Name” as feature_id and used @@ -682,13 +697,13 @@
Pathway analysis -
diff --git a/docs/index.html b/docs/index.html index 8ea8a41..c0467ee 100644 --- a/docs/index.html +++ b/docs/index.html @@ -33,7 +33,7 @@+significant_features <- select_sign_features(mRList = mL_norm_bio, test_method="anova", test_value = "q", cutoff_value = 10e-10, feature_id = "PubChemCID") all_PubChemCID <- unique(mL_norm_bio$metab_ann$PubChemCID[!is.na(mL_norm_bio$metab_ann$PubChemCID)]) pa_res <- pathway_analysis(in_list = significant_features, type = "ora", universe = all_PubChemCID)
In case of MSEA, a named ranked vector of scores for all PubChemCIDs in the dataset, in decreasing order of importance:
-@@ -73,13 +73,16 @@+diff --git a/docs/authors.html b/docs/authors.html index c0c1751..ea1d802 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -17,7 +17,7 @@ranked_vector <- select_sign_features(mRList = mL_norm_bio, test_method="anova", test_value = "q", cutoff_value = Inf, feature_id = "PubChemCID", values = TRUE) ranked_vector <- sort(-log10(ranked_vector), decreasing = T) msea_res <- pathway_analysis(in_list = ranked_vector, type = "msea")
Citation
Mosca E, Ulaszewska M, Bellini EN, Alavikakhki Z, Frigerio G, Drago D, Mannella V, Andolfo A (2024). “MargheRita: an R package for LC-MS/MS SWATH metabolomics data analysis and confident metabolite identification based on a spectral library of reference standards.” -R package version 0.2.3. +bioRxiv. +doi: 10.1101/2024.06.20.599545.
-@Misc{, +@Article{, title = {MargheRita: an R package for LC-MS/MS SWATH metabolomics data analysis and confident metabolite identification based on a spectral library of reference standards}, author = {Ettore Mosca and Marynka Ulaszewska and Edoardo Niccolò Bellini and Zahrasadat Alavikakhki and Gianfranco Frigerio and Denise Drago and Valeria Mannella and Annapaola Andolfo}, + journal = {bioRxiv}, year = {2024}, - note = {R package version 0.2.3}, + publisher = {Cold Spring Harbor Laboratory}, + doi = {10.1101/2024.06.20.599545}, }Documentation: https://emosca-cnr.github.io/margheRita
Source code: https://github.com/emosca-cnr/margheRita
-Citation: …
+Citation: Ettore Mosca, Marynka Ulaszewska, Zahrasadat Alavikakhki, Edoardo Niccolò Bellini, Valeria Mannella, Gianfranco Frigerio, Denise Drago, Annapaola Andolfo. MargheRita: an R package for LC-MS/MS SWATH metabolomics data analysis and confident metabolite identification based on a spectral library of reference standards. bioRxiv 2024.06.20.599545; doi: https://doi.org/10.1101/2024.06.20.599545
Contacts:
- @@ -87,10 +87,7 @@
Installation
-The package requires a series of other R packages, which are availble in CRAN, Bioconductor and github. In most of the cases, the following instructions guarantee that all such dependencies are installed:
-```{r, eval=FALSE} install.packages(“devtools”) devtools::install_github(c(“pcastellanoescuder/POMA”, “antonvsdata/notame”))
-if (!require(“BiocManager”, quietly = TRUE)){ install.packages(“BiocManager”) } BiocManager::install(c(“clusterProfiler”, “pcaMethods”))
-devtools::install_github(“emosca-cnr/margheRita”, dependencies = T) ```
+See documentation at https://emosca-cnr.github.io/margheRita