Skip to content

Using MetaNetwork

Austin Carr edited this page Aug 23, 2022 · 9 revisions

Overview

One of the most difficult parts of a proteomics experiment is the data interpretation. Due to the sheer number of proteins identified, filtering out the most important proteins underlying biological changes across experimental conditions is required. By applying statistical and fold-change cut-offs, the large list of proteins can be filtered to a manageable size, but at the cost of discarding the majority of proteins.

In contrast, MetaNetwork clusters related proteins into "modules" where each module contains proteins related by their weighted adjacency, a modified form of the correlation coefficient where weighting reduces the number of spurious correlations. By looking at how each module behaves across samples, important clusters of proteins can be identified. The "expression" of a module can be shown using a "module eigenprotein," calculated by taking the first principal component of all the proteins' expression in the module.

After identifying modules that have statistically significant differences in module eigenprotein expression across experimental conditions, it is useful to identify the biological role of the module, accomplished by using MetaNetwork functional annotation enrichment features. Looking at the most enriched functional annotation terms can quickly identify biological, enabling hypothesis generation or a more streamlined approach to data analysis using differential expression.

Glossary

Sample- A biological replicate in any experimental condition.

Module- A group of proteins clustered by weighted adjacency.

Module Eigenprotein (ME)- A value analogous to module expression, calculated by taking the first principal component of protein expression across samples in each module.

WGCNA Workflow

MetaNetwork requires 3 files: data, experimental groups, and a UniProt database with protein accessions and protein names.

1. Data file format.

The data file should be structured so that the first column contains UniProt accessions. The next columns can contain any identifying information, notes, etc. In the vignette data folder, the ProstateCancerDataUpload.csv file contains imputed, log2-transformed, normalized data to be analyzed. The columns in the ProstateCancerDataUpload.csv file contain only a column of UniProt accession values as the first column. Therefore, we would set the Non-data columns parameter to 1. If the second column contained something like Gene Symbols and the third column contained Entrez IDs, we would set the Non-data columns parameter to 3.

All subsequent columns should contain protein intensity values from each experiment. Each column should have a unique column name.

2. Experimental file format

The experimental file has two columns. The rows of the first column should contain the unique identifiers of each column from the Data file. The rows of the second columns are used to describe the sample. The first column should be identified as "SampleID", and the second column should be identified as "Experiment". In the example GroupsFile.csv file from the vignette's folder, the rows of the Sample column correspond to the names from the ProstateCancerDataUpload.csv file. The corresponding tissue type can be found in the rows of the Experiment column. For sample A_NAT_151, the Experiment column tells us that it is an Aggressive Normal tissue.

3. UniProt Accession, Gene Symbol, and Protein Names File

The Uniprot Data Table file is named H SAPIENS DATABASE.tab. It contains a tab separated file downloaded from UniProt that contains three columns: Protein Name, Gene Name and Entry (corresponding to the UniProt accession).

4. WGCNA parameters selection

All parameters and the WGCNA are explained in depth in the paper describing MetaNetwork. For each parameter, we will provide a brief summary, along with a description of how this parameter will influence the final WGCNA workflow.

WGCNA Paramters

Scale-free topology approximation threshold. This parameter controls the acceptable R^2 value corresponding to the minimum threshold at which the scale-free topology is met. Higher scale-free cutoff values will require higher values of ß to achieve scale-free topology. Increasing ß will reduce the overall number of connections between proteins. If there is no underlying scale-free topology in a data set, the value of ß will never be large enough to achieve the Scale-free cutoff, indicating the presence of some effect, either experimental or batch, that is driving the data's correlation structure.

Max power for scale-free network testing. This parameter is the maximum power, ß, that will be tested by MetaNetwork when determining whether the data meets the scale-free network topology criterion.

Check for automatic power selection. If selected, MetaNetwork will use the first power that achieves scale-free topology. If left unselected, the power used will be 12, which is sufficient for around 20 samples.

Module merging cut height. To reduce the number of modules, MetaNetwork merges those that are closely correlated. To merge, MetaNetwork clusters the module eigenproteins based on their correlation. The height at which the modules branch corresponds to their correlation coefficient. Setting the Module Merge Cut Height to 0.25 means that modules that are correlated at 0.75 or higher are merged together. Decreasing the Module Merge Cut Height increases the number of modules. Modifying this parameter influences the size and number of modules.

Minimum module size. Controls the minimum size of the module during clustering. Defaults to 20 proteins.

Advanced options

User-entered power selection. Allows users to manually input the power used for creating the adjacency matrix. Not run unless "Check for automatic power selection" is unchecked.

Module detection sensitivity. Controls the sensitivity of the dynamic tree cutting function used to identify modules. Lower values are less sensitive and will produce less modules. Higher values produce more modules.

Maximum block size. Entering more proteins than their is available RAM will trigger MetaNetwork to run modules in a blockwise fashion. Protein will be k-means clustered, then WGCNA will be performed on each cluster.

Number of preclustering centers Controls the initial number of k-means centers used when MetaNetwork runs blockwise.

Submission

After uploading data and modifying settings, click the submit button. The workflow will begin only if all required fields are filled. Messages from MetaNetwork will be displayed in the terminal or powershell window rather than the MetaNetwork GUI. Upon completion of the WGCNA workflow, a message will be displayed underneath the data preview and plots will appear in the windows.