Skip to content

Data structures

dean geckt edited this page Sep 12, 2024 · 3 revisions

The data structure are the core of the software and later affect the post processing analysis as well.

They are defined either via python Enum or via Pydantic (an external python famous package), which can be thought of as a ‘class’ without function (like C structure).

They are located under /utils/types.py

Enum Classes

NetworkInputType

Different supported network input formats (each has a different loader under /networks/loaders/)

  • simple_adj_txt: Simple adjacency matrix in text format. Each line contains (v1, v2, w), where v1 -> v2, and w is ignored.
  • worm_wiring_xlsx: C.elegans connectomes from wormwiring.
  • polarity_xlsx: C.elegans connectomes with polarity data in Excel format , from the paper.
  • durbin_txt: C.elegans connectomes from Durbin in a text format.
  • multilayer: C.elegans multilayer connectomes from the paper.
  • graph: General graph (list of strings (tuples) where each is an edge. in the format: ["1 2" "2 3" ...]).
  • binary_network: Binary network file format (saved by this software).

SubGraphAlgoName

Different implemented algorithms used for subgraph enumeration:

  • specific: Specific subgraph detection algorithm.
  • mfinder_induced: MFinder - an induced version, from the paper.
  • mfinder_none_induced: MFinder - a non-induced version.
  • fanmod_esu: Fanmod ESU algorithm, from the paper.
  • triadic_census: Triadic census algorithm - a wrapper around networkx Implementation.

RandomGeneratorAlgoName

An enumeration of random network generation algorithms:

  • markov_chain_switching: Markov chain switching algorithm. R. Kannan, P. Tetali, S. Vempala, Random Struct. Algorithms 14, 293 (1999).
  • nerve_ring_markov_chain_switching: This is an anatomical constrained Markov-Chain which allow switches based on the C.elegans nerve-ring distances, based on Zaslaver et al., 2022: "The synaptic organization in the Caenorhabditis elegans neural network suggests significant local compartmentalized computations".
  • erdos_renyi: Erdős-Rényi random graph model. with probability p such that the average |E| of all random networks ~= |E| of the original network (Wikipedia).
  • barabasi: Barabási-Albert model. with m (# of edges to attach) = |E| / |N|. then random direction per edge is chosen (Wikipedia).

MotifType

A subgraph eventually is classified as either one of the following:

  • motif: Significantly more than by chance.
  • anti_motif: Significantly less than by chance.
  • none: Neither.

MotifName

Some subgraph have meaning names (other than their id representation):

  • self_loop: Self-loop motif.
  • mutual_regulation: Mutual regulation motif.
  • fan_out: Fan-out motif (also known as sim_2).
  • fan_in: Fan-in motif.
  • cascade: Cascade motif.
  • feed_forward: Feed-forward motif.
  • bi_fan: Bi-fan motif.
  • bi_parallel: Bi-parallel motif.
  • sim_3: Sim_3 motif.
  • na: Not available (n/a).

Data Models

NetworkLoaderArgs

A Pydantic model that defines the arguments for loading a network (check Arguments for more details):

  • synapse_threshold: Load an edge if the number of synapses is larger or equal to this threshold.
  • filter_polarity: Optional list of polarities to filter, default is ['+', '-'].
  • filter_prim_nt: Optional list of primary neurotransmitters or integers to filter, default is ['GABA', 'Glu', 'ACh', 0].
  • filter_syn_type: Optional type of synapse to filter, default is 'chem'.
  • filter_sex_type: Optional sex type to filter, default is 'herm'.
  • filter_nerve_ring_neurons: Optional boolean to filter nerve ring neurons, default is False.
  • filter_monoamines: Optional list of monoamines to filter, default is ['dopamine', 'octopamine', 'serotonin', 'tyramine'].
  • allow_self_loops: Optional boolean to allow self-loops, default is False.

MotifCriteriaArgs

A Pydantic model defining criteria for motif detection:

  • alpha: Significance level (float).
  • uniqueness_threshold: Threshold for uniqueness (int).
  • use_uniq_criteria: Boolean to indicate if uniqueness criteria should be used.
  • frequency_threshold: Frequency threshold for motif detection (float).

MotifCriteriaResults

A Pydantic model for storing the results of motif criteria evaluation:

  • n_real: Number of real appearances of the (candidate) motif (int).
  • is_statistically_significant: Optional boolean indicating statistical significance, default is False.
  • n_rand: Optional float representing the number of random appearances.
  • z_score: Optional float for the z-score.
  • std: Optional float for the standard deviation.
  • p_value: Optional float for the p-value.
  • uniq: Optional integer for uniqueness.
  • is_motif_frequent: Optional boolean indicating if the motif is frequent.
  • is_anti_motif_frequent: Optional boolean indicating if the anti-motif is frequent.
  • is_uniq: Optional boolean or string indicating if it is unique.
  • is_motif: Optional MotifType indicating the type of motif.

PolarityFrequencies

A Pydantic model for polarity frequencies:

  • frequency: Frequency count (int).
  • polarity: The actual polarity, represented as list of strings, e.g.,: ['+', '-'].
  • sub_graphs: The list of subgraphs that make up the instance of this polarity subgraph.

Motif

A Pydantic model representing a motif:

  • name: MotifName of the motif.
  • id: ID of the motif or subgraph (int or string).
  • adj_mat: Adjacency matrix (numpy array).
  • role_pattern: List of role patterns as tuples (list of tuples).
  • n_real: Optional number of real appearances (int, default is 0).
  • motif_criteria: Optional MotifCriteriaResults.
  • random_network_samples: Optional number of appearances of this motif in the random networks.
  • sub_graphs: Optional list of all the isomorphic sub graphs appearances, in a tuple-edge format: (s,t,polarity).
  • node_roles: Optional dictionary of node roles and frequencies (dict of dicts, key=role. value = dict where keys are node name and value are their freq).
  • node_appearances: Optional dictionary of node appearances (a sorted dict of the nodes that appear in this motif where the key is either a neuron name or node id).
  • polarity_motifs: Optional list of polarity motifs (list of Motif).
  • polarity: Optional the polarity of this motif, in case it is a polarity motif belonging to another (recursively).

SubGraphSearchResult

A Pydantic model for the result of a the subgraph enumeration search:

  • fsl: Dictionary of frequent subgraph lists (dict where the key is the motif id, and the value is the frequency).
  • fsl_fully_mapped: Same fsl but the values are the list of subgraphs.

LargeSubGraphSearchResult

An extension of SubGraphSearchResult with additional attributes:

  • adj_mat: Dictionary of adjacency matrices (dict of numpy arrays).

SearchResultBinaryFile

A TypedDict representing the binary file for search results:

  • args: Namespace containing arguments.
  • motifs: Dictionary of Motif.

NetworkBinaryFile

A TypedDict representing a binary file to save a custom network:

  • graph: DiGraph representing the network.
  • participating_nodes: Set of participating nodes.
  • neuron_names: List of neuron names.