Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy in Edge Counts for Directed Erdős–Rényi Graph Generation #144

Open
mdindoost opened this issue Dec 25, 2024 · 1 comment · Fixed by #141
Open

Discrepancy in Edge Counts for Directed Erdős–Rényi Graph Generation #144

mdindoost opened this issue Dec 25, 2024 · 1 comment · Fixed by #141
Assignees

Comments

@mdindoost
Copy link
Contributor

I suspect that the ar.gnp(num_nodes, p) function may have an issue.
I ran the code in this way: temp_prop_graph = ar.gnp(num_nodes, p, create_using=ar.PropGraph) so I supposed it will create Directed graph. if this assumption is not correct ignore the rest!

Let's analyze the parameters and the observed results:

Number of nodes (n): 10,000
Edge probability (p): 0.0005
Graph type: Directed (Erdős–Rényi model G(n,p)
Self-loops: Not allowed

1. My understanding of the Directed Erdős–Rényi: In a directed Erdős–Rényi G(n,p) graph without self-loops:
Each node can have a directed edge to every other node.

Total possible directed edges = n(n−1)

Total possible edges = 10,000 × 9,999 = 99,990,000
Total possible edges=10,000×9,999=99,990,000

2. Calculating the Expected Number of Edges (E)
E=n(n−1)×p=99,990,000×0.0005=49,995
So, the expected number of edges is 49,995.

3. Calculating the Variance (Var) and Standard Deviation (σ)
The variance in a binomial distribution is given by:

Var = n (n−1) × p × (1−p)
Substituting the values:
Var=99,990,000 × 0.0005 × 0.9995 ≈ 49,975.0125 almost 49,975
The standard deviation is the square root of the variance:

𝜎 = SQRT(Var) = ≈223.6

This means that while the expected number of edges is 49,995, the actual number of edges will typically lie within a range around this value, considering the standard deviation.

4. Comparing with Observed Results
Your graph generation resulted in:

Vertices: 9,942
Edges: 24,888

5. Analyzing the Discrepancy
The observed number of edges (24,888) is roughly half of the expected number for a directed graph (49,995). This suggests that the graph might be treated as undirected.

The observed edge count (24,888) closely aligns with the expected number for an undirected graph, indicating that the graph generation code might be inadvertently treating the graph as undirected despite the intention to create a directed graph.

@alvaradoo
Copy link
Member

Thank you for pointing this out. I am attaching to this issue a branch that will fix this problem when merged. The problem is that our old gnp method was treating directed graphs as undirected graphs which is incorrect. Our updated function is below where it creates all possible indices in an n x n matrix and then selects if the edge exists or not independent of u-->v or v-->u.

def gnp(n: int, p: float,
        create_using: Union[ar.Graph,ar.DiGraph,ar.PropGraph]
    ) -> Union[ar.Graph,ar.DiGraph,ar.PropGraph]:
    """
    Generate a random binomial graph. Also known as an Erdos-Renyi or completely random graph.
    Does not allow for the creation of isolates, self-loops, or duplicated edges.

    Parameters
    ----------
    n : int
        number of nodes
    p : float in [0, 1]
        probability of edge formation

    create_using : Union[ar.Graph,ar.DiGraph,ar.PropGraph]
        Arachne graph constructor
        constructors supported
        ar.Graph
        ar.DiGraph
        ar.PropGraph

    Returns
    -------
    graph: Union[ar.Graph,ar.DiGraph,ar.PropGraph]
    
    """
    U = ak.broadcast(ak.arange(0, n*n, n), ak.arange(n), n*n)
    V = ak.arange(U.size) % n

    not_self_loop = V != U
    filtered_U = U[not_self_loop]
    filtered_V = V[not_self_loop]

    probabilities = ak.randint(0, 1, filtered_U.size, dtype=ak.float64)

    kept_edges = probabilities < p
    kept_U = filtered_U[kept_edges]
    kept_V = filtered_V[kept_edges]

    graph = empty_graph(create_using)

    if V.size == 0 or U.size == 0:
        return graph

    graph.add_edges_from(kept_U,kept_V)

    return graph

@alvaradoo alvaradoo linked a pull request Jan 2, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants