Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: normalization issue in betweenness_centrality #4941

Open
2 tasks done
zmahoor opened this issue Feb 13, 2025 · 3 comments
Open
2 tasks done

[BUG]: normalization issue in betweenness_centrality #4941

zmahoor opened this issue Feb 13, 2025 · 3 comments
Assignees
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@zmahoor
Copy link

zmahoor commented Feb 13, 2025

Version

24.10

Which installation method(s) does this occur on?

Conda

Describe the bug.

the normalized values in betweenness_centrality seem to be incorrect when k=cudf.Series. the normalized values are not between [0, 1] and also are different from the values we get from networkx.

Minimum reproducible example

import cugraph
import cudf
df = cudf.DataFrame(
      columns=["dst", "src"],
      data=[('A','i'),('B','A'),('C','B'),('E','C'),('D','E'),('D','A'),('i','D'),('C','D')]
      )

G = cugraph.Graph(directed=True)
G.from_cudf_edgelist(df, source='src', destination='dst')
bc = cugraph.betweenness_centrality(G, normalized=True, k=cudf.Series(["i"], dtype=object))
print(bc)

Relevant log output

betweenness_centrality vertex
0                     1.2      A
1                     0.3      D
2                     0.0      E
3                     0.3      B
4                     0.3      C
5                     0.0      i


the values we get from networkx are {'A': 1.0, 'B': 0.25, 'C': 0.25, 'E': 0, 'D': 0.25, 'i': 0}

Environment details

Other/Misc.

No response

Code of Conduct

  • I agree to follow cuGraph's Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report
@zmahoor zmahoor added ? - Needs Triage Need team to review and classify bug Something isn't working labels Feb 13, 2025
@ChuckHastings
Copy link
Collaborator

We will investigate. Should be an easy fix, since the scores look proportional but not properly normalized.

@ChuckHastings ChuckHastings self-assigned this Feb 21, 2025
@zmahoor
Copy link
Author

zmahoor commented Feb 25, 2025

@ChuckHastings should the betweenness centrality values be integers or real numbers when normalized=False? I'm getting real numbers but I assumed when normalized=False we should get the counts of shortest paths.

import cudf
import cugraph
# download data from https://github.com/rapidsai/cugraph/blob/main/datasets/karate.csv
df = cudf.read_csv(
    "./karate.csv", 
    delimiter=" ", 
    names=["src", "dst", "value"], 
    dtype=["int32", "int32", "float32"]
    )
sg = cugraph.Graph(directed=True)
sg.from_cudf_edgelist(df, source='src', destination='dst')
cdf_bc = cugraph.betweenness_centrality(sg, normalized=False)
cdf_bc.sort_values(by='betweenness_centrality', ascending=False).head(5)
vertex | betweenness_centrality
--        | --
0         | 462.142914
33       | 321.103180
32       | 153.380936
2         | 151.701569
31       | 146.019058

@ChuckHastings
Copy link
Collaborator

Real numbers.

The centrality score is computed by taking the ratio of total shortest paths through a vertex over all shortest paths for each s,t pair in the graph, so it's going to be a rational number represented in floating point (either float or double depending on what kind of weight you use).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants