Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Node Selection from PropertyGraph #4940

Open
2 tasks done
zmahoor opened this issue Feb 13, 2025 · 1 comment
Open
2 tasks done

[BUG]: Node Selection from PropertyGraph #4940

zmahoor opened this issue Feb 13, 2025 · 1 comment
Assignees
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@zmahoor
Copy link

zmahoor commented Feb 13, 2025

Version

24.10

Which installation method(s) does this occur on?

Conda

Describe the bug.

property_graph.select_vertices() does not work with expressions with "in". Example:expression = "(v_prop in [1, 4, 5])"
Not sure if this's an actual bug or "in" is not supported in that function. The underlying issue is with using default python evalinselected_col = eval(expr, globals, locals)` which does not work well with dataframes. You may need to use https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.eval.html

Minimum reproducible example

import cugraph
import cudf
from cugraph.experimental import PropertyGraph
df = cudf.DataFrame(columns=["src", "dst", "some_property"],
                             data=[(99, 22, "a"),
                                   (98, 34, "b"),
                                   (97, 56, "c"),
                                   (96, 88, "d"),
                                  ])
pG = PropertyGraph()
pG.add_edge_data(df, type_name="etype", vertex_col_names=("src", "dst"))
vert_df = cudf.DataFrame({"vert_id": [99, 22, 98, 34, 97, 56, 96, 88],
                                   "v_prop": [1, 2, 3, 4, 5, 6, 7, 8]})
pG.add_vertex_data(vert_df, type_name="vtype", vertex_col_name="vert_id")

expression = "(v_prop in [1, 4, 5])"
selection = pG.select_vertices(expression)

Relevant log output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_5035/64479715.py in <cell line: 0>()
      4 print(expression)
      5 
----> 6 selection = pG.select_vertices(expression)
      7 # sub_G = pG.extract_subgraph(
      8 #     selection=selection,

/opt/conda/lib/python3.11/site-packages/cugraph/structure/property_graph.py in select_vertices(self, expr, from_previous_selection)
   1494 
   1495         globals = {}
-> 1496         selected_col = eval(expr, globals, locals)
   1497 
   1498         num_rows = len(self.__vertex_prop_dataframe)

<string> in <module>

/opt/conda/lib/python3.11/site-packages/cudf/core/frame.py in __bool__(self)
   1605 
   1606     def __bool__(self):
-> 1607         raise ValueError(
   1608             f"The truth value of a {type(self).__name__} is ambiguous. Use "
   1609             "a.empty, a.bool(), a.item(), a.any() or a.all()."

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Environment details

Other/Misc.

No response

Code of Conduct

  • I agree to follow cuGraph's Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report
@zmahoor zmahoor added ? - Needs Triage Need team to review and classify bug Something isn't working labels Feb 13, 2025
@rlratzel
Copy link
Contributor

Thank you @zmahoor for this issue and especially the minimal reproducer.
Since, as you noted, the implementation uses DataFrames and v_prop in your expression is a Series, the workaround is to use isin:

>>> expression = "(v_prop.isin([1, 4, 5]))"
>>> selection = pG.select_vertices(expression)
>>> selection.vertex_selections
_VERTEX_
99     True
22    False
98    False
34     True
97     True
56    False
96    False
88    False
Name: v_prop, dtype: bool

This is obviously not as nice or intuitive as your example since it requires the user to understand key implementation details, but hopefully it should unblock you.

Also, there's unfortunately some differences in support for DataFrame.eval between pandas and cudf which might cause other issues. For example, here's what we'd like to do (your example), which works well when Pandas DFs are used:

>>> vert_df = pd.DataFrame({"vert_id": [99, 22, 98, 34, 97, 56, 96, 88],"v_prop": [1, 2, 3, 4, 5, 6, 7, 8]})
>>> vert_df.eval("v_prop in [1,4,5]")
0     True
1    False
2    False
3     True
4     True
5    False
6    False
7    False
Name: v_prop, dtype: bool

However, when we use a cudf.DataFrame:

>>> vert_df = cudf.DataFrame({"vert_id": [99, 22, 98, 34, 97, 56, 96, 88],"v_prop": [1, 2, 3, 4, 5, 6, 7, 8]})
>>> vert_df.eval("v_prop in [1,4,5]")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.12/site-packages/cudf/utils/performance_tracking.py", line 51, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/cudf/core/dataframe.py", line 7993, in eval
    return Series._from_column(self._compute_column(statements[0]))
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/cudf/core/dataframe.py", line 7862, in _compute_column
    plc.expressions.to_expression(expr, self._column_names),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "expressions.pyx", line 485, in pylibcudf.expressions.to_expression
  File "/opt/conda/lib/python3.12/ast.py", line 407, in visit
    return visitor(node)
           ^^^^^^^^^^^^^
  File "expressions.pyx", line 374, in pylibcudf.expressions.ExpressionTransformer.visit_Module
  File "/opt/conda/lib/python3.12/ast.py", line 407, in visit
    return visitor(node)
           ^^^^^^^^^^^^^
  File "expressions.pyx", line 377, in pylibcudf.expressions.ExpressionTransformer.visit_Expr
  File "/opt/conda/lib/python3.12/ast.py", line 407, in visit
    return visitor(node)
           ^^^^^^^^^^^^^
  File "expressions.pyx", line 431, in pylibcudf.expressions.ExpressionTransformer.visit_Compare
  File "expressions.pyx", line 435, in genexpr
KeyError: <class 'ast.In'>

I think we should still strive to support a more natural expression like (v_prop in [1, 4, 5]) though. I wish I had a better answer but since this is likely non-trivial, we'll have to understand how best to support that and prioritize it accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants