-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: Strange interaction between cuDF spilling and rmm.statistics #1792
Comments
Thanks for the issue! Here are some notes on what is going on. Enabling statistics wraps the current MR in an adaptor that tracks statistics (see implementation). Similarly, enabling spilling in cuDF will override the current MR with a spilling adaptor (see implementation). The spilling implementation uses a This API design is somewhat flawed in my opinion. We should encourage a pattern that makes it clear that the global memory resource is changed by these features (note that the spilling option documents this behavior, but the downstream effects are not clear, as evidenced by this issue). There may be a way to re-order these commands in the code snippet above to make it work better. RMM is happy to stack MR adaptors and can support this case in the abstract, but it will be difficult to make the existing APIs act in a more predictable way. For the longer term fix, I would propose that we deprecate the existing statistics APIs (they pretend to act "globally" but are susceptible to modifications of the underlying memory resource). Instead, we should encourage users to explicitly control the memory resource like: import rmm.mr
current_mr = rmm.mr.get_current_device_resource()
stats_mr = rmm.mr.StatisticsResourceAdaptor(current_mr)
rmm.mr.set_current_device_resource(stats_mr)
# do allocations
print(stats_mr.allocation_counts) Likewise for spilling, we should encourage direct control of the memory resource. We can make a "nicer" name than import cudf
import rmm.mr
current_mr = rmm.mr.get_current_device_resource()
spilling_mr = cudf.SpillingResourceAdaptor(current_mr) # SpillingResourceAdaptor does not exist yet
stats_mr = rmm.mr.StatisticsResourceAdaptor(spilling_mr)
rmm.mr.set_current_device_resource(stats_mr) That way it tracks stats outside of the spilling framework (I think that is the desired behavior). If we implemented a @TomAugspurger @madsbk @vyasr I would be eager to hear your thoughts. |
I think it is a good idea to make enabling statistics in RMM more explicit as you suggestion, and maybe add an enable-statistics argument to But I don't think it helps cudf. If we want |
I don't know how much people are currently using spilling right now. Most of the times that I've heard about people using it they are people internal to NVIDIA doing more advanced things and I suspect they would be OK with spilling not being quite so simple as an option. IOW I would be supportive of moving to a model where we explicitly create the mr for spilling as well. The problem is that cudf's spilling functionality is not as straightforward as just changing the mr. It also involves changing some of the core data structures that cudf uses (the buffers) to be "spilling-aware". So while enabling statistics can largely be done by simply changing the mr, we would need to come up with a different model for spilling that is more explicit than the current one with respect to the memory resource but still allows for the more extensive implicit internal changes that need to happen. |
Describe the bug
There's a strange interaction between
rmm.statistics
and cuDF'sspill=True
option, where the first time acudf.DataFrame
is initialized with this option set, the initial timermm.push_statistics(); rmm.pop_statistics()
is called, then the return value ofpop_statistics
isNone
.Steps/Code to reproduce bug
Here's a reproducer that uses cudf:
The output is
Expected behavior
Return
Statistics
both times.Environment details (please complete the following information):
docker pull
&docker run
commands usedrmm/print_env.sh
script to gather relevant environment detailsAdditional context
The ordering of the
cudf.DataFrame()
does matter. It needs to be created afterenable_statistics
and beforepop_statistics()
to observe the reported behavior.The text was updated successfully, but these errors were encountered: