You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, preprocess_ondisk_dataset consumes much more memory than the topology of a graph itself during the preprocessing. When loading a graph with 2B nodes and 8B edges, it cannot be finished in a machine with 380 GB memory. After a rough profiling, I found that the peak memory usage is reached when converting a DGL graph to a fused sampling graph.
🚀 Feature
Motivation
Currently, preprocess_ondisk_dataset consumes much more memory than the topology of a graph itself during the preprocessing. When loading a graph with 2B nodes and 8B edges, it cannot be finished in a machine with 380 GB memory. After a rough profiling, I found that the peak memory usage is reached when converting a DGL graph to a fused sampling graph.
dgl/python/dgl/graphbolt/impl/ondisk_dataset.py
Line 212 in 4ee0a8b
There could be two factors contributing to the peak memory usage.
from_dglgraph
creates a temporary homogeneous graph and also its CSC format.Alternatives
Pitch
Additional context
The text was updated successfully, but these errors were encountered: