[Graphbolt][Performance] Reduce the memory usage of `preprocess_ondisk_dataset` #7086

czkkkkkk · 2024-02-05T08:10:17Z

🚀 Feature

Motivation

Currently, preprocess_ondisk_dataset consumes much more memory than the topology of a graph itself during the preprocessing. When loading a graph with 2B nodes and 8B edges, it cannot be finished in a machine with 380 GB memory. After a rough profiling, I found that the peak memory usage is reached when converting a DGL graph to a fused sampling graph.

dgl/python/dgl/graphbolt/impl/ondisk_dataset.py

Line 212 in 4ee0a8b

fused_csc_sampling_graph = from_dglgraph(

There could be two factors contributing to the peak memory usage.

The input DGL graph is passed to the function, which consumes about 160 GB memory.
from_dglgraph creates a temporary homogeneous graph and also its CSC format.

Alternatives

Pitch

Additional context

The text was updated successfully, but these errors were encountered:

Rhett-Ying · 2024-02-05T09:10:21Z

@Skeleton003 could you look into it and try with the new implementation: #6986

github-actions · 2024-03-07T01:17:26Z

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

Rhett-Ying assigned Rhett-Ying and Skeleton003 Feb 5, 2024

Rhett-Ying mentioned this issue Feb 5, 2024

[GraphBolt] modify preprocess_ondisk_dataset() #6986

Merged

8 tasks

github-actions bot added the stale-issue label Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Graphbolt][Performance] Reduce the memory usage of `preprocess_ondisk_dataset` #7086

[Graphbolt][Performance] Reduce the memory usage of `preprocess_ondisk_dataset` #7086

czkkkkkk commented Feb 5, 2024

Rhett-Ying commented Feb 5, 2024

github-actions bot commented Mar 7, 2024

[Graphbolt][Performance] Reduce the memory usage of preprocess_ondisk_dataset #7086

[Graphbolt][Performance] Reduce the memory usage of preprocess_ondisk_dataset #7086

Comments

czkkkkkk commented Feb 5, 2024

🚀 Feature

Motivation

Alternatives

Pitch

Additional context

Rhett-Ying commented Feb 5, 2024

github-actions bot commented Mar 7, 2024

[Graphbolt][Performance] Reduce the memory usage of `preprocess_ondisk_dataset` #7086

[Graphbolt][Performance] Reduce the memory usage of `preprocess_ondisk_dataset` #7086