-
Notifications
You must be signed in to change notification settings - Fork 5
Parallel Graph Loading
anilpacaci edited this page Jun 29, 2017
·
1 revision
- Adjacency List File Format generated and uploaded in HDFS
- Partition Mapping generated and available as text file
- Cassandra Cluster running
- I prefer Cassandra instances to use local disk for storage, so I make sure that data directories are set to specific directory in local file system
- Rename Cassandra data directories using dataset name, specific partitioning scheme name and # of partitions such as
sf3_metis_4
- Cassandra instances should be configured to use
ByteOrderedPartitioner
andinitial_token
should be set appropriately (token range should be equally divided into # of partitions)
- JanusGraph installation
- Edgecut branch of anilpacaci's fork
- Altough Java HashMap might be sufficient for smaller datasets, its better to stick with Memcached
- Run Memcached with significant enough memory
- ~512M for each sf (sf3 --> ~1500M)
- PartitionLookup Importer Script can be used to populate memcached
- run gremlin console from JanusGraph
- modify snb.properties file to point your running Memcached, and partition lookup file
- using
:load
command, load the PartitionLookupImporter.groovy script - run PartitionLookupImporter.load(
path to snb.properties file
)
- Configure JanusGraph instance
- We need to configure a JanusGraph instance to run on Cassandra Cluster, here is a sample configuration file I use janusgraph-cassandra-es-server.properties
- Make sure that
storage.hostname
points to the IP of cassandra instance (running on the same machine) - Make sure that ids.placement-history-hostname` points to running memcached server
- ids.placement is
PartitionAwarePlacementStrategy
for fennel, ldg and metis - Load initialization (script)[https://github.com/anilpacaci/graph-partitioning/blob/master/scripts/initJanus.groovy] using
:load
command from gremlin console - call
initializeJanus([path to janusgraph-cassandra-es-server.properties])
- Initialize graph over the newly created graph instance:
graph = JanusGraphFactory.open([path to janusgraph-cassandra-es-server.properties])
- Start Multithreaded data loader
- Load SNB (script)[https://github.com/anilpacaci/graph-partitioning/blob/master/scripts/SNBParser.groovy] using
:load
command from gremlin console - Configure snb.properties file (point to correct memcached instance,
input.base
should point tosocial_network
directory of the dataset loaded) - call
SNBParser.loadSNBGraph([graph instance from step 1], [path to snb.properties])
- Load SNB (script)[https://github.com/anilpacaci/graph-partitioning/blob/master/scripts/SNBParser.groovy] using