Skip to content

tyrex-team/mumonoid-programs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

mu-monoid experiments

This repository contains the code and instructions needed to reproduce the experiments of the paper Efficient Iterative Programs with Distributed Data Collections. Experiments conducted with the DIQL system are found in this repository, and those conducted with the Emma system are available in this repository.

Requirements

This repository is written in Scala to be executed in a Spark 2.4 cluster. It requires to be compiled and executed in a Java 8 environment with Scala 2.11.12.

Datasets

All the datasets used for the experiments reported in the paper can be downloaded here: datasets.zip


This archive contains the files of the graphs used in the experiments.

You can download it with the following command:

wget https://cloud.univ-grenoble-alpes.fr/s/4QPG6FMobrAqwnn/download/datasets.zip
unzip dataset.zip

Files to be used to run mu-monoid programs are in the folder mumonoiddata.

Compilation

  1. Compile the assembly JAR of this project. It will produce a ready-to-use JAR file in the target/scala-2.11 folder, with the name matching *-assembly-*.jar. You should move this file up to the project root and rename it mumonoid-programs-assembly.jar (we'll use this same in the next commands):
    sbt assembly
    mv target/scala-2.11/mumonoid-programs-assembly-0.1.0-SNAPSHOT.jar mumonoid-programs-assembly.jar
  2. Copy the assembly JAR to a place accessible by spark-submit

A note about IDEs

  • IntelliJ works well by loading the build.sbt file to make a new project
  • If you use Eclipse:
    1. Call sbt eclipse to generate the project files
    2. Import the project into Eclipse
    3. Change the Scala Compiler properties of the Eclipse project to use Scala 2.11

Usage

Below is the command to run each the different programs. These commands require the following arguments:

  • $DATA_FILE: Absolute path to the input dataset
  • $MASTER_URL: Spark master URL. See the Spark documentation. Omit when running locally.
  • $NB_PARTITIONS: Number of partitions in Spark. We set it to be the number of available cores in the cluster
  • --cluster: Option to indicate that the program is run on a spark cluster. Omit when running locally.
  • $PROGRAM: The test class name. All available in the directory todo (for instance TC, SP, ...)

TC and SP command

  • $PROGRAM: One of TC, TCNoPdist, SP, SPNoPA, SPNoPdist.
  • $DATA_FILE: Absolute path to the file containing the graph edges. Files used in the experiments are of the form rnd_n_p.txt for TC programs and rnd_n_p_W.txt for SP programs.
spark-submit \
    --class fr.inria.tyrex.mumonoidPrograms.$PROGRAM \
    --driver-memory 40g \
    --conf spark.driver.maxResultSize=0 \
    mumonoid-programs-assembly.jar \
    $DATA_FILE \
    --cluster \
    --master $MASTER_URL \
    --partitions $NB_PARTITIONS 

TC Filter and SP Filter command

  • $PROGRAM: One of TCFilter, TCFilterNoPdist, SPFilter, SPFilterNoPA, SPFilterNoPdist.
  • $DATA_FILE: Absolute path to the file containing the graph edges. Files used in the experiments are of the form rnd_n_p.txt for TC programs and rnd_n_p_W.txt for SP programs.
  • $START_NODES_FILE: Absolute path to the file containing the start nodes. Files used in the experiments are of the form start_rnd_n_p.txt
spark-submit \
    --class fr.inria.tyrex.mumonoidPrograms.$PROGRAM \
    --driver-memory 40g \
    --conf spark.driver.maxResultSize=0 \
    mumonoid-programs-assembly.jar \
    $DATA_FILE \
    $START_NODES_FILE
    --cluster \
    --master $MASTER_URL \
    --partitions $NB_PARTITIONS 

Path planning command

  • $PROGRAM: One of PathPlanning, PathPlanningNoPA, PathplanningNoPdist.
  • $DATA_FILE: Absolute path to the file containing routes between cities. Files used in the experiments are of the form cities_n_p.
  • $START_ROUTES_FILE: Absolute path to the file containing the starting routes. Files used in the experiments are of the form start_cities_n_p.txt
spark-submit \
    --class fr.inria.tyrex.mumonoidPrograms.$PROGRAM \
    --driver-memory 40g \
    --conf spark.driver.maxResultSize=0 \
    mumonoid-programs-assembly.jar \
    $DATA_FILE \
    $START_ROUTES_FILE
    --cluster \
    --master $MASTER_URL \
    --partitions $NB_PARTITIONS 

Movie Recommandation command

  • $PROGRAM: One of MovieRecommendations, MovieRecommendationsNoPdist.
  • $DATA_FILE: Absolute path to the file containing users. Files used in the experiments are of the form users_n
  • $START_MOVIES_FILE: Absolute path to the file containing start movies. Files used in the experiments are of the form start_movies_n.txt
spark-submit \
    --class fr.inria.tyrex.mumonoidPrograms.$PROGRAM \
    --driver-memory 40g \
    --conf spark.driver.maxResultSize=0 \
    mumonoid-programs-assembly.jar \
    $DATA_FILE \
    $START_MOVIES_FILE
    --cluster \
    --master $MASTER_URL \
    --partitions $NB_PARTITIONS 

Flights command

  • $PROGRAM: One of Flights, FlightsNoPdist.
  • $DATA_FILE: Absolute path to the file containing flights. Files used in the experiments are of the form flights_n_p.txt.
spark-submit \
    --class fr.inria.tyrex.mumonoidPrograms.$PROGRAM \
    --driver-memory 40g \
    --conf spark.driver.maxResultSize=0 \
    mumonoid-programs-assembly.jar \
    $DATA_FILE 
    --cluster \
    --master $MASTER_URL \
    --partitions $NB_PARTITIONS 

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages