This project demonstrates how to use Spark to ingest into Pivotal GemFire via the REST API. The process is as follows:
- Read two CSV files from disk using Spark's
textFile
- Iterate over the records with a Spark
foreachPartition
operation- Instantiate
Order
andOrderLineItem
objects from the records - Store the object into the respective GemFire region
- Instantiate
- GemFire asynchronously appends (trickle merges)
OrderLineItems
into the correspondingOrder
objects, based on theorder_id
field - A Spring Boot GemFire client receives realtime updates to
Order
objects and displays them in a UI
(note: the GemFire Java API was also successfully tested but abandoned due to added complexity)
Please see this repo for the retail_demo
sample data.
$ (cd spark-ingestion && sbt assembly)
$ mvn package
$ (cd gemfire-server/scripts && ./startCluster.gfsh)
$ java -jar spring-websocket-reactive-app/target/spring-websocket-reactive-1.0-SNAPSHOT.jar
$ $SPARK_HOME/bin/spark-submit --class "io.pivotal.sample.PushToGemFireApp" --master local[*] spark-ingestion/target/scala-2.11/spark-gemfire-ingestion-assembly-1.0.jar file:///Users/kdunn/gdrive/SampleData/retail_demo/orders/10k_orders.tsv.gz file:///Users/kdunn/gdrive/SampleData/retail_demo/order_lineitems/10k_order_lineitems.tsv.gz