-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: James W. Kimani <jkimani2@gmail.com>
- Loading branch information
Showing
22 changed files
with
709 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
RemoteSystemsTempFiles/ | ||
Servers/ | ||
target/ | ||
logs/ | ||
.metadata/ | ||
bin/ | ||
tmp/ | ||
*.tmp | ||
*.bak | ||
*.swp | ||
*~.nib | ||
local.properties | ||
.settings/ | ||
.loadpath | ||
.recommenders | ||
.idea/ | ||
.project | ||
classes/ | ||
.classpath | ||
.iml | ||
*_SUCCESS* | ||
*.crc | ||
|
||
# External tool builders | ||
.externalToolBuilders/ | ||
|
||
# Locally stored "Eclipse launch configurations" | ||
*.launch | ||
|
||
# PyDev specific (Python IDE for Eclipse) | ||
*.pydevproject | ||
|
||
# CDT-specific (C/C++ Development Tooling) | ||
.cproject | ||
|
||
# Java annotation processor (APT) | ||
.factorypath | ||
|
||
# PDT-specific (PHP Development Tools) | ||
.buildpath | ||
|
||
# sbteclipse plugin | ||
.target | ||
|
||
# Tern plugin | ||
.tern-project | ||
|
||
# TeXlipse plugin | ||
.texlipse | ||
|
||
# STS (Spring Tool Suite) | ||
.springBeans | ||
|
||
# Code Recommenders | ||
.recommenders/ | ||
|
||
# Scala IDE specific (Scala & Java development for Eclipse) | ||
.cache-main | ||
.scala_dependencies | ||
.worksheet |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,53 @@ | ||
# big-data-insights-scala | ||
personal solutions to big data problem scenarios using scala | ||
personal solutions to big data problem scenarios using scala | ||
|
||
## Project Structure | ||
Each package is based on a problem scenario. | ||
|
||
Each problem scenario will contain a main class in the *com.jwk.development.big_data_insights.scala.products.driver* package | ||
|
||
Each problem contains a problem scenario detail and result sheet. | ||
|
||
### 1. Product Data for a pen company | ||
|
||
Problem: Given csv files with product information from a pen company, provide some insights using big data technologies | ||
|
||
Package name: *com.jwk.development.big_data_insights.scala.products.problem_scenario_One* | ||
|
||
Driver/Main class: *com.jwk.development.big_data_insights.scala.products.driver.run_problem_scenario_one* | ||
|
||
Link to result sheet and detailed problem scenarions: | ||
|
||
[Part One]() | ||
[Part Two]() | ||
[Part Three]() | ||
|
||
### 2. Patient Data | ||
|
||
Problem: ** | ||
Package name: ** | ||
Driver/Main class: ** | ||
|
||
Link to result sheet and detailed problem scenarions: | ||
|
||
|
||
## Troubleshooting | ||
1. When running applications if below error occurs: *A master URL must be set in your configuration* | ||
``` | ||
Exception in thread "main" java.lang.ExceptionInInitializerError | ||
at com.jwk.development.big_data_insights.scala.products.driver.problem_scenario_1.main(problem_scenario_1.scala) | ||
Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration | ||
``` | ||
Solution: | ||
Add the following VM option to your run configurations | ||
``` | ||
-Dspark.master=local | ||
``` | ||
[link to setting spark master to local in intellij]() | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
name := "big-data-insights-scala" | ||
|
||
version := "1.0" | ||
|
||
scalaVersion := "2.11.8" | ||
|
||
libraryDependencies ++= Seq( | ||
"org.apache.hadoop" % "hadoop-client" % "2.7.3", | ||
("org.apache.spark" % "spark-core_2.11" % "2.1.0"), | ||
("org.apache.spark" % "spark-sql_2.11" % "2.1.0"), | ||
"org.apache.spark" % "spark-hive_2.11" % "2.1.0", | ||
"com.databricks" % "spark-avro_2.11" % "3.2.0", | ||
"com.databricks" % "spark-csv_2.10" % "1.3.0", | ||
"org.scala-lang" % "scala-library" % "2.11.8", | ||
"org.scala-lang" % "scala-reflect" % "2.11.8", | ||
"com.typesafe" % "config" % "1.3.1", | ||
"org.apache.logging.log4j" %% "log4j-api-scala" % "2.8.1", | ||
"org.apache.logging.log4j" % "log4j-core" % "2.8.1", | ||
"org.apache.kafka" %% "kafka" % "0.9.0.2.3.4.51-1" | ||
|
||
) | ||
//use external repositories | ||
resolvers += "HortonWorksRepo" at "http://repo.hortonworks.com/content/repositories/releases/" | ||
|
||
parallelExecution in test := false | ||
|
||
|
||
initialCommands := "import org.test._" | ||
|
||
//clean operations | ||
cleanFiles += baseDirectory { base => base / "build" }.value | ||
cleanFiles += baseDirectory { base => base / "metastore_db" }.value | ||
|
||
//assembly-settings |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Set root logger level to DEBUG and its only appender to A1. | ||
log4j.rootLogger=ERROR, A1 | ||
# If we get chained appenders, this stops the message being written multiple times | ||
log4j.additivity.org.apache=false | ||
log4j.additivity.xdasLogger=false | ||
# A1 is set to be a ConsoleAppender. | ||
log4j.appender.A1=org.apache.log4j.ConsoleAppender | ||
log4j.appender.stdout.Target=System.out | ||
# A1 uses PatternLayout. | ||
log4j.appender.A1.layout=org.apache.log4j.PatternLayout | ||
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n | ||
|
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Set root logger level to DEBUG and its only appender to A1. | ||
log4j.rootLogger=ERROR, A1 | ||
# If we get chained appenders, this stops the message being written multiple times | ||
log4j.additivity.org.apache=false | ||
log4j.additivity.xdasLogger=false | ||
# A1 is set to be a ConsoleAppender. | ||
log4j.appender.A1=org.apache.log4j.ConsoleAppender | ||
log4j.appender.stdout.Target=System.out | ||
# A1 uses PatternLayout. | ||
log4j.appender.A1.layout=org.apache.log4j.PatternLayout | ||
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
patientID,name ,address ,dateOfBirth,lastVisitDate | ||
1001 ,Homer Simpson ,"123 Blue St.,Los Angeles, CA 12345" ,1989-12-31 ,2017-01-21 | ||
1002 ,Peter Griffin ,"234 Brown St., San Fransisco, CA 23456",1950-01-30 ,2015-04-18 | ||
1003 ,Hubert J. Fansworth,"546 Red Dr., Sacramento, CA 54678" ,1978-08-21 ,2017-02-14 | ||
1004 ,Marge Simpson ,"123 Blue St.,Los Angeles, CA 12345" ,1990-03-18 ,2016-02-15 | ||
1005 ,Bender Rodriguez ,"127 Brown St., Charlotte, NC 28223" ,1986-12-31 ,2013-12-14 | ||
1006 ,Turanga Leela ,"128 Brown St., Charlotte, NC 28223" ,1978-08-21 ,2012-09-15 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
productID productCode name quantity price supplierid | ||
1001 PEN Pen Red 5000 1.23 501 | ||
1002 PEN Pen Blue 8001 1.25 501 | ||
1003 PEN Pen Black 2000 1.25 501 | ||
1004 PEC Pencil 2B 10000 0.48 502 | ||
1005 PEC Pencil 2H 8000 0.49 502 | ||
1006 PEC Pencil HB 0 9999.99 502 | ||
2001 PEC Pencil 3B 500 0.52 501 | ||
2002 PEC Pencil 4B 200 0.62 501 | ||
2003 PEC Pencil 5B 100 0.73 501 | ||
2004 PEC Pencil 6B 500 0.47 502 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
productID,supplierID | ||
2001 ,501 | ||
2002 ,501 | ||
2003 ,501 | ||
2004 ,502 | ||
2001 ,503 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
supplierID,name ,phone | ||
501 ,ABC Traders,88881111 | ||
502 ,XYZ Company,88882222 | ||
503 ,QQ Corp ,88883333 | ||
504 ,DEG LLC ,88884444 | ||
505 ,FGH Limited,88885555 |
Oops, something went wrong.