Skip to content

Commit

Permalink
repo restructure
Browse files Browse the repository at this point in the history
Signed-off-by: James W. Kimani <jkimani2@gmail.com>
  • Loading branch information
jwkimani committed Feb 10, 2018
1 parent cac777e commit cc3678a
Show file tree
Hide file tree
Showing 22 changed files with 709 additions and 1 deletion.
60 changes: 60 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
RemoteSystemsTempFiles/
Servers/
target/
logs/
.metadata/
bin/
tmp/
*.tmp
*.bak
*.swp
*~.nib
local.properties
.settings/
.loadpath
.recommenders
.idea/
.project
classes/
.classpath
.iml
*_SUCCESS*
*.crc

# External tool builders
.externalToolBuilders/

# Locally stored "Eclipse launch configurations"
*.launch

# PyDev specific (Python IDE for Eclipse)
*.pydevproject

# CDT-specific (C/C++ Development Tooling)
.cproject

# Java annotation processor (APT)
.factorypath

# PDT-specific (PHP Development Tools)
.buildpath

# sbteclipse plugin
.target

# Tern plugin
.tern-project

# TeXlipse plugin
.texlipse

# STS (Spring Tool Suite)
.springBeans

# Code Recommenders
.recommenders/

# Scala IDE specific (Scala & Java development for Eclipse)
.cache-main
.scala_dependencies
.worksheet
53 changes: 52 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,53 @@
# big-data-insights-scala
personal solutions to big data problem scenarios using scala
personal solutions to big data problem scenarios using scala

## Project Structure
Each package is based on a problem scenario.

Each problem scenario will contain a main class in the *com.jwk.development.big_data_insights.scala.products.driver* package

Each problem contains a problem scenario detail and result sheet.

### 1. Product Data for a pen company

Problem: Given csv files with product information from a pen company, provide some insights using big data technologies

Package name: *com.jwk.development.big_data_insights.scala.products.problem_scenario_One*

Driver/Main class: *com.jwk.development.big_data_insights.scala.products.driver.run_problem_scenario_one*

Link to result sheet and detailed problem scenarions:

[Part One]()
[Part Two]()
[Part Three]()

### 2. Patient Data

Problem: **
Package name: **
Driver/Main class: **

Link to result sheet and detailed problem scenarions:


## Troubleshooting
1. When running applications if below error occurs: *A master URL must be set in your configuration*
```
Exception in thread "main" java.lang.ExceptionInInitializerError
at com.jwk.development.big_data_insights.scala.products.driver.problem_scenario_1.main(problem_scenario_1.scala)
Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration
```
Solution:
Add the following VM option to your run configurations
```
-Dspark.master=local
```
[link to setting spark master to local in intellij]()
34 changes: 34 additions & 0 deletions build.sbt
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name := "big-data-insights-scala"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-client" % "2.7.3",
("org.apache.spark" % "spark-core_2.11" % "2.1.0"),
("org.apache.spark" % "spark-sql_2.11" % "2.1.0"),
"org.apache.spark" % "spark-hive_2.11" % "2.1.0",
"com.databricks" % "spark-avro_2.11" % "3.2.0",
"com.databricks" % "spark-csv_2.10" % "1.3.0",
"org.scala-lang" % "scala-library" % "2.11.8",
"org.scala-lang" % "scala-reflect" % "2.11.8",
"com.typesafe" % "config" % "1.3.1",
"org.apache.logging.log4j" %% "log4j-api-scala" % "2.8.1",
"org.apache.logging.log4j" % "log4j-core" % "2.8.1",
"org.apache.kafka" %% "kafka" % "0.9.0.2.3.4.51-1"

)
//use external repositories
resolvers += "HortonWorksRepo" at "http://repo.hortonworks.com/content/repositories/releases/"

parallelExecution in test := false


initialCommands := "import org.test._"

//clean operations
cleanFiles += baseDirectory { base => base / "build" }.value
cleanFiles += baseDirectory { base => base / "metastore_db" }.value

//assembly-settings
Empty file.
12 changes: 12 additions & 0 deletions config/test_linux/log4j.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Set root logger level to DEBUG and its only appender to A1.
log4j.rootLogger=ERROR, A1
# If we get chained appenders, this stops the message being written multiple times
log4j.additivity.org.apache=false
log4j.additivity.xdasLogger=false
# A1 is set to be a ConsoleAppender.
log4j.appender.A1=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
# A1 uses PatternLayout.
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

Empty file.
12 changes: 12 additions & 0 deletions config/test_windows/log4j.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Set root logger level to DEBUG and its only appender to A1.
log4j.rootLogger=ERROR, A1
# If we get chained appenders, this stops the message being written multiple times
log4j.additivity.org.apache=false
log4j.additivity.xdasLogger=false
# A1 is set to be a ConsoleAppender.
log4j.appender.A1=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
# A1 uses PatternLayout.
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

7 changes: 7 additions & 0 deletions insight_data/patients.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
patientID,name ,address ,dateOfBirth,lastVisitDate
1001 ,Homer Simpson ,"123 Blue St.,Los Angeles, CA 12345" ,1989-12-31 ,2017-01-21
1002 ,Peter Griffin ,"234 Brown St., San Fransisco, CA 23456",1950-01-30 ,2015-04-18
1003 ,Hubert J. Fansworth,"546 Red Dr., Sacramento, CA 54678" ,1978-08-21 ,2017-02-14
1004 ,Marge Simpson ,"123 Blue St.,Los Angeles, CA 12345" ,1990-03-18 ,2016-02-15
1005 ,Bender Rodriguez ,"127 Brown St., Charlotte, NC 28223" ,1986-12-31 ,2013-12-14
1006 ,Turanga Leela ,"128 Brown St., Charlotte, NC 28223" ,1978-08-21 ,2012-09-15
11 changes: 11 additions & 0 deletions insight_data/products.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
productID productCode name quantity price supplierid
1001 PEN Pen Red 5000 1.23 501
1002 PEN Pen Blue 8001 1.25 501
1003 PEN Pen Black 2000 1.25 501
1004 PEC Pencil 2B 10000 0.48 502
1005 PEC Pencil 2H 8000 0.49 502
1006 PEC Pencil HB 0 9999.99 502
2001 PEC Pencil 3B 500 0.52 501
2002 PEC Pencil 4B 200 0.62 501
2003 PEC Pencil 5B 100 0.73 501
2004 PEC Pencil 6B 500 0.47 502
6 changes: 6 additions & 0 deletions insight_data/products_suppliers.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
productID,supplierID
2001 ,501
2002 ,501
2003 ,501
2004 ,502
2001 ,503
6 changes: 6 additions & 0 deletions insight_data/supplier.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
supplierID,name ,phone
501 ,ABC Traders,88881111
502 ,XYZ Company,88882222
503 ,QQ Corp ,88883333
504 ,DEG LLC ,88884444
505 ,FGH Limited,88885555
Loading

0 comments on commit cc3678a

Please sign in to comment.