Skip to content

Latest commit

 

History

History
54 lines (47 loc) · 2.67 KB

CodeBook.md

File metadata and controls

54 lines (47 loc) · 2.67 KB

Datasets

Downloaded unzipped:

data/Dataset.zip: downloaded from the Internet "UCI HAR Dataset": folder unzipped from Dataset.zip "UCI HAR Dataset"/activity_labels.txt: labels for each of the 6 activities (loaded by run_analysis.R) "UCI HAR Dataset"/features.txt: labels for each measurement found in the test and training datasets (loaded by run_analysis.R) "UCI HAR Dataset"/train/subject_train.txt: subjects ID corresponding to the training dataset (loaded by run_analysis.R) "UCI HAR Dataset"/train/X_train.txt: the training dataset, containing measurements only (loaded by run_analysis.R) "UCI HAR Dataset"/train/y_train.txt: activities codes for the training dataset (loaded by run_analysis.R) "UCI HAR Dataset"/test/subject_test.txt: subjects ID corresponding to the test dataset (loaded by run_analysis.R) "UCI HAR Dataset"/test/X_test.txt: the test dataset, containing measurements only (loaded by run_analysis.R) "UCI HAR Dataset"/test/y_test.txt: activities codes for the test dataset (loaded by run_analysis.R) Generated by the script:

data/tidy.txt: output of the project generated by run_analysis.R; contains a header and 180 rows (30 subjects x 6 activities) Variables Description

Result dataset: tidy.txt

Subject: subject ID, comes from subject_train.txt and subject_test.txt files Activity: activity as a factor, comes from y_train.txt and y_test.txt files All mean measurements in the form ID-(mean|std)[-X|Y|Z], e.g fBodyAcc-mean-X, fBodyBodyGyroMag-std. There are 66 measurements in the tidy.txt dataset. The original measurements come from X_train.txt and X_test.txt. Transformations

The following steps are performed by run_analysis.R to generate tidy.txt:

a. cleanup b. fetch and unzip the data set b.1 create data sub-directory if necessary b.2 download original data if necessary (skip if exists already as it takes time) b.3 unzip and creates dataSetDir if necessary c. read the data sets c.1 subjects IDs c.1 activities codes c.2 measurements d. merge datasets vertically, adding rows but keeping the same columns d.1 subjects d.2 activity codes d.3 measurements e. read feature and activity labels e.1 read as-is e.2 add column names and check f. renames columns of the merged measurement dataset with the feature labels g. filter the merged dataset to keep names with mean() or std() in them g.1 select the columns to keep g.2 subset by keeping the columns g.3. remove the parenthesis from the names h. add Subject and Activity columns in front i. add activity labels to the merged dataset (Activity becomes a factor) j. aggregate and calculate the mean by subject and activity k. save the tidy dataset in data (note: Activiti is save as a factor) These steps are clearly documented in run_analysis.R