-
Notifications
You must be signed in to change notification settings - Fork 43
/
Copy pathWeek2Exercises_answers.Rmd
129 lines (94 loc) · 3.18 KB
/
Week2Exercises_answers.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
title: "Week 2 Exercises"
author: "Your Name Here"
date: "May 14, 2022"
output: pdf_document
---
Please complete all exercises below. You may use stringr, lubridate, or the forcats library.
Place this at the top of your script:
library(stringr)
library(lubridate)
library(forcats)
# Exercise 1
Read the sales_pipe.txt file into an R data frame as sales.
```{r}
# Your code here
sales <- read.delim("Week_2/Data/sales_pipe.txt"
,stringsAsFactors=FALSE
,sep = "|"
,fileEncoding="WINDOWS-1252"
)
```
# Exercise 2
You can extract a vector of columns names from a data frame using the colnames() function. Notice the first column has some odd characters. Change the column name for the FIRST column in the sales date frame to Row.ID.
**Note: You will need to assign the first element of colnames to a single character.**
```{r}
# Your code here
colnames(sales)[1] <- "Row.ID"
```
# Exercise 3
Convert both Ship.Date and Order.Date to date vectors within the sales data frame. What is the number of days between the most recent order and the oldest order?
How many years is that?
How many weeks?
**Note: Use lubridate**
```{r}
# Your code here
library(lubridate)
sales$Ship.Date <- as.Date(sales$Ship.Date
,format ="%B %d %Y")
sales$Order.Date <- as.Date(sales$Order.Date
,format ="%m/%d/%Y"
)
max(sales$Order.Date) - min(sales$Order.Date)
```
# Exercise 4
What is the average number of days it takes to ship an order?
```{r}
# Your code here
sales$difference <- (sales$Ship.Date - sales$Order.Date)
mean(sales$difference)
```
# Exercise 5
How many customers have the first name Bill?
You will need to split the customer name into first and last name segments and then use a regular expression to match the first name bill. Use the length() function to determine the number of customers with the first name Bill in the sales data.
```{r}
# Your code here
names = unique(sales$Customer.Name)
length(grep("Bill", names))
```
# Exercise 6
How many mentions of the word 'table' are there in the Product.Name column?
**Note you can do this in one line of code**
```{r}
# Your code here
length(grep("(?i)table", sales$Product.Name))
```
# Exercise 7
Create a table of counts for each state in the sales data. The counts table should be ordered alphabetically from A to Z.
```{r}
# Your code here
table(sales$State)
```
# Exercise 8
Create an alphabetically ordered barplot for each sales Category in the State of Texas.
```{r}
# Your code here
texas_df = sales[sales$State=="Texas",]
texas_df$Category = factor(texas_df$Category)
barplot(table(texas_df$Category))
```
# Exercise 9
Find the average profit by region.
**Note: You will need to use the aggregate() function to do this. To understand how the function works type ?aggregate in the console.**
```{r}
# Your code here
aggregate(sales$Profit ~ sales$Region,sales,mean)
```
# Exercise 10
Find the average profit by order year.
**Note: You will need to use the aggregate() function to do this. To understand how the function works type ?aggregate in the console.**
```{r}
# Your code here
sales$Order.Year = format(sales$Order.Date,format="%Y")
aggregate(sales$Profit ~ sales$Order.Year,sales,mean)
```