-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathEDA.Rmd
775 lines (649 loc) · 52.7 KB
/
EDA.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
---
title: "Kobe Bryant EDA"
output: html_document
---
# Purpose
The purpose of this notebook is to show the many different ways to visualize the data from this competition. Given that the dataset provides a lot of information about the shot location, shot distance, opponents, and type of shot, there is a lot that can be discovered. This notebook can be a guide as an example of how to visualize NBA shot data in unique ways besides the most convenient graphs of scatterplots, histograms, and bar plots.
A number of ideas for visualizations came from the [R Graph Gallery](https://www.r-graph-gallery.com/). There are many examples of different types of visualizations and is worth exploring if you want to try to improve or diversify the visualizations that you make.
This notebook does not have anything with fitting a model for this competition, but a good notebook that goes over feature engineering and fitting a model for this competition can be found [here](https://www.kaggle.com/matt4byu/kobe-bryant-shot-selection-analysis-with-xgboost). In the future, I may create a second notebook that goes over model fitting and how well my model can do in this competition. All of the code that was used for this notebook is included as an appendix at the end of this document.
# Introduction and Getting Started
The data provided contains location and circumstance information about every field goal attempt by Kobe Bryant over the course of his career with the Los Angeles Lakers. The goal of this competition is to determine whether or not Kobe made or missed a certain shot. The different visualizations help to better understand the data better and to potentially understand additional features to consider when doing feature engineering.
Here are all of the packages that are used in this analysis.
```{r, message=FALSE, warning=FALSE, echo=FALSE}
library(devtools)
library(ballr)
library(tidyverse)
library(RColorBrewer)
library(forcats)
library(ggthemes)
library(viridis)
library(maps)
library(plotly)
library(leaflet)
library(ggridges)
library(hrbrthemes)
library(patchwork)
library(lubridate)
library(ggalt)
library(gganimate)
```
The data is contained in a single file, instead of a separate train and test file, so we will read it into a single date frame. I will create a dataset that has only the shots where it is known whether or not the shot was made, which will be used for some of the visualizations below. I also remove any shots that were further than 30 feet away from the basket for some of the visualizations because most of his shots were within 30 feet unless it was at the end of the shot clock or end of the quarter.
```{r, echo=FALSE, message=FALSE, warning=FALSE}
kobe <- read_csv("data.csv")
shot.data <- kobe %>%
filter(!is.na(shot_made_flag))
kobe.filter <- kobe %>%
filter(shot_distance <= 30) %>%
mutate(shot_zone_range = factor(shot_zone_range,
levels = c("Less Than 8 ft.", "8-16 ft.", "16-24 ft.", "24+ ft.")))
```
This code here creates the court that is used for a number of the visualizations, so I will run it at the beginning so it is available for any of the visualizations that follow.
```{r, warning=FALSE, message=FALSE, echo=FALSE}
library(ballr)
source("https://raw.githubusercontent.com/toddwschneider/ballr/master/plot_court.R")
source("https://raw.githubusercontent.com/toddwschneider/ballr/master/court_themes.R")
object <- plot_court()
court_points <- court_points %>%
mutate(desc = factor(desc),
x = x*10,
y = y*10)
test.three <- court_points %>% filter(desc == "three_point_line")
test.perimeter <- court_points %>% filter(desc == "perimeter")
test.outer_key <- court_points %>% filter(desc == "outer_key")
test.backboard <- court_points %>% filter(desc == "backboard")
test.neck <- court_points %>% filter(desc == "neck")
test.hoop <- court_points %>% filter(desc == "hoop")
test.foul_circle <- court_points %>% filter(desc == "foul_circle_top")
test.restricted <- court_points %>% filter(desc == "restricted")
```
# Shot Type Visualization
The first type of visualizations are related to the different types of shots and percentage for those different shots. There are a lot of different shot types on this first plot, but it is pretty easy to tell that Kobe made dunks and layups at a higher percentage than jump shots.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
shot.type <- shot.data %>%
group_by(action_type) %>%
summarize(FGPercent = round(mean(shot_made_flag) * 100, digits = 2))
ggplot(shot.type, aes(fct_reorder(action_type, FGPercent), FGPercent)) +
geom_bar(stat = "identity", fill = "#fdb927") +
geom_point(color = "#552583") +
xlab("Shot type") +
ylab("Field Goal Percentage") +
coord_flip()
```
This graphic simplifies the shot type into six different shot types and it isn't surprising that dunks and layups have a high percentage. Bank shots and hook shots are also shots that are likely closer to the basket, so it makes sense that they have a high field goal percentage as well. Jump shots are further away from the basket and so it makes sense that they have a lower field goal percentage as well.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
shot.type.small <- shot.data %>%
group_by(combined_shot_type) %>%
summarize(FGPercent = round(mean(shot_made_flag) * 100, digits = 2))
ggplot(shot.type.small, aes(fct_reorder(combined_shot_type, FGPercent), FGPercent)) +
geom_bar(stat = "identity", fill = "#fdb927") +
geom_point(color = "#552583", size = 3) +
xlab("Shot type") +
ylab("Field Goal Percentage") +
coord_flip() + theme_clean()
```
# Field Goal Percentage vs. Opponent
The data included the opponent for each game, which allowed me to calculate the field goal percentage for Kobe Bryant against each team. A circular barplot was produced to show the percentage against each team between the different conferences. Over his career, Kobe played more games against teams in the Western Conference than the Eastern Conference so those teams will have more games played against them. I used resources from [R Graph Gallery](https://www.r-graph-gallery.com/circular-barplot.html). to figure out how to create the circular bargraph.
As the plot below shows, Kobe shot the worst against the New Orleans Pelicans and the Brooklyn Nets. This isn't very surprising because the Pelicans were previously the New Orleans Hornets and the Nets were previously the New Jersey Nets. When the teams switched names or cities, it was later in Kobe's career when he dealt with more injuries and shot at a lower percentage than earlier in his career. On the other hand, Kobe shot the best against the Vancouver Grizzlies and the New York Knicks. The Vancouver Grizzlies were only located in Vancouver at the beginning of his career and then eventually moved to Memphis, so his field goal percentage is higher than most other teams. There doesn't seem to be much of a logical explanation why Kobe shot the best against the Knicks though. It seems that the attention that comes from playing in New York seemed to motivate Kobe to play better against the Knicks. The Knicks have also been a poor team most of Kobe's career, so it isn't extremely surprising that he would shoot extremely well against them.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
team_percentage <- shot.data %>%
group_by(opponent) %>%
summarize(FGPercent = mean(shot_made_flag == 1),
FG2Percent = sum(shot_made_flag == 1 & shot_type == "2PT Field Goal")/sum(shot_type == "2PT Field Goal"),
FG3Percent = sum(shot_made_flag == 1 & shot_type == "3PT Field Goal")/sum(shot_type == "3PT Field Goal"))
team_games <- shot.data %>%
group_by(opponent) %>%
summarize(Games_Against = length(unique(game_id)))
team_percentage <- team_percentage %>% inner_join(team_games, by = "opponent")
team_percentage$conference <- c("East", "East", "East", "East", "East", "East", "West", "West", "East", "West", "West", "East", "West", "West", "East", "East", "West", "East", "West", "West", "East", "West", "East", "East", "West", "West", "West", "West", "West", "East", "West", "West", "East")
team_percentage$conference <- factor(team_percentage$conference)
team_percentage <- team_percentage %>% arrange(desc(FGPercent))
empty_bar <- 10
to_add <- data.frame( matrix(NA, empty_bar*nlevels(team_percentage$conference), ncol(team_percentage)) )
colnames(to_add) <- colnames(team_percentage)
to_add$conference <- rep(levels(team_percentage$conference), each=empty_bar)
team_percentage <- rbind(team_percentage, to_add)
team_percentage <- team_percentage %>% arrange(conference)
team_percentage$id <- seq(1, nrow(team_percentage))
label_data <- team_percentage
number_of_bar <- nrow(label_data)
angle <- 90 - 360 * (label_data$id-0.5) /number_of_bar # I substract 0.5 because the letter must have the angle of the center of the bars. Not extreme right(1) or extreme left (0)
label_data$hjust <- ifelse( angle <= -90, 1, 0)
label_data$angle <- ifelse(angle <= -90, angle+180, angle)
base_data <- team_percentage %>%
group_by(conference) %>%
summarize(start=min(id), end=max(id) - empty_bar) %>%
rowwise() %>%
mutate(title=mean(c(start, end)))
grid_data <- base_data
grid_data$end <- grid_data$end[ c( nrow(grid_data), 1:nrow(grid_data)-1)] + 1
grid_data$start <- grid_data$start - 1
grid_data <- grid_data[-1,]
ggplot(team_percentage, aes(x=as.factor(id), y=FGPercent*100, fill=conference)) + # Note that id is a factor. If x is numeric, there is some space between the first bar
geom_bar(aes(x=as.factor(id), y=FGPercent*100, fill=conference), stat="identity", alpha=0.5) +
# Add a val=100/75/50/25 lines. I do it at the beginning to make sur barplots are OVER it.
geom_segment(data=grid_data, aes(x = end, y = 80, xend = start, yend = 80), colour = "grey", alpha=1, size=0.3 , inherit.aes = FALSE ) +
geom_segment(data=grid_data, aes(x = end, y = 60, xend = start, yend = 60), colour = "grey", alpha=1, size=0.3 , inherit.aes = FALSE ) +
geom_segment(data=grid_data, aes(x = end, y = 40, xend = start, yend = 40), colour = "grey", alpha=1, size=0.3 , inherit.aes = FALSE ) +
geom_segment(data=grid_data, aes(x = end, y = 20, xend = start, yend = 20), colour = "grey", alpha=1, size=0.3 , inherit.aes = FALSE ) +
# Add text showing the value of each 100/75/50/25 lines
annotate("text", x = rep(max(team_percentage$id),4), y = c(20, 40, 60, 80), label = c("20", "40", "60", "80") , color="grey", size=3 , angle=0, fontface="bold", hjust=1) +
geom_bar(aes(x=as.factor(id), y=FGPercent, fill=conference), stat="identity", alpha=0.5) +
ylim(-50,100) +
theme_minimal() +
theme(
legend.position = "none",
axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank(),
plot.margin = unit(rep(-1,4), "cm")
) +
coord_polar() +
geom_text(data=label_data, aes(x=id, y=(FGPercent*100)+10, label=paste(opponent, " ", round(FGPercent*100, digits = 2), "%", sep = ""), hjust=hjust), color="black", fontface="bold",alpha=0.6, size=2.5, angle= label_data$angle, inherit.aes = FALSE) +
# Add base line information
geom_segment(data=base_data, aes(x = start, y = -5, xend = end, yend = -5), colour = "black", alpha=0.8, size=0.6 , inherit.aes = FALSE ) +
geom_text(data=base_data, aes(x = title, y = -19, label=conference), hjust=c(1,0), colour = "black", alpha=0.8, size=4, fontface="bold", inherit.aes = FALSE) + labs(title ="Kobe Bryant Field Goal Percentage by Team") + scale_fill_manual(values = c("#0168ad", "#ed174b"))
```
I also made a few maps that show the field goal percentage against different opponents. I have a data set that includes the latitude and longitude for NBA arenas and that was used to create the map. The colors of the dots correspond with the field goal percentage against each team and then the size of the dots correspond with the number of games played against each team. The interactive maps include a hover tool that tell the name of the team, the arena name, the number of games played against the team, and the exact field goal percentage against the team. These visualizations don't necessarily provide additional information about how Kobe performed against certain teams, but it presents the information in an interesting way for someone to explore.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
arena <- read.csv("arenas.csv")
nba.arena <- arena %>% filter(leagueName == "National Basketball Association") %>%
select(teamName, venueName, lat, long)
extra_teams <- data.frame(teamName = c("Vancouver Grizzlies", "Seattle SuperSonics", "New Orleans Hornets", "New Jersey Nets"), venueName = c("Rogers Arena", "Key Arena", "Pete Maravich Assembly Center", "Continental Airlines Arena"), lat = c(49.277836, 47.6200, 30.4143, 40.8116), long = c(-123.108823, -122.3525, -91.1846, -74.0676))
nba.arena <- nba.arena %>% bind_rows(extra_teams) %>%
filter(teamName != "Los Angeles Lakers", teamName != "Indianapolis Olympians") %>%
slice(c(1:2, 5:nrow(.))) %>%
mutate(id = 1:33)
nba.arena$opponent <- c("ATL", "BOS", "BKN", "CHA", "CHI", "CLE", "DAL", "DEN", "DET", "GSW", "HOU", "IND", "LAC", "MEM", "MIA", "MIL", "MIN", "NOP", "NYK", "OKC", "ORL", "PHI", "PHX", "POR", "SAC", "SAS", "TOR", "UTA", "WAS", "VAN", "SEA", "NOH", "NJN")
teams <- team_percentage %>% inner_join(nba.arena, by = "opponent")
mybreaks <- c(.4, .415, .43, .445, .46, .475)
map <- map_data("world") %>% filter(region %in% c("Canada", "USA"))
teams <- teams %>%
mutate(mytext = paste(teamName, "\n", venueName, "\n", "Conference: ", conference , "\n", "Games Against: ", Games_Against, "\n", "Field Goal Percentage: ", round(FGPercent*100, digits = 2), "%", sep = ""))
p <- ggplot() +
geom_polygon(data = map, aes(x=long, y = lat, group = group), fill="grey", alpha=0.7) +
geom_point( data=teams, aes(x=long, y=lat, color=FGPercent, size = FGPercent, text = mytext)) +
scale_size_continuous(range=c(1,5), name = "Field Goal Percent", breaks = mybreaks) +
scale_color_viridis(option = "viridis", name = "Field Goal Percent", breaks = mybreaks) +
theme_void() + coord_fixed(xlim = c(-135, -60), ylim = c(20, 55), ratio = 1.2) + guides( colour = guide_legend()) + theme(legend.position = "bottom")
p.anim <- ggplotly(p, tooltip = "text")
p
p.anim
mytext_leaflet = paste(teams$teamName, "<br/>", teams$venueName, "<br/>", "Conference: ", teams$conference , "<br/>", "Games Against: ", teams$Games_Against, "<br/>", "Field Goal Percentage: ", round(teams$FGPercent*100, digits = 2), "%", sep = "") %>% lapply(htmltools::HTML)
mypalette <- colorBin(palette = "viridis", pretty = TRUE, domain = teams$FGPercent)
teams %>%
leaflet() %>%
addProviderTiles("Esri") %>%
addCircleMarkers(lng = ~long, lat = ~lat, label = mytext_leaflet,
color = ~mypalette(FGPercent), radius = 7,
fillOpacity = 1, stroke = FALSE) %>%
addLegend(pal = mypalette, values = ~FGPercent, opacity = .75, title = "Field Goal Percent", position = "bottomleft")
teams %>%
leaflet() %>%
addProviderTiles("Esri") %>%
addCircleMarkers(lng = ~long, lat = ~lat, label = mytext_leaflet,
color = ~mypalette(FGPercent),
fillOpacity = 1, stroke = FALSE, radius = ~(Games_Against/5)) %>%
addLegend(pal = mypalette, values = ~FGPercent, opacity = .75, title = "Field Goal Percent", position = "bottomleft")
```
# Shot Distributions
Next, I thought it would be useful to look at ridgeline plots for the distribution of shot distance against various variables.
The first plot shows the density of the shot distance for each season. At the beginning of his career, Kobe was often taking shots that are in the paint and then it seemed that mid-range jump shots were very common for him as well. He didn't attempt as many three-pointers at the beginning of his career. As his career (and the NBA) evolved, he began to shot more threes, while also still getting to the basket a lot.
The most drastic change in where he was taking his shots from occurs after he tears his Achilles, which isn't very surprising. After the injury, he didn't have the same quick first step and was unable to get to the basket as easily. As a result, he began to take more shots further away from the basket since those are the shots that he was able to get easier. By his final season, most of this shots were jump shots or three, which again is not surprising if you watched any of his games from his final season.
From this plot, I would say that it is likely that Kobe's field goal percentage is likely to be lower after his injury because he was taking more jump shots and got fewer layups and dunks, which were the shots that he (and all NBA players make) at a higher percentage than jump shots.
```{r,warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
ggplot(kobe.filter, aes(x = shot_distance, y = season, fill = ..x..)) +
geom_density_ridges_gradient(scale = 2, rel_min_height = 0.01) +
scale_fill_viridis(option = "C") +
labs(title = 'Kobe Bryant Shot Distance by Season', y = "", caption = "Shot distance (ft.)") +
theme_fivethirtyeight() +
theme(
legend.position="none",
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8)
)
```
Next I wanted to see if the distribution of shots changed based on the number of minutes remaining in the game. At the end of the game, you would expect teams to prevent getting to the basket for a layup, especially if the game is close. For the most part, it seems that the shot distribution is about the same regardless how much time is left. There is a spike of mid range jumpers in the first minute of a quarter and there also seems to be more 3-point shots in the last minute, which makes sense since a lot of the time teams need a 3 to tie or win the game or there is little time to get a shot off at the end of a quarter. Besides those two situations, it appears that the shot distribution for Kobe was very similar regardless of how much time was left.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
ggplot(kobe.filter, aes(x = shot_distance, y = as.factor(minutes_remaining), fill = ..x..)) +
geom_density_ridges_gradient(scale = 3, rel_min_height = 0.01) +
scale_fill_viridis(option = "inferno") +
labs(title = 'Kobe Bryant Shot Distance by Minutes Remaining', y = "", x = "Shot distance (ft.)", caption = "Shot distance (ft.)") +
theme_fivethirtyeight() +
theme(
legend.position="none",
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8)
)
```
# Field Goal Percentage Over Time
The next few plots look at Kobe's shooting percentage of the course of his career. The first few years of his career, Kobe had a lower field goal percentage, which isn't very surprising since rookies tend to have an adjustment period and struggle during the start of their career. The next seasons were relatively consistent, with some fluctuation from year to year, but there wasn't a sharp decline in field goal percentage until Kobe suffered his Achilles injury.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
season.fg <- shot.data %>%
group_by(season) %>%
summarize(FGPercent = round(mean(shot_made_flag), digits = 4),
Percent2 = sum(shot_made_flag == 1 & shot_type == "2PT Field Goal")/sum(shot_type == "2PT Field Goal"),
percent3 = sum(shot_made_flag == 1 & shot_type == "3PT Field Goal")/sum(shot_type == "3PT Field Goal"))
ggplot(season.fg, aes(x = season, y = FGPercent)) +
geom_line(group = 1, color = "gray45") +
geom_point(shape = 21, color = "black", fill = "darkmagenta", size = 4) +
theme_fivethirtyeight() + scale_y_percent(limits = c(.33, .48)) + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + labs(title = "Kobe Bryant Field Goal Percentage Over Time")
```
It was also useful to compare field goal percentage between the regular season and the playoffs. The teams in the playoffs will play harder, make more of an effort on defense, and are in general better teams, so it wouldn't be surprising if the field goal percentage is lower during the playoffs. From the plot below, it appears that this trend holds up and in general, Kobe shot a lower percentage during the playoffs than he did during the regular season. There are a few seasons where he was better in the playoffs, but this is also likely due to the smaller sample size in the playoffs since the most games that Kobe ever played in the playoffs was 23 games.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
playoffs <- shot.data %>%
filter(playoffs == 1) %>%
group_by(season) %>%
summarize(FGPercent = round(mean(shot_made_flag), digits = 4),
Percent2 = sum(shot_made_flag == 1 & shot_type == "2PT Field Goal")/sum(shot_type == "2PT Field Goal"),
percent3 = sum(shot_made_flag == 1 & shot_type == "3PT Field Goal")/sum(shot_type == "3PT Field Goal"))
reg.season <- shot.data %>%
filter(playoffs == 0) %>%
group_by(season) %>%
summarize(FGPercent = round(mean(shot_made_flag), digits = 4),
Percent2 = sum(shot_made_flag == 1 & shot_type == "2PT Field Goal")/sum(shot_type == "2PT Field Goal"),
percent3 = sum(shot_made_flag == 1 & shot_type == "3PT Field Goal")/sum(shot_type == "3PT Field Goal"))
ggplot(season.fg, aes(x = season, group = 1)) +
geom_line(data = playoffs, aes(color = "Play", y = FGPercent)) +
geom_line(data = reg.season, aes(color = "Season", y = FGPercent)) +
geom_point(data = playoffs, aes(color = "Play", y = FGPercent), size = 3) +
geom_point(data = reg.season, aes(color = "Season", y = FGPercent), size = 3) +
scale_y_percent(limits = c(.33, .52)) + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "bottom", panel.grid.major.x = element_blank(), panel.grid.major.y = element_line(linetype = "dashed")) + scale_color_manual(name ="", values = c("#195190FF", "#A2A2A1FF"), labels = c("Playoffs", "Regular Season")) + labs(x = "", y = "Percent", title = "Kobe Bryant Field Goal Percentage", subtitle = "Regular Season vs. Playoffs")
```
It is also important to look at the type of shots. Almost all NBA players shoot a higher percentage on 2 point attempts than 3-point attempts, which is also the case with Kobe. The 3-point shot hadn't become extremely popular in the NBA during most of Kobe's career, which is probably one reason that he shot so much lower from 3, but in general Kobe wasn't considered a great 3-point shooter as well. A lot of his shots were highly contested and off the dribble. Catch and shoot threes are shot at a lot higher percentage than other 3s, but Kobe very rarely played off the ball during his career, so it isn't surprising that he was a lot worse from 3.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
ggplot(season.fg, aes(x = season, group = 1)) +
geom_line(aes(color = "Two", y = Percent2)) +
geom_line(aes(color = "Three", y = percent3)) +
geom_point(aes(color = "Two", y = Percent2), size = 3) +
geom_point(aes(color = "Three", y = percent3), size = 3) +
scale_y_percent(limits = c(.15, .52)) + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "bottom", panel.grid.major.x = element_blank(), panel.grid.major.y = element_line(linetype = "dashed")) + scale_color_manual(name ="", values = c("#195190FF", "#A2A2A1FF"), labels = c("3 PT", "2 PT")) + labs(x = "", y = "Percent", title = "Kobe Bryant Percentage", subtitle = "2 PT vs 3 PT")
```
I thought it would also be interesting to see the field goal percentage based on distance away from the basket. Shots that are right next to the basket have the highest percentage and then the percentage slowly decreases the further away from the basket until you get to deep three pointers (27+ feet away) where there is a drastic decline in field goal percentage, since Kobe generally didn't shot those shots and they were likely end of shot clock or end of quarter shots.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
fg.distance <- shot.data %>%
filter(shot_distance <= 32) %>%
mutate(shot_distance = factor(shot_distance)) %>%
group_by(shot_distance) %>%
summarize(FGPercent = round(mean(shot_made_flag), digits = 4))
ggplot(fg.distance, aes(x = shot_distance, y = FGPercent)) +
geom_line(group = 1, color = "gray45") +
geom_point(shape=21, color="black", fill="darkmagenta", size=4) +
theme_fivethirtyeight() + scale_y_percent(limits = c(0, .7)) + labs(title = "Kobe Bryant Field Goal Percentage by Distance", caption = "Distance (ft.)")
```
I thought it would be interesting to see if there was any difference in field goal percentage for Kobe based on whether or not it was a home game for him. In baseball for example, home and road splits are often very different because of the climate and park layout. With basketball, you wouldn't expect the splits to be as different because the courts are all the same size and they are played in a controlled indoor environment. The graph below seems to support the idea that there isn't much of a difference between home and road splits. There are some years where Kobe shot better at home and some years he was better on the road, so it doesn't seem like there is anything specific that causes the difference in home and road field goal percentage besides year to year variation. The same pattern is discovered if we look at 3 point percentage for home and away games, so it seems there is only year to year variation for his shooting percentage at home and on the road and there is no benefit to playing on his home court.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
shots.home <- shot.data %>%
mutate(home = ifelse(str_detect(matchup, "@") == FALSE, 1, 0)) %>%
filter(home == 1) %>%
group_by(season) %>%
summarize(FGPercent = round(mean(shot_made_flag), digits = 4),
Percent2 = sum(shot_made_flag == 1 & shot_type == "2PT Field Goal")/sum(shot_type == "2PT Field Goal"),
percent3 = sum(shot_made_flag == 1 & shot_type == "3PT Field Goal")/sum(shot_type == "3PT Field Goal"))
shots.away <- shot.data %>%
mutate(home = ifelse(str_detect(matchup, "@")== FALSE, 1, 0)) %>%
filter(home == 0) %>%
group_by(season) %>%
summarize(FGPercent = round(mean(shot_made_flag), digits = 4),
Percent2 = sum(shot_made_flag == 1 & shot_type == "2PT Field Goal")/sum(shot_type == "2PT Field Goal"),
percent3 = sum(shot_made_flag == 1 & shot_type == "3PT Field Goal")/sum(shot_type == "3PT Field Goal"))
ggplot(season.fg, aes(x = season, group = 1)) +
geom_line(data = shots.home, aes(color = "Home", y = FGPercent)) +
geom_line(data = shots.away, aes(color = "Away", y = FGPercent)) +
geom_point(data = shots.home, aes(color = "Home", y = FGPercent), size = 3) +
geom_point(data = shots.away, aes(color = "Away", y = FGPercent), size = 3) +
scale_y_percent(limits = c(.33, .52)) + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "bottom", panel.grid.major.x = element_blank(), panel.grid.major.y = element_line(linetype = "dashed"), plot.title = element_text(hjust = .5, face = "bold"), plot.subtitle = element_text(hjust = .5)) + scale_color_manual(name ="", values = c("#195190FF", "#A2A2A1FF"), labels = c("Home", "Away")) + labs(x = "", y = "Percent", title = "Kobe Bryant Field Goal Percentage", subtitle = "Home vs. Away")
```
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
ggplot(season.fg, aes(x = season, group = 1)) +
geom_line(data = shots.home, aes(color = "Home", y = percent3)) +
geom_line(data = shots.away, aes(color = "Away", y = percent3)) +
geom_point(data = shots.home, aes(color = "Home", y = percent3), size = 3) +
geom_point(data = shots.away, aes(color = "Away", y = percent3), size = 3) +
scale_y_percent(limits = c(0, .45)) + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "bottom", panel.grid.major.x = element_blank(), panel.grid.major.y = element_line(linetype = "dashed"), plot.title = element_text(hjust = .5, face = "bold"), plot.subtitle = element_text(hjust = .5)) + scale_color_manual(name ="", values = c("#195190FF", "#A2A2A1FF"), labels = c("Home", "Away")) + labs(x = "", y = "Percent", title = "Kobe Bryant 3PT Percentage", subtitle = "Home vs. Away")
```
# Summary Plots
The next group of plots are categorized as summary plots. I have tried to take the data that was provided and summarize it in a way that is useful to understand which shots Kobe is more or less likely to make.
The first chart shows the percentage that Kobe shot from each area of the court. As was shown earlier, Kobe shot the best from the restricted area and then from the non-restricted area in the paint. We also can see that Kobe shot a lot better from the left corner three than from the the right corner, so it seems that either Kobe preferred that shot or he simply got the ball for more open shots in that corner. Either way, it seems preferrable that he would shot from the left corner rather than the right corner.
```{r, message=FALSE, warning=FALSE, echo=FALSE}
shot.area <- shot.data %>%
filter(shot_distance <= 32) %>%
group_by(shot_zone_basic) %>%
summarize(shots_attempted = length(shot_made_flag),
shots_made = sum(shot_made_flag == 1),
mloc_x = mean(loc_x),
mloc_y = mean(loc_y),
shot_accuracy = shots_made/shots_attempted,
label = paste(round(shot_accuracy*100, digits = 1), "%", sep = ""))
shot.area$mloc_y[shot.area$shot_zone_basic == "Above the Break 3"] <- shot.area$mloc_y[shot.area$shot_zone_basic == "Above the Break 3"] + 30
shot.area$mloc_y[shot.area$shot_zone_basic == "Mid-Range"] <- shot.area$mloc_y[shot.area$shot_zone_basic == "Mid-Range"] + 35
ggplot() + geom_point(data = shot.area, aes(x = mloc_x, y = mloc_y + 69, color = shot_zone_basic), size = 4) + geom_path(data = test.outer_key, aes(x= x, y = y)) + geom_path(data = test.perimeter, aes(x= x, y = y)) + geom_path(data = test.three, aes(x= x, y = y)) + ylim(-10, 350) +
geom_path(data = test.backboard, aes(x= x, y = y)) +
geom_path(data = test.neck, aes(x= x, y = y)) +
geom_path(data = test.hoop, aes(x= x, y = y)) +
geom_path(data = test.foul_circle, aes(x= x, y = y)) +
geom_path(data = test.restricted, aes(x= x, y = y)) +
coord_fixed() + theme_void() + geom_text(data = shot.area, aes(x = mloc_x, y = mloc_y + 96, color =shot_zone_basic, label = label), size =5) + labs(title = "Kobe Bryant Shot Accuracy") + theme(plot.title = element_text(size = 17, face = "bold", hjust = .65), legend.title = element_blank())
```
After looking at the percentages that he shot from each area, I thought it would be useful to see how many shots he took from each area. Ideally, based off of the previous chart above, you would want him to get most of his shots from the restricted area, in the paint, or from the left corner three. As you can see below, Kobe took more shots from the mid range than any other area and it wasn't even close. The restricted area had the second most shots but that was at least 4000 shots less. He shot a lot from the above the break 3, even though he shot a lower percentage. It also appears that the higher percentage from the left corner in comparison to the right corner might be due to a small sample size since there are very few shots taken from those areas. Regardless, even though he shot a higher percentage from the left corner, Kobe very rarely was taking shots from the left corner 3.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
shot.area <- shot.area %>%
arrange(shots_attempted) %>%
mutate(shot_zone_basic = factor(shot_zone_basic, shot_zone_basic))
ggplot(data = shot.area, aes(x = fct_reorder(shot_zone_basic, shots_attempted), y = shots_attempted)) +
geom_segment( aes(x=shot_zone_basic, xend=shot_zone_basic, y=0, yend=shots_attempted), color="#E10600FF", alpha = .6, size = .75) +
geom_point( color="#00239CFF", size=6) + theme_light() +
coord_flip() +
theme(
panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.y = element_blank()
) + labs(x = "", y = "Number of shots")
```
Since most of his shots were from the mid-range, you would expect most of his shots to be jump shots and the graph below shows that almost all of his shots were jump shots or layups, with some dunks mixed in. Hook, bank, and tip shots are all pretty rare in comparison to the other types of shots, so this plot doesn't show anything surprising.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
shot.types <- shot.data %>%
filter(shot_distance <= 32) %>%
group_by(combined_shot_type) %>%
summarize(shots_attempted = length(shot_made_flag)) %>%
arrange(shots_attempted) %>%
mutate(combined_shot_type = factor(combined_shot_type, combined_shot_type))
ggplot(data = shot.types, aes(x = combined_shot_type, y = shots_attempted)) +
geom_segment( aes(x=combined_shot_type, xend=combined_shot_type, y=0, yend=shots_attempted), color="#E10600FF", alpha = .6, size = .75) +
geom_point( color="#00239CFF", size=6) + theme_light() +
coord_flip() +
theme(
panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.y = element_blank()
) + labs(x = "", y = "Number of shots")
```
I looked at home many shots Kobe took on average per quarter. I thought that he would take the most shots in the 4th quarter, but he actually took more shots in the 3rd on average, but that may have to do with substitution patterns of his coach, where they would play him most of the 3rd quarter and have him rest at the beginning of the 4th quarter. The overtime periods are the smallest, which isn't surprising because they are only 5 minutes long in comparison with the other quarters, which are 12 minutes long.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
periods <- shot.data %>%
filter(shot_distance <= 32) %>%
group_by(period, game_id) %>%
summarize(avg_shots_attempted = length(shot_made_flag)/length(unique(game_id))) %>%
ungroup() %>%
group_by(period) %>%
summarize(avg_shots_q = mean(avg_shots_attempted)) %>%
arrange(avg_shots_q) %>%
mutate(period = factor(period, period))
levels(periods$period) <- c("2OT", "OT", "3OT", "2nd", "4th", "1st", "3rd")
ggplot(data = periods, aes(x = period, y = avg_shots_q)) +
geom_segment( aes(x=period, xend=period, y=0, yend=avg_shots_q), color="#E10600FF", alpha = .6, size = .75) +
geom_point( color="#00239CFF", size=6) + theme_light() +
coord_flip() +
theme(
panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.y = element_blank(),
plot.title = element_text(size = 14, face = "bold", hjust = .5)
) + labs(x = "", y = "Average number of shots", title = "Kobe Bryant Average Shots Per Period")
```
After looking at the average number of shots per quarter, I thought it would be useful to see how many shots were taken for each minute remaining in the quarter. Since Kobe was generally the best player on his team (or one of the best), he often took the shots towards the end of the quarter, so I assumed that he would have more shots in the last minute than other minutes, which ended being very accurate. He had a lot more shots in the last minute of a quarter than at any other time of the quarter, which mean he generally had the ball when it mattered most and time was running low.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
minutes <- shot.data %>%
filter(shot_distance <= 32) %>%
group_by(minutes_remaining) %>%
summarize(shots_attempted = length(shot_made_flag),
FGPercentage = round(mean(shot_made_flag) * 100), digits = 2) %>%
arrange(shots_attempted) %>%
mutate(minutes_remaining = factor(minutes_remaining, minutes_remaining))
ggplot(data = minutes, aes(x = minutes_remaining, y = shots_attempted)) +
geom_segment( aes(x= minutes_remaining, xend= minutes_remaining, y = 0, yend=shots_attempted), color = "#E10600FF", alpha = .6, size = .75) +
geom_point(color="#00239CFF", size = 6) + theme_light() +
coord_flip() +
theme(
panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.y = element_blank(),
plot.title = element_text(size = 14, face = "bold", hjust = .5)
) + labs(x = "", y = "Number of shots", title = "Shots for Minutes Remaining in the Quarter")
```
Once I found that he took more shots with less than a minute left in the quarter, it got me interested in how well he actually shot in the last minute (and other times during the quarter). Not surprisingly, the only time he shot under 40% was in the last minute of the game, which makes sense since those shots tend to be more difficult and more heavily contested as well.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
minutes <- minutes %>%
arrange(FGPercentage) %>%
mutate(minutes_remaining = factor(minutes_remaining, minutes_remaining))
ggplot(data = minutes, aes(x = minutes_remaining, y = FGPercentage)) +
geom_segment( aes(x = minutes_remaining, xend = minutes_remaining, y = 0, yend = FGPercentage), color = "#E10600FF", alpha = .6, size = .75) +
geom_point( color = "#00239CFF", size = 6) + theme_light() +
coord_flip() +
theme(
panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.y = element_blank(),
plot.title = element_text(size = 14, face = "bold", hjust = .5)
) + labs(x = "", y = "Field Goal Percentage", title = "Percentage for Minutes Remaining in the Quarter")
```
# Shot Charts
I also made a few graphs that are similar to charts that Kirk Goldsberry has made. They aren't exactly the same, but some of them are similar and made using R. If you haven't seen example of the shot charts that Kirk Goldsberry has made, there are a few examples in this [article](https://fivethirtyeight.com/features/how-mapping-shots-in-the-nba-changed-it-forever/).
The first chart is a shot chart of all of Kobe's shots across the basketball court. It shows that Kobe shot from everywhere on the floor over the course of his career.
```{r, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center'}
col_scheme <- c("firebrick4","darkorange2","gold1",
"chartreuse4","dodgerblue", "darkorchid4")
ggplot() +
geom_point(data = kobe.filter,
aes(x = loc_x, y = loc_y + 52, color = shot_distance),
size = .4) +
geom_path(data = test.outer_key,
aes(x = x, y = y)) +
geom_path(data = test.perimeter,
aes(x = x, y = y)) +
geom_path(data = test.three,
aes(x = x, y = y)) +
ylim(-10, 350) +
geom_path(data = test.backboard,
aes(x = x, y = y)) +
geom_path(data = test.neck,
aes(x = x, y = y)) +
geom_path(data = test.hoop,
aes(x = x, y = y)) +
geom_path(data = test.foul_circle,
aes(x = x, y = y)) +
geom_path(data = test.restricted,
aes(x = x, y = y)) +
coord_fixed() +
theme_void() +
scale_color_gradientn(colors = col_scheme,
limits = c(0,30),
breaks = c(0, 30),
name = 'Shot Distance (ft)') +
labs(title = "Kobe Bryant Shot Distance")
```
Next, I made the same shot chart, but instead of the color being the distance from the basket, it is the range from the basket, which is really the distance broken into four groups.
```{r, warning = FALSE, message = FALSE, echo=FALSE,fig.align='center'}
ggplot() +
geom_point(data = kobe.filter,
aes(x = loc_x, y = loc_y + 52, color = shot_zone_range),
size = .4) +
geom_path(data = test.outer_key,
aes(x = x, y = y)) +
geom_path(data = test.perimeter,
aes(x = x, y = y)) +
geom_path(data = test.three,
aes(x = x, y = y)) +
ylim(-10, 350) +
geom_path(data = test.backboard,
aes(x = x, y = y)) +
geom_path(data = test.neck,
aes(x = x, y = y)) +
geom_path(data = test.hoop,
aes(x = x, y = y)) +
geom_path(data = test.foul_circle,
aes(x = x, y = y)) +
geom_path(data = test.restricted,
aes(x = x, y = y)) +
coord_fixed() +
theme_void() +
scale_color_manual(values = c("#D92027", "#FF9234", "#FFCD3C", "#35D0BA"),
name = "Zone Range") +
guides(colour = guide_legend(reverse = TRUE, override.aes = list(size = 5))) +
labs(title = "Kobe Bryant Shot Zone Range")
```
Another similar chart to the previous ones, but the area of the court is highlighted rather than the distance from the basket.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
ggplot() +
geom_point(data = kobe.filter,
aes(x = loc_x, y = loc_y + 52, color = shot_zone_area),
size = .4) +
geom_path(data = test.outer_key,
aes(x = x, y = y)) +
geom_path(data = test.perimeter,
aes(x = x, y = y)) +
geom_path(data = test.three,
aes(x = x, y = y)) +
ylim(-10, 350) +
geom_path(data = test.backboard,
aes(x = x, y = y)) +
geom_path(data = test.neck,
aes(x = x, y = y)) +
geom_path(data = test.hoop,
aes(x = x, y = y)) +
geom_path(data = test.foul_circle,
aes(x = x, y = y)) +
geom_path(data = test.restricted,
aes(x = x, y = y)) +
coord_fixed() +
theme_void() +
scale_color_manual(values = col_scheme, name = "Zone Range") +
guides(color = guide_legend(reverse = TRUE, override.aes = list(size = 5))) +
labs(title = "Kobe Bryant Shot Zone Range")
```
This plot shows all of Kobe's makes and misses on the court, which surprisingly is not very informative. With the quantity of shots over the course of his entire career, Kobe practically made and missed a shot from almost every location on the court.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
ggplot() +
geom_point(data = shot.data,
aes(x = loc_x, y = loc_y + 52, color = as.factor(shot_made_flag)), size = .75) +
geom_path(data = test.outer_key,
aes(x = x, y = y)) +
geom_path(data = test.perimeter,
aes(x = x, y = y)) +
geom_path(data = test.three,
aes(x = x, y = y)) +
ylim(-10, 350) +
geom_path(data = test.backboard,
aes(x = x, y = y)) +
geom_path(data = test.neck,
aes(x = x, y = y)) +
geom_path(data = test.hoop,
aes(x = x, y = y)) +
geom_path(data = test.foul_circle,
aes(x = x, y = y)) +
geom_path(data = test.restricted,
aes(x = x, y = y)) +
coord_fixed() +
theme_void() +
scale_color_manual(values = c("#fdb927", "#552583"), labels = c("Miss", "Make"), name = "Zone Range") +
guides(colour = guide_legend(reverse = TRUE, override.aes = list(size = 5))) +
labs(title = "Kobe Bryant Shot Distance")
```
The next few charts aim at taking the previous information from the shot charts and creating graphics where some useful information can be extracted from them. First, this is a heat map of Kobe's shots outside of the restricted area. He shot so many shots in the restricted area that it was difficult to determine where he shot from the most outside of the restricted area, so I excluded shots that were in the restricted area. This heat map shows that most of his shots were from the non-restricted area in the paint or from the left side mid range from either the elbow or baseline. He clearly preferred to shoot from the left side rather than from the right side.
```{r, warnings = FALSE, messages = FALSE, echo=FALSE, fig.align='center'}
ggplot() +stat_density_2d(
data = shot.data %>% filter(shot_zone_basic != "Restricted Area"),
aes(x = loc_x, y = loc_y + 52, fill = stat(density / max(density))),
geom = "raster", na.rm = TRUE, contour = FALSE, interpolate = TRUE, n = 300
) + geom_path(data = test.outer_key, aes(x= x, y = y), color = "white") + geom_path(data = test.perimeter, aes(x= x, y = y), color = "white", na.rm = TRUE) + geom_path(data = test.three, aes(x= x, y = y), color = "white") + ylim(-10, 350) +
geom_path(data = test.backboard, aes(x= x, y = y),color = "white") +
geom_path(data = test.neck, aes(x= x, y = y),color = "white") +
geom_path(data = test.hoop, aes(x= x, y = y),color = "white") +
geom_path(data = test.foul_circle, aes(x= x, y = y),color = "white") +
geom_path(data = test.restricted, aes(x= x, y = y),color = "white") +
coord_fixed() + theme_void() + scale_fill_viridis_c(
"Shot Frequency ",
limits = c(0, 1),
breaks = c(0, 1),
labels = c("Lower", "Higher"),
option = "inferno"
) + theme(plot.background = element_rect(fill = "black"),
legend.text = element_text(color = "white"),
legend.title = element_text(color = "white", vjust = 1),
plot.title = element_text(color = "white", hjust = .5, face = "bold"),
legend.position = "bottom",
plot.margin=unit(c(0,0,0,.0), "null"),
panel.background = element_rect(fill = "black"),
strip.background = element_blank(),
plot.subtitle = element_text(color = "white", hjust = .5, face = "bold")) + labs(title = "Kobe Bryant Shot Frequency", subtitle = "Outside of the Restricted Area")
```
I split the data into the made and missed shots to see if there was a difference in the heat maps between makes and misses. There were a lot more makes that were in the paint, while the misses included a lot more shots from the mid-range and behind the 3 point line. It shows that if Kobe was in the paint, he had a higher chance of making the shot and it also shows he didn't shoot particularly well from outside the paint.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
made <- ggplot() + stat_density_2d(
data = shot.data %>% filter(shot_zone_basic != "Restricted Area") %>% filter(shot_made_flag == 1),
aes(x = loc_x, y = loc_y + 52, fill = stat(density / max(density))),
geom = "raster", na.rm = TRUE, contour = FALSE, interpolate = TRUE, n = 300
) + geom_path(data = test.outer_key, aes(x= x, y = y), color = "white") + geom_path(data = test.perimeter, aes(x= x, y = y), color = "white", na.rm = TRUE) + geom_path(data = test.three, aes(x= x, y = y), color = "white") + ylim(-10, 350) +
geom_path(data = test.backboard, aes(x= x, y = y),color = "white") +
geom_path(data = test.neck, aes(x= x, y = y),color = "white") +
geom_path(data = test.hoop, aes(x= x, y = y),color = "white") +
geom_path(data = test.foul_circle, aes(x= x, y = y),color = "white") +
geom_path(data = test.restricted, aes(x= x, y = y),color = "white") +
coord_fixed() + theme_void() + scale_fill_viridis_c(
"Shot Frequency ",
limits = c(0, 1),
breaks = c(0, 1),
labels = c("Lower", "Higher"),
option = "inferno"
) + theme(plot.background = element_rect(fill = "black"),
legend.text = element_text(color = "white"),
legend.title = element_text(color = "white", vjust = 1),
plot.title = element_text(color = "white", hjust = .5, face = "bold"),
legend.position = "bottom",
plot.margin=unit(c(0,0,0,.0), "null"),
panel.background = element_rect(fill = "black"),
strip.background = element_blank(),
plot.subtitle = element_text(color = "white", hjust = .5, face = "bold")) + labs(title = "Kobe Bryant Shot Frequency", subtitle = "Made Baskets")
miss <- ggplot() + stat_density_2d(
data = shot.data %>% filter(shot_zone_basic != "Restricted Area") %>% filter(shot_made_flag == 0),
aes(x = loc_x, y = loc_y + 52, fill = stat(density / max(density))),
geom = "raster", na.rm = TRUE, contour = FALSE, interpolate = TRUE, n = 300
) + geom_path(data = test.outer_key, aes(x = x, y = y), color = "white") + geom_path(data = test.perimeter, aes(x = x, y = y), color = "white", na.rm = TRUE) + geom_path(data = test.three, aes(x = x, y = y), color = "white") + ylim(-10, 350) +
geom_path(data = test.backboard, aes(x = x, y = y),color = "white") +
geom_path(data = test.neck, aes(x = x, y = y),color = "white") +
geom_path(data = test.hoop, aes(x = x, y = y),color = "white") +
geom_path(data = test.foul_circle, aes(x = x, y = y),color = "white") +
geom_path(data = test.restricted, aes(x = x, y = y),color = "white") +
coord_fixed() + theme_void() + scale_fill_viridis_c(
"Shot Frequency ",
limits = c(0, 1),
breaks = c(0, 1),
labels = c("Lower", "Higher"),
option = "inferno"
) + theme(plot.background = element_rect(fill = "black"),
legend.text = element_text(color = "white"),
legend.title = element_text(color = "white", vjust = 1),
plot.title = element_text(color = "white", hjust = .5, face = "bold"),
legend.position = "bottom",
plot.margin = unit(c(0,0,0,.0), "null"),
panel.background = element_rect(fill = "black"),
strip.background = element_blank(),
plot.subtitle = element_text(color = "white", hjust = .5, face = "bold")) + labs(title = "Kobe Bryant Shot Frequency", subtitle = "Missed Baskets")
made + miss
```
A hexbin chart was also made to get a little more detail about the places on the court that Kobe preferred and shot a high percentage. The chart below shows how many shots were made from each spot on the court. The most shots were made in the paint in the restricted area, and then there is one spot of the left baseline where more shots were made than the rest of the places on the court.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
ggplot() + coord_fixed() + stat_summary_hex(data = shot.data %>% filter(shot_distance <= 32), aes(x= loc_x, y = loc_y +54, z = shot_made_flag, fill = cut(..value.., c(-1, 50, 100, 200, 500, 3500), na.rm = TRUE)), fun = sum, bins = 10, na.rm = TRUE)+ geom_path(data = test.outer_key, aes(x= x, y = y), color = "black") + geom_path(data = test.perimeter, aes(x= x, y = y), color = "black", na.rm = TRUE) + geom_path(data = test.three, aes(x= x, y = y), color = "black") + ylim(-10, 350) +
geom_path(data = test.backboard, aes(x= x, y = y),color = "black") +
geom_path(data = test.neck, aes(x= x, y = y),color = "black") +
geom_path(data = test.hoop, aes(x= x, y = y),color = "black") +
geom_path(data = test.foul_circle, aes(x= x, y = y),color = "black") +
geom_path(data = test.restricted, aes(x= x, y = y),color = "black") + theme_void() +
scale_fill_brewer(palette = "OrRd", labels = c("0-50", "50-100", "100-200", "200-500", "500-3500"), name = "Number of \nMade Shots") + labs(title = "Kobe Bryant Made Shots") + theme(plot.title = element_text(hjust = .5, face = "bold"))
```
This final plot then shows the field goal percentage from each bin on the floor. Not surprisingly, the paint had the highest percentage. After that, there are a few select places on the court where Kobe shot a little bit better and then there seems to be no difference between other spots. Finally there is a dropoff of shooting percentage for longer 3-pointers as well.
```{r, warning = FALSE, message = FALSE, echo=FALSE, fig.align='center'}
ggplot(data = shot.data) + coord_fixed() + stat_summary_hex(data = shot.data %>% filter(shot_distance <= 32), aes(x= loc_x, y = loc_y +52, z = shot_made_flag, fill = cut(..value.., c(-Inf, .3, .4, .45, .5, .6, Inf), na.rm = TRUE)), fun = mean, bins = 10, na.rm = TRUE)+ geom_path(data = test.outer_key, aes(x= x, y = y), color = "black") + geom_path(data = test.perimeter, aes(x= x, y = y), color = "black", na.rm = TRUE) + geom_path(data = test.three, aes(x= x, y = y), color = "black") + ylim(-10, 350) +
geom_path(data = test.backboard, aes(x= x, y = y),color = "black") +
geom_path(data = test.neck, aes(x= x, y = y),color = "black") +
geom_path(data = test.hoop, aes(x= x, y = y),color = "black") +
geom_path(data = test.foul_circle, aes(x= x, y = y),color = "black") +
geom_path(data = test.restricted, aes(x= x, y = y),color = "black") + theme_void() +
scale_fill_brewer(palette = "OrRd", labels = c("0-30%", "30-40%", "40-45%", "45-50%", "50-60%", "60+%"), name = "Field Goal \nPercent") + labs(title ="Kobe Bryant Shooting Percentage by Location") +
theme(plot.title = element_text(hjust = .8, face = "bold"))
```
# Conclusion
This notebook provides a variety of different visualizations and shows how information from data can be presented in various forms, while getting across the same idea. Some of the plots are easier to understand than others, so it is important to remember the audience when creating visualizations with data. This provides a variety of visualizations to illustrate how data can be presented in many different ways and certain features or patterns about the data can be discovered through thorough exploratory data visualizations.
If there are any questions or concerns, feel free to email me and I'll answer any questions.
# Appendix
```{r ref.label = knitr::all_labels(), echo = TRUE, eval = FALSE}
```