-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcrossfit_competition.Rmd
58 lines (43 loc) · 2.16 KB
/
crossfit_competition.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
title: "Crossfit Data"
output: github_document
---
```{r, echo=FALSE}
knitr::opts_chunk$set(message = FALSE, warning = FALSE, comment="")
log_odds <- function(x, divide_by=100, scale=0.99){
p <- scale*x/divide_by
log(p/(1-p))
}
```
```{r}
library(kerasformula)
library(polyreg)
seed <- 12345
```
Below find publically-available data from the Crossfit annual open, an amateur athletics competition consisting of five workouts each year. "Rx", as in "perscription", denotes the heaviest weights and most complex movements. In this analysis, we restrict the predictors to height, weight, age, region, and performance in the first of the five rounds. We also restrict analysis who did all five rounds "Rx" and reported that other data. The analysis is repeated for men and women and 2017 and 2018. In each case, `kerasformula` and `xvalPoly` are compared in terms of mean absolute error. (Note `kms` will standardize the outcome by default and so the test stats are on the same scale).
# Men 2018 Competition
```{r, results="hide"}
MAE_results <- matrix(nrow=3, ncol=4,
dimnames=list(c("kms", "xvalPoly (best)", "xvalPoly (worst)"), competitions))
for(i in seq(0.2, 0.9, 0.1)){
Rx <- read.csv(paste0(competitions[i], ".csv"))
colnames(Rx) <- gsub("[[:punct:]]", "", colnames(Rx)) # forgetmenot
colnames(Rx) <- tolower(colnames(Rx))
colnames(Rx) <- gsub("x18", "open", colnames(Rx))
colnames(Rx) <- gsub("x17", "open", colnames(Rx))
Rx_tmp <- dplyr::select(Rx, heightm, weightkg, age, open1percentile, overallpercentile)
Rx_complete <- Rx_tmp[complete.cases(Rx_tmp), ]
P <- ncol(model.matrix(overallpercentile ~ ., Rx_complete))
Rx_complete_01 <- as.data.frame(lapply(Rx_complete, kerasformula:::zero_one))
xval.out <- xvalPoly(Rx_complete_01, maxDeg = 3, maxInteractDeg = 2)
Rx_kms_out$MAE_predictions
MAE_results[1,i] <- Rx_kms_out$MAE_predictions
MAE_results[2,i] <- min(xval.out)
MAE_results[3,i] <- max(xval.out)
}
```
# The results
Out-of-sample mean absolute error (MAE) for `kms` vs. `xvalPoly` (for the latter, the lowest MAE for the models corresponding to the three polynomial degrees is selected).
```{r}
MAE_results
```