generated from jhudsl/OTTR_Template
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
387 changed files
with
32,258 additions
and
194 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,80 @@ | ||
# Functions Exercises | ||
# Functions Exercises | ||
|
||
## Part 1: Writing your function | ||
|
||
Create a function, called `num_na` in which the function takes in any vector, and then return a single numeric value. This numeric value is the number of `NA`s in the vector. Use cases: `num_na(c(NA, 2, 3, 4, NA, 5)) = 2` and `num_na(c(2, 3, 4, 5)) = 0`. | ||
|
||
Hint 1: Use `is.na()` function. Hint 2: Given a logical vector, you can count the number of `TRUE` values by using `sum()`, such as `sum(c(TRUE, TRUE, FALSE)) = 2`. | ||
|
||
Create a function, called `medicaid_eligible` in which the function takes in one argument: a numeric vector called `age`. The function returns a numeric vector with the same length as `age`, in which elements are `0` for indicies that are less than 65 in `age`, and `1` for indicies 65 or higher in `age`. (Hint: This is a data recoding problem!) Use cases: `medicaid_eligible(c(30, 70)) = c(0, 1)` | ||
|
||
Let's improve the use of this function a little bit. What happens if the user run `medicaid_eligible(c("hello", "there"))`? It still runs, but some kind of weird coercing happened (try it yourself!). A better design of this function would prevent the user using the function if it wasn't a numeric data type. | ||
|
||
We need to add the following logical structure: "if age is a numeric data type, then run rest of the function. else, return `NA`". We can do this via **conditionals**, which looks like this: | ||
|
||
``` | ||
if (condition_is_TRUE) { | ||
#do this | ||
}else { | ||
#do something else | ||
} | ||
``` | ||
|
||
In the context of this problem: | ||
|
||
``` | ||
medicaid_eligible = function(age) { | ||
if (is.numeric(age)) { | ||
#do this | ||
}else { | ||
return(NA) | ||
} | ||
} | ||
``` | ||
|
||
Modify `medicaid_eligible()` so that if you give the input a numeric value, it returns the original value as before, but if you give the input anything other data type, it returns `NA`. Test it yourself to see that it works. | ||
|
||
```{r} | ||
``` | ||
|
||
## Part 2: Functions in State Cancer Profiles | ||
|
||
Let's look at the analysis code we have written for State Cancer Profile again. | ||
|
||
```{r, message=FALSE, warning=FALSE} | ||
library(tidyverse) | ||
library(cancerprof) | ||
``` | ||
|
||
We load in cancer incidence rates and do some data cleaning, for females and males. | ||
|
||
```{r} | ||
#female | ||
female_melanoma_WA = incidence_cancer("WA", "county", "melanoma of the skin", "all races (includes hispanic)", "females", "all ages", "all stages", "latest 5 year average") | ||
female_melanoma_WA = select(female_melanoma_WA, County, `Age Adjusted Incidence Rate`) | ||
names(female_melanoma_WA) = c("County", "Age_adjusted_incidence_rate") | ||
female_melanoma_WA$Age_adjusted_incidence_rate[female_melanoma_WA$Age_adjusted_incidence_rate == "* "] = NA | ||
female_melanoma_WA$Age_adjusted_incidence_rate = as.numeric(female_melanoma_WA$Age_adjusted_incidence_rate) | ||
#male | ||
male_melanoma_WA = incidence_cancer("WA", "county", "melanoma of the skin", "all races (includes hispanic)", "males", "all ages", "all stages", "latest 5 year average") | ||
male_melanoma_WA = select(male_melanoma_WA, County, `Age Adjusted Incidence Rate`) | ||
names(male_melanoma_WA) = c("County", "Age_adjusted_incidence_rate") | ||
male_melanoma_WA$Age_adjusted_incidence_rate[male_melanoma_WA$Age_adjusted_incidence_rate == "* "] = NA | ||
male_melanoma_WA$Age_adjusted_incidence_rate = as.numeric(male_melanoma_WA$Age_adjusted_incidence_rate) | ||
``` | ||
|
||
The code for females and males are nearly identical with the exception that one of the input arguments for `incidence_cancer()` is "female" and the other is "male". There is a lot of redundancy as a result. | ||
|
||
Write a function, `process_incidence_cancer()` in which the function takes in one argument: a character `sex`. The function returns a dataframe with the columns `County` and `Age_adjusted_incidence_rate` for that particular sex. Use cases: `process_incidence_cancer("females")`, and `process_incidence_cancer("males")` | ||
|
||
```{r} | ||
``` | ||
|
||
Can you improve `process_incidence_cancer()` so that if `sex` is "males" or "females", it runs as it has been, but if `sex` is anything else, it returns `NA`? | ||
|
||
```{r} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,167 @@ | ||
# Iteration Exercises | ||
|
||
```{r, message=FALSE, warning=FALSE} | ||
library(cancerprof) | ||
library(tidyverse) | ||
library(palmerpenguins) | ||
``` | ||
|
||
## Part 1: Iteration warm-up | ||
|
||
Write a function called `num_unique()` that takes any vector input, and then returns a numeric that gives the number of unique elements in the vector. | ||
|
||
Hint: use the functions `unique()`, which gives you the unique elements of a vector, and `length().` | ||
|
||
```{r} | ||
``` | ||
|
||
Test that the function works. | ||
|
||
```{r} | ||
``` | ||
|
||
Now, we will use this function to iterate over the `penguins` columns to get the number of unique elements for each column. | ||
|
||
Before using the functional, let's practice writing the first iteration down: | ||
|
||
```{r} | ||
``` | ||
|
||
Then, to do this functionally, we think about: | ||
|
||
- Variable we need to loop through: `penguins` | ||
|
||
- The repeated task as a function: `num_unique()` | ||
|
||
- The looping mechanism, and its output: ? | ||
|
||
```{r} | ||
``` | ||
|
||
## Part 2: Repetition in State Cancer Profiles | ||
|
||
From the previous' week exercise, we created a function to process and clean cancer incidence data when given the demographic's sex. In this version, a new column is added to designate the sex: | ||
|
||
```{r} | ||
process_incidence_cancer = function(sex) { | ||
df = incidence_cancer("WA", "county", "melanoma of the skin", "all races (includes hispanic)", sex, "all ages", "all stages", "latest 5 year average") | ||
df = select(df, County, `Age Adjusted Incidence Rate`) | ||
names(df) = c("County", "Age_adjusted_incidence_rate") | ||
df$Age_adjusted_incidence_rate[df$Age_adjusted_incidence_rate == "* "] = NA | ||
df$Age_adjusted_incidence_rate = as.numeric(df$Age_adjusted_incidence_rate) | ||
df$Sex = sex #new column! | ||
return(df) | ||
} | ||
``` | ||
|
||
We are ready to use this function for females and males. Suppose we anticipate that we will use our function for many different inputs in the future will scale up in the future: perhaps we modify the function to take in state, and we want to analyze all states. As a proof of concept, can we iterate through "females" and "males" via a Functional? | ||
|
||
Before using the functional, let's practice writing the first iteration down: | ||
|
||
```{r} | ||
parameters = c("females", "males") | ||
#first iteration: | ||
``` | ||
|
||
Then, to do this functionally, we think about: | ||
|
||
- Variable we need to loop through: `parameters` | ||
|
||
- The repeated task as a function: `process_incidence_cancer()` | ||
|
||
- The looping mechanism, and its output: ? | ||
|
||
Store your result as: `incidence_cancer_list`. It should be a list containing two elements, and each element is a dataframe. | ||
|
||
```{r} | ||
``` | ||
|
||
To continue the analysis, previously we had used `full_join()` and `pivot_longer()` to get the dataframe in the right format for plotting. Here's an alternative way: because in this version of the function we added a column sex, if we stack both of these dataframes on top of each other, we would have the columns "County", "Age_adjusted_incidence_rate", and "Sex" ready for plotting. To do so, we use the function `rbind()` (row bind) to bind two dataframes together by rows: | ||
|
||
```{r} | ||
incidence_cancer_analysis = rbind(incidence_cancer_list[[1]], incidence_cancer_list[[2]]) | ||
``` | ||
|
||
Plot: | ||
|
||
```{r} | ||
ggplot(incidence_cancer_analysis, aes(x = County, y = Age_adjusted_incidence_rate, fill = Sex)) + geom_bar(position="dodge", stat="identity") + theme(axis.text.x = element_text(angle = 90)) + labs(title = "Melanoma", y = "Incidence Rate per 100k people") | ||
``` | ||
|
||
## Part 3: Repetition in State Cancer Profiles for states | ||
|
||
Now, let's change up our analysis so that we iterate through all states and get the melanoma cancer incidence rate. | ||
|
||
Write a new function, `process_incidence_cancer_by_state()` so that it takes in the input argument `state` as a character. When calling `incidence_cancer()` within your new function, the first argument, `area`, should use your `state` input argument, and the `sex` argument should have a fixed value of your choice. | ||
|
||
The function returns a dataframe with the following columns: `County`, `Age_adjusted_incidence_rate`, and `State`. You will have to create the `State` column to have the value of `state` near the end of the function. | ||
|
||
```{r} | ||
process_incidence_cancer_by_state = function(state) { | ||
df = incidence_cancer(state, "county", "melanoma of the skin", "all races (includes hispanic)", "females", "all ages", "all stages", "latest 5 year average") | ||
df = select(df, County, `Age Adjusted Incidence Rate`) | ||
names(df) = c("County", "Age_adjusted_incidence_rate") | ||
df$Age_adjusted_incidence_rate[df$Age_adjusted_incidence_rate == "* "] = NA | ||
df$Age_adjusted_incidence_rate = as.numeric(df$Age_adjusted_incidence_rate) | ||
df$State = state | ||
return(df) | ||
} | ||
``` | ||
|
||
Test that it works on a few states of choice: | ||
|
||
```{r} | ||
``` | ||
|
||
Let's get all the state abbreviations (except Louisiana, Alaska - no data for some reason). Here's a dataset: | ||
|
||
```{r} | ||
states = read_delim("classroom_data/states.txt", delim = "\t", col_names = c("fullname", "shorthand")) | ||
head(states) | ||
``` | ||
|
||
Now, let's run `process_incidence_cancer_by_state()` for all states: use a functional to apply `process_incidence_cancer_by_state()` on each element of `states$shorthand`, and store it as `results`. | ||
|
||
```{r} | ||
``` | ||
|
||
Great! Now we have a large list, in which each element is a dataframe belonging to a state. To consolidate all of these dataframes together, we could use `rbind()` to start: `rbind(results[[1]], results[[2]])`, but we would need to continue this with `rbind(rbind(results[[1]], results[[2]]), results[[3]])`, which becomes unwieldy very quickly. We wish `rbind()` was designed to handle combining two dataframes more elegantly... | ||
|
||
A solution to this is to use the `reduce()` function: it takes a list or vector of length *n* and produces a single value by calling a function with a pair of values at a time: | ||
|
||
data:image/s3,"s3://crabby-images/e6d1c/e6d1c6b3491c95c9e512d879ad909cb75a0e5f74" alt=""{width="300"} | ||
|
||
`reduce(c(1, 2, 3, 4), f)` is equivalent to `f(f(f(1, 2), 3), 4)`. | ||
|
||
We can think of `reduce()` as a useful way to generalise a function that works with two inputs to work with any number of inputs. | ||
|
||
Try `reduce()` on `results` with `rbind`. | ||
|
||
```{r} | ||
``` | ||
|
||
Follow up with analysis of your choice! | ||
|
||
```{r} | ||
``` |
Oops, something went wrong.