Skip to content

Commit

Permalink
Apply Jerome's suggestions
Browse files Browse the repository at this point in the history
Co-authored-by: fcjerome <129063142+fcjerome@users.noreply.github.com>
  • Loading branch information
cforgaci and fcjerome authored Jan 22, 2024
1 parent f4155bc commit 1b9b03a
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 18 deletions.
26 changes: 9 additions & 17 deletions episodes/03-explore-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ There are multiple ways to explore a data set. Here are just a few examples:


```{r}
head(gapminder) # see first 5 rows of the data set
head(gapminder) # see first 6 rows of the data set
summary(gapminder) # gives basic statistical information about each column. Information format differes by data type.
Expand Down Expand Up @@ -150,7 +150,7 @@ First we define data set, then - with the use of pipe we pass it on to the `sele

## Filter

We already know how to select only the needed columns. But now, we also want to filter the data set via certain conditions with `filter()` function. Instead of doing it in separate steps, we can do it all together.
We already know how to select only the needed columns. But now, we also want to filter the rows of our data set via certain conditions with `filter()` function. Instead of doing it in separate steps, we can do it all together.

In the `gapminder` data set, we want to see the results from outside of Europe for the 21st century.
```{r}
Expand All @@ -160,15 +160,6 @@ year_country_gdp_euro <- gapminder %>%
head(year_country_gdp_euro)
```
Let's now find all the observations from Eurasia:

```{r}
year_country_gdp_eurasia <- gapminder %>%
filter(continent == "Europe" | continent == "Asia") %>% # I operator (OR) - one of the conditions must be met
select(year, country, gdpPercap)
head(year_country_gdp_eurasia)
```

### Exercise 1

Expand All @@ -183,18 +174,17 @@ Write a single command (which can span multiple lines and includes pipes) that w


```{r ex5, class.source="bg-info"}
year_country_lifeExp_Africa <- gapminder %>%
filter(continent=="Africa" ) %>%
select(year,country,lifeExp)
year_country_gdp_eurasia <- gapminder %>%
filter(continent == "Europe" | continent == "Asia") %>% # | operator (OR) - one of the conditions must be met
select(year, country, gdpPercap)
nrow(year_country_lifeExp_Africa)
```



## Group and summarize
So far, we have created a data frame for one of the continents represented in the `gapminder` data set. But often instead of doing that, we would like to know statistics about all of the continents, presented by group.
So far, we have provided summary statistics on the whole dataset, selected columns, and filtered the observations. But often instead of doing that, we would like to know statistics about all of the continents, presented by group.

```{r dplyr-group}
gapminder %>% # select the dataset
Expand Down Expand Up @@ -273,7 +263,9 @@ head(gapminder_gdp)

::::::::::::::::::::::::::::::::::::: keypoints

-
- We can use the `select()` and `filter()` functions to select certain columns in a data frame and to subset it based a specific conditions.
- With `mutate()`, we can create new columns in a data frame with values based on existing columns.
- By combining `group_by()` and `summarize()` in a pipe (`%>%`) chain, we can generate summary statistics for each group in a data frame.

::::::::::::::::::::::::::::::::::::::::::::::::

12 changes: 11 additions & 1 deletion episodes/04-intro-to-visualisation.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ gapminder %>%
```

Maybe we don't need that much information about the life expectancy. We
only want to know if it's below or above average.
only want to know if it's below or above average. We will make use of the `if_else()` function inside `mutate()` to create a new column `lifeExpCat` with the value `high` if life expectancy is above average and `low` otherwise. Note the usage of the `if_else()` function: `if_else(<condition>, <value if TRUE>, <value if FALSE>)`.

```{r ggplot-colors-discrete}
p <- # this time let's save the plot in an object
Expand Down Expand Up @@ -226,3 +226,13 @@ gapminder_amr_2007 <- gapminder %>%
write.csv(gapminder_amr_2007, here('data_output', 'gapminder_americas_2007.csv'), row.names=FALSE)
```

::::::::::::::::::::::::::::::::::::: keypoints

- With `ggplot2`, we use the `+` operator to combine plot layers and incrementally build a more complex plot.
- In the aesthetics (`aes()`), we can assign variables to the x and y axes and use the `fill` argument for colouring surfaces.
- With `scale_fill_viridis_c()` and `scale_fill_manual()` we can assign new colours to our plot.
- To open the help documentation for a function, we run the name of the function preceded by the `?` sign.

::::::::::::::::::::::::::::::::::::::::::::::::

0 comments on commit 1b9b03a

Please sign in to comment.