Skip to content

Commit

Permalink
don't forget to render!
Browse files Browse the repository at this point in the history
  • Loading branch information
spressi committed Sep 29, 2024
1 parent 4876663 commit e9a1430
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 70 deletions.
96 changes: 33 additions & 63 deletions docs/1.2_Data_Wrangling.html
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@
<link href="site_libs/quarto-html/light-border.css" rel="stylesheet">
<link href="site_libs/quarto-html/quarto-html.min.css" rel="stylesheet" data-mode="light">
<link href="site_libs/quarto-html/quarto-syntax-highlighting.css" rel="stylesheet" id="quarto-text-highlighting-styles"><meta charset="utf-8">
<meta name="generator" content="quarto-1.5.57">
<meta name="generator" content="quarto-1.4.555">

<title>Precision Workshop 1.2 Data Wrangling</title>
<title>Precision Workshop - 1.2 Data Wrangling</title>
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent">
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
Expand Down Expand Up @@ -41,7 +41,7 @@
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; }
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
{ counter-reset: source-line 0; }
Expand Down Expand Up @@ -69,7 +69,7 @@
code span.at { color: #657422; } /* Attribute */
code span.bn { color: #ad0000; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #003b4f; font-weight: bold; } /* ControlFlow */
code span.cf { color: #003b4f; } /* ControlFlow */
code span.ch { color: #20794d; } /* Char */
code span.cn { color: #8f5902; } /* Constant */
code span.co { color: #5e5e5e; } /* Comment */
Expand All @@ -83,7 +83,7 @@
code span.fu { color: #4758ab; } /* Function */
code span.im { color: #00769e; } /* Import */
code span.in { color: #5e5e5e; } /* Information */
code span.kw { color: #003b4f; font-weight: bold; } /* Keyword */
code span.kw { color: #003b4f; } /* Keyword */
code span.op { color: #5e5e5e; } /* Operator */
code span.ot { color: #003b4f; } /* Other */
code span.pp { color: #ad0000; } /* Preprocessor */
Expand Down Expand Up @@ -433,7 +433,7 @@ <h2>Accessing Variables/Columns</h2>
<p>When wrangling your data in R, you often want to access/use different columns, e.g.&nbsp;to calculate new ones. There are a number of ways you can do that:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb1-1"><a></a><span class="co"># create a small data set for this example:</span></span>
<span id="cb1-2"><a></a>testdata <span class="ot">&lt;-</span> <span class="fu">data.frame</span>(<span class="at">a =</span> <span class="fu">c</span>(<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>), <span class="co"># c() creates a (column)vector!</span></span>
<span id="cb1-2"><a></a>testdata <span class="ot">&lt;-</span> <span class="fu">data.frame</span>(<span class="at">a =</span> <span class="fu">c</span>(<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>), <span class="co"># c() creates a vector!</span></span>
<span id="cb1-3"><a></a> <span class="at">b =</span> <span class="fu">c</span>(<span class="st">"a"</span>, <span class="st">"b"</span>, <span class="st">"c"</span>),</span>
<span id="cb1-4"><a></a> <span class="at">c =</span> <span class="fu">c</span>(<span class="dv">4</span>, <span class="dv">5</span>, <span class="dv">6</span>),</span>
<span id="cb1-5"><a></a> <span class="at">d =</span> <span class="fu">c</span>(<span class="dv">7</span>, <span class="dv">8</span>, <span class="dv">9</span>),</span>
Expand Down Expand Up @@ -1049,16 +1049,16 @@ <h2>Create New Variables</h2>
<pre><code># A tibble: 10 × 2
year decade
&lt;dbl&gt; &lt;dbl&gt;
1 1889 1880
2 1894 1890
3 1909 1900
4 1910 1910
5 1929 1920
6 1932 1930
7 1945 1940
8 1952 1950
9 1966 1960
10 1977 1970</code></pre>
1 1885 1880
2 1922 1920
3 1925 1920
4 1929 1920
5 1944 1940
6 1945 1940
7 1983 1980
8 1985 1980
9 2000 2000
10 2011 2010</code></pre>
</div>
</div>
</div>
Expand Down Expand Up @@ -1162,8 +1162,6 @@ <h2>Grouping and Summarizing 3</h2>
It is good practice to always `ungroup()` your data once you finished the operations you needed the grouping for!
::: {.cell}
```{.r .cell-code code-line-numbers="8"}
Expand Down Expand Up @@ -1193,8 +1191,6 @@ <h2>Grouping and Summarizing 3</h2>
:::
:::
-->
</section>
<section id="grouping-and-summarizing-4" class="slide level2">
Expand Down Expand Up @@ -1356,7 +1352,8 @@ <h2>Tidy Data</h2>
</tr>
</tbody>
</table>
</div></div>
</div>
</div>
</div>
<div class="fragment">
<p>Wide format implements a sparser representation of the data but less tidy!<br>
Expand Down Expand Up @@ -1443,8 +1440,6 @@ <h2>Tidy Data 2</h2>
:::columns
::: {.column width="40%"}
::: {.cell}
::: {.cell-output .cell-output-stdout}
Expand All @@ -1470,13 +1465,9 @@ <h2>Tidy Data 2</h2>
:::
:::
:::
::: {.column width="60%"}
::: {.cell}
::: {.cell-output .cell-output-stdout}
Expand All @@ -1496,16 +1487,12 @@ <h2>Tidy Data 2</h2>
:::
:::
:::
:::
:::columns
::: {.column width="35%"}
::: {.cell}
::: {.cell-output .cell-output-stdout}
Expand All @@ -1525,13 +1512,9 @@ <h2>Tidy Data 2</h2>
:::
:::
:::
::: {.column width="65%"}
::: {.cell}
::: {.cell-output .cell-output-stdout}
Expand All @@ -1548,8 +1531,6 @@ <h2>Tidy Data 2</h2>
:::
:::
:::
:::
-->
Expand Down Expand Up @@ -1987,9 +1968,9 @@ <h1>Thanks!</h1>
// The "normal" size of the presentation, aspect ratio will be preserved
// when the presentation is scaled to fit different resolutions. Can be
// specified using percentage units.
width: 1050,
width: 1280,

height: 700,
height: 720,

// Factor of the display size that should remain empty around the content
margin: 0.1,
Expand Down Expand Up @@ -2129,7 +2110,18 @@ <h1>Thanks!</h1>
}
return false;
}
const onCopySuccess = function(e) {
const clipboard = new window.ClipboardJS('.code-copy-button', {
text: function(trigger) {
const codeEl = trigger.previousElementSibling.cloneNode(true);
for (const childEl of codeEl.children) {
if (isCodeAnnotation(childEl)) {
childEl.remove();
}
}
return codeEl.innerText;
}
});
clipboard.on('success', function(e) {
// button target
const button = e.trigger;
// don't keep focus
Expand Down Expand Up @@ -2161,37 +2153,15 @@ <h1>Thanks!</h1>
}, 1000);
// clear code selection
e.clearSelection();
}
const getTextToCopy = function(trigger) {
const codeEl = trigger.previousElementSibling.cloneNode(true);
for (const childEl of codeEl.children) {
if (isCodeAnnotation(childEl)) {
childEl.remove();
}
}
return codeEl.innerText;
}
const clipboard = new window.ClipboardJS('.code-copy-button:not([data-in-quarto-modal])', {
text: getTextToCopy
});
clipboard.on('success', onCopySuccess);
if (window.document.getElementById('quarto-embedded-source-code-modal')) {
// For code content inside modals, clipBoardJS needs to be initialized with a container option
// TODO: Check when it could be a function (https://github.com/zenorocha/clipboard.js/issues/860)
const clipboardModal = new window.ClipboardJS('.code-copy-button[data-in-quarto-modal]', {
text: getTextToCopy,
container: window.document.getElementById('quarto-embedded-source-code-modal')
});
clipboardModal.on('success', onCopySuccess);
}
var localhostRegex = new RegExp(/^(?:http|https):\/\/localhost\:?[0-9]*\//);
var mailtoRegex = new RegExp(/^mailto:/);
var filterRegex = new RegExp('/' + window.location.host + '/');
var isInternal = (href) => {
return filterRegex.test(href) || localhostRegex.test(href) || mailtoRegex.test(href);
}
// Inspect non-navigation links and adorn them if external
var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item):not(.quarto-navigation-tool):not(.about-link)');
var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item):not(.quarto-navigation-tool)');
for (var i=0; i<links.length; i++) {
const link = links[i];
if (!isInternal(link.href)) {
Expand Down
10 changes: 5 additions & 5 deletions docs/search.json
Original file line number Diff line number Diff line change
Expand Up @@ -331,7 +331,7 @@
"href": "1.1_R_Intro.html#rstudio-panes",
"title": "1.1 Intro to R",
"section": "RStudio Panes",
"text": "RStudio Panes\n\n\n\nScript pane: view, edit, & save your code\nConsole: here the commands are run and rudimentary output may be provided\nEnvironment: which variables/data are available\nFiles, plots, help etc.\n\n\n\n\n\n\n\nRStudio Interface\n\n\n\n\nConsole vs. Script (Rmarkdown later)",
"text": "RStudio Panes\n\n\n\nScript pane: view, edit, & save your code\nConsole: here the commands are run and rudimentary output may be provided\nEnvironment: which variables/data are available\nFiles, plots, help etc.\n\n\n\n\n\n\n\nRStudio Interface\n\n\n\n\n\nConsole vs. Script (Rmarkdown later)",
"crumbs": [
"1.1 Intro to R"
]
Expand All @@ -351,7 +351,7 @@
"href": "1.1_R_Intro.html#saving-the-results-as-a-variableobject",
"title": "1.1 Intro to R",
"section": "Saving the Results as a Variable/Object",
"text": "Saving the Results as a Variable/Object\n\na &lt;- 100 + 1\n\nmulti &lt;- 2*3\n\nSqrtOfNine &lt;- sqrt(9)\n\nword &lt;- \"Hello\"\n\n\n\n\n&lt;- is used to assign values to variables (= is also possible, but discouraged in R)\na, multi etc. are the variable names (some naming rules, e.g., no whitespace, must no start with an number, many special characters not allowed)\n\nYou can find those now in your Environment! (top right panel)\nNo feedback in the console for saving variables (2*3 outputs 6, but multi &lt;- 2*3 doesn’t)\n\nvariables can contain basically anything (words, numbers, entire tables of data …)\nthe variables contain the calculated value (i.e. 101) and not the calculation/formula (100+1)\n\n\n\nType first command in console, what happens?\nWhy don’t we see anything in the console?\nWhat happens if we type in a in the console?\nIs there anything else that you find interesting?\nWhat is sqrt()?",
"text": "Saving the Results as a Variable/Object\n\na &lt;- 100 + 1\n\nmulti &lt;- 2*3\n\nSqrtOfNine &lt;- sqrt(9)\n\nword &lt;- \"Hello\"\n\n\n\n\n&lt;- is used to assign values to variables (= is also possible, but discouraged in R)\na, multi etc. are the variable names (some naming rules, e.g., no whitespace, must not start with an number, many special characters not allowed)\n\nYou can find those now in your Environment! (top right panel)\nNo feedback in the console for saving variables (2*3 outputs 6, but multi &lt;- 2*3 doesn’t)\n\nvariables can contain basically anything (words, numbers, entire tables of data …)\nthe variables contain the calculated value (i.e. 101) and not the calculation/formula (100+1)\n\n\n\nType first command in console, what happens?\nWhy don’t we see anything in the console?\nWhat happens if we type in a in the console?\nIs there anything else that you find interesting?\nWhat is sqrt()?",
"crumbs": [
"1.1 Intro to R"
]
Expand Down Expand Up @@ -651,7 +651,7 @@
"href": "1.2_Data_Wrangling.html#accessing-variablescolumns",
"title": "1.2 Data Wrangling",
"section": "Accessing Variables/Columns",
"text": "Accessing Variables/Columns\nWhen wrangling your data in R, you often want to access/use different columns, e.g. to calculate new ones. There are a number of ways you can do that:\n\n# create a small data set for this example:\ntestdata &lt;- data.frame(a = c(1, 2, 3), # c() creates a (column)vector!\n b = c(\"a\", \"b\", \"c\"),\n c = c(4, 5, 6),\n d = c(7, 8, 9),\n e = c(10, 11, 12))\n\nprint(testdata)\n\n a b c d e\n1 1 a 4 7 10\n2 2 b 5 8 11\n3 3 c 6 9 12\n\nstr(testdata)\n\n'data.frame': 3 obs. of 5 variables:\n $ a: num 1 2 3\n $ b: chr \"a\" \"b\" \"c\"\n $ c: num 4 5 6\n $ d: num 7 8 9\n $ e: num 10 11 12\n\n\n\ndata.frame() = function to create a data.frame, which is what holds a data set! (tibbles..)\nc() = function to make a vector. A vector is just like one single column of a data frame: It can hold several values, but all of the same type.",
"text": "Accessing Variables/Columns\nWhen wrangling your data in R, you often want to access/use different columns, e.g. to calculate new ones. There are a number of ways you can do that:\n\n# create a small data set for this example:\ntestdata &lt;- data.frame(a = c(1, 2, 3), # c() creates a vector!\n b = c(\"a\", \"b\", \"c\"),\n c = c(4, 5, 6),\n d = c(7, 8, 9),\n e = c(10, 11, 12))\n\nprint(testdata)\n\n a b c d e\n1 1 a 4 7 10\n2 2 b 5 8 11\n3 3 c 6 9 12\n\nstr(testdata)\n\n'data.frame': 3 obs. of 5 variables:\n $ a: num 1 2 3\n $ b: chr \"a\" \"b\" \"c\"\n $ c: num 4 5 6\n $ d: num 7 8 9\n $ e: num 10 11 12\n\n\n\ndata.frame() = function to create a data.frame, which is what holds a data set! (tibbles..)\nc() = function to make a vector. A vector is just like one single column of a data frame: It can hold several values, but all of the same type.",
"crumbs": [
"1.2 Data Wrangling"
]
Expand Down Expand Up @@ -821,7 +821,7 @@
"href": "1.2_Data_Wrangling.html#create-new-variables",
"title": "1.2 Data Wrangling",
"section": "Create New Variables",
"text": "Create New Variables\nIf we want to create variables that do not exist yet (i.e. by calculating values, combining other variables, etc.), we can use mutate()!\n\nAdd a variable called “country” that contains the value “USA” for all observations\n\n\n\nbaby_where &lt;- babynames %&gt;% mutate(country = \"USA\")\n\n\n\nBut mutate is much more powerful and can create variables that differ per observation, depending on other values in the tibble/data frame:\n\nCreate a variable that denotes the decade a baby was born:\n\n\n\n\n#we can only use floor to round down to full numbers =&gt; divide year by 10, floor it, and then multiply by 10 again\nbaby_decades &lt;- babynames %&gt;% mutate(decade = floor(year/10) *10) #round(year, -1) works but not floor(year, -1) :(\n\n\n\n# A tibble: 10 × 2\n year decade\n &lt;dbl&gt; &lt;dbl&gt;\n 1 1889 1880\n 2 1894 1890\n 3 1909 1900\n 4 1910 1910\n 5 1929 1920\n 6 1932 1930\n 7 1945 1940\n 8 1952 1950\n 9 1966 1960\n10 1977 1970",
"text": "Create New Variables\nIf we want to create variables that do not exist yet (i.e. by calculating values, combining other variables, etc.), we can use mutate()!\n\nAdd a variable called “country” that contains the value “USA” for all observations\n\n\n\nbaby_where &lt;- babynames %&gt;% mutate(country = \"USA\")\n\n\n\nBut mutate is much more powerful and can create variables that differ per observation, depending on other values in the tibble/data frame:\n\nCreate a variable that denotes the decade a baby was born:\n\n\n\n\n#we can only use floor to round down to full numbers =&gt; divide year by 10, floor it, and then multiply by 10 again\nbaby_decades &lt;- babynames %&gt;% mutate(decade = floor(year/10) *10) #round(year, -1) works but not floor(year, -1) :(\n\n\n\n# A tibble: 10 × 2\n year decade\n &lt;dbl&gt; &lt;dbl&gt;\n 1 1885 1880\n 2 1922 1920\n 3 1925 1920\n 4 1929 1920\n 5 1944 1940\n 6 1945 1940\n 7 1983 1980\n 8 1985 1980\n 9 2000 2000\n10 2011 2010",
"crumbs": [
"1.2 Data Wrangling"
]
Expand Down Expand Up @@ -901,7 +901,7 @@
"href": "1.2_Data_Wrangling.html#tidy-data",
"title": "1.2 Data Wrangling",
"section": "Tidy Data",
"text": "Tidy Data\nTidy data: Data that is easily processed by tidyverse functions (also for visualizations and statistical analyses).\nThree principles:\n\nEach variable has its own column.\nEach observation has its own row.\nEach value has its own cell.\n\n\nWide vs. long format data?\n\n\nWide format: Each participant/animal has one row;\nrepeated observations are in several columns\n\n\n\nID\nTime_1\nTime_2\n\n\n\n\na1\n230\n310\n\n\na2\n195\n220\n\n\na3\n245\n290\n\n\n\n\nLong format: Each observation has its own row;\nthere are (usually) several rows per participant\n\n\n\nID\nTime\nValue\n\n\n\n\na1\n1\n230\n\n\na1\n2\n310\n\n\na2\n1\n195\n\n\na3\n2\n220\n\n\na3\n1\n245\n\n\na3\n2\n290\n\n\n\n\n\n\nWide format implements a sparser representation of the data but less tidy!\nIf you want to convert Time from milliseconds into seconds, what do you have to do in both formats?\n\nData often does not come in this format but is rather messy! That’s why we wrangle.\nTidy data is in between wide and long (you can always go longer! :D)",
"text": "Tidy Data\nTidy data: Data that is easily processed by tidyverse functions (also for visualizations and statistical analyses).\nThree principles:\n\nEach variable has its own column.\nEach observation has its own row.\nEach value has its own cell.\n\n\nWide vs. long format data?\n\n\nWide format: Each participant/animal has one row;\nrepeated observations are in several columns\n\n\n\nID\nTime_1\nTime_2\n\n\n\n\na1\n230\n310\n\n\na2\n195\n220\n\n\na3\n245\n290\n\n\n\n\nLong format: Each observation has its own row;\nthere are (usually) several rows per participant\n\n\n\nID\nTime\nValue\n\n\n\n\na1\n1\n230\n\n\na1\n2\n310\n\n\na2\n1\n195\n\n\na3\n2\n220\n\n\na3\n1\n245\n\n\na3\n2\n290\n\n\n\n\n\n\n\nWide format implements a sparser representation of the data but less tidy!\nIf you want to convert Time from milliseconds into seconds, what do you have to do in both formats?\n\nData often does not come in this format but is rather messy! That’s why we wrangle.\nTidy data is in between wide and long (you can always go longer! :D)",
"crumbs": [
"1.2 Data Wrangling"
]
Expand Down
Loading

0 comments on commit e9a1430

Please sign in to comment.