Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidying up a bit further #44

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
163 changes: 40 additions & 123 deletions content/post/drafts/2020-06-03-data-science-publishing/index.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Data Science as an Entryway to Open Publishing
author: Julia Lowndes, Nicholas Tierney
author: Julia Lowndes and Nicholas Tierney
date: '2020-06-03'
slug: data-science-publishing
draft: true
Expand All @@ -24,164 +24,81 @@ knitr::opts_chunk$set(
)
```

Last Week Julia Lowndes and I presented on a talk called:
*In May we presented a virtual fireside chat at the [Open Publishing Fest](https://openpublishingfest.org/) called "Data Science as an Entryway to Open Publishing". The premise was that the open source R programming language is a powerhouse for data analysis and statistics -- and it also is fueling open publishing through RMarkdown and a large, engaged, and innovative community. We briefly showed community-created examples of tutorials, blogs, websites, manuscripts, books, etc, and discussed how they are an entryway to open science, preprints, and open scientific publishing. This post is some reflections from the experience and summary of our [slides](https://zenodo.org/record/3873698#.XtbQo8Z7nOQ).*

"Data Science as an Entryway to Open Publishing", you can see the slides [here](https://zenodo.org/record/3873698#.XtbQo8Z7nOQ).
------------------------------------------------------------------------

Abstract:
One of the (many) things that gets us excited about R is that the same workflow you use for data analysis -- that is rooted in reproducibility -- empowers you make your work available to the world...in ways you never imagined.

> The open source R programming language is a powerhouse for data analysis and statistics – and it also is fueling open publishing through RMarkdown and a large, engaged, and innovative community. We will show community-created examples of tutorials, blogs, websites, manuscripts, books, etc, and discuss how this is an entryway to open science, preprints, and open scientific publishing. We welcome other contributed examples to showcase how R can streamline open publishing, as well as examples showcasing other programming languages.
As Julia said in our presentation:

The same workflow you use for data analysis
– rooted in reproducibility – empowers you make your work available to the world
> I came to R for the data analysis, and was blown away by the publishing

...in ways you never imagined

> I came to R for the data analysis, and was blown away by the publishing
-- Julia Lowndes
We then introduced RMarkdown, framed for scientific publishing and so much more.

# Using RMarkdown for scientific publishing: Fueling reproducibility in data science

## Rmarkdown

RMarkdown powerfully combines executable R code with simple text formatting and for efficient, automatable, reproducible research

Simple text formatting

+

R Code
## RMarkdown

=
RMarkdown powerfully combines executable R code with simple text formatting for efficient, automatable, reproducible research. It combines simple text formatting with R code, which means analyses and figures are in the same place as your reporting document. This saves time as you iterate, and enables good practices for reproducibility & versioning.

Analyses and figures are in the same place as your reporting document:
saves time as you iterate!
## RMarkdown's familiar outputs for science: Word documents and PDFs

Enables good practices for reproducibility & versioning
RMarkdown renders to Word and PDF --- imagine never copy-pasting a graph into your report again!!! RMarkdown can also manage citations, cross-referencing figures and section headers.

## RMarkdown’s familiar outputs for science: Word documents and PDFs
But wait, you can also use RMarkdown behind your wildest dreams.

Rmarkdown renders to:

- Word
- PDF

Imagine never copy-pasting a graph into your report again!!!!
RMarkdown can also manage citations, cross- referencing figures and section headers.

# Using RMarkdown beyond your wildest dreams: Reimagining sharing and publishing online
# Using RMarkdown beyond your wildest dreams: Reimagining sharing and publishing online

## RMarkdown: RMarkdown creates HTML files that can be shared openly on the web

Rendering rmarkdown to HTML:

> We can store and distribute html files on GitHub, which also offers display options for publishing. Let's look at some real-world examples from science...

> Suddenly you can share a URL rather than attaching a file!
And that same URL will update rather than re-attaching a new version of the file!


## Single-page html

RMarkdown html files for open publishing; URL will display most recent version

Examples from the [Ocean Health Index](https://ohi-science.org/)

Many display options; floating table of contents, show/hide code

Then we can think about organization & discoverability: How to organize multiple htmls? And how do we find them?

ohi-science.org/ohiprep_v2019/globalprep/prs_slr/v2019/slr_layer_prep_v2.html
Learn: rmarkdown.rstudio.com

## Simple Websites
We can render RMarkdown to HTML. And we can store and distribute HTML files on GitHub, which also offers display options for publishing. Suddenly you can share a URL rather than attaching a file --- and that same URL will update rather than re-attaching a new version of the file!

Combine RMarkdown files as a website with a navigation bar between pages, requires only GitHub
We shared some real-world examples from science, including examples from the [Ocean Health Index](https://ohi-science.org/), [Alison Hill's academic](https://alison.rbind.io/post%202017-06-12-up-and-running-with-blogdown), [Ben Marwick's PhD thesis template formatting for the University of Washington](https://github.com/benmarwick/huskydown), [Allison Horst's missing explorer lesson](https://allisonhorst.shinyapps.io/missingexplorer)

These examples started with **single-page HTMLs**, with the ability to display floating table of contents and toggle between showing hiding code. And also **simple websites** that combine RMarkdown files as a website with a navigation bar between pages, and requires only GitHub to display.

Useful for organizing, e.g. linking out to additional single-page htmls
We also discussed **blogdown** that creates powerful websites with more complexity and blogging capabilities. This has been so important for creating blogs and tutorials to share code, discuss, and learn together. Why this important is nicely represented in this quote from the 2020 RStudio conference:

You can also create templates and populate them automatically
> "If you want to learn to write, you read a lot, if you want to play music, you listen a lot. It's hard to do this with data analysis." - [Hilary Parker & Roger Peng, RStudio::conf(2020) keynote](<http://nssdeviations.com/100-live-from-rstudio-conf-2020>)

ohi-science.org/ohi-global
ohi-science.org/esw
Learn: jules32.github.io/rmarkdown-website-tutorial
In addition to websites, RMarkdown can create **bookdown books** that organize and navigate html files as e-books. This is really powerful for organizing reports and documents.

## Blogdown websites
We can also create **simple slides** from a single RMarkdown file -- imagine making a presentation and then being able to re-create presentations with updated data! Further, we can create **xaringan slides** that enable you to incorporate powerful styling options from within R (without requiring knowledge of JavaScript, CSS, etc).

Create powerful websites with more complexity and blogging capabilities; requires more setup & deployment from a server
**learnr** provides the power of interactive tutorials from a friendly website interface. This is really exciting to think about reimagining teaching and how to blend lectures and hands-on coding for learners of all levels.

“If you want to learn to write, you read a lot, if you want to play music, you listen a lot. It’s hard to do this with data analysis.” - Hilary Parker & Roger Peng, RStudio::conf(2020) keynote

So we write blogs and tutorials to share code, discuss, and learn together.

Power to organize, tag, search, navigage, etc.

← Academic theme templates!

alison.rbind.io
Learn: alison.rbind.io/post 2017-06-12-up-and-running-with-blogdown

## Bookdown books

Organize and navigate html files as e-books

Really powerful for organizing reports and documents.

I wish I could have written my PhD thesis is Bookdown
Eg: github.com/ benmarwick/huskydown

r4ds.had.co.nz
Learn: bookdown.org/yihui/bookdown

## Simple slides: create slides in a single rmarkdown file

Imagine re-creating presentations with updated data.

Text-based slide creation can be a powerful flow to think and outline.

Share presentations – and with a human-readable url!

rstudio.com/slides/rstudio-pbc
Learn: rmarkdown.rstudio.com/lesson-11

## Xaringan Slides: Create slides in a single RMarkdown file
# Discussion time

Incorporate powerful styling options from within R (without requiring knowledge of JavaScript, CSS, etc)
After going through the slides, we discussed with attendees the benefits of a HTML-focussed workflow. One of the benefits of this is that by avoiding page breaks, all your figures and tables can usually be placed right where they are mentioned. Although this might seem like a small detail, avoiding page breaks actually saved you a huge amount of time and hassle. Adam Sparks discussed that one of the benefits to HTML is the ease of sharing these in a team internally - they do not have to be published online, and can be opened in any browser. The fact that they often look really snappy and polished, and can include interactive elements like maps is also a huge selling point.

slides.yihui.org/xaringan
arm.rbind.io/slides/xaringan
Learn: above, and bookdown.org/yihui/rmarkdown/xaringan
We also discussed how `pagedown` provides a fresh approach to generating PDFs on the web, and were lucky to have the creator of [paged.js]() (which powers `pagedown`) Adam Hyde at our chat.

## Learnr tutorials: Interactive tutorials from a friendly website interface
We also discussed some alternative formats for publishing, such as JATS, a journal publishing standard for XML. We hadn't heard of it, (and apparently that was a good thing).

Reimagine teaching and how to blend lectures and hands-on coding for learners of all levels
Alison Hill made an excellent point:

allisonhorst.shinyapps.io/missingexplorer
> As a former scientist, I felt woefully ill-equipped when first working with HTML output. I know you all have thought a lot about this- how can we increase HTML comfort levels and fluency for new/early scientists?

Learn: education.rstudio.com/blog/2020/05/learnr-for-remote
In response to this we discussed some of the downfalls of HTML, namely the fact that CSS is usually required to answer questions like:

- "How do I change the font size"
- "How do I change the font colour", or
- "How do I create two columns of text"

# Discussion time
While CSS is uniquitous, and everywhere on the web, so it can be easy to change appearances of text. It is an additional learning point for learners, and something that can tip the balance and turn people away and back to systems they know.

What examples or questions do you have?
Some suggestions on addressing this were:

Other discussion topics
- Showing students existing Rmarkdown HTML templates
- Providing simple CSS templates within an RMarkdown file for people to use
- Building better tools that guide people to create their own CSS

How does RMarkdown relate to/streamline the academic publishing process?
Analog: rOpenSci software review process
Friendly entryways to open science & publishing : you’re already doing it w/ code
Process affects the outcome: Easier to share at the end because you’re already sharing with yourself throughout
Not just R! Examples from other languages (Jupyter [note]books)
Open publishing in the wild
Education: allisonhorst.github.io, datavizm20.classes.andrewheiss.com, tinystats.github.io/teacups-giraffes-and-statistics, ida.numbat.space
Programs: openscapes.org
Accompanying science pubs: ohi-science.org/betterscienceinlesstime
We had another interesting question from John Chodacki:

RMarkdown <> Word workflows: noamross.github.io/redoc
Nick’s experience writing his thesis in bookdown: how does it compare to latex?
Incorporating RMarkdown sub-documents (“knit child”): OHI suppl. methods
How to share documents using GitHub’s gh-pages or doc/: R for Excel Users
> What about readers/consumers? But it seems like all the cool features of auto-updating tables, etc. can bring confusion for the reader ... unable to rely on stable info. Do you agree? Are there innovative ways to mitigate?

This problem can arise when you generate a graphic, or a model during data analysis, and iterate on it and improve it. Later one, you might want to compare your current graphic to your first one, or your first model to the current one. However, doing that is actually pretty hard, and involves some strong version control skills. We discussed an approach that Miles McBain broached a few years ago called [journalr](https://ghcdn.rawgit.org/MilesMcBain/journalr/master/Journalling_tool_proposal.html) ([repo](https://github.com/MilesMcBain/journalr)).

Ultimately, this is a hard problem to solve, and mimics a real life pen and paper notebook. Roger Peng discussed an approach to this in the NSSDeviations podcast (we believe in [Episode 74](http://nssdeviations.com/74-i-draw-the-line-at-fans)), involving a manual approach.