diff --git a/content/post/drafts/2020-06-03-data-science-publishing/index.Rmd b/content/post/drafts/2020-06-03-data-science-publishing/index.Rmd index 396f2be..ddc76de 100644 --- a/content/post/drafts/2020-06-03-data-science-publishing/index.Rmd +++ b/content/post/drafts/2020-06-03-data-science-publishing/index.Rmd @@ -1,6 +1,6 @@ --- title: Data Science as an Entryway to Open Publishing -author: Julia Lowndes, Nicholas Tierney +author: Julia Lowndes and Nicholas Tierney date: '2020-06-03' slug: data-science-publishing draft: true @@ -24,164 +24,81 @@ knitr::opts_chunk$set( ) ``` -Last Week Julia Lowndes and I presented on a talk called: +*In May we presented a virtual fireside chat at the [Open Publishing Fest](https://openpublishingfest.org/) called "Data Science as an Entryway to Open Publishing". The premise was that the open source R programming language is a powerhouse for data analysis and statistics -- and it also is fueling open publishing through RMarkdown and a large, engaged, and innovative community. We briefly showed community-created examples of tutorials, blogs, websites, manuscripts, books, etc, and discussed how they are an entryway to open science, preprints, and open scientific publishing. This post is some reflections from the experience and summary of our [slides](https://zenodo.org/record/3873698#.XtbQo8Z7nOQ).* -"Data Science as an Entryway to Open Publishing", you can see the slides [here](https://zenodo.org/record/3873698#.XtbQo8Z7nOQ). +------------------------------------------------------------------------ -Abstract: +One of the (many) things that gets us excited about R is that the same workflow you use for data analysis -- that is rooted in reproducibility -- empowers you make your work available to the world...in ways you never imagined. -> The open source R programming language is a powerhouse for data analysis and statistics – and it also is fueling open publishing through RMarkdown and a large, engaged, and innovative community. We will show community-created examples of tutorials, blogs, websites, manuscripts, books, etc, and discuss how this is an entryway to open science, preprints, and open scientific publishing. We welcome other contributed examples to showcase how R can streamline open publishing, as well as examples showcasing other programming languages. +As Julia said in our presentation: -The same workflow you use for data analysis -– rooted in reproducibility – empowers you make your work available to the world +> I came to R for the data analysis, and was blown away by the publishing -...in ways you never imagined - -> I came to R for the data analysis, and was blown away by the publishing --- Julia Lowndes +We then introduced RMarkdown, framed for scientific publishing and so much more. # Using RMarkdown for scientific publishing: Fueling reproducibility in data science -## Rmarkdown - -RMarkdown powerfully combines executable R code with simple text formatting and for efficient, automatable, reproducible research - -Simple text formatting - -+ - -R Code +## RMarkdown -= +RMarkdown powerfully combines executable R code with simple text formatting for efficient, automatable, reproducible research. It combines simple text formatting with R code, which means analyses and figures are in the same place as your reporting document. This saves time as you iterate, and enables good practices for reproducibility & versioning. -Analyses and figures are in the same place as your reporting document: -saves time as you iterate! +## RMarkdown's familiar outputs for science: Word documents and PDFs -Enables good practices for reproducibility & versioning +RMarkdown renders to Word and PDF --- imagine never copy-pasting a graph into your report again!!! RMarkdown can also manage citations, cross-referencing figures and section headers. -## RMarkdown’s familiar outputs for science: Word documents and PDFs +But wait, you can also use RMarkdown behind your wildest dreams. -Rmarkdown renders to: - -- Word -- PDF - -Imagine never copy-pasting a graph into your report again!!!! -RMarkdown can also manage citations, cross- referencing figures and section headers. - -# Using RMarkdown beyond your wildest dreams: Reimagining sharing and publishing online +# Using RMarkdown beyond your wildest dreams: Reimagining sharing and publishing online ## RMarkdown: RMarkdown creates HTML files that can be shared openly on the web -Rendering rmarkdown to HTML: - -> We can store and distribute html files on GitHub, which also offers display options for publishing. Let's look at some real-world examples from science... - -> Suddenly you can share a URL rather than attaching a file! -And that same URL will update rather than re-attaching a new version of the file! - - -## Single-page html - -RMarkdown html files for open publishing; URL will display most recent version - -Examples from the [Ocean Health Index](https://ohi-science.org/) - -Many display options; floating table of contents, show/hide code - -Then we can think about organization & discoverability: How to organize multiple htmls? And how do we find them? - -ohi-science.org/ohiprep_v2019/globalprep/prs_slr/v2019/slr_layer_prep_v2.html -Learn: rmarkdown.rstudio.com - -## Simple Websites +We can render RMarkdown to HTML. And we can store and distribute HTML files on GitHub, which also offers display options for publishing. Suddenly you can share a URL rather than attaching a file --- and that same URL will update rather than re-attaching a new version of the file! -Combine RMarkdown files as a website with a navigation bar between pages, requires only GitHub +We shared some real-world examples from science, including examples from the [Ocean Health Index](https://ohi-science.org/), [Alison Hill's academic](https://alison.rbind.io/post%202017-06-12-up-and-running-with-blogdown), [Ben Marwick's PhD thesis template formatting for the University of Washington](https://github.com/benmarwick/huskydown), [Allison Horst's missing explorer lesson](https://allisonhorst.shinyapps.io/missingexplorer) +These examples started with **single-page HTMLs**, with the ability to display floating table of contents and toggle between showing hiding code. And also **simple websites** that combine RMarkdown files as a website with a navigation bar between pages, and requires only GitHub to display. -Useful for organizing, e.g. linking out to additional single-page htmls +We also discussed **blogdown** that creates powerful websites with more complexity and blogging capabilities. This has been so important for creating blogs and tutorials to share code, discuss, and learn together. Why this important is nicely represented in this quote from the 2020 RStudio conference: -You can also create templates and populate them automatically +> "If you want to learn to write, you read a lot, if you want to play music, you listen a lot. It's hard to do this with data analysis." - [Hilary Parker & Roger Peng, RStudio::conf(2020) keynote]() -ohi-science.org/ohi-global -ohi-science.org/esw -Learn: jules32.github.io/rmarkdown-website-tutorial +In addition to websites, RMarkdown can create **bookdown books** that organize and navigate html files as e-books. This is really powerful for organizing reports and documents. -## Blogdown websites +We can also create **simple slides** from a single RMarkdown file -- imagine making a presentation and then being able to re-create presentations with updated data! Further, we can create **xaringan slides** that enable you to incorporate powerful styling options from within R (without requiring knowledge of JavaScript, CSS, etc). -Create powerful websites with more complexity and blogging capabilities; requires more setup & deployment from a server +**learnr** provides the power of interactive tutorials from a friendly website interface. This is really exciting to think about reimagining teaching and how to blend lectures and hands-on coding for learners of all levels. -“If you want to learn to write, you read a lot, if you want to play music, you listen a lot. It’s hard to do this with data analysis.” - Hilary Parker & Roger Peng, RStudio::conf(2020) keynote - -So we write blogs and tutorials to share code, discuss, and learn together. - -Power to organize, tag, search, navigage, etc. - -← Academic theme templates! - -alison.rbind.io -Learn: alison.rbind.io/post 2017-06-12-up-and-running-with-blogdown - -## Bookdown books - -Organize and navigate html files as e-books - -Really powerful for organizing reports and documents. - -I wish I could have written my PhD thesis is Bookdown -Eg: github.com/ benmarwick/huskydown - -r4ds.had.co.nz -Learn: bookdown.org/yihui/bookdown - -## Simple slides: create slides in a single rmarkdown file - -Imagine re-creating presentations with updated data. - -Text-based slide creation can be a powerful flow to think and outline. - -Share presentations – and with a human-readable url! - -rstudio.com/slides/rstudio-pbc -Learn: rmarkdown.rstudio.com/lesson-11 - -## Xaringan Slides: Create slides in a single RMarkdown file +# Discussion time -Incorporate powerful styling options from within R (without requiring knowledge of JavaScript, CSS, etc) +After going through the slides, we discussed with attendees the benefits of a HTML-focussed workflow. One of the benefits of this is that by avoiding page breaks, all your figures and tables can usually be placed right where they are mentioned. Although this might seem like a small detail, avoiding page breaks actually saved you a huge amount of time and hassle. Adam Sparks discussed that one of the benefits to HTML is the ease of sharing these in a team internally - they do not have to be published online, and can be opened in any browser. The fact that they often look really snappy and polished, and can include interactive elements like maps is also a huge selling point. -slides.yihui.org/xaringan -arm.rbind.io/slides/xaringan -Learn: above, and bookdown.org/yihui/rmarkdown/xaringan +We also discussed how `pagedown` provides a fresh approach to generating PDFs on the web, and were lucky to have the creator of [paged.js]() (which powers `pagedown`) Adam Hyde at our chat. -## Learnr tutorials: Interactive tutorials from a friendly website interface +We also discussed some alternative formats for publishing, such as JATS, a journal publishing standard for XML. We hadn't heard of it, (and apparently that was a good thing). -Reimagine teaching and how to blend lectures and hands-on coding for learners of all levels +Alison Hill made an excellent point: -allisonhorst.shinyapps.io/missingexplorer +> As a former scientist, I felt woefully ill-equipped when first working with HTML output. I know you all have thought a lot about this- how can we increase HTML comfort levels and fluency for new/early scientists? -Learn: education.rstudio.com/blog/2020/05/learnr-for-remote +In response to this we discussed some of the downfalls of HTML, namely the fact that CSS is usually required to answer questions like: +- "How do I change the font size" +- "How do I change the font colour", or +- "How do I create two columns of text" -# Discussion time +While CSS is uniquitous, and everywhere on the web, so it can be easy to change appearances of text. It is an additional learning point for learners, and something that can tip the balance and turn people away and back to systems they know. -What examples or questions do you have? +Some suggestions on addressing this were: -Other discussion topics +- Showing students existing Rmarkdown HTML templates +- Providing simple CSS templates within an RMarkdown file for people to use +- Building better tools that guide people to create their own CSS -How does RMarkdown relate to/streamline the academic publishing process? -Analog: rOpenSci software review process -Friendly entryways to open science & publishing : you’re already doing it w/ code -Process affects the outcome: Easier to share at the end because you’re already sharing with yourself throughout -Not just R! Examples from other languages (Jupyter [note]books) -Open publishing in the wild -Education: allisonhorst.github.io, datavizm20.classes.andrewheiss.com, tinystats.github.io/teacups-giraffes-and-statistics, ida.numbat.space -Programs: openscapes.org -Accompanying science pubs: ohi-science.org/betterscienceinlesstime +We had another interesting question from John Chodacki: -RMarkdown <> Word workflows: noamross.github.io/redoc -Nick’s experience writing his thesis in bookdown: how does it compare to latex? -Incorporating RMarkdown sub-documents (“knit child”): OHI suppl. methods -How to share documents using GitHub’s gh-pages or doc/: R for Excel Users +> What about readers/consumers? But it seems like all the cool features of auto-updating tables, etc. can bring confusion for the reader ... unable to rely on stable info. Do you agree? Are there innovative ways to mitigate? +This problem can arise when you generate a graphic, or a model during data analysis, and iterate on it and improve it. Later one, you might want to compare your current graphic to your first one, or your first model to the current one. However, doing that is actually pretty hard, and involves some strong version control skills. We discussed an approach that Miles McBain broached a few years ago called [journalr](https://ghcdn.rawgit.org/MilesMcBain/journalr/master/Journalling_tool_proposal.html) ([repo](https://github.com/MilesMcBain/journalr)). +Ultimately, this is a hard problem to solve, and mimics a real life pen and paper notebook. Roger Peng discussed an approach to this in the NSSDeviations podcast (we believe in [Episode 74](http://nssdeviations.com/74-i-draw-the-line-at-fans)), involving a manual approach.