- 📆 11th May 2021 CEST
- ⌚ 9:30--15:00
- 🌍 Remote: https://bbb-greenlight.uni-rostock.de/b/fra-zr0-xmc-sa1
- 💜 Code of Conduct: https://github.com/reprohack/reprohack-hq/blob/master/CODE_OF_CONDUCT.md
▶️ Slides: https://anja-eggert.net/wp-content/uploads/intro-slides.html- ✍️ Author feedback form: https://evasys.uni-rostock.de/evasys/online.php?p=ReproHack
9:30 - Opening and virtual come together
- Introduction ORDS
- Icebreaker
- Introduction TKFDM
- Article including Code and Data
- Team formation
10:00 - 1st part of the workshop
- Hands-on analysis in teams
11:30 - Rejoin and tell
12:00 - Lunch break 🍕 🍲 🍓
13:00 - 2nd part of the workshop
- Continue working in your team
- Prepare feedback to the group and for the authors
14:30 - Evaluation and Goodbye
Please sign in (Affiliation / Twitter / GitHub)
- Frank Krüger (University of Rostock / @_frank_k_ / f-krueger )
- Manuela Reichelt (FBN Dummerstorf / @manuReichelt / ManuelaReichelt)
- Anja Eggert (FBN Dummerstorf / @AnjaEggert42 / AnjaEggert)
- Max Schröder (University of Rostock / @m6121 / m6121)
- Jessica Rex (Technical University of Ilmenau / [@ThatDataStuff] (https://twitter.com/ThatDataStuff)
- Roman Gerlach (Friedrich-Schiller-University Jena / @FDMThueringen )
- Kevin Lang (Bauhaus-Universität Weimar / @kev_lan / GitHub)
- Anke Günther (Uni Rostock, Uni Greifswald)
- Fabian Dröge (Uni Jena / GitLab)
- Sheeba Samuel (Uni Jena/ @sheebasamuel/GitLab)
- Phillip Seeber (Uni Jena / GitLab)
- Kai Budde (Uni Rostock / GitHub)
- Stephanie Dahn (Uni Rostock) GitHub Twitter
- Markus Zehner (Uni Jena / GitHub)
- Henja Wehmann (Uni Rostock)
- Felix Cremer (DLR Jena / GitHub)
- Sebastian Seidenath (Friedrich-Schiller-University Jena)
- Oscar Beltran (Leibniz Institute for Baltic Sea Research)
- Taufia Hussain (Uni Rostock/ [GitHub])
- Inga Ulusoy (Scientific Software Center, Uni Heidelberg / GitHub)
Form teams and try to answer the following questions in breakout rooms.
- Who are you?
- Why are you here?
- What is your level of repro-experience?
- What is your favorite (new) hobby after a year of on/off Corona lock down?
As a group: name your room!
- What do you have in common?
- all-over-Germany-group
- Thuringian-group
- techincal-issues-group
- wannabe-musicians-group
- white-wall-group
In contrast to other ReproHacks, here we focus on one particular paper rather than an entire list of papers. We selected the following article:
Luis M. Vilches-Blázquez & Daniela Ballari (2020): Unveiling the diversity of spatial data infrastructures in Latin America: evidence from an exploratory inquiry, Cartography and Geographic Information Science, DOI: 10.1080/15230406.2020.1772113
(author copy available for Download)
Feel free to either join the predefined teams Beginners, Advanced, or Experts, create your own team, or work individually on the paper.
The paper is analysed with respect to their published resources and the original analysis is re-run in order to see whether the same results will be generated.
Participants are expected to have some basic knowledge of R
- Manuela Reichelt
- Anja Eggert
- Jessica Rex
- Franziska Koebsch
- Anke Günther
- Henja Wehmann (knows R, but programming skills are a mess)
- Stephanie Dahn (really basic R knowledge)
- Alexander Schwab (no significant knowledge unfortunately)
- Sebastian Seidenath (no significant knowledge unfortunately)
- Taufia Hussain (basic R knowledge)
- Dietmar Zechner (no knowledge)
- Oscar Beltran
- Figure 2b: "Yes" bar missing for Colombia 2017 in the article, caused by zooming in too much on the y-axis (bottom)
- Round while preserving sum: Link to r-bloggers
The paper's analysis is re-implemented in a Python Jupyter notebook to see whether the same results can be generated in a different computational environment.
Participants are expected to have some basic knowledge in R and Python
- Frank Krüger
- Kevin Lang (Python with Jupyter)
- Inga Ulusoy
- Fabian Dröge --> Julia or Python, I don't mind :)
- Felix Cremer would like to use Julia
- Markus Zehner
- Sheeba Samuel
- recreated first two plots with
- some raw values adjusted (inserted magic numbers)
- Translation to Julia seems complicated
- R code is fairly structured and easy to follow, makes the task of mapping into a different language a lot easier
- for the maps, we use shape files from https://tapiquen-sig.jimdofree.com/english-version/free-downloads/americas/
- the shapefiles were added manually over the bar plots
A computational environment for both, the original analysis and the re-implemement analysis, is created.
Participants are expected to have some basic knowledge in R, Python and Docker
- Max Schröder
- Roman Gerlach
- Phillip Seeber (Haskell, Nix)
- Paper uses survey to gather data
- Data and source code published on figshare:
- Excel sheet with cleaned data (years: 2019, 2017, 2014) (cleaning process see paper)
- Rmarkdown document
- HTML with details and figures
- Use command to produce HTML:
Rscript -e "library(knitr); knitr::knit2html('Survey_Trend_SDI.Rmd')"
or better:Rscript -e "library(knitr); rmarkdown::render('Survey_Trend_SDI.Rmd')"
- We aim at two different computing environments:
install.packages(c('remotes'), repos='https://ftp.fau.de/cran/')
install_version("knitr", version='1.28', repos="https://ftp.fau.de/cran/")
install_version("rmdformats", version='0.3.6', repos="https://ftp.fau.de/cran/")
install_version("readxl", version='1.3.1', repos="https://ftp.fau.de/cran/")
install_version("ggplot2", version='3.2.1', repos="https://ftp.fau.de/cran/")
install_version("stringr", version='1.4.0', repos="https://ftp.fau.de/cran/")
install_version("rworldmap", version='1.3-6', repos="https://ftp.fau.de/cran/")
install_version("RColorBrewer", version='1.1-2', repos="https://ftp.fau.de/cran/")
install_version("DT", version='0.12', repos="https://ftp.fau.de/cran/")
install_version("plyr", version='1.8.6', repos="https://ftp.fau.de/cran/")
install_version("tidyr", version='1.0.2', repos="https://ftp.fau.de/cran/")
install_version("ggpubr", version='0.2.5', repos="https://ftp.fau.de/cran/")
FROM r-base:3.6.3
LABEL maintainer="max.schroeder@uni-rostock.de;roman.gerlach@uni-jena.de"
RUN apt update \
&& apt install -y \
libcurl4-openssl-dev \
pandoc \
&& apt clean \
&& rm -rf /var/lib/apt/lists/*
COPY install.R /tmp/install.R
RUN Rscript /tmp/install.R
docker build -t ords-reprohack:r-container .
docker run --rm -u $(id -u) -v /data/reprohack:/opt/data -w /opt/data ords-reprohack:r-container Rscript -e "library(knitr); rmarkdown::render('Survey_Trend_SDI.Rmd')"
Software dependencies:
- Jupyter Notebook with Python3 Kernel
- Python packages:
- pandas
- matplotlib
- geopandas
FROM jupyter/scipy-notebook:09fb66007615
LABEL maintainer="max.schroeder@uni-rostock.de;roman.gerlach@uni-jena.de"
RUN python3 -m pip install -r geopandas==0.9.0
The Nix version reproduces the original data with hermetic nix down to the exact hashes. A GitLab repository with the Nix expressions and a short description how to build is at GitLab.
The build definition is given by
{ stdenvNoCC, lib, fetchurl, unzip, rPackages, rWrapper, pandoc }:
rPkgs = import ./pkgs.nix { inherit rPackages; };
rWithPkgs = rWrapper.override { packages = rPkgs; };
in stdenvNoCC.mkDerivation rec {
pname = "ReproHack-Original";
version = "1.0";
src = fetchurl {
url = "https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/22720802/SupplementaryMaterial.zip";
sha256 = "00lsp163q44dn8adlra8vaf4cgcyiifv2nh0qabypsfcgzj0c2sd";
nativeBuildInputs = [
phases = [
installTargets = [
unpackPhase = ''
unzip $src
buildPhase = ''
Rscript -e "library(knitr); rmarkdown::render('Survey_Trend_SDI.Rmd')"
installPhase = ''
mkdir -p $out
for i in ${toString installTargets}; do
cp -r $i $out/.
which is fully reproducible, as the dependencies are exactly pinned by git and sha256 hashes.
A Jupyter Lab environment that can fully run the Jupyter notebook from the advanced group can be obtained from within the Python
directory by executing
nix-shell --command "jupyter lab"
and in jupyter lab select the ReproHackPython
- We attempt to reproduce the paper from available materials and documentation
- Make notes about your experiences, in particular with respect to how easy it is to:
- 🌍 navigate the materials
- 🔁 reproduce the analysis
- ♻️ reuse the materials
- Fill in the author feedback form, documenting your experiences reproducing your chosen group
- ✍️ Feedback form: https://evasys.uni-rostock.de/evasys/online.php?p=ReproHack
(put your comments here)
- to try something I have no idea- Thanks a lot to the organizers
- high level overview of completely foreign data and working with them
- trying to get into a completely new language
- coding together
- learning
- I enjoyed taking on the role of inspecting in detail the results of a paper and exploring the tools available to do so.
- working in just one paper +1
(put your comments here)
- the tooling was communicated more openly on the website than it was in the end in reality (reduction to Python, Julia)
- more smaller breaks in between where technical issues can be adressed and where you have time to read into new things (and for having snacks :D )
- more structured sections for working and listening - doing everything at once sometimes was very exhausting for me
- perhaps have the paper at least one hour beforehand to know what it is about.
- give an overview about the data (files and meaning)
- give a short introduction about the paper and what we are reproducing
- after coming back from lunch had problems to enter a breakout room
- technical issues in joining the main room
- it would be good to have a place (online) to share the intermediate results/code within the breakout group members. May be also a shared computational environment.
(put your comments here)
- working remotely :/
- doing only plots, it would also be interesting to have a paper with a more interesting methodology, where also the reproduction is harder, because we could have rounding errors and other subtle differences
(put your comments here)
- separation into groups with different technologies and aspects
- openness to beginners
- different levels of expertise
- open to different programming languages +1
(put your comments here)
- time, I think it would be nicer, to have a little bit more time
- intro into reproducibility and what we are actually looking for while doing the analysis