Skip to content

Commit

Permalink
deploy: 65027d5
Browse files Browse the repository at this point in the history
  • Loading branch information
bast committed Sep 12, 2024
0 parents commit cd93c1a
Show file tree
Hide file tree
Showing 267 changed files with 41,328 additions and 0 deletions.
Empty file added .nojekyll
Empty file.
Binary file added _images/animation.mp4
Binary file not shown.
Binary file added _images/stars.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions _sources/documentation.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
(documentation)=

# Code documentation

- 15 min: What makes good documentation? Also https://diataxis.fr/
- 15 min: Writing good README files
- 30 min: Exercise: Set up a Sphinx documentation and add API documentation
- 15 min: Building documentation with GitHub Actions
95 changes: 95 additions & 0 deletions _sources/example.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
(example-project)=

# Example project: Simulating the motion of planets

The [example code](https://github.com/workshop-material/planets) that we will study
is a hopefully simple N-body simulation written in Python. It is not important
or expected that we understand the code in any detail.

:::{video} video/animation.mp4
:width: 600
:::

The **big picture** is that the code simulates the motion of a number of
planets:
- We can choose the number of planets.
- Each planet starts with a random position, velocity, and mass.
- At each time step, the code calculates the gravitational force between each
pair of planets.
- The forces accelerate each planet, the acceleration modifies the velocity,
the velocity modifies the position of each planet.
- We can choose the number of time steps.
- The units were chosen to make numbers easy to read.


## Example run

:::{instructor-note}
The instructor demonstrates running the code on their computer.
:::

The code is written to accept **command-line arguments** to specify the number
of planets and the number of time steps.

We first generate starting data:
```console
$ python generate-data.py --num-planets 10 --output-file initial.csv
```

The generated file (initial.csv) could look like this:
```
px,py,pz,vx,vy,vz,mass
-46.88,-42.51,88.33,-0.86,-0.18,0.55,6.70
-5.29,17.09,-96.13,0.66,0.45,-0.17,3.51
83.53,-92.83,-68.77,-0.26,-0.48,0.24,6.84
-36.31,25.48,64.16,0.85,0.75,-0.56,1.53
-68.38,-17.21,-97.07,0.60,0.26,0.69,6.63
-48.37,-48.74,3.92,-0.92,-0.33,-0.93,8.60
40.53,-75.50,44.18,-0.62,-0.31,-0.53,8.04
-27.21,10.78,-78.82,-0.09,-0.55,-0.03,5.35
88.42,-74.95,-45.85,0.81,0.68,0.56,5.36
39.09,53.12,-59.54,-0.54,0.56,0.07,8.98
```

Then we can simulate their motion (in this case for 20 steps):
```console
$ python simulate.py --num-steps 20 \
--input-file initial.csv \
--output-file final.csv
```

The `--output-file` (final.csv) is again a CSV file (comma-separated values)
and contains the final positions of all planets.

It is possible to run on **multiple cores** and to **animate** the result.
Here is an example with 100 planets:
```{code-block} console
---
emphasize-lines: 7,11
---
$ python generate-data.py --num-planets 100 --output-file initial.csv

$ python simulate.py --num-steps 50 \
--input-file initial.csv \
--output-file final.csv \
--trajectories-file trajectories.npz \
--num-cores 8

$ python animate.py --initial-file initial.csv \
--trajectories-file trajectories.npz \
--output-file animation.mp4
```

:::{admonition} Learning goals
- What are the most important steps to make this code **reusable by others**
and **our future selves**?
- Be able to apply these techniques to your own code/script.
:::

:::{admonition} We will not focus on ...
- ... how the code works internally in detail.
- ... whether this is the most efficient algorithm.
- ... whether the code is numerically stable.
- ... how to code scales with the number of cores.
- ... whether it is portable to other operating systems (we will discuss this later).
:::
156 changes: 156 additions & 0 deletions _sources/index.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Reproducible research software development using Python


## Big-picture goal

This is a **hands-on course on research software engineering**. In this
workshop we assume that most workshop participants use Python in their work or
a leading a group which uses Python. Therefore, some of the examples will use
Python as the example language.

We will work with an example project and go through all the steps of a typical
software project. Once we have seen the building blocks, we will try to apply
them to own projects. Workshop participants will receive and also learn to give
constructive code feedback.


## Prerequisites

:::{prereq} Preparation
1. Get a **GitHub account** following [these instructions](https://coderefinery.github.io/installation/github/).
1. You will need a **text editor**. If you don't have a favorite one, we recommend
[VS Code](https://coderefinery.github.io/installation/vscode/).
1. **If you prefer to work in the terminal** and not in VS Code, set up these two (skip this if you use VS Code):
- [Git in the terminal](https://coderefinery.github.io/installation/git-in-terminal/)
- [SSH or HTTPS connection to GitHub from terminal](https://coderefinery.github.io/installation/ssh/)
1. **One of these two software environments** (if you are not sure which one to
choose or have no preference, choose Conda):
- {ref}`conda`
- {ref}`venv` (Snakemake is not available in this environment)
1. **Optional** and only on Linux: [Apptainer](https://apptainer.org/) following
[these instructions](https://apptainer.org/docs/admin/1.3/installation.html#install-from-pre-built-packages).
:::


## Schedule

:::{note}
The schedule will very soon contain links to lesson material and exercises.
:::


### Day 1 (Sep 16)

- 13:00-13:30 (0.5h) - **Welcome and introduction**
- Motivation (reproducibility, robustness, distribution, improvement, trust, etc.)
- Practical information (tools, communication, breaks, etc.)
- What will learn and achieve from this course?
- {ref}`example-project`

- 13:30-14:45 (1.25h) - **Introduction to version control with Git and GitHub (1/2)**
- Creating a repository and porting your project to Git and GitHub
- Basic commands

- 15:00-16:30 (1.5h) - **Introduction to version control with Git and GitHub (2/2)**
- Branching and merging
- Recovering from typical mistakes

- 16:45-18:00 (1.25h) - {ref}`documentation`
- In-code documentation including docstrings
- Writing good README files
- Markdown
- Sphinx
- Building documentation with GitHub Actions
- Jupyter Notebooks


### Day 2 (Sep 17)

- 09:00-10:30 (1.5h) - **Collaborative version control and code review (1/2)**
- Practice code review using issues and pull requests
- Forking workflow
- Contributing changes to projects of others

- 10:45-12:15 (1.5h) - **Collaborative version control and code review (2/2)**
- Organization strategies
- Merge vs. rebase
- Conflict resolution

- 16:45-18:00 (1.25h) - **Debriefing and Q&A**
- Participants work on their projects
- Together we study actual codes that participants wrote or work on
- Constructively we discuss possible improvements
- Give individual feedback on code projects


### Day 3 (Sep 18)

- 09:00-10:30 (1.5h) - {ref}`testing`
- Unit tests
- End-to-end tests
- pytest
- GitHub Actions

- 10:45-12:15 (1.5h) - {ref}`reusable`
- Tracking dependencies with requirements.txt and environment.yml
- Recording environments in containers

- 13:00-14:45 (1.75h) - {ref}`refactoring`
- Naming (and other) conventions, project organization, modularity
- Refactoring (explained through examples)
- Design patterns: functional design vs. object-oriented design
- How to design your code before writing it
- Structuring larger software projects in a modular way
- Command-line interfaces
- Workflows with Snakemake

- 15:00-16:30 (1.5h) - {ref}`publishing`
- Licenses
- Publishing the code via Zenodo
- Packaging the code
- Sharing the code via PyPI

- 16:45-18:00 (1.25h) - **Debriefing and Q&A**
- Participants work on their projects
- Together we study actual codes that participants wrote or work on
- Constructively we discuss possible improvements
- Give individual feedback on code projects


### Extra material if we have time

- Profiling memory and CPU usage
- Strategies for parallelization


```{toctree}
:maxdepth: 1
:caption: Software environment
:hidden:

installation/conda
installation/virtual-environment
```

```{toctree}
:maxdepth: 1
:caption: Episodes
:hidden:

example
documentation
testing
reusable
refactoring
publishing
```

```{toctree}
:maxdepth: 1
:caption: Reference
:hidden:

All lessons <https://coderefinery.org/lessons/>
CodeRefinery <https://coderefinery.org/>
Reusing <https://coderefinery.org/lessons/reusing/>
```
87 changes: 87 additions & 0 deletions _sources/installation/conda.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
(conda)=

# Conda environment

A Conda environment is an isolated software environment that is used to manage dependencies for a project
and you decide where it is located.

You will need a `environment.yml` file that documents the dependencies:
```yaml
name: coderefinery
channels:
- conda-forge
- bioconda
dependencies:
- python >= 3.10
- black
- click
- flit
- ipywidgets
- isort
- jupyterlab
- jupyterlab_code_formatter
- jupyterlab-git
- matplotlib
- myst-parser
- nbdime
- numpy
- pandas
- pytest
- pytest-cov
- scalene
- seaborn
- snakemake-minimal
- sphinx
- sphinx-autoapi
- sphinx-autobuild
- sphinx_rtd_theme >= 2.0
- vulture
- scikit-image
```


## Before you create a virtual environment

1. Create a new directory for this course.
1. In this directory, create an `environment.yml` file and copy-paste the dependencies above into it.


## Choose the tool to manage the environment

If you are already using one of these tools, please continue using the tool that you like and know.
If you are new to this, **we recommend using Miniconda or Miniforge**.

- [Anaconda](https://docs.anaconda.com/anaconda/install/)
- Advantages: easy to install, easy to use, good for beginners
- Disadvantages: large download, installs more than we will need, license restrictions
- [Miniconda](https://docs.anaconda.com/miniconda/)
- Advantages: small size, installs only what you need
- Disadvantages: no graphical interface, license restrictions
- [Miniforge](https://github.com/conda-forge/miniforge)
- Advantages: small size, no license restrictions
- Disadvantages: no graphical interface
- [Micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html)
- Advantages: fast, small size
- Disadvantages: no graphical interface
- [Pixi](https://pixi.sh/latest/)
- Advantages: fast and new
- Disadvantages: new and less tested and not documented here


## Creating the virtual environment

1. Open your terminal shell (e.g. Bash or Zsh).
2. Activate `conda` using `conda activate` or `source ~/miniconda3/bin/activate`.
3. Run the following command:
```console
$ conda env create --file environment.yml
```
4. Make sure that you see "coderefinery" in the output when you ask for a list of all available environments:
```console
$ conda env list
```


## How to verify that this worked

(this will be added)
Loading

0 comments on commit cd93c1a

Please sign in to comment.