Skip to content

Commit

Permalink
adapt episode about the example project to classifying task
Browse files Browse the repository at this point in the history
  • Loading branch information
bast committed Jan 25, 2025
1 parent 672f387 commit 1495c78
Show file tree
Hide file tree
Showing 6 changed files with 61 additions and 61 deletions.
1 change: 0 additions & 1 deletion content/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@
# remove once sphinx_rtd_theme updated for contrast and accessibility:
"sphinx_rtd_theme_ext_color_contrast",
"sphinx_coderefinery_branding",
"sphinxcontrib.video",
]

# MyST extensions
Expand Down
117 changes: 59 additions & 58 deletions content/example.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,29 @@
(example-project)=
# Example project: 2D classification task using a nearest-neighbor predictor

# Example project: Simulating the motion of planets
The [example code](https://github.com/workshop-material/classification-task)
that we will study is a relatively simple nearest-neighbor predictor written in
Python. It is not important or expected that we understand the code in detail.

The [example code](https://github.com/workshop-material/planets) that we will study
is a hopefully simple N-body simulation written in Python. It is not important
or expected that we understand the code in any detail.
The code will produce something like this:

:::{video} video/animation.mp4
:width: 600
:::{figure} img/chart.svg
:alt: Results of the classification task
:width: 100%

The bottom row shows the training data (two labels) and the top row shows the
test data and whether the nearest-neighbor predictor classified their labels
correctly.
:::

The **big picture** is that the code simulates the motion of a number of
planets:
- We can choose the number of planets.
- Each planet starts with a random position, velocity, and mass.
- At each time step, the code calculates the gravitational force between each
pair of planets.
- The forces accelerate each planet, the acceleration modifies the velocity,
the velocity modifies the position of each planet.
- We can choose the number of time steps.
- The units were chosen to make numbers easy to read.
The **big picture** of the code is as follows:
- We can choose the number of samples (the example above has 50 samples).
- The code will generate samples with two labels (0 and 1) in a 2D space.
- One of the labels has a normal distribution and a circular distribution with
some minimum and maximum radius.
- The second label only has a circular distribution with a different radius.
- Then we try to predict whether the test samples belong to label 0 or 1 based
on the nearest neighbors in the training data. The number of neighbors can
be adjusted and the code will take label of the majority of the neighbors.


## Example run
Expand All @@ -29,57 +33,54 @@ The instructor demonstrates running the code on their computer.
:::

The code is written to accept **command-line arguments** to specify the number
of planets and the number of time steps.
of samples and file names. Later we will discuss advantages of this approach.

We first generate starting data:
Let us try to get the help text:
```console
$ python generate-data.py --num-planets 10 --output-file initial.csv
```
$ python generate-data.py --help

Usage: generate-data.py [OPTIONS]

Program that generates a set of training and test samples for a non-linear
classification task.

The generated file (initial.csv) could look like this:
Options:
--num-samples INTEGER Number of samples for each class. [required]
--training-data TEXT Training data is written to this file. [required]
--test-data TEXT Test data is written to this file. [required]
--help Show this message and exit.
```
px,py,pz,vx,vy,vz,mass
-46.88,-42.51,88.33,-0.86,-0.18,0.55,6.70
-5.29,17.09,-96.13,0.66,0.45,-0.17,3.51
83.53,-92.83,-68.77,-0.26,-0.48,0.24,6.84
-36.31,25.48,64.16,0.85,0.75,-0.56,1.53
-68.38,-17.21,-97.07,0.60,0.26,0.69,6.63
-48.37,-48.74,3.92,-0.92,-0.33,-0.93,8.60
40.53,-75.50,44.18,-0.62,-0.31,-0.53,8.04
-27.21,10.78,-78.82,-0.09,-0.55,-0.03,5.35
88.42,-74.95,-45.85,0.81,0.68,0.56,5.36
39.09,53.12,-59.54,-0.54,0.56,0.07,8.98

We first generate the training and test data:
```console
$ python generate-data.py --num-samples 50 --training-data train.csv --test-data test.csv

Generated 50 training samples (train.csv) and test samples (test.csv).
```

Then we can simulate their motion (in this case for 20 steps):
In a second step we generate predictions for the test data:
```console
$ python simulate.py --num-steps 20 \
--input-file initial.csv \
--output-file final.csv
$ python generate-predictions.py --num-neighbors 7 --training-data train.csv --test-data test.csv --predictions predictions.csv

Predictions saved to predictions.csv
```

The `--output-file` (final.csv) is again a CSV file (comma-separated values)
and contains the final positions of all planets.

It is possible to run on **multiple cores** and to **animate** the result.
Here is an example with 100 planets:
```{code-block} console
---
emphasize-lines: 7,11
---
$ python generate-data.py --num-planets 100 --output-file initial.csv
$ python simulate.py --num-steps 50 \
--input-file initial.csv \
--output-file final.csv \
--trajectories-file trajectories.npz \
--num-cores 8
$ python animate.py --initial-file initial.csv \
--trajectories-file trajectories.npz \
--output-file animation.mp4
Finally, we can plot the results:
```console
$ python plot-results.py --training-data train.csv --predictions predictions.csv --output-chart chart.svg

Accuracy: 0.94
Saved chart to chart.svg
```


## Discussion and goals

:::{discussion}
- Together we look at the generated files (train.csv, test.csv, predictions.csv, chart.svg).
- We browse and discuss the [example code behind these scripts](https://github.com/workshop-material/classification-task).
:::

:::{admonition} Learning goals
- What are the most important steps to make this code **reusable by others**
and **our future selves**?
Expand All @@ -90,6 +91,6 @@ $ python animate.py --initial-file initial.csv \
- ... how the code works internally in detail.
- ... whether this is the most efficient algorithm.
- ... whether the code is numerically stable.
- ... how to code scales with the number of cores.
- ... how to code scales with system size.
- ... whether it is portable to other operating systems (we will discuss this later).
:::
1 change: 1 addition & 0 deletions content/img/chart.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion content/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ them to own projects**.
- 13:00-13:30 - **Welcome and introduction**
- Practical information (tools, communication, breaks, etc.)
- Motivation (reproducibility, robustness, distribution, improvement, trust, etc.)
- {ref}`example-project`
- {doc}`example`

- 13:30-14:45 - {ref}`version-control` (1/2)
- {ref}`version-control-motivation` (15 min)
Expand Down
Binary file removed content/video/animation.mp4
Binary file not shown.
1 change: 0 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,3 @@ sphinx_rtd_theme_ext_color_contrast
myst_nb
sphinx-lesson
https://github.com/coderefinery/sphinx-coderefinery-branding/archive/master.zip
sphinxcontrib-video

0 comments on commit 1495c78

Please sign in to comment.