adapt episode about the example project to classifying task

coderefinery · Jan 25, 2025 · 1495c78 · 1495c78
1 parent 672f387
commit 1495c78
Show file tree

Hide file tree

Showing 6 changed files with 61 additions and 61 deletions.
diff --git a/content/conf.py b/content/conf.py
@@ -40,7 +40,6 @@
     # remove once sphinx_rtd_theme updated for contrast and accessibility:
     "sphinx_rtd_theme_ext_color_contrast",
     "sphinx_coderefinery_branding",
-    "sphinxcontrib.video",
 ]
 
 # MyST extensions

diff --git a/content/example.md b/content/example.md
@@ -1,25 +1,29 @@
-(example-project)=
+# Example project: 2D classification task using a nearest-neighbor predictor
 
-# Example project: Simulating the motion of planets
+The [example code](https://github.com/workshop-material/classification-task)
+that we will study is a relatively simple nearest-neighbor predictor written in
+Python. It is not important or expected that we understand the code in detail.
 
-The [example code](https://github.com/workshop-material/planets) that we will study
-is a hopefully simple N-body simulation written in Python. It is not important
-or expected that we understand the code in any detail.
+The code will produce something like this:
 
-:::{video} video/animation.mp4
-:width: 600
+:::{figure} img/chart.svg
+:alt: Results of the classification task
+:width: 100%
+
+The bottom row shows the training data (two labels) and the top row shows the
+test data and whether the nearest-neighbor predictor classified their labels
+correctly.
 :::
 
-The **big picture** is that the code simulates the motion of a number of
-planets:
-- We can choose the number of planets.
-- Each planet starts with a random position, velocity, and mass.
-- At each time step, the code calculates the gravitational force between each
-  pair of planets.
-- The forces accelerate each planet, the acceleration modifies the velocity,
-  the velocity modifies the position of each planet.
-- We can choose the number of time steps.
-- The units were chosen to make numbers easy to read.
+The **big picture** of the code is as follows:
+- We can choose the number of samples (the example above has 50 samples).
+- The code will generate samples with two labels (0 and 1) in a 2D space.
+- One of the labels has a normal distribution and a circular distribution with
+  some minimum and maximum radius.
+- The second label only has a circular distribution with a different radius.
+- Then we try to predict whether the test samples belong to label 0 or 1 based
+  on the nearest neighbors in the training data. The number of neighbors can
+  be adjusted and the code will take label of the majority of the neighbors.
 
 
 ## Example run
@@ -29,57 +33,54 @@ The instructor demonstrates running the code on their computer.
 :::
 
 The code is written to accept **command-line arguments** to specify the number
-of planets and the number of time steps.
+of samples and file names. Later we will discuss advantages of this approach.
 
-We first generate starting data:
+Let us try to get the help text:
 ```console
-$ python generate-data.py --num-planets 10 --output-file initial.csv
-```
+$ python generate-data.py --help
+
+Usage: generate-data.py [OPTIONS]
+
+  Program that generates a set of training and test samples for a non-linear
+  classification task.
 
-The generated file (initial.csv) could look like this:
+Options:
+  --num-samples INTEGER  Number of samples for each class.  [required]
+  --training-data TEXT   Training data is written to this file.  [required]
+  --test-data TEXT       Test data is written to this file.  [required]
+  --help                 Show this message and exit.
 ```
-px,py,pz,vx,vy,vz,mass
--46.88,-42.51,88.33,-0.86,-0.18,0.55,6.70
--5.29,17.09,-96.13,0.66,0.45,-0.17,3.51
-83.53,-92.83,-68.77,-0.26,-0.48,0.24,6.84
--36.31,25.48,64.16,0.85,0.75,-0.56,1.53
--68.38,-17.21,-97.07,0.60,0.26,0.69,6.63
--48.37,-48.74,3.92,-0.92,-0.33,-0.93,8.60
-40.53,-75.50,44.18,-0.62,-0.31,-0.53,8.04
--27.21,10.78,-78.82,-0.09,-0.55,-0.03,5.35
-88.42,-74.95,-45.85,0.81,0.68,0.56,5.36
-39.09,53.12,-59.54,-0.54,0.56,0.07,8.98
+
+We first generate the training and test data:
+```console
+$ python generate-data.py --num-samples 50 --training-data train.csv --test-data test.csv
+
+Generated 50 training samples (train.csv) and test samples (test.csv).
 ```
 
-Then we can simulate their motion (in this case for 20 steps):
+In a second step we generate predictions for the test data:
 ```console
-$ python simulate.py --num-steps 20 \
-                     --input-file initial.csv \
-                     --output-file final.csv
+$ python generate-predictions.py --num-neighbors 7 --training-data train.csv --test-data test.csv --predictions predictions.csv
+
+Predictions saved to predictions.csv
 ```
 
-The `--output-file` (final.csv) is again a CSV file (comma-separated values)
-and contains the final positions of all planets.
-
-It is possible to run on **multiple cores** and to **animate** the result.
-Here is an example with 100 planets:
-```{code-block} console
----
-emphasize-lines: 7,11
----
-$ python generate-data.py --num-planets 100 --output-file initial.csv
-
-$ python simulate.py --num-steps 50 \
-                     --input-file initial.csv \
-                     --output-file final.csv \
-                     --trajectories-file trajectories.npz \
-                     --num-cores 8
-
-$ python animate.py --initial-file initial.csv \
-                    --trajectories-file trajectories.npz \
-                    --output-file animation.mp4
+Finally, we can plot the results:
+```console
+$ python plot-results.py --training-data train.csv --predictions predictions.csv --output-chart chart.svg
+
+Accuracy: 0.94
+Saved chart to chart.svg
 ```
 
+
+## Discussion and goals
+
+:::{discussion}
+- Together we look at the generated files (train.csv, test.csv, predictions.csv, chart.svg).
+- We browse and discuss the [example code behind these scripts](https://github.com/workshop-material/classification-task).
+:::
+
 :::{admonition} Learning goals
 - What are the most important steps to make this code **reusable by others**
   and **our future selves**?
@@ -90,6 +91,6 @@ $ python animate.py --initial-file initial.csv \
 - ... how the code works internally in detail.
 - ... whether this is the most efficient algorithm.
 - ... whether the code is numerically stable.
-- ... how to code scales with the number of cores.
+- ... how to code scales with system size.
 - ... whether it is portable to other operating systems (we will discuss this later).
 :::
diff --git a/content/img/chart.svg b/content/img/chart.svg
diff --git a/content/index.md b/content/index.md
@@ -30,7 +30,7 @@ them to own projects**.
 - 13:00-13:30 - **Welcome and introduction**
   - Practical information (tools, communication, breaks, etc.)
   - Motivation (reproducibility, robustness, distribution, improvement, trust, etc.)
-  - {ref}`example-project`
+  - {doc}`example`
 
 - 13:30-14:45 - {ref}`version-control` (1/2)
   - {ref}`version-control-motivation` (15 min)

diff --git a/content/video/animation.mp4 b/content/video/animation.mp4
diff --git a/requirements.txt b/requirements.txt
@@ -4,4 +4,3 @@ sphinx_rtd_theme_ext_color_contrast
 myst_nb
 sphinx-lesson
 https://github.com/coderefinery/sphinx-coderefinery-branding/archive/master.zip
-sphinxcontrib-video