The goal of territoria
is to cluster observations from different
breeding bird surveys into territoria.
You can install the development version from GitHub with:
# install.packages("remotes")
remotes::install_github("inbo/territoria")
We start by simulating some observations. We need for every observation
their x
and y
coordinates in a projected coordinate system. survey
is an integer id for every survey. A survey is a unique combination of
an area and date. status
is an integer indication the breeding status.
A higher value assume more certainty about breeding. Set this to a
constant value if you don’t distinct between different certainties. In
this example we use three classes: 1
, 2
and 3
.
library(territoria)
set.seed(20210806)
obs <- simulate_observations()
names(obs)
#> [1] "observations" "centroids"
summary(obs$centroids)
#> x y
#> Min. : 45.51 Min. : 36.32
#> 1st Qu.: 544.72 1st Qu.: 946.10
#> Median : 927.92 Median :1246.24
#> Mean : 955.15 Mean :1224.73
#> 3rd Qu.:1349.57 3rd Qu.:1669.61
#> Max. :1897.31 Max. :1963.82
summary(obs$observations)
#> x y survey status
#> Min. : -19.55 Min. : -14.75 Min. :1.00 Min. :1.000
#> 1st Qu.: 550.47 1st Qu.: 910.56 1st Qu.:1.75 1st Qu.:2.000
#> Median : 899.91 Median :1272.53 Median :2.50 Median :2.000
#> Mean : 946.40 Mean :1238.04 Mean :2.50 Mean :2.096
#> 3rd Qu.:1362.48 3rd Qu.:1764.88 3rd Qu.:3.25 3rd Qu.:3.000
#> Max. :2012.23 Max. :2082.18 Max. :4.00 Max. :3.000
#> observed id
#> Mode :logical Min. : 1.00
#> FALSE:44 1st Qu.: 26.75
#> TRUE :60 Median : 52.50
#> Mean : 52.50
#> 3rd Qu.: 78.25
#> Max. :104.00
obs <- obs$observations[obs$observations$observed, ]
Once we have a data.frame with the observations, we connect to a SQLite database and import the observations. This assigns every observation to its own cluster.
conn <- connect_db()
import_observations(observations = obs, conn = conn, max_dist = 336)
result <- get_cluster(conn = conn)
nrow(result$observations) == nrow(result$cluster)
#> [1] TRUE
Next, we need to calculate the distance matrix. This is not the full distance matrix. We omit all irrelevant distances, e.g. between observations from the same survey or with a distance larger than twice the maximum cluster distance.
distance_matrix(conn = conn, max_dist = 366)
Now we can start the clustering. The clustering takes into account all
observations with a status
greater than or equal to the set status.
cluster_observation(conn = conn, status = 3, max_dist = 336)
#> ......
result3 <- get_cluster(conn = conn)
nrow(result3$observations) > nrow(result3$cluster)
#> [1] TRUE
Repeat the clustering for every status level. Note that skipping levels implies that we combine them with the lower level.
cluster_observation(conn = conn, status = 1, max_dist = 336)
#> ......................
result1 <- get_cluster(conn = conn)
nrow(result1$observations) > nrow(result1$cluster)
#> [1] TRUE
nrow(result3$cluster) > nrow(result1$cluster)
#> [1] TRUE
summary(result1$observations)
#> id x y survey
#> Min. : 2.00 Min. : -19.55 Min. : 40.03 Min. :1.000
#> 1st Qu.: 22.75 1st Qu.: 550.47 1st Qu.: 976.80 1st Qu.:1.000
#> Median : 53.50 Median : 921.68 Median :1282.60 Median :3.000
#> Mean : 50.32 Mean : 931.40 Mean :1283.44 Mean :2.417
#> 3rd Qu.: 72.25 3rd Qu.:1309.80 3rd Qu.:1798.25 3rd Qu.:3.000
#> Max. :103.00 Max. :1884.38 Max. :2082.18 Max. :4.000
#> status cluster
#> Min. :1.000 Min. : 2.00
#> 1st Qu.:2.000 1st Qu.:10.00
#> Median :2.000 Median :17.00
#> Mean :2.033 Mean :21.33
#> 3rd Qu.:2.250 3rd Qu.:23.00
#> Max. :3.000 Max. :77.00
summary(result1$cluster)
#> cluster n_obs max_status centroid_x
#> Min. : 2.00 Min. :1.000 Min. :1.000 Min. : 19.02
#> 1st Qu.:10.50 1st Qu.:2.000 1st Qu.:2.000 1st Qu.: 490.94
#> Median :19.00 Median :3.000 Median :2.000 Median : 920.00
#> Mean :24.61 Mean :2.609 Mean :2.391 Mean : 915.14
#> 3rd Qu.:26.50 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:1316.24
#> Max. :77.00 Max. :4.000 Max. :3.000 Max. :1800.93
#> centroid_y
#> Min. : 74.42
#> 1st Qu.: 992.81
#> Median :1280.92
#> Mean :1292.35
#> 3rd Qu.:1829.07
#> Max. :2036.60