From e3a4640055f32aab49fe1d8ce08abc34ded3d3a3 Mon Sep 17 00:00:00 2001
From: acerioni 1. Tileset generationoth is the abbreviation we use).
In inference-only scenarios, a single COCO tileset labeled as "other" is generated (oth
).
In training + inference scenarios, the full collection of tilesets is generated: trn
, val
, tst
, oth
.
In training AND inference scenarios, the full collection of tilesets is generated: trn
, val
, tst
, oth
.
The 1st step provides a collection of tiled images, sharing the same size and resolution, plus the corresponding COCO files (trn
+ val
+ tst
and/or oth
depending on the scenario).
The 2nd step performs the actual training of a predictive model, iterating over the training dataset. As already mentioned, we delegate this crucial part of the process to the Detectron2 library; support for other libraries may be implemented in the future, if suitable. Detectron2 comes with a large collection of pre-trained models tailored for various tasks. In particular, as far as instance segmentation is concerned, pre-trained models can be selected from this list.
-In our workflow, we setup Detectron2 in such a way that inference is made on the validation dataset every N training iterations, being N a user-defined parameter. By doing this, we can monitor both the training and validation losses all along the iterative learning and decide when to stop. Typically, learning is stopped when the validation loss reaches a minimum (see e.g. this article for further information on early stopping). As training and validation loss curves are somewhat noisy, these curves can be smoothed on the fly in order to reveal steady trends. Other metrics may be tracked and used to decide when to stop. For now, within our framework (early) stopping can be done manually and is left to the user; it will be made automatic in the future, following some suitable criterion.
+In our workflow, we setup Detectron2 in such a way that inference is made on the validation dataset every N training iterations, N being an user-defined parameter. By doing this, we can monitor both the training and validation losses all along the iterative learning and decide when to stop. Typically, learning is stopped when the validation loss reaches a minimum (see e.g. this article for further information on early stopping). As training and validation loss curves are somewhat noisy, these curves can be smoothed on the fly in order to reveal steady trends. Other metrics may be tracked and used to decide when to stop. For now, within our framework (early) stopping can be done manually and is left to the user; it will be made automatic in the future, following some suitable criterion.
diff --git a/search/search_index.json b/search/search_index.json
index fc9d32c..8faea86 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Swiss Territorial Data Lab - STDL","text":"
The STDL aims to promote collective innovation around the Swiss territory and its digital copy. It mainly explores the possibilities provided by data science to improve official land registering.
A multidisciplinary team composed of cantonal, federal and academic partners is reinforced by engineers specialized in geographical data science to tackle the challenges around the management of territorial data-sets.
The developed STDL platform codes and documentation are published under open licenses to allow partners and Swiss territory management actors to leverage the developed technologies.
"},{"location":"#exploratory-projects","title":"Exploratory Projects","text":"Exploratory projects in the field of the Swiss territorial data are conducted at the demand of institutions or actors of the Swiss territory. The exploratory projects are conducted with the supervision of the principal in order to closely analyze the answers to the specifications along the project. The goal of exploratory project aims to provide proof-of-concept and expertise in the application of technologies to Swiss territorial data.
Detection of occupied and free surfaces on rooftops May 2024Cl\u00e9mence Herny (Exolabs) - Gwena\u00eblle Salamin (Exolabs) - Alessandro Cerioni (\u00c9tat de Gen\u00e8ve) - Roxane Pott (swisstopo) Proposed by the Canton of Geneva- PROJ-ROOFTOPS Free roof surfaces offer great potential for the installation of new infrastructure, such as solar panels and vegetated rooftops. In this project, in collaboration with the Canton of Geneva, we have developed and tested three methods to automatically identify occupied and free surfaces on roofs: (1) classification of roof plane occupancy based on a random forest, (2) segmentation of objects in LiDAR point clouds based on a clustering and (3) segmentation of objects in aerial imagery based on a deep learning. The results are vector layers containing information about surface occupancy. The methods developed on a subset of 122 buildings achieved satisfactory performance. About 85% of the roof planes were correctly classified. The segmentation method was able to detect most of the objects with f1 scores of 0.78 and 0.75 for the LiDAR-based segmentation and the image-based segmentation respectively. The global shape of the occupied surface was more difficult to reproduce with a median intersection over the union of 0.35 and 0.37 respectively. The results of all three methods were considered satisfactory by the experts, with 70% to 95% of the results considered acceptable. Considering the quality of the results and the computational time, only the classification method was selected for application at the cantonal level.
Full article
Automatic Soil Segmentation April 2024Nicolas Beglinger (swisstopo) - Clotilde Marmy (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo) Proposed by the Canton of Fribourg - PROJ-SOILS This project focuses on developing an automated methodology to distinguish areas covered by pedological soil from areas comprised of non-soil. The goal is to generate high-resolution maps (10cm) to aid in the location and assessment of polluted soils. Towards this end, we utilize deep learning models to classify land cover types using raw, raster-based aerial imagery and digital elevation models (DEMs). Specifically, we assess models developed by the Institut National de l\u2019Information G\u00e9ographique et Foresti\u00e8re (IGN), the Haute Ecole d'Ing\u00e9nierie et de Gestion du Canton de Vaud (HEIG-VD), and the Office F\u00e9d\u00e9ral de la Statistique (OFS). The performance of the models is evaluated with the Matthew's correlation coefficient (MCC) and the Intersection over Union (IoU), as well as with qualitatifve assessments conducted by the beneficiaries of the project. In addition to testing pre-existing models, we fine-tuned the model developed by the HEIG-VD on a dataset specifically created for this project. The fine-tuning aimed to optimize the model performance on the specific use-case and to adapt it to the characteristics of the dataset: higher resolution imagery, different vegetation appearances due to seasonal differences, and a unique classification scheme. Fine-tuning with a mixed-resolution dataset improved the model performance of its application on lower-resolution imagery, which is proposed to be a solution to square artefacts that are common in inferences of attention-based models. Reaching an MCC score of 0.983, the findings demonstrate promising performance. The derived model produces satisfactory results, which have to be evaluated in a broader context before being published by the beneficiaries. Lastly, this report sheds light on potential improvements and highlights considerations for future work.
Full article
Cross-generational change detection in classified LiDAR point clouds for a semi-automated quality control April 2024Nicolas M\u00fcnger (Uzufly) - Gwena\u00eblle Salamin (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo) Proposed by the Federal Office of Topography swisstopo - PROJ-QALIDAR The acquisition of LiDAR data has become standard practice at national and cantonal levels during the recent years in Switzerland. In 2024, swisstopo will complete a comprehensive campaign of 6 years covering the whole Swiss territory. The produced point clouds are classified post-acquisition, i.e. each point is attributed to a certain category, such as \"building\" or \"vegetation\". Despite the global control performed by providers, local inconsistencies in the classification persist. To ensure the quality of a Swiss-wide product, extensive time is invested by swisstopo in the control of the classification. This project aims to highlight changes in a new point cloud compared to a previous generation acting as reference. We propose here a method where a common grid is defined for the two generations of point clouds and their information is converted in voxels, summarizing the distribution of classes and comparable one-to-one. This method highlights zones of change by clustering the concerned voxels. Experts of the swisstopo LiDAR team declared themselves satisfied with the precision of the method.
Full article
Automatic detection and observation of mineral extraction sites in Switzerland January 2024Cl\u00e9mence Herny (ExoLabs) - Shanci Li (Uzufly) - Alessandro Cerioni (Etat de Gen\u00e8ve) - Roxane Pott (Swisstopo) Proposed by the Federal Office of Topography swisstopo - TASK-DQRY The study of the evolution of mineral extraction sites (MES) is primordial for the management of mineral resources and the assessment of their environmental impact. In this context, swisstopo has solicited the STDL to automate the vectorisation of MES over the years. This tedious task was previously carried out manually and was not regularly updated. Automatic object detection using a deep learning method was applied to SWISSIMAGE RGB orthophotos with a spatial resolution of 1.6 m px-1. The trained model proved its ability to accurately detect MES, achieving a f1-score of 82%. Detection by inference was performed on images from 1999 to 2021, enabling us to track the evolution of potential MES over several years. Although the results are satisfactory, a careful examination of the detections must be carried out by experts to validate them as true MES. Despite this remaining manual work involved, the process is faster than a full manual vectorisation and can be used in the future to keep MES information up-to-date.
Full article
Dieback of beech trees: methodology for determining the health state of beech trees from airborne images and LiDAR point clouds August 2023Clotilde Marmy (ExoLabs) - Gwena\u00eblle Salamin (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo) Proposed by the Republic and Canton of Jura - PROJ-HETRES Beech trees are sensitive to drought and repeated episodes can cause dieback. This issue affects the Jura forests requiring the development of new tools for forest management. In this project, descriptors for the health state of beech trees were derived from LiDAR point clouds, airborne images and satellite images to train a random forest predicting the health state per tree in a study area (5 km\u00b2) in Ajoie. A map with three classes was produced: healthy, unhealthy, dead. Metrics computed on the test dataset revealed that the model trained with all the descriptors has an overall accuracy up to 0.79, as well as the model trained only with descriptors derived from airborne imagery. When all the descriptors are used, the yearly difference of NDVI between 2018 and 2019, the standard deviation of the blue band, the mean of the NIR band, the mean of the NDVI, the standard deviation of the canopy cover and the LiDAR reflectance appear to be important descriptors.
Full article
Using spatio-temporal neighbor data information to detect changes in land use and land cover April 2023Shanci Li (Uzufly) - Alessandro Cerioni (Canton of Geneva) - Clotilde Marmy (ExoLabs) - Roxane Pott (swisstopo) Proposed by the Swiss Federal Statistical Office - PROJ-LANDSTATS From 2020 on, the Swiss Federal Statistical Office started to update the land use/cover statistics over Switzerland for the fifth time. To help and lessen the heavy workload of the interpretation process, partially or fully automated approaches are being considered. The goal of this project was to evaluate the role of spatio-temporal neighbors in predicting class changes between two periods for each survey sample point. The methodolgy focused on change detection, by finding as many unchanged tiles as possible and miss as few changed tiles as possible. Logistic regression was used to assess the contribution of spatial and temporal neighbors to the change detection. While time deactivation and less-neighbors have a 0.2% decrease on the balanced accuracy, the space deactivation causes 1% decrease. Furthermore, XGBoost, random forest (RF), fully convolutional network (FCN) and recurrent convolutional neural network (RCNN) performance are compared by the means of a custom metric, established with the help of the interpretation team. For the spatial-temporal module, FCN outperforms all the models with a value of 0.259 for the custom metric, whereas the logistic regression indicates a custom metrics of 0.249. Then, FCN and RF are tested to combine the best performing model with the model trained by OFS on image data only. When using temporal-spatial neighors and image data as inputs, the final integration module achieves 0.438 in custom metric, against 0.374 when only the the image data is used.It was conclude that temporal-spatial neighbors showed that they could light the process of tile interpretation.
Full article
Classification of road surfaces March 2023Gwena\u00eblle Salamin (swisstopo) - Cl\u00e9mence Herny (Exolabs) - Roxane Pott (swisstopo) - Alessandro Cerioni (Canton of Geneva) Proposed by the Federal Office of Topography swisstopo - PROJ-ROADSURF The Swiss road network extends over 83\u2019274 km. Information about the type of road surface is useful not only for the Swiss Federal Roads Office and engineering companies, but also for cyclists and hikers. Currently, the data creation and update is entirely done manually at the Swiss Federal Office of Topography. This is a time-consuming and methodical task, potentially suitable to automation by data science methods. The goal of this project is classifying Swiss roads according to their surface type, natural or artificial. We first searched for statistical differences between these two classes, in order to then perform supervised classification based on machine-learning methods. As we could not find any discriminant feature, we used deep learning methods.
Full article
Tree Detection from Point Clouds for the Canton of Geneva March 2022Alessandro Cerioni (Canton of Geneva) - Flann Chambers (University of Geneva) - Gilles Gay des Combes (CJBG - City of Geneva and University of Geneva) - Adrian Meyer (FHNW) - Roxane Pott (swisstopo) Proposed by the Canton of Geneva - PROJ-TREEDET Trees are essential assets, in urban context among others. Since several years, the Canton of Geneva maintains a digital inventory of isolated (or \"urban\") trees. This project aimed at designing a methodology to automatically update Geneva's tree inventory, using high-density LiDAR data and off-the-shelf software. Eventually, only the sub-task of detecting and geolocating trees was explored. Comparisons against ground truth data show that the task can be more or less tricky depending on how sparse or dense trees are. In mixed contexts, we managed to reach an accuracy of around 60%, which unfortunately is not high enough to foresee a fully unsupervised process. Still, as discussed in the concluding section there may be room for improvement.
Full article
Detection of thermal panels on canton territory to follow renewable energy deployment February 2022Nils Hamel (UNIGE) - Huriel Reichel (FHNW) Project in collaboration with Geneva and Neuch\u00e2tel States - TASK-TPNL Deployment of renewable energy becomes a major stake in front of our societies challenges. This imposes authorities and domain expert to promote and to demonstrate the deployment of such energetic solutions. In case of thermal panels, politics ask domain expert to certify, along the year, of the amount of deployed surface. In front of such challenge, this project aims to determine to which extent data science can ease the survey of thermal panel installations deployment and how the work of domain expert can be eased.
Full article
Automatic detection of quarries and the lithology below them in Switzerland January 2022Huriel Reichel (FHNW) - Nils Hamel (UNIGE) Proposed by the Federal Office of Topography swisstopo - TASK-DQRY Mining is an important economic activity in Switzerland and therefore it is monitored by the Confederation through swisstopo. To this points, the identification of quarries has been mode manually, which even being done with very high quality, unfortunately does not follow the constant changing and updating pattern of these features. For this reason, swisstopo contacted the STDL to automatically detect quarries through the whole country. The training was done using SWISSIMAGE with 10cm spatial resolution and the Deep Learning Framework from the STDL. Moreover there were two iteration steps with the domain expert which included the manual correction of detection for new training. Interaction with the domain expert was very relevant for final results and summing to his appreciation, an f1-score of 85% was obtained in the end, which due to peculiar characteristics of quarries can be considered an optimal result.
Full article
Updating the \u00abCultivable Area\u00bb Layer of the Agricultural Office, Canton of Thurgau June 2021Adrian Meyer (FHNW) - Pascal Salath\u00e9 (FHNW) Proposed by the Canton of Thurgau - PROJ-TGLN The Cultivable agricultural area layer (\"LN, Landwirtschaftliche Nutzfl\u00e4che\") is a GIS vector product maintained by the cantonal agricultural offices and serves as the key calculation index for the receipt of direct subsidy contributions to farms. The canton of Thurgau requested a spatial vector layer indicating locations and area consumption extent of the largest silage bale deposits intersecting with the known LN area, since areas used for silage bale storage are not eligible for subsidies. Having detections of such objects readily available greatly reduces the workload of the responsible official by directing the monitoring process to the relevant hotspots. Ultimately public economical damage can be prevented which would result from the payout of unjustified subsidy contributions.
Full article
Swimming Pool Detection for the Canton of Thurgau April 2021Adrian Meyer (FHNW) - Alessandro Cerioni (Canton of Geneva) Proposed by the Canton of Thurgau - PROJ-TGPOOL The Canton of Thurgau entrusted the STDL with the task of producing swimming pool detections over the cantonal area. Specifically interesting was to leverage the ground truth annotation data from the Canton of Geneva to generate a predictive model in Thurgau while using the publicly available SWISSIMAGE aerial imagery datasets provided by swisstopo. The STDL object detection framework produced highly accurate predictions of swimming pools in Thurgau and thereby proved transferability from one canton to another without having to manually redigitize annotations. These promising detections showcase the highly useful potential of this approach by greatly reducing the need of repetitive manual labour.
Full article
Completion of the federal register of buildings and dwellings February 2021Nils Hamel (UNIGE) - Huriel Reichel (swisstopo) Proposed by the Federal Statistical Office - TASK-REGBL The Swiss Federal Statistical Office is in charge of the national Register of of Buildings and Dwellings (RBD) which keep tracks of every existing building in Switzerland. Currently, the register is being completed with building in addition to regular dwellings to offer a reliable and official source of information. The completion of the register introduced issue dues to missing information and their difficulty to be collected. The construction years of the building is one missing information for large amount of register entries. The Statistical Office mandated the STDL to investigate on the possibility to use the Swiss National Maps to extract this missing information using an automated process. A research was conducted in this direction with the development of a proof-of-concept and a reliable methodology to assess the obtained results.
Full article
Swimming Pool Detection from Aerial Images over the Canton of Geneva January 2021Alessandro Cerioni (Canton of Geneva) - Adrian Meyer (FHNW) Proposed by the Canton of Geneva - PROJ-GEPOOL Object detection is one of the computer vision tasks which can benefit from Deep Learning methods. The STDL team managed to leverage state-of-art methods and already existing open datasets to first build a swimming pool detector, then to use it to potentially detect unregistered swimming pools over the Canton of Geneva. Despite the success of our approach, we will argue that domain expertise still remains key to post-process detections in order to tell objects which are subject to registration from those which aren't. Pairing semi-automatic Deep Learning methods with domain expertise turns out to pave the way to novel workflows allowing administrations to keep cadastral information up to date.
Full article
Difference models applied to the land register November 2020Nils Hamel (UNIGE) - Huriel Reichel (swisstopo) Project scheduled in the STDL research roadmap - TASK-DTRK Being able to track modifications in the evolution of geographical datasets is one important aspect in territory management, as a large amount of information can be extracted out of differences models. Differences detection can also be a tool used to assess the evolution of a geographical model through time. In this research project, we apply differences detection on INTERLIS models of the official Swiss land registers in order to emphasize and follow its evolution and to demonstrate that change in reference frames can be detected and assessed.
Full article
"},{"location":"#research-developments","title":"Research Developments","text":"Research developments are conducted aside of the research projects to provide a framework of tools and expertise around the Swiss territorial data and related technologies. The research developments are conducted according to the research plan established by the data scientists and validated by the steering committee.
OBJECT DETECTION FRAMEWORK November 2021**Alessandro Cerioni (Canton of Geneva) - Cl\u00e9mence Herny (Exolabs) - Adrian Meyer (FHNW) - Gwena\u00eblle Salamin (Exolabs) ** Project scheduled in the STDL research roadmap - TASK-IDET This strategic component of the STDL consists of the automated analysis of geospatial images using deep learning while providing practical applications for specific use cases. The overall goal is the extraction of vectorized semantic information from remote sensing data. The involved case studies revolve around concrete object detection use cases deploying modern machine learning methods and utilizing a multitude of available datasets. The goal is to arrive at a prototypical platform for object detection which is highly useful not only for cadastre specialists and authorities but also for stakeholders at various contact points in society.
Full article
AUTOMATIC DETECTION OF CHANGES IN THE ENVIRONMENT November 2020Nils Hamel (UNIGE) Project scheduled in the STDL research roadmap - TASK-DIFF Developed at EPFL with the collaboration of Cadastre Suisse to handle large scale geographical models of different nature, the STDL 4D platform offers a robust and efficient indexation methodology allowing to manage storage and access to large-scale models. In addition to spatial indexation, the platform also includes time as part of the indexation, allowing any area to be described by models in both spatial and temporal dimensions. In this development project, the notion of model temporal derivative is explored and proof-of-concepts are implemented in the platform. The goal is to demonstrate that, in addition to their formal content, models coming with different temporal versions can be derived along the time dimension to compute difference models. Such proof-of-concept is developed for both point cloud and vectorial models, demonstrating that the indexation formalism of the platform is able to ease considerably the computation of difference models. This research project demonstrates that the time dimension can be fully exploited in order to access the data it holds.
Full article
"},{"location":"#steering-committee","title":"Steering Committee","text":"The steering committee of the Swiss Territorial Data Lab is composed of Swiss public administrations bringing their expertise and competences to guide the conducted projects and developments.
Members of the STDL steering committee"},{"location":"#submitting-a-project","title":"Submitting a project","text":"To submit a project to the STDL, simply fill this form. To contact the STDL, please write an email to info@stdl.ch. We will reply as soon as possible!
"},{"location":"PROJ-DQRY/","title":"Automatic Detection of Quarries and the Lithology below them in Switzerland","text":"Huriel Reichel (FHNW) - Nils Hamel (UNIGE) Supervision : Nils Hamel (UNIGE) - Raphael Rollier (swisstopo)
Proposed by swisstopo - PROJ-DQRY June 2021 to January 2022 - Published on January 30th, 2022
Abstract: Mining is an important economic activity in Switzerland and therefore it is monitored by the Confederation through swisstopo. To this points, the identification of quarries has been made manually, which even being done with very high quality, unfortunately does not follow the constant changing and updating pattern of these features. For this reason, swisstopo contacted the STDL to automatically detect quarries through the whole country. The training was done using SWISSIMAGE with 10cm spatial resolution and the Deep Learning Framework from the STDL. Moreover there were two iteration steps with the domain expert which included the manual correction of detection for new training. Interaction with the domain expert was very relevant for final results and summing to his appreciation, an F1 Score of 85% was obtained in the end, which due to peculiar characteristics of quarries can be considered an optimal result.
"},{"location":"PROJ-DQRY/#1-introduction","title":"1 - Introduction","text":"Mining is an important economic activity worldwide and this is also the case in Switzerland. The Confederation topographic office (swisstopo) is responsible for monitoring the presence of quarries and also the materials being explored. This is extremely relevant for planning the demand and shortage of explored materials and also their transportation through the country. As this of federal importance the mapping of these features is already done. Although this work is very detailed and accurate, quarries have a very characteristical updating pattern. Quarries can appear and disappear in a matter of a few months, in especial when they are relatively small, as in Switzerland. Therefore it is of interest of swisstopo to make an automatic detection of quarries in a way that it is also reproducible in time.
A strategy offen offered by the Swiss Territorial Data Lab is the automatic detection of several objects in aerial imagery through deep learning, following our Object Detection Framework. In this case it is fully applicable as quarries in Switzerland are relatively small, so high resolution imagery is required, which is something our Neural Network has proven to tackle with mastery in past projects. This high resolution imagery is also reachable through SWISSIMAGE, aerial images from swisstopo that cover almost the whole country with a 10cm pixel size (GSD).
Nevertheless, in order to train our neural network, and as it's usually the case in deep learning, several labelled images are required. These data work as ground truth so that the neural network \"learns\" what's the object to be detected and which should not. For this purpose, the work from the topographic landscape model (TLM) team of swisstopo has been of extreme importance. Among different surface features, quarries have been mapped all over Switzerland with a highly detailed scale.
Although the high quality and precision of the labels from TLM, quarries are constantly changing, appearing and disappearing, and therefore the labels are not always synchronized with the images from SWISSIMAGE. This lack of of synchronization between these sets of data can be seen in Figure 1, where in the left one has the year of mapping of TLM and on the right the year of the SWISSIMAGE flights.
Figure 1 : Comparison of TLM (left) and SWISSIMAGE (right) temporality.For this purpose, a two-times interaction was necessary with the domain expert. In order to have a ground truth that was fully synchronized with SWISSIMAGE we required two stages of training : one making use of the TLM data and a second one with a manual correction of the predicted labels from the first interaction. It is of crucial importance to state that this correction needed to be made by the domain expert so that he could carefully check each detection in pre-defined tiles. With that in hands, we could go further with a more trustworthy training.
As stated, it is of interest of swisstopo to also identify the material being explored by every quarry. For that purpose, it was recommended the usage of the GeoCover dataset from swisstopo as well. This dataset a vector layer of the the geological cover of the whole Switzerland, which challenged us to cross the detector predictions with such vector information.
In summary, the challenge of the STDL was to investigate to which extent is it possible to automatically detect quarries using Deep Learning considering their high update ratio using aerial imagery.
"},{"location":"PROJ-DQRY/#2-methodology","title":"2 - Methodology","text":"First of all the \"area of interest\" must be identified. This is where the detection and training took place. In this case, a polygon of the whole Switzerland was used. After that, the area of interest is divided in several tiles of fixed size. This is then defining the slicing of SWISSIMAGE (given as WMS). For this study, tiles of different sizes were tested, being 500x500m tiles defined for final usage. Following the resolution of the images must be defined, which, again, after several tests, was defined as 512x512 pixels.
For validation purposes the data is then split into Training, Validation and Testing. The training data-set is used inside the network for its learning; the validation is completely apart from training and used only to check results and testing is used for cross-validation. 70% of the data was used for training, 15% for validation and 15% for testing.
To what concerns the labels, the ones from TLM were manually checked so that a group of approximately 250 labels with full synchronization with the SWISSIMAGE were found and recorded. Following, the first row training passes through the same framework from former STDL projects. We make use of a predictive Recursive-Convolutional Neural Network with ResNet-50 backbone provided by Detectron2. A deeper explanation of the network functionality can be found here and here.
Even with different parameters set, it was observed that predictions were including too much false positives, which were mainly made of snow. Most probably the reflectance of snow is similar to the one of quarries and this needed a special treatment. For this purpose, a filtering of the results was used. First of all the features were filtered based on the score values (0.9) and then by elevation, using the SRTM digital elevation model. As snow usually does not precipitate below around 1155 m, this was used as threshold. Finally an area threshold is also passed (using smallest predictions area) and predictions are merged. A more detailed description of how to operate this first filter can be seen here.
Once several tests were performed, the new predictions were sent back to the domain expert for detailed revision with a rigid protocol. This included the removal of false positives and the inclusion of false negatives, mainly. This was performed by 4 different experts from swisstopo in 4 regions with the same amount of tiles to be analyzed. It is important to the state again the importance of domain expertise in this step, as a very careful and manual evaluation of what is and what is not a quarry must be made.
Once the predictions were corrected, a new session of training was performed using different parameters. Once again, the same resolution and tile size were used as in the first iteration (512x512m tiles with 512x512 pixels of resolution), although this time a new filtering was developed. Very similar to the first one, but in a different order, allowing more aesthetical predictions in the end, something the domain expert was also carrying about.
This procedure is summarized in figure 2.
Figure 2 : Methodology applied for the detection of quarries and new training sessions.In the end, in order to also include the geological information of the detected quarries, a third layer resulting of the intersection of both the predictions and the GeoCover labels is created. This was done in a way that the final user can click to obtain both information on the quarry (when not a pure prediction) and the information of the geology/lithology on this part of the quarry. As a result, each resulting intersection poylgon contains both information from quarry and GeoCover.
In order to evaluate the obtained results, the F1 Score was computed and also the final predictions were compared to the corrected labels from the domain expert side. This was done visually by acquiring the centroid of each quarry detected and by a heat-map, allowing one to detect the spatial pattern of detections. The heat-map was computed using 10'000 m radius and a 100 m pixel size.
"},{"location":"PROJ-DQRY/#3-results-discussion","title":"3 - Results & Discussion","text":"In the first iteration, when the neural was trained with some labels of the TLM vector data, an optimal F1 score of approximately 0.78 was obtained. The figure 3 shows the behavior of the precision, recall and F1 score for the final model selected.
Figure 3 : Precision, Recall and F1 score of the first iteration (using TLM data).Given the predictions resulting from the correction by the domain experts, there was an outstanding improve in the F1 score obtained, which was of approximately 0.85 in its optimal, as seen in figure 4. A total of 1265 were found in Switzerland after filtering.
Figure 4 : Precision, Recall and F1 score of the second iteration (using data corrected by the domain expert).Figure 5 demonstrates some examples of detected quarries and this one can have some notion of the quality of the shape of the detections and how they mark the real-world quarries. Examples of false positives and false negatives, unfortunately still present in the detections are also shown. This is also an interesting demonstration of how some objects that are very similar to quarries, in the point of view of non-experts and how they may influence the results. These examples of errors are also an interesting indication of the importance of domain expertise in evaluating machine made results.
Figure 5 : Examples of detected quarries, with true positive, false negative and false positive.To check on the validity of the new predictions generated, the centroid of them was plot along the centroid of the corrected labels, so one could check the spatial pattern of them and this way evaluate if they were respecting the same behavior. Figure 6 shows this plot.
Figure 6 : Disposition of the centroids of assessed predictions and final predictions.One can see that despite some slight differences, the overall pattern is very similar among the disposition of the predictions. A very similar result can be seen with the computed heat-map of these points, seen in figure 7.
Figure 7 : Heatmap of assessed predictions and final predictions.There is a small area to the west of the country where there were less detections than desired and in general there were more predictions than before. The objective of the heat-map is more to give a general view of the results than giving an exact comparison, as a point is created for every feature and the new filter used tended to smooth the results and join many features into a single one too.
At the end the results were also intersected with GeoCover, which provide the Swiss soil detailed lithology, and an example of the results can be seen below using the QGIS Software.
Figure 8 : Intersection of predictions with GeoCover seen in QGIS.Finally and most important, the domain expert was highly satisfied with this work, due to the support it can give to swisstopo and the TLM team in mapping the future quarries. The domain expert also demonstrated interest in pursuing the work by investigating the temporal pattern of quarries and detecting the volume of material in each quarry.
"},{"location":"PROJ-DQRY/#4-conclusion","title":"4 - Conclusion","text":"Through this collaboration with swisstopo, we managed to demonstrate that data science is able to provide relevant and efficient tool to ease complex and time-consuming task. With the produced inventory of the quarries on the whole Swiss territory, we were able to provide a quasi-exhaustive view of the situation to the domain expert, leading him to have a better view of the exploitation sites.
This is of importance and a major step forward compared to the previous situation. Indeed, before this project, the only solution available to the domain expert was to gather all the federal and cantonal data, through non-standardized and time-consuming process, to hope having a beginning of an inventory, with temporality issues. With the developed prototype, within hours, the entire SWISSIMAGE data-set can be processed and turn into a full scale inventory, guiding the domain expert directly toward its interests.
The resulting geographical layer can then be seen as the result of this demonstrator, able to turn the aerial images into a simple polygonal layer representing the quarries, with little false positive and false negative, providing the required view for the domain expert understanding of the Swiss situation. With such a result, it is possible to convolve it with all the other existing data, with the GeoCover in the first place. This lithology model of the Swiss soil can be intersected with the produced quarries layer in order to create a secondary geographical layer merging both quarries location and quarries soil type, leading to a powerful analyzing tool for the domain expert.
The produced demonstrator shows that it is possible, in hours, to deduce a simple and reliable geographical layer based on a simple set of orthomosaic. The STDL then was able to prove the possibility to repeat the process along the time dimension, for future and past images, opening the way to build and rebuild the history and evolution of the quarries. With such a process, it will be possible to compute statistical quantities on the long term to catch the evolution and the resources, leading to more reliable strategical understanding of the Swiss resources and sovereignty.
"},{"location":"PROJ-DQRY-TM/","title":"Automatic detection and observation of mineral extraction sites in Switzerland","text":"Cl\u00e9mence Herny (Exolabs), Shanci Li (Uzufly), Alessandro Cerioni (\u00c9tat de Gen\u00e8ve), Roxane Pott (swisstopo)
Proposed by swisstopo - PROJ-DQRY-TM October 2022 to February 2023 - Published on January 2024
Abstract: Studying the evolution of mineral extraction sites (MES) is of primary importance for assessing the availability of mineral resources, managing MES and evaluating the impact of mining activity on the environment. In Switzerland, MES are inventoried at local level by the cantons and at federal level by swisstopo. The latter performs manual vectorisation of MES boundaries. Unfortunately, although the data is of high quality, it is not regularly updated. To automate this tedious task and to better observe the evolution of MES, swisstopo has solicited the STDL to carry out an automatic detection of MES in Switzerland over the years. We performed instance segmentation using a deep learning method to automatically detect MES in RGB aerial images with a spatial resolution of 1.6 m px-1. The detection model was trained with 266 labels and orthophotos from the SWISSIMAGE RGB mosaic published in 2020. The selected trained model achieved a f1-score of 82% on the validation dataset. The model was used to do detection by inference of potential MES in SWISSIMAGE RGB orthophotos from 1999 to 2021. The model shows good ability to detect potential MES with about 82% of labels detected for the 2020 SWISSIMAGE mosaic. The detections obtained with SWISSIMAGE orthophotos acquired over different years can be tracked to observe their temporal evolution. The framework developed can perform detection in an area of interest (about a third of Switzerland at the most) in just a few hours, which is a major advantage over manual mapping. We acknowledge that there are some missed and false detections in the final product, and the results need to be reviewed and validated by domain experts before being analysed and interpreted. The results can be used to perform statistics over time and update MES evolution in future image acquisitions.
"},{"location":"PROJ-DQRY-TM/#1-introduction","title":"1. Introduction","text":""},{"location":"PROJ-DQRY-TM/#11-context","title":"1.1 Context","text":"Mineral extraction constitutes a strategic activity worldwide, including in Switzerland. Demand for mineral resources has been growing significantly in recent decades1, mainly due to the rapid increase in the production of batteries and electronic chips, or buildings construction, for example. As a result, the exploitation of some resources, such as rare earth elements, lithium, or sand, is putting pressure on their availability. Being able to observe the development of mineral extraction sites (MES) is of primary importance to adapting mining strategy and anticipating demand and shortage. Mining has also strong environmental and societal impact23. It implies the extraction of rocks and minerals from water ponds, cliffs, and quarries. The surface affected, initially natural areas, can reach up to thousands of square kilometres1. The extraction of some minerals could lead to soil and water pollution and involves polluting truck transport. Economic and political interests of some resources might overwhelm land protection, and conflicts are gradually intensifying2.
MES are dynamic features that can evolve according to singular patterns, especially if they are small, as is the case in Switzerland. A site can expand horizontally and vertically or be filled to recover the site4235. Changes can happen quickly, in a couple of months. As a results, updating the MES inventory can be challenging. There is a significant demand for effective MES observation of development worldwide. Majority of MES mapping is performed manually by visual inspection of images1. Alternatively, recent improvements in the availability of high spatial and temporal resolution space/airborne imagery and computational methods have encouraged the development of automated image processing. Supervised classification of spectral images is an effective method but requires complex workflow 642. More recently, few studies have implemented deep learning algorithms to train models to detect extraction sites in images and have shown high levels of accuracy3.
In Switzerland, MES management is historically regulated on a canton-based level using GIS data, including information about the MES location, extent, and extracted materials among others. At the federal level, swisstopo and the Federal Office of Statistics (FSO) observe the development of MES. swisstopo has carried out a detailed manual delineation of MES based on SWISSIMAGE dataset over Switzerland.
In the scope to fasten and improving the process of MES mapping in Switzerland, we developed a method for automating MES detection over the years. Ultimately, the goal is to keep the database up to date when new images are acquired. The results can be statistically process to better assess the MES evolution over time in Switzerland.
"},{"location":"PROJ-DQRY-TM/#12-approach","title":"1.2. Approach","text":"The STDL has developed a framework named object-detector to automatically detect objects in a georeferenced imagery dataset based on deep learning method. The framework can be adapted to detect MES (also referred as quarry in the project) in Switzerland.
A project to automatically detect MES in Switzerland7 has been carried out by the STDL in 2021 (detector-interface framework). Detection of potential MES obtained by automatic detection on the 2020 SWISSIMAGE mosaic has already been delivered to swisstopo (layer 2021_10_STDL_QC1). The method has proven its efficiency detecting MES. The numerical model trained with the object detector achieved a f1-score of 82% and detected about 1200 potential MES over Switzerland.
In this project, we aim to continue this work and extend it to a second objective, that of observing MES evolution over time. The main challenge is to prove the algorithm reliability for detecting objects in a multi-year dataset images acquired with different sensors.
The project workflow is synthesised in Figure 1. First, a deep learning algorithm is trained using a manually mapped MES dataset that serves as ground truth (GT). After evaluating the performance of the trained model, the selected one was used to perform inference detection for a given year dataset and area of interest (AoI). The results were filtered to discard irrelevant detection. The operation was repeated over several years. Finally, each potential MES detected was tracked over the years to observe its evolution.
Figure 1: Workflow diagram for automatic MES detection.In this report, we first describe the data used, including the image description and the definition of AoI. Then we explain the model training, evaluation and object detection procedure. Next, we present the results of potential MES detection and the MES tracking strategy. Finally, we provide conclusion and perspectives.
"},{"location":"PROJ-DQRY-TM/#2-data","title":"2. Data","text":""},{"location":"PROJ-DQRY-TM/#21-images-and-area-of-interest","title":"2.1 Images and area of interest","text":"Automatic detection of potential MES over the years in Switzerland was performed with aerial orthophotos from the swisstopo product SWISSIMAGE Journey. Images are georeferenced RGB TIF tiles with a size of 256 x 256 pixels (1 km2).
Product Year Coordinate system Spatial resolution SWISSIMAGE 10 cm 2017 - current CH1903+/MN95 (EPSG:2056) 0.10 m (\\(\\sigma\\) \\(\\pm\\) 0.15 m) - 0.25 m SWISSIMAGE 25 cm 2005 - 2016 MN03 (2005 - 2007) and MN95 (since 2008) 0.25 m (\\(\\sigma\\) \\(\\pm\\) 0.25 m) - 0.50 m (\\(\\sigma\\) \\(\\pm\\) 3.00 - 5.00 m) SWISSIMAGE 50 cm 1998 - 2004 MN03 0.50 m (\\(\\sigma\\) \\(\\pm\\) 0.50 m) Table 1: SWISSIMAGE products characteristics.
Several SWISSIMAGE products exist, produced from different instrumentation (Table 1). SWISSIMAGE mosaics are built and published yearly. The year of the mosaic corresponds to the last year of the dataset publication, and the most recent orthophotos datasets available are then used to complete the mosaic. For example the 2020 SWISSIMAGE mosaic is a combination of 2020, 2019 and 2018 images acquisition. The 1998 mosaic release corresponds to a year of transition from black and white images (SWISSIMAGE HIST) to RGB images. For this study, only RGB data from 1999 to 2021 were considered.
Figure 2: Acquisition footprint of SWISSIMAGE aerial orthophotos for the years 2016 to 2021. The SWISSIMAGE Journey mosaic in the background is the 2020 release.Acquisition footprints of yearly acquired orthophotos were used as AoI to perform MES detection through time. Over the years, the footprints may spatially overlap (Fig. 2). Since 2017, the geometry of the acquisition footprints has been quasi-constant, dividing Switzerland into three more or less equal areas, ensuring that the orthophotos are updated every three years. For the years before 2017, the acquisition footprints were not systematic and do not guarantee a periodically update of the orthophotos. The acquisition footprint may also not be spatially contiguous.
Figure 3: Illustration of the combination of SWISSIMAGE images and FSO images for the 2007 SWISSIMAGE mosaic. (a) Overview of the 2007 SWISSIMAGE mosaic. The red polygon corresponds to the provided SWISSIMAGE acquisition footprint for 2007. The orange polygon corresponds to the surface covered by the new SWISSIMAGE for 2007. The remaining area of the red polygon corresponds to the FSO image dataset acquired in 2007. The black box indicates the panel (b) location, and the white box indicates the panel (c) location. (b) Side-by-side comparison of image composition in 2006 and 2007 SWISSIMAGE mosaics. (c) Examples of detection polygons (white polygons) obtained by inference on the 2007 SWISSIMAGE dataset (red box) and FSO images 2007 (outlined by black box).SWISSIMAGE Journey mosaics of 2005, 2006, and 2007 present a particularity as it is composed not only of 25 cm resolution SWISSIMAGE but also of orthophotos acquired for the FSO. These are tiff RGB orthophotos with a spatial resolution of 50 cm px-1 (coordinate system: CH1903/LV03 (EPSG:21781)) and have been integrated into the SWISSIMAGE Journey products. However, these images were discarded (modification of the footprint shape) from our dataset because they were causing issues in the MES automatic detection producing odd segmented detection shapes (Fig. 3). This is probably due to the different stretching of pixel colour between datasets.
It also has to be noted that there are currently missing images (about 88 tiles at zoom level 16) in the 2020 SWISSIMAGE dataset.
"},{"location":"PROJ-DQRY-TM/#22-image-fetching","title":"2.2 Image fetching","text":"Pre-rendered SWISSIMAGE tiles (256 x 256 px, 1 km2) are downloaded using the Web Map Tile Service (WMTS) wmts.geo.admin.ch via an XYZ connector. Tiles are served on a cartesian coordinates grid using a Web Mercator Quad projection and a coordinate reference system EPGS 3857. Position of a tile on the grid is defined by x and y coordinates and the pixel resolution of the image is defined by z, its zoom level. Changing the zoom level affects the resolution by a factor of 2 (Fig. 4). For instance a zoom level of 17 corresponds to a resolution of 0.8 m px-1 and a zoom level of 16 to a resolution of 1.6 m px-1.
Figure 4: Examples of tiles geometry at zoom level 16 (z16, black polygons) and at zoom level 17 (z17, blue polygons). The number of tiles for each zoom level is indicated in square brackets. The tiles are selected for model training, i.e. only tiles intersecting swissTLM3D labels (tlm-hr-trn-topo, yellow polygons).Note that in the subsequent project carried out by Reichel and Hamel (2021)7, the tiling method adopted was slightly different from the one adopted for this project. Custom size and resolution tiles were built. A sensitivity analysis of these two parameters was conducted and led to the choice of tiles with a size of about 500 m and a pixel resolution of about 1 m (above, the performance was not significantly improved).
"},{"location":"PROJ-DQRY-TM/#23-ground-truth","title":"2.3 Ground truth","text":"The MES labels originate from the swiss Topographic Landscape Model 3D (swissTLM3D) produced by swisstopo. swissTLM3D is a large-scale topographic landscape model of Switzerland, including manually drawn and georeferenced vectors of objects of interest at a high resolution, including MES features. Domain experts from swisstopo have carried out extensive work to review the labeled MES and to synchronise them with the 2020 SWISSIMAGE mosaic to improve the quality of the labeled dataset. A total of 266 labels are available. The mapped MES reveal the diversity of MES characteristics, such as the presence or absence of buildings/infrastructures, trucks, water pounds, and vegetation (Fig. 5).
Figure 5: Examples of MES mapped in swissTLM3D and synchronised to 2020 SWISSIMAGE mosaic.These labels are used as the ground truth (GT) i.e. the reference dataset indicating the presence of a MES in an image. The GT is used both as input to train the model to detect MES and to evaluate the model performance.
"},{"location":"PROJ-DQRY-TM/#3-automatic-detection-methodology","title":"3. Automatic detection methodology","text":""},{"location":"PROJ-DQRY-TM/#31-deep-learning-algorithm-for-object-detection","title":"3.1 Deep learning algorithm for object detection","text":"Training and inference detection of potential MES in SWISSIMAGE were performed with the object detector framework. This project is based on the open source detectron2 framework8 implemented with PyTorch by the Facebook Artificial Intelligence Research group (FAIR). Instance segmentation (delineation of object) was performed with a Mask R-CNN deep learning algorithm9. It is based on a Recursive-Convolutional Neural Network (CNN) with a backbone pre-trained model ResNet-50 (50 layers deep residual network).
Images were annotated with custom COCO object based on the labels (class 'Quarry'). The model is trained with this dataset to later perform inference detection on images. If the object is detected by the algorithm, a pixel mask is produced with a confidence score (0 to 1) attributed to the detection (Fig. 6).
Figure 6: Example of detection mask. The pink rectangle corresponds to the bounding box of the object, the object is segmented by the pink polygons associated with the detection class ('Quarry') and a confidence score.The object detector framework permits to convert detection mask to georeferenced polygon that can be used in GIS softwares. The implementation of the Ramer-Douglas-Peucker (RDP) algorithm, allows the simplification of the derived polygons by discarding non-essential points based on a smoothing parameter. This allow to considerably reduces the amount of data to be stored and prevent potential memory saturation while deriving detection polygons on large areas as it is the case for this study.
"},{"location":"PROJ-DQRY-TM/#32-model-training","title":"3.2 Model training","text":"Orthophotos from the 2020 SWISSIMAGE mosaic, for which the GT has been defined, were chosen to proceed the model training. Tiles intersecting labels were selected and split randomly into three datasets: the training dataset (70%), the validation dataset (15%), and the test dataset (15%). Addition of empty tiles (no annotation) to confront the model to landscapes not containing the target object has been tested (Appendix A.1) but did not provide significant improvement in the model performance to be adopted.
Figure 7: Training curves obtained at zoom level 16 on the 2020 SWISSIMAGE mosaic. The curves were obtained for the trained model 'replicate 3'. (a) Learning rate in function of iteration. The step was defined every 500 iterations. The initial learning rate was 5.0 x 10-3 with a weight and bias decay of 1.0 x 10-4. (b) The total loss is a function of iteration. Raw measurement (light red) and smoothed curve (0.6 factor, solid red) are superposed. (c) The validation loss curve is a function of iteration. Raw measurement (light red) and smoothed curve (0.6 factor, solid red) are superposed. The vertical dashed black lines indicate the iteration minimising the validation loss curve, i.e. 3000.Models were trained with two images per batch (Appendix A.2), a learning rate of 5 x 10-3, and a learning rate decay of 1 x 10-4 every 500 steps (Fig. 7 (a)). For the given model, parameters and a zoom level of 16 (Section 3.3.3), the training is performed over 7000 iterations and lasts about 1 hour on a 16 GiB GPU (NVIDIA Tesla T4) machine compatible with CUDA. The total (train and validation loss) loss curve decreases until reaching a quasi-steady state around 6000 iterations (Fig. 7 (b)). The optimal detection model corresponds to the one minimising the validation loss curve. This minimum is reached between 2000 and 3000 iterations (Fig. 7 (c)).
"},{"location":"PROJ-DQRY-TM/#33-metrics","title":"3.3 Metrics","text":"The model performance and detection reliability were assessed by comparing the results to the GT. The detection performed by the model can be either (1) a True Positive (TP), i.e. the detection is real (spatially intersecting the GT) ; (2) a False Positive i.e. the detection is not real (not spatially intersecting the GT) or (3) a False Negative (FN) i.e. the labeled object is not detected by the algorithm (Fig. 8). Tagging the detection (Fig. 9(a)) allows to calculate several metrics (Fig. 9(b)) such as:
Figure 8: Examples of different detection cases. Label is represented with a yellow polygon and detection with a red polygon. (a) True Positive (TP) detection intersecting the GT, (b) a potential True Positive (TP?) detection with no GT, (c) False Negative (FN) case with no detection while GT exists, (d) False Positive (FP) detection of object that is not a MES.the recall, translating the amount of TP detections predicted by the model:
\\[recall = \\frac{TP}{(TP + FN)}\\]the precision, translating the number of well-predicted TP among all the detections:
\\[precision = \\frac{TP}{(TP + FP)}\\]the f1-score, the harmonic average of the precision and the recall:
\\[f1 = 2 \\times \\frac{recall \\times precision}{recall + precision}\\]Trained models reached f1-scores of about 80% with a standard deviation of 2% (Table 2). The performances are similar to the model trained by Reichel and Hamel (2021)7.
model precision recall f1 replicate 1 0.84 0.79 0.82 replicate 2 0.77 0.76 0.76 replicate 3 0.83 0.81 0.82 replicate 4 0.89 0.77 0.82 replicate 5 0.78 0.82 0.80
Table 2: Metrics value computed for the validation dataset for trained models replicates with the 2020 SWISSIMAGE mosaic at zoom level 16.
A variability is expected as the deep learning algorithm displays some random behavior, but it is supposed to be negligible. However, the observed model variability is enough to affect final results that might slightly change by using different trained models with same input parameters (Fig. 10).
Figure 10: Detection polygons obtained for the different trained model replicates (Table 2) highlighting results variability. The labels correspond to orange polygons. The number in the square bracket corresponds to the number of polygons. The inference detections have been performed on a subset of 2000 tiles for the 2020 SWISSIMAGE at zoom level 16. Detections have been filtered according to the parameters defined in Section 5.1.To reduce the variability of the trained models, the random seeds of both detectron2 and python have been fixed. Neither of these attempts have been successful, and the variability remains. The nondeterministic behavior of detectron2 has been recognised (issue 1, issue 2), but no suitable solution has been provided yet. Further investigation on the model performance and consistency should be performed in the future.
To mitigate the results variability of model replicates, we could consider in the future to combine the results of several model replicates to remove FP while preserving the TP and potential TP detection. The choice and number of models used should be evaluated. This method is tedious as it requires inference detection from several models, which can be time-consuming and computationally intensive.
"},{"location":"PROJ-DQRY-TM/#42-sensitivity-to-the-zoom-level","title":"4.2 Sensitivity to the zoom level","text":"Image resolution is dependent on the zoom level (Section 2.2). To select the most suitable zoom level for MES detection, we performed a sensitivity analysis on trained model performance. Increasing the zoom level increases the value of the metrics following a global linear trend (Fig. 11).
Figure 11: Metrics values (precision, recall and f1) as function of zoom level for the validation dataset. The results of the replicates performed at each zoom level are included (Table A1).Models trained at a higher zoom level performed better. However, a higher zoom level implies smaller tile and thus, a larger number of tiles to fill the AoI. For a typical AoI, i.e up to a third of Switzerland, this can lead to a large number of tiles to be stored and processed, leading to potential RAM and/or disk space saturation. For 2019 AoI, 89'290 tiles are required at zoom level 16 while 354'867 tiles are required at zoom level 17, taking respectively 3 hours and 11 hours to process on a 30 GiB RAM machine with a 16 GiB GP.
Visual comparison of inference detection reveals that there was no significant improvement in the object detection quality from zoom level 16 to zoom level 17. Both zoom level present a similar proportion of detections intersecting labels (82% and 79% for zoom level 16 and zoom level 17 respectively). On the other hand, the quality of object detection at zoom level 15 was depreciated. Indeed, detection scores were lower, with only tens of detection scores above 0.95 while it was about 400 at zoom level 16 and about 64% of detection intersecting labels.
"},{"location":"PROJ-DQRY-TM/#43-model-choice","title":"4.3 Model choice","text":"Based on tests performed, we selected the 'replicate 3' model, obtained (Tables 2 and A1) at zoom level 16, to perform inference detection.
Models trained at zoom level 16 (1.6 m px-1 pixel resolution) have shown satisfying results in accurately detecting MES contour and limiting the number of FP with high detection score (Fig. 11). It represents a good trade-off between results reliability (f1-score between 76% and 82% on the validation dataset) and computational resources. Then, among all the replicates performed at zoom level 16, we selected the trained model 'replicate 3' (Table 2) because it combines both the highest metrics values (for the validation dataset but also the train and test datasets), close precision and recall values and a rather low amount of low score detections.
"},{"location":"PROJ-DQRY-TM/#5-automatic-detection-of-mes","title":"5. Automatic detection of MES","text":""},{"location":"PROJ-DQRY-TM/#51-detection-post-processing","title":"5.1 Detection post-processing","text":"Detection by inference was performed over AoIs with a threshold detection score of 0.3 (Fig. 12). The low score filtering results in a large amount of detections. Several detections may overlap, potentially segmenting a single object. In addition a detection might be split into multiple tiles. To improve the pertinence and the aesthetics of the raw detection polygons, a post-processing procedure was applied.
First, a large proportion of FP occurred in mountainous areas (rock outcrops and snow, Fig. 12(a)). We assumed MES are not present (or at least sparse) above a given altitude. An elevation filtering was applied using a Switzerland Digital Elevation Model (about 25 m px-1) derived from the SRTM instrument (USGS - SRTM). The maximum elevation of the labeled MES is about 1100 m.
Second, detection aggregation was applied: - polygons were clustered (K-means) according to their centroid position. The method involves setting a predefined number k of clusters. Manual tests performed by Reichel and Hamel (2021)7 concluded to set k equal to the number of detection divided by three. The highest detection score was assigned to the clustered detection. This method preserves the final integrity of detection polygons by retaining detection that has potentially a low confidence score but belongs to a cluster with a higher confidence score improving the final segmentation of the detected object. The value of the threshold score must be kept relatively low (i.e. 0.3) when performing the detection to prevent removing too many polygons that could potentially be part of the detected object. We acknowledge that determining the optimal number of clusters by clustering validation indices rather than manual adjustment would be more robust. In addition, exploring other clustering methods, such as DBSCAN, based on local density, can be considered in the future. - score filtering was applied. - spatially close polygons were assumed to belong to the same MES and are merged according to a distance threshold. The averaged score of the merged detection polygons was ultimately computed.
Finally, we assumed that a MES covers a minimal area. Detection with an area smaller than a given threshold were filtered out. The minimum MES area in the GT is 2270 m2.
Figure 12: MES detection filtering. (a) Overview of the automatic detection of MES obtained with 2020 SWISSIMAGE at zoom level 16. Transparent red polygons (with associated confidence score in white) correspond to the raw object detection output and the red line polygons (with associated confidence score in red) correspond to the final filtered detection. The black box outlines the location of the (b) and (c) panel zoom. Note the large number of detection in the mountains (right area of the image). (b) Zoom on several raw detections polygons of a single object with their respective confidence score. (c) Zoom on a filtered detection polygon of a single object with the resulting score.Sensitivity of detections to these filters was investigated (Table 3). The quantitative evaluation of filter combination relevance is tricky as potential MES presence is performed by inference, and the GT provided by swissTLM3D constitutes an incomplete portion of the MES in Switzerland (2020). As indication, we computed the number of spatial intersection between ground truth and detection obtained with the 2020 SWISSIMAGE mosaic. Filter combination number 3 was adopted, allowing to detect about 82% of the GT with a relatively limited amount of FP detection compared to filter combinations 1 and 2 (from visual inspection).
filters combination score threshold elevation threshold (m) area threshold (m2) distance threshold (m) number of detection label detection (%) 1 0.95 2000 1100 10 1745 85.1 2 0.95 2000 1200 10 1862 86.6 3 0.95 5000 1200 10 1347 82.1 4 0.96 2000 1100 10 1331 81.3 5 0.96 2000 1200 8 1445 78.7 6 0.96 5000 1200 10 1004 74.3
Table 3: Threshold values of filtering parameters and their respective number of detections and intersection proportion with swissTLM3D labels. The detections have been obtained for the 2020 SWISSIMAGE mosaic.
We acknowledged that for the selected filter combination, the area threshold value is higher than the smallest area value of the GT polygons. However, reducing the area value increases significantly the presence of FP. Thirteen labels display an area below 5000 m2.
"},{"location":"PROJ-DQRY-TM/#52-inference-detections","title":"5.2 Inference detections","text":"The trained model was used to perform inference detection on SWISSIMAGE orthophotos from 1999 to 2021. The automatic detection model shows good capabilities to detect MES in different years orthophotos (Fig. 13), despite being trained on the 2020 SWISSIMAGE mosaic. The model also demonstrates capabilities to detect potential MES that have not been mapped yet but are strong candidates. However, the model misses some labeled MES or potential MES (FN, Fig. 8). However, when the model process FSO images, with different colour stretching, it failed to correctly detect potential MES (Fig. 3). It reveals that images must have characteristics close to the training dataset for optimal results with a deep learning model.
Figure 13: Examples of object detection segmented by polygons in different year orthophotos. The yellow polygon for the year 2020 panel of object ID 3761 corresponds to the label. Other coloured polygons correspond to the algorithm detection.Then, we acknowledge that a significant amount of FP detection can still be observed in our filtered detection dataset (Figs. 8 and 14). The main sources of FP are the presence of large rock outcrops, mountainous areas without vegetation, snow, river sand beds, brownish-coloured fields, or construction areas. MES present a large variety of features (buildings, water pounds, trucks, vegetation) (Fig. 5) which can be a source of confusion for the algorithm but even sometimes for human eye. Therefore, the robustness of the GT is crucial for reliable detection. The algorithm's results should be taken carefully.
Figure 14: Examples of FP detection. (a) Snow patches (2019) ; (b) River sand beds and gullies (2019); (c) Brownish field (2020); (d) vineyards (2005); (e) Airport tarmac (2020); (f) Construction site (2008).The detections produced by the algorithm are potential MES, but the final results must be reviewed by experts in the field to discard remaining FP detection and correct FN before any processing or interpretation.
"},{"location":"PROJ-DQRY-TM/#6-observation-of-mes-evolution","title":"6. Observation of MES evolution","text":""},{"location":"PROJ-DQRY-TM/#61-object-tracking-strategy","title":"6.1 Object tracking strategy","text":"Switzerland is covered by RGB SWISSIMAGE product over more than 20 years (1999 to actual), allowing changes to be detected (Fig. 13).
Figure 15: Strategy for MES tracking over time. ID assignment to detection. Spatially intersecting polygons share the same ID allowing the MES to be tracked in a multi-year dataset.We assumed that detection polygons that overlap from one year to another describe a single object (Fig. 15). Overlapping detections and unique detections (which do not overlap with polygons from other years) in the multi-year dataset were assigned a unique object identifier (ID). A new object ID in the timeline indicates: - the first occurrence of the object detected in the dataset of the first year available for the area. It does not mean that the object was not present before, - the creation of a potential new MES.
The disappearance of an object ID indicates its potential refill. Therefore, the chronology of MES, creation, evolution and filling, can be constrained.
"},{"location":"PROJ-DQRY-TM/#62-evolution-of-mes-over-years","title":"6.2 Evolution of MES over years","text":"Figures 13 and 16 illustrate the ability of the trained model to detect and track a single object in a multi-year dataset. The detection over the years appears reliable and consistent, although object detection may be absent from a year dataset (e.g. due to shadows or colour changes in the surroundings). Remember that the image coverage of a given area is not renewed every year. Characteristics of the potential MES, such as surface evolution (extension or retreat), can be quantified. For example, the surfaces of object IDs 239 and 3861 have more than doubled in about 20 years. Tracking object ID along with image visualisation allows observation of the opening and the closing of potential MES, as object IDs 31, 44, and 229.
Figure 16: Detection area (m2) as a function of years for several object ID. Figure 13 provides the visualisation of the object IDs selected. Each point corresponds to an object ID occurrence in the corresponding year dataset.The presence of an object in several years dataset strengthens the likeliness of the detected object to be an actual MES. On the other hand, object detection of only one occurrence is more likely a FP detection.
"},{"location":"PROJ-DQRY-TM/#7-conclusion-and-perspectives","title":"7. Conclusion and perspectives","text":"The project demonstrated the ability to automatically, quickly (a matter of hours for one AoI), and reliably detect potential MES in orthophotos of Switzerland with an automatic detection algorithm (deep learning). The selected trained model achieved a f1-score of 82% on the validation dataset. The final detection polygons accurately delineate the potential MES. We can track single MES through multiple years, emphasising the robustness of the method to detect objects in multi-year datasets despite the detection model being trained on a single dataset (2020 SWISSIMAGE mosaic). However, image colour stretching different from that used to train the model can significantly affect the model's ability to provide reliable detection, as was the case with the FSO images.
Although the performance of the trained model is satisfactory, FP and FN are present in the datasets. They are mainly due to confusion of the algorithm between MES and rock outcrops, river sandbeds or construction sites. A manual verification of the relevance of the detection by experts in the field is necessary before processing and interpreting the data. Revision of all the detections from 1999 to 2021 is a time-consuming effort but is necessary to guarantee detection reliability. Despite the required manual checks, the provided framework and detection results constitute a valuable contribution that can greatly assist the inventory and the observation of MES evolution in Switzerland. It provides state-wide detection in a matter of hours, which is a considerable time-saving compared with manual mapping. It also enables MES detection with a standardised method, independent of the information or method adopted by the cantons.
Further model improvements could be consider, such as increasing the metrics by improving GT quality, improving model learning strategy, mitigating the model learning variability, or test supervised clustering methods to find relevant detection.
This work can be used to compute statistics to study long-term MES in Switzerland and better management of resources and land use in the future. MES detection can be combined with other data, such as the geologic layer, to identify the mineral/rocks exploited and high-resolution DEM (swissALTI3D) to infer elevation changes and observe excavation or filling of MES5. So far only RGB SWISSIMAGE orthophotos from 1999 to 2021 were processed. Prior to 1999, black and white orthophotos exist but the model trained on RGB images could not be applied trustfully to black and white images. Image colourisation tests (with the help of deep learning algorithm[@farella_colour_2022]) were performed and provided encouraging detection results. This avenue needs to be explored.
Finally, automatic detection of MES is rare13, and most studies perform manual mapping. Therefore, the framework could be the extended to other datasets and/or other countries to provide a valuable asset to the community. A global mapping of MES has been completed with over 21'000 polygons1 and can be used as a GT database to train an automatic detection model.
"},{"location":"PROJ-DQRY-TM/#code-availability","title":"Code availability","text":"The codes are stored and available on the STDL's github repository:
This project was made possible thanks to a tight collaboration between the STDL team and swisstopo. In particular, the STDL team acknowledges key contribution from Thomas Galfetti (swisstopo). This project has been funded by \"Strategie Suisse pour la G\u00e9oinformation\".
"},{"location":"PROJ-DQRY-TM/#appendix","title":"Appendix","text":""},{"location":"PROJ-DQRY-TM/#a1-influence-of-empty-tiles-addition-to-model-performance","title":"A.1 Influence of empty tiles addition to model performance","text":"By selecting tiles intersecting only labels, the detection model is mainly confronted with the presence of the targeted object to be detected. Addition of non-label-intersecting tiles, i.e. empty tiles, provides landscape diversity that might help to improve the object detection performance.
In order to evaluate the influence of adding empty tiles to the dataset used for the model performance, empty tiles were chosen randomly (not intersecting labels) within Switzerland boundaries and added to the tile dataset used for the model training (Fig. A1). Empty tiles were added to (1) the whole dataset split as for the initial dataset (training: 70%, test: 15%, and validation: 15%) and (2) only to the training dataset. A visual inspection must be performed to prevent a potential unlabeled MES to be present in the image and disturbing the algorithm learning.
Figure A1: View of tiles intersecting (black) labels (yellow) and randomly selected empty tiles (red) in Switzerland. This case correspond to the addition of 35% empty tiles.Figure A1 reveals that adding empty tiles to the dataset does not significantly influence the metrics values. The number of TP, FP, and FN do not show significant variation. However, when performing an inference detection test on a subset of tiles (2000) for an AOI, it appears that the number of raw detections (unfiltered) is reduced as the number of empty tiles increases. However, visual inspection of the final detection after applying filters does not show significant improvement compared to a model trained without adding empty tiles.
Figure A1: Influence of the addition of empty tiles (relative to the number of tiles intersecting labels) on trained performance for zoom levels 16 and 17 with (a) the F1-score as a function of the percentage of added empty tiles and (b) the normalised (by the number of tiles sampled = 2000) number of detection as a function of added empty tiles. Empty tiles have been added to only the train dataset for the 5% and 30% cases and to all datasets for 9%, 35%, 70%, and 140% cases.A considered solution to improve the results could be to specifically select tiles for which FP occurred and include them in the training dataset as empty tiles. This way, the model could be trained with relevant confounding features such as snow patches, river sandbeds, or gullies not labeled as GT.
"},{"location":"PROJ-DQRY-TM/#a2-sensitivity-of-the-model-to-the-number-of-images-per-batch","title":"A.2 Sensitivity of the model to the number of images per batch","text":"During the model learning phase, the trained model is updated after each batch of samples was processed. Adding more samples, i.e. in our case images, to the batch can influence the model learning capacity. We investigated the role of adding more images per batch for a dataset with and without adding a portion of empty tiles to the learning dataset. Adding more images per batch speeds up the model learning (Table A1), and the minimum of the loss curve is reached for a smaller number of iterations.
Figure A2: Metrics (precision, recall and f1-score) evolution with the number of images per batch during the model training. Results have been obtained on a dataset without empty tiles addition (red) and with the addition of 23% of empty tiles to the training dataset.Figure A2 reveals that the metrics values remain in a range of constant values while adding extra images to the batch in all cases (with or without empty tiles). A potential effect of adding more images to the batch is the reduction of the metrics variability between replicates of trained models as the range of metrics values is smaller for 8 images per batch than 2 images per batch. However, this observation has to be taken carefully as fewer replicates have been performed with 8 images per batch than for 2 or 4 images per batch. Further investigation would provide stronger insights on this effect.
"},{"location":"PROJ-DQRY-TM/#a3-evaluation-of-trained-models","title":"A.3 Evaluation of trained models","text":"Table A1 sumup metrics value obtained for all the configuration tested for the project.
zoom level model empty tiles (%) image per batch optimum iteration precision recall f1 15 replicate 1 0 2 1000 0.727 0.810 0.766 16 replicate 1 0 2 2000 0.842 0.793 0.817 16 replicate 2 0 2 2000 0.767 0.760 0.763 16 replicate 3 0 2 3000 0.831 0.810 0.820 16 replicate 4 0 2 2000 0.886 0.769 0.826 16 replicate 5 0 2 2000 0.780 0.818 0.798 16 replicate 6 0 2 3000 0.781 0.826 0.803 16 replicate 7 0 4 1000 0.748 0.860 0.800 16 replicate 8 0 4 1000 0.779 0.785 0.782 16 replicate 9 0 8 1500 0.800 0.793 0.797 16 replicate 10 0 4 1000 0.796 0.744 0.769 16 replicate 11 0 8 1000 0.802 0.769 0.785 16 ET-250_allDS_1 34.2 2 2000 0.723 0.770 0.746 16 ET-250_allDS_2 34.2 2 3000 0.748 0.803 0.775 16 ET-1000_allDS_1 73.8 2 6000 0.782 0.815 0.798 16 ET-1000_allDS_2 69.8 2 6000 0.786 0.767 0.776 16 ET-1000_allDS_3 70.9 2 6000 0.777 0.810 0.793 16 ET-1000_allDS_4 73.8 2 6000 0.768 0.807 0.787 16 ET-2000_allDS_1 143.2 2 6000 0.761 0.748 0.754 16 ET-80_trnDS_1 5.4 2 2000 0.814 0.793 0.803 16 ET-80_trnDS_2 5.4 2 2000 0.835 0.752 0.791 16 ET-80_trnDS_3 5.4 2 2000 0.764 0.802 0.782 16 ET-400_trnDS_1 29.5 2 6000 0.817 0.777 0.797 16 ET-400_trnDS_2 29.5 2 5000 0.848 0.785 0.815 16 ET-400_trnDS_3 29.5 2 4000 0.758 0.802 0.779 16 ET-400_trnDS_4 29.5 4 2000 0.798 0.818 0.808 16 ET-400_trnDS_5 29.5 4 1000 0.825 0.777 0.800 16 ET-1000_trnDS_1 0 2 4000 0.758 0.802 0.779 17 replicate 1 0 2 5000 0.819 0.853 0.835 17 replicate 1 0 2 5000 0.803 0.891 0.845 17 replicate 1 0 2 5000 0.872 0.813 0.841 17 ET-250_allDS_1 16.8 2 3000 0.801 0.794 0.797 17 ET-1000_allDS_1 72.2 2 7000 0.743 0.765 0.754 18 replicate 1 0 2 10000 0.864 0.855 0.859
Table A1: Metrics value computed for the validation dataset for all the trained models with the 2020 SWISSIMAGE Journey mosaic at zoom level 16.
Victor Maus, Stefan Giljum, Jakob Gutschlhofer, Dieison M. Da Silva, Michael Probst, Sidnei L. B. Gass, Sebastian Luckeneder, Mirko Lieber, and Ian McCallum. A global-scale data set of mining areas. Scientific Data, 7(1):289, September 2020. URL: https://www.nature.com/articles/s41597-020-00624-w, doi:10.1038/s41597-020-00624-w.\u00a0\u21a9\u21a9\u21a9\u21a9\u21a9
Vicen\u00e7 Carabassa, Pau Montero, Marc Crespo, Joan-Cristian Padr\u00f3, Xavier Pons, Jaume Balagu\u00e9, Llu\u00eds Brotons, and Josep Maria Alca\u00f1iz. Unmanned aerial system protocol for quarry restoration and mineral extraction monitoring. Journal of Environmental Management, 270:110717, September 2020. URL: https://linkinghub.elsevier.com/retrieve/pii/S0301479720306496, doi:10.1016/j.jenvman.2020.110717.\u00a0\u21a9\u21a9\u21a9\u21a9
Chunsheng Wang, Lili Chang, Lingran Zhao, and Ruiqing Niu. Automatic Identification and Dynamic Monitoring of Open-Pit Mines Based on Improved Mask R-CNN and Transfer Learning. Remote Sensing, 12(21):3474, January 2020. URL: https://www.mdpi.com/2072-4292/12/21/3474, doi:10.3390/rs12213474.\u00a0\u21a9\u21a9\u21a9\u21a9
Haoteng Zhao, Yong Ma, Fu Chen, Jianbo Liu, Liyuan Jiang, Wutao Yao, and Jin Yang. Monitoring Quarry Area with Landsat Long Time-Series for Socioeconomic Study. Remote Sensing, 10(4):517, April 2018. URL: https://www.mdpi.com/2072-4292/10/4/517, doi:10.3390/rs10040517.\u00a0\u21a9\u21a9
Valentin Tertius Bickel and Andrea Manconi. Decadal Surface Changes and Displacements in Switzerland. Journal of Geovisualization and Spatial Analysis, 6(2):24, December 2022. URL: https://link.springer.com/10.1007/s41651-022-00119-9, doi:10.1007/s41651-022-00119-9.\u00a0\u21a9\u21a9
George P. Petropoulos, Panagiotis Partsinevelos, and Zinovia Mitraka. Change detection of surface mining activity and reclamation based on a machine learning approach of multi-temporal Landsat TM imagery. Geocarto International, 28(4):323\u2013342, July 2013. URL: http://www.tandfonline.com/doi/abs/10.1080/10106049.2012.706648, doi:10.1080/10106049.2012.706648.\u00a0\u21a9
Huriel Reichel and Nils Hamel. Automatic Detection of Quarries and the Lithology below them in Switzerland. 2022. URL: file:///C:/Users/Clemence/Documents/STDL/Projects/proj-quarries/01_Documentation/Bibliography/Automatic%20Detection%20of%20Quarries%20and%20the%20Lithology%20below%20them%20in%20Switzerland%20-%20Swiss%20Territorial%20Data%20Lab.htm.\u00a0\u21a9\u21a9\u21a9\u21a9
Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. Detectron2. 2019. URL: https://github.com/facebookresearch/detectron2.\u00a0\u21a9
Kaiming He, Georgia Gkioxari, Piotr Doll\u00e1r, and Ross Girshick. Mask R-CNN. January 2018. arXiv:1703.06870 [cs]. URL: http://arxiv.org/abs/1703.06870, doi:10.48550/arXiv.1703.06870.\u00a0\u21a9
Nils Hamel (UNIGE) - Huriel Reichel (swisstopo)
Project scheduled in the STDL research roadmap - PROJ-DTRK September 2020 to November 2020 - Published on April 23, 2021
Abstract : Being able to track modifications in the evolution of geographical datasets is one important aspect in territory management, as a large amount of information can be extracted out of differences models. Differences detection can also be a tool used to assess the evolution of a geographical model through time. In this research project, we apply differences detection on INTERLIS models of the official Swiss land registers in order to emphasize and follow its evolution and to demonstrate that change in reference frames can be detected and assessed.
"},{"location":"PROJ-DTRK/#introduction","title":"Introduction","text":"Land register models are probably to most living of the geographical models as they are constantly updated to offer a rigorous and up-to-date view of the territory.
The applied corrections are always the result of a complex process, involving different territory actors, until the decision is made to integrate them into the land register. In addition, land register models comes with an additional constraint linked to political decisions. Indeed, the land register models are the result of a political mission conducted under federal laws making these models of high importance and requiring constant care. We show in this research project how differences detection tool [1] of the STDL 4D framework can be used to emphasize and analyze these corrections along the time dimension.
In addition to the constant updates of the models, changes in the reference frame can also lead to large-scale corrections of the land register models. These global corrections are then made even more complex by the federal laws that impose a high degree of correctness and accuracy.
In the context of the introduction of the new reference frame DM.flex [2] for the Swiss land register, being able to assess the applied changes on the geographical model appear as an important aspect. Indeed, changing the reference frame for the land register models is a long and complex technical process that can be error prompt. We also show in this research project how the difference detection algorithm can be helpful to assess and verify the performed corrections.
"},{"location":"PROJ-DTRK/#research-project-specifications","title":"Research Project Specifications","text":"In this research project, the difference detection algorithm implemented in the STDL 4D framework is applied on INTERLIS data containing the official land register models of different Swiss Canton. As introduced, two main directions are considered for the difference detection algorithm :
Demonstrating the ability to extract information in between land register models
Demonstrating the ability of difference models to be used as an assessment tool
Through the first direction, the difference detection algorithm is presented. Considering the difference models it allows computing, it is shown how such model are able to extract information in between the models in order to emphasize the ability to represent, and then, to verify the evolution of the land register models.
The second direction focuses on demonstrating that difference models are a helpful representation of the large-scale corrections that can be applied to land register during reference frame modification and how they can be used as a tool to assess the modifications and to help to fulfil the complex task of the verification of the corrected models.
"},{"location":"PROJ-DTRK/#research-project-data","title":"Research Project Data","text":"For the first research direction, the land register models of the Thurgau Kanton are considered. They are selected in order to have a small temporal distance allowing to focus on a small amount of well-defined differences :
Thurgau Kanton, 2020-10-13, INTERLIS
Thurgau Kanton, 2020-10-17, INTERLIS
For the second direction, which focus on more complex differences, the models of the Canton of Geneva land register are considered with a much larger temporal gap between them :
Canton of Geneva, 2009-10, INTERLIS
Canton of Geneva, 2013-04, INTERLIS
Canton of Geneva, 2017-04, INTERLIS
Canton of Geneva, 2019-04, INTERLIS
This first section focuses on short-term differences to show how difference models work and how they are able to represent the modifications extracted out of the two compared models. The following images give an illustration of the considered dataset, which are the land register models of Thurgau Kanton :
Illustration of Thurgau Kanton INTERLIS models - Data : Kanton ThurgauThe models are made of vector lines, well geo-referenced in the Swiss coordinates frame EPSG:2056. The models are also made of different layers that are colored differently with the following correspondences :
INTERLIS selected topics and tables colors - Official French and German designationsThese legends are used all along this research project.
Considering two temporal versions of this geographical model, separated of a few days, one is able to extract difference models using the 4D framework algorithm. As an example, one can consider this very specific view of the land register, focusing on a few houses :
Close view of the Thurgau INTERLIS model in 2020-10-13 (left) and 2020-10-17 (right) - Data : Kanton ThurgauIt is clear that most of the close view is identical for the two models, except for a couple of houses that were added to the land register model between these two temporal versions. By applying the difference detection algorithm, one is able to obtain a difference model comparing the two previous models. The following image gives an illustration of the obtained difference models considering the most recent temporal version as reference :
Difference model obtained comparing the two temporal versions - Data : Kanton ThrugauOne can see how the difference algorithm is able to emphasize the differences and to represent them in a human-readable third model. The algorithm also displays the identical parts in dark gray to offer the context of the differences to the operator.
Of course, in such close view, differences detection can appear as irrelevant, as one is clearly able to see that something changed on the selected example without any help. But difference models can be computed at any scale. For example, taking the example of the Amriswil city :
View of Amriswil model in 2020-10-13 (left) and 2020-10-17 (right) - Data : Kanton ThurgauIt becomes more complicated to track down the differences that can appear between the two temporal versions. By computing their difference model, one is able to access a third model that ease the analysis of the evolution at the scale of the city itself as illustrated on the following image :
Difference model computed for the city of Amriswil - Data : Kanton ThrugauOne can see how difference models can be used to track down modifications brought to the land register in a simple manner, while keeping the information of the unchanged elements between the two compared models. This demonstrates that information that exists between models can be extracted and represented for further users or automated processes. In addition, such difference models can be computed at any scale, considering small area up to the whole countries.
"},{"location":"PROJ-DTRK/#difference-models-an-assessment-tool","title":"Difference Models : An Assessment Tool","text":"On the previous section, the difference models are computed using two models only separated of a few days, containing only a small amount of clear and simple modifications. This section focuses on detecting differences on larger models, separated by several years. In this case, the land register of the Canton of Geneva is considered :
Illustration of the Geneva land register in 2017-04 (left) and 2019-04 (right) - Data : Canton of GenevaOne can see that at such a scale, taking into account that the Canton of Geneva is one of the smallest in Switzerland, having a vision and a clear understanding of the modifications made between these two models is difficult by considering the two models separately.
It's precisely where differences models can be useful to understand and analyze the evolution of the land register, along both the space and time dimensions.
"},{"location":"PROJ-DTRK/#large-scale-analysis","title":"Large-Scale Analysis","text":"A first large-scale evaluation can be made on the overall models. A difference model can be computed considering the land register of Geneva in 2019 and 2017 as illustrated on the following image :
Difference model on Geneva land register between 2019-04 and 2017-04 - Data : Canton of GenevaTwo observations can be already made by looking at the difference model. In the first place, one can see that the amount of modifications brought to the land register is large in only two years. A large portion of the land register were subject to modifications or corrections, the unchanged parts being mostly limited outside the populated area.
In the second place, one can observe large portions where differences seem to be accumulating over this period of time. Looking at them more closely leads to the conclusion that these zones were actually completely modified, as all elements are highlighted by the difference detection algorithm. The following image gives a closer view of such an area of differences accumulation :
Focus on Carouge area of the 2019-04 and 2017-04 difference model - Data : Canton of GenevaDespite the amount of modifications outside this specific zone is also high, it is clear that the pointed zone contains more of them. Looking at it more closely leads to the conclusion that everything changed.
In order to understand these areas of differences accumulation, the the land register experts of the Canton of Geneva (SITG) were questioned. They provided an explanation for these specific areas. Between 2017 and 2019, these areas were subjected to a global correction in order to release the tension between the old reference frame LV03 [3] and the current one LV95 [4]. These corrections were made using the FINELTRA algorithm to modify the elements of the land register of the order of a few centimeters.
The land register of Geneva provided the following illustration summarizing these reference frame corrections made between 2017 and 2019 on the Geneva territory :
Reference frame corrections performed between 2017 and 2019 - Data : SITGComparing this map from the land register with the computed model allows seeing how differences detection can emphasize this type of corrections efficiently, as the corrected zones on this previous image corresponds to the difference accumulation areas on the computed difference model.
"},{"location":"PROJ-DTRK/#small-scale-analysis","title":"Small-Scale Analysis","text":"One can also dive deep into the details of the difference models. As we saw on the large scale analysis, two types of areas can be seen on the 2019-04-2017-04 difference model of Geneva : regular evolution with an accumulation of corrections and areas on which global corrections were applied. The following images propose a close view of these two types of situation :
Illustration of the two observed type of evolutions of the land register - Data : Canton of GenevaOn the left image above, one can observe the regular evolution of the land register where modifications are brought to the model in order to follow the evolution of the territory. On the right image above, one can see a close view of an area subjected to a global correction (reference frame), leading to a difference model highlighting all the elements.
Analyzing more closely the right image above lead the observer to conclude that not all the elements are actually highlighted by the difference detection algorithm. Indeed, some elements are rendered in gray on the difference model, indicating their lack of modification between the two compared times. The following image emphasizes the unchanged elements that can be observed :
Unchanged elements in the land register after reference frame correction - Data : SITGThese unchanged elements can be surprising as they're found in an area that was subject to a global reference frame correction. This shows how difference models can be helpful to track down these type of events in order to check whether these unchanged elements are expected or are the results of a discrepancy in the land register evolution.
Other example can be found in this very same area of the Geneva city. The following images give an illustration of two other close view where the unchanged element can be seen despite the reference frame correction :
Unchanged elements in the land register after reference frame correction - Data : SITGOn the left image above, one can observe that the unchanged elements are the railway tracks within the commune of Carouge. This is an interesting observation, as railway tracks can be considered as specific elements that can be subjected to different legislations regarding the land register. But it is clear that railway tracks were not considered in the reference frame correction.
On the right image above, one can see another example of unchanged elements that are more complicated to explain, as they're in the middle of modified other elements. This clearly demonstrate how difference models can be helpful for analyzing and assessing the evolution of the land register models. Such models are able to drive users or automated processes and lead them to focus on relevant aspects and to define the good question in the context of analyzing the evolution of the land register.
"},{"location":"PROJ-DTRK/#conclusion","title":"Conclusion","text":"The presented difference models computed based on two temporal versions of the land register and using the 4D framework algorithm showed how differences can be emphasized for users and automated processes [1]. Difference models can be helpful to determine the amount and nature of changes that appear in the land register. Applying such an algorithm on land register is especially relevant as it is a highly living model, that evolves jointly with the territory it describes.
Two main applications can be considered using difference models applied on the land register. In the first place, the difference models can be used to assess and analyze the regular evolution of the territory. Indeed, updating the land register is not a simple task. Such modifications involve a whole chain of decisions and verifications, from surveyors to the highest land register authority before to be integrated in the model. Being able to assess and analyze the modifications in the land register through difference models could be one interesting strengthening of the overall process.
The second application of difference models could be as an assessment tool of global corrections applied to the land register or parts of it. These modifications are often linked to the reference frame and its evolution. Being able to assess the corrections through the difference models could add a helpful tool in order to verify that the elements of the land register where correctly processed. In this direction, difference models could be used during the introduction of the DM.flex reference frame for both analyzing its introduction and demonstrating that difference models can be an interesting point of view.
"},{"location":"PROJ-DTRK/#reproduction-resources","title":"Reproduction Resources","text":"To reproduce the presented experiments, the STDL 4D framework has to be used and can be found here :
You can follow the instructions on the README to both compile and use the framework.
Unfortunately, the used data are not currently public. In both cases, the land register INTERLIS datasets were provided to the STDL directly. You can contact both Thurgau Kanton and SITG :
INTERLIS land register, Thurgau Kanton
INTERLIS land register, SITG (Geneva)
to query the data.
In order to extract and convert the data from the INTERLIS models, the following code is used :
where the README gives all the information needed.
For the 3D geographical coordinates conversion and heights restoration, we used two STDL internal tools. You can contact the STDL to obtain the tools and support in this direction :
ptolemee-suite : 3D coordinate conversion tool (EPSG:2056 to WGS84)
height-from-geotiff : Restoring geographical heights using topographic GeoTIFF (SRTM)
You can contact STDL for any question regarding the reproduction of the presented results.
"},{"location":"PROJ-DTRK/#references","title":"References","text":"[1] Automatic Detection of Changes in the Environment, N. Hamel, STDL 2020
[2] DM.flex reference frame
[3] LV03 Reference frame
[4] LV95 Reference frame
"},{"location":"PROJ-GEPOOL/","title":"Swimming Pool Detection from Aerial Images over the Canton of Geneva","text":"Alessandro Cerioni (Canton of Geneva) - Adrian Meyer (FHNW)
Proposed by the Canton of Geneva - PROJ-GEPOOL September 2020 to January 2021 - Published on May 18, 2021
Abstract: Object detection is one of the computer vision tasks which can benefit from Deep Learning methods. The STDL team managed to leverage state-of-art methods and already existing open datasets to first build a swimming pool detector, then to use it to potentially detect unregistered swimming pools over the Canton of Geneva. Despite the success of our approach, we will argue that domain expertise still remains key to post-process detections in order to tell objects which are subject to registration from those which aren't. Pairing semi-automatic Deep Learning methods with domain expertise turns out to pave the way to novel workflows allowing administrations to keep cadastral information up to date.
"},{"location":"PROJ-GEPOOL/#introduction","title":"Introduction","text":"The Canton of Geneva manages a register of swimming pools, counting - in principle - all and only those swimming pools that are in-ground or, at least, permanently fixed to the ground. The swimming pool register is part of a far more general cadastre, including several other classes of objects (cf. this page).
Typically the swimming pool register is updated either by taking building/demolition permits into account, or by manually checking its multiple records (4000+ to date) against aerial images, which is quite a long and tedious task. Exploring the opportunity of leveraging Machine Learning to help domain experts in such an otherwise tedious tasks was one of the main motivations behind this study. As such, no prior requirements/expectations were set by the recipients.
The study was autonomously conducted by the STDL team, using Open Source software and Open Data published by the Canton of Geneva. Domain experts were asked for feedback only at a later stage. In the following, details are provided regarding the various steps we followed. We refer the reader to this page for a thorough description of the generic STDL Object Detection Framework.
"},{"location":"PROJ-GEPOOL/#method","title":"Method","text":"Several steps are required to set the stage for object detection and eventually reach the goal of obtaining - ideally - even more than decent results. Despite the linear presentation that the reader will find here-below, multiple back-and-forths are actually required, especially through steps 2-4.
"},{"location":"PROJ-GEPOOL/#1-data-preparation","title":"1. Data preparation","text":"As a very first step, one has to define the geographical region over which the study has to be conducted, the so-called \"Area of Interest\" (AoI). In the case of this specific application, the AoI was chosen and obtained as the geometric subtraction between the following two polygons:
The so-defined AoI covers both the known \"ground-truth\" labels and regions over which hypothetical unknown objects are expected to be detected.
The second step consists in downloading aerial images from a remote server, following an established tiling strategy. We adopted the so-called \"Slippy Map\" tiling scheme. Aerial images were fetched from a raster web service hosted by the SITG and powered by ESRI ArcGIS Server. More precisely, the following dataset was used: ORTHOPHOTOS AGGLO 2018. According to our configuration, this second step produces a folder including one GeoTIFF image per tile, each image having a size of 256x256 pixels. In terms of resolution - or better, in terms of \"Ground Sampling Distance\" (GSD) - the combination of
yields approximately a GSD of ~ 60 cm/pixel. The tests we performed at twice the resolution showed little gain in terms of predictive power, surely not enough to support the interest in engaging 4x more resources (storage, CPU/GPU, ...).
The third step amounts to splitting the tiles covering the AoI (let's label them \"AoI tiles\") twice:
first, tiles are partitioned into two subsets, according to whether they include (GT
tiles) or not (oth
tiles) ground-truth labels:
\\(\\mbox{AoI tiles} = (\\mbox{GT tiles}) \\cup (\\mbox{oth tiles}),\\; \\mbox{with}\\; (\\mbox{GT tiles}) \\cap (\\mbox{oth tiles}) = \\emptyset\\)
Then, ground-truth tiles are partitioned into three other subsets, namely the training (trn
), validation (val
) and test (tst
) datasets:
\\(\\mbox{GT tiles} = (\\mbox{trn tiles}) \\cup (\\mbox{val tiles}) \\cup (\\mbox{tst tiles})\\)
with \\(A \\neq B \\Rightarrow A \\cap B = \\emptyset, \\quad \\forall A, B \\in \\{\\mbox{trn tiles}, \\mbox{val tiles}, \\mbox{tst tiles}, \\mbox{oth tiles}\\}\\)
We opted for the 70%-15%-15% dataset splitting strategy.
Slippy Map Tiles at zoom level 18 covering the Area of Interest, partitioned into several subsets: ground-truth (GT = trn + val + tst), other (oth).
Zoom over a portion of the previous image.
Concerning ground-truth labels, the final results of this study rely on a curated subset of the public dataset including polygons corresponding to the Canton of Geneva's registered swimming pools, cf. PISCINES. Indeed, some \"warming-up\" iterations of this whole process allowed us to semi-automatically identify tiles where the swimming pool register was inconsistent with aerial images, and viceversa. By manually inspecting the tiles displaying inconsistency, we discarded those tiles for which the swimming pool register seemed to be wrong (at least through the eyes of a Data Scientist; in a further iteration, this data curation step should be performed together with domain experts). While not having the ambition to return a \"100% ground-truth\" training dataset, this data curation step yielded a substantial gain in terms of \\(F_1\\) score (from ~82% to ~90%, to be more precise).
"},{"location":"PROJ-GEPOOL/#2-model-training","title":"2. Model training","text":"A predictive model was trained, stemming from one of the pre-trained models provided by Detectron2. In particular, the \"R50-FPN\" baseline was used (cf. this page), which implements a Mask R-CNN architecture leveraging a ResNet-50 backbone along with a Feature Pyramid Network (FPN). We refer the reader e.g. to this blog article for further information about this kind of Deep Learning methods.
Training a (Deep) Neural Network model means running an algorithm which iteratively adjusts the various parameters of a Neural Network (40+ million parameters in our case), in order to minimize the value of some \"loss function\". In addition to the model parameters (otherwise called \"weights\", too), multiple \"hyper-parameters\" exist, affecting the model and the way how the optimization is performed. In theory, one should automatize the hyper-parameters tuning, in order to eventually single out the best setting among all the possible ones. In practice, the hyper-parameters space is never fully explored; a minima, a systematic search should be performed, in order to find a \"sweet spot\" among a finite, discrete collection of settings. In our case, no systematic hyper-parameters tuning was actually performed. Instead, a few man hours were spent in order to manually tune the hyper-parameters, until a setting was found which the STDL team judged to be reasonably good (~90% \\(F_1\\) score on the test dataset, see details here-below). The optimal number of iterations was chosen so as to approximately minimize the loss on the validation dataset.
"},{"location":"PROJ-GEPOOL/#3-prediction","title":"3. Prediction","text":"Each image resulting from the tiling of the AoI constitutes - let's say - the \"basic unit of computation\" of this analysis. Thus, the model optimized at the previous step was used to make predictions over:
oth
images, meaning images covering no already known swimming pools; trn
, val
and tst
images, meaning images covering already known swimming pools.The combination of predictions 1 and 2 covers the entire AoI and allows us to discover potential new objects as well as to check whether some of the known objects are outdated, respectively.
Image by image, the model produces one segmentation mask per detected object, accompanied by a score ranging from a custom minimum value (5% in our setting) to 100%. The higher the score, the most the model is confident about a given prediction.
Sample detections of swimming pools, accompanied by scores. Note that multiple detections can concern the same object, if the latter extends over multiple tiles.
Let us note that not only swimming pools exhibiting only \"obvious\" features (bluish color, rectangular shape, ...) were detected, but also:
As a matter of fact, the training dataset was rich enough to also include samples of such somewhat tricky cases.
"},{"location":"PROJ-GEPOOL/#4-prediction-assessment","title":"4. Prediction assessment","text":"As described here in more detail, in order to assess the reliability of the predictive model predictions have to be post-processed so as to switch from the image coordinates - ranging from (0, 0) to (255, 255) in our case, where 256x256 pixel images were used - to geographical coordinates. This amounts to applying an affine transformation to the various predictions, yielding a vector layer which we can compare with ground-truth (GT
) data by means of spatial joins:
GT
data are referred to as \"true positives\" (TPs);GT
data are referred to as \"false positives\" (FPs);GT
objects which are not detected are referred to as \"false negatives\" (FNs).Example of a true positive (TP), a false positive (FP) and a false negative (FN). Note that both the TP and the FP object are detected twice, as they extend over multiple tiles.
The counting of TPs, FPs, FNs allow us to compute some standard metrics such as precision, recall and \\(F_1\\) score (cf. this Wikipedia page for further information). Actually, one count (hence one set of metrics) can be produced per choice of the minimum score that one is willing to accept. Choosing a threshold value (= thr
) means keeping all the predictions having a score >= thr
and discarding the rest. Intuitively,
Such intuitions can be confirmed by the following diagram, which we obtained by sampling the values of thr
by steps of 0.05 (= 5%), from 0.05 to 0.95.
True positives (TPs), false negatives (FNs), and false positives (FPs) counted over the test dataset, as a function of the threshold on the score: for a given threshold, all and only the predictions exhibiting a bigger score are kept.
Performance metrics computed over the test dataset as a function of the threshold on the score: for a given threshold, all and only the predictions exhibiting a bigger score are kept.
The latter figure was obtained by evaluating the predictions of our best model over the test dataset. Inferior models exhibited a similar behavior, with a downward offset in terms of \\(F_1\\) score. In practice, upon iterating over multiple realizations (with different hyper-parameters, training data and so on) we aimed at maximizing the value of the \\(F_1\\) score on the validation dataset, and stopped when the \\(F_1\\) score went over the value of 90%.
As the ground-truth data we used turned out not to be 100% accurate, the responsibility for mismatching predictions has to be shared between ground-truth data and the predictive model, at least in some cases. In a more ideal setting, ground-truth data would be 100% accurate and differences between a given metric (precision, recall, \\(F_1\\) score) and 100% should be imputed to the model.
"},{"location":"PROJ-GEPOOL/#domain-experts-feedback","title":"Domain experts feedback","text":"All the predictions having a score \\(\\geq\\) 5% obtained by our best model were exported to Shapefile and shared with the experts in charge of the cadastre of the Canton of Geneva, who carried out a thorough evaluation. By checking predictions against the swimming pool register as well as aerial images, it was empirically found that the threshold on the minimum score (= thr
) should be set as high as 97%, in order not to have too many false positives to deal with. In spite of such a high threshold, 562 potentially new objects were detected (over 4652 objects which were known when this study started), of which:
This figures show that:
Examples of \"actual false positives\": a fountain (left) and a tunnel (right).
Examples of detected swimming pools which are not subject to registration: placed on top of a building (left), inflatable hence temporary (right).
"},{"location":"PROJ-GEPOOL/#conclusion","title":"Conclusion","text":"The analysis reported in this document confirms the opportunity of using state-of-the-art Deep Learning approaches to assist experts in some of their tasks, in this case that of keeping the cadastre up to date. Not only the opportunity was explored and actually confirmed, but valuable results were also produced, leading to the detection of previously unknown objects. At the same time, our study also shows how essential domain expertise still remains, despite the usage of such advanced methods.
As a concluding remark, let us note that our predictive model may be further improved. In particular, it may be rendered less prone to false positives, for instance by:
Clotilde Marmy (ExoLabs) - Gwena\u00eblle Salamin (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo)
Proposed by the Canton of Jura - PROJ-HETRES October 2022 to August 2023 - Published on November 13, 2023
All scripts are available on GitHub.
Abstract: Beech trees are sensitive to drought and repeated episodes can cause dieback. This issue affects the Jura forests requiring the development of new tools for forest management. In this project, descriptors for the health state of beech trees were derived from LiDAR point clouds, airborne images and satellite images to train a random forest predicting the health state per tree in a study area (5 km\u00b2) in Ajoie. A map with three classes was produced: healthy, unhealthy, dead. Metrics computed on the test dataset revealed that the model trained with all the descriptors has an overall accuracy up to 0.79, as well as the model trained only with descriptors derived from airborne imagery. When all the descriptors are used, the yearly difference of NDVI between 2018 and 2019, the standard deviation of the blue band, the mean of the NIR band, the mean of the NDVI, the standard deviation of the canopy cover and the LiDAR reflectance appear to be important descriptors.
"},{"location":"PROJ-HETRES/#1-introduction","title":"1. Introduction","text":"Since the drought episode of 2018, the canton of Jura and other cantons have noticed dieback of the beech trees in their forests 1. In the canton of Jura, this problem mainly concerns the Ajoie region, where 1000 hectares of deciduous trees are affected 2. This is of concern for the productivity and management of the forest, as well as for the security of walkers. In this context, the R\u00e9publique et Canton du Jura has contacted the Swiss Territorial Data Lab to develop a new monitoring solution based on data science, airborne images and LiDAR point clouds. The dieback symptoms are observable in the mortality of branches, the transparency of the tree crown and the leaf mass partition 3.
The vegetation health state influences the reflectance in images (airborne and satellite), which is often used as a monitoring tool, in particular under the form of vegetation indices:
For instance, Brun et al. studied early-wilting in Central European forests with time series of the Normalized Difference Vegetation Index (NDVI) and estimate the surface concerned by early leaf-shedding 4.
Another technology used to monitor forests is light detection and ranging (LiDAR) as it penetrates the canopy and gives 3D information on trees and forest structures. Several forest and tree descriptors such as the canopy cover 5 or the standard deviation of crown return intensity 6 can be derived from the LiDAR point cloud to monitor vegetation health state.
In 5, the study was conducted at tree level, whereas in 6 stand level was studied. To work at tree level, it is necessary to segment individual trees in the LiDAR point cloud. On complex forests, like with a dense understory near tree stems, it is challenging to get correct segments without manual corrections.
The aim of this project is to provide foresters with a map to help plan the felling of beech trees in the Ajoie's forests. To do so, we developed a combined method using LiDAR point clouds and airborne and satellite multispectral images to determine the health state of beech trees.
"},{"location":"PROJ-HETRES/#2-study-area","title":"2. Study area","text":"The study was conducted in two areas of interest in the Ajoie region (Fig. 1.A); one near Mi\u00e9court (Fig. 1.B), the other one near Beurnev\u00e9sin (Fig. 1.C). Altogether they cover 5 km2, 1.4 % of the Canton of Jura's forests 7.
Mi\u00e9court sub-area is west-south and south oriented, whereas Beurnev\u00e9sin sub-area is rather east-south and south oriented. They are in the same altitude range (600-700 m) and are 2 km away from each other, thus near the same weather station.
Figure 1: The study area is composed of two areas of interest."},{"location":"PROJ-HETRES/#3-data","title":"3. Data","text":"The project makes use of different data types: LiDAR point cloud, airborne and satellite imagery, and ground truth data. Table 1 gives an overview of the data and their characteristics. Data have been acquired in late summer 2022 to have an actual and temporally correlated information on the health state of beech trees.
Table 1: Overview of the data used in the project.
Resolution Acquisition time Proprietary LiDAR 50-100 pts/m2 08.2022 R\u00e9publique et Canton du Jura Airborne images 0.03 m 08.2022 R\u00e9publique et Canton du Jura Yearly variation of NDVI 10 m 06.2015-08.2022 Bern University of Applied Science (HAFL) and the Federal Office for Environment (BAFU) Weekly vegetation health index 10 m 06.2015-08.2022 ExoLabs Ground truth - (point data) 08.-10.2022 R\u00e9publique et Canton du Jura "},{"location":"PROJ-HETRES/#31-lidar-point-cloud","title":"3.1 LiDAR point cloud","text":"The LiDAR dataset was acquired on the 16th of August 2023 and its point density is 50-100 pts/m\u00b2. It is classified in the following classes: ground, low vegetation (2-10m), middle vegetation (10-20m) and high vegetation (20 m and above). It was delivered in the LAS format and had reflectance values 8 in the intensity storage field.
"},{"location":"PROJ-HETRES/#32-airborne-images","title":"3.2 Airborne images","text":"The airborne images have a ground resolution of 3 cm and were acquired simultaneously to the LiDAR dataset. The camera captured the RGB bands, as well as the near infrared (NIR) one. The acquisition of images with a lot of overlap and oblique views allowed the production of a true orthoimage for a perfect match with the LiDAR point cloud and the data of the ground truth.
"},{"location":"PROJ-HETRES/#33-satellite-images","title":"3.3 Satellite images","text":"The Sentinel-2 mission from the European Space Agency is passing every 6 days over Switzerland and allows free temporal monitoring at a 10 m resolution. The archives are available back to the beginning of beech tree dieback in 2018.
"},{"location":"PROJ-HETRES/#331-yearly-variation-of-ndvi","title":"3.3.1 Yearly variation of NDVI","text":"The Bern University of Applied Science (HAFL) and the Federal Office for Environment (BAFU) have developed Web Services for vegetation monitoring derived from Sentinel-2 images. For this project, the yearly variation of NDVI 9 between two successive years is used. It measures the decrease in vegetation activity between August of one year (e.g. 2018) and June of the following year (e.g. 2019). The decrease is derived from rasters made of maximum values of the NDVI in June, July or August. The data are downloaded from the WCS service which delivers \"row\" indices: the NDVI values are not cut for a minimal threshold.
"},{"location":"PROJ-HETRES/#332-vhi","title":"3.3.2 VHI","text":"The Vegetation Health Index (VHI) was generated by ETHZ, WSL and ExoLab within the SILVA project 10 which proposes several indices for forest monitoring. VHI from 2016 to 2022 is used. It is computed mainly out of Sentinel-2 images, but also out of images from other satellite missions, in order to have data to obtain a weekly index with no time gap.
"},{"location":"PROJ-HETRES/#34-ground-truth","title":"3.4 Ground truth","text":"The ground truth was collected between August and October 2022 by foresters. They assessed the health of the beech trees based on four criteria 3:
In addition, each tree was associated with its coordinates and pictures as illustrated in Figure 1 and Figure 2 respectively. The forester surveyed: 75 healthy, 77 unhealthy and 56 dead trees.
Tree locations were first identified in the field with a GPS-enabled tablet on which the 2022 SWISSIMAGE mosaic was displayed. Afterwards, the tree locations were precisely adjusted on the trunk locations by visually locating the corresponding stems in the LiDAR point cloud with the help of the pictures taken in the field. The location and health status of a further 18 beech trees were added in July 2023. These 226 beeches - under which are 76 healthy, 77 affected and 73 dead trees - surveyed at the two dates are defined as the ground truth for this project.
Figure 2: Examples of the three health states: left, a healthy tree with a dense green tree crown; center, an unhealthy tree with dead twigs and a scarce foliage; right, a dead tree completely dry."},{"location":"PROJ-HETRES/#4-method","title":"4. Method","text":"The method developed is based on the processing of LiDAR point clouds and of airborne images. Ready-made vegetation indices derived from satellite imagery were also used. First, a segmentation of the trees in the LiDAR point cloud was carried out using the Digital-Forestry-Toolbox (DFT) 11. Then, descriptors for the health state of the beech trees were derived from each dataset. Boxplots and corresponding t-test are computed to evaluate the ability of descriptors to differentiate the three health states. A t-test value below 0.01 indicates that there is a significant difference between the means of two classes. Finally, the descriptors were used jointly with the ground truth to train a random forest (RF) algorithm, before inferring for the study area.
Figure 3: Overview of the methodology, which processes the data into health descriptors for beech trees, before training and evaluating a random forest."},{"location":"PROJ-HETRES/#41-lidar-processing","title":"4.1 LiDAR processing","text":"At the beginning of LiDAR processing, exploration of the data motivated the segmentation and descriptors computation.
"},{"location":"PROJ-HETRES/#412-data-exploration","title":"4.1.2 Data exploration","text":"In order to get an understanding of the available information at the tree level, we manually segmented three healthy, five unhealthy and three dead trees. More unhealthy trees have been segmented to better represent dieback symptoms. Vertical slices of each tree were rotary extracted, providing visual information on the health state.
"},{"location":"PROJ-HETRES/#413-segmentation","title":"4.1.3 Segmentation","text":"To be able to describe the health state of each tree, segmentation of the forest was performed using the DFT. Parameters have been tuned to find an appropriate segmentation. Two strategies for peak isolation were tested on the canopy height model (CHM):
Each peak isolation method was tested on a range of parameters and on different cell resolutions for the CHM computation. The detailed plan of the simulation is given in Appendix 1. The minimum tree height was set to 10 m. For computation time reasons, only 3 LiDAR tiles with 55 ground truth (GT) trees located on them were processed.
To find the best segmentation, the locations of the GT trees were compared to the location of the segment peaks. GT trees with a segmented peak less than 4 m away were considered as True Positive (TP). The best segmentation was the one with the most TP.
"},{"location":"PROJ-HETRES/#414-structural-descriptors","title":"4.1.4 Structural descriptors","text":"An alternative to the segmentation is to change of paradigm and perform the analyses at the stand level. Meng et al. 6 derived structural descriptors for acacia dieback at the stand level based on LiDAR point cloud. By adapting their method to the present case, the following descriptors were derived from the LiDAR point cloud using the LidR library from R 12:
Descriptors 1 to 6 are directly overtaken from Meng et al. All the descriptors were first computed for three grid resolutions: 10 m, 5 m and 2.5 m. In a second time, the DFT segments were considered as an adaptive grid around the trees, with the assumption that it is still more natural than a regular grid. Then, structural descriptors for vertical points distribution (descriptors 1 to 4) were computed on each segment, whereas descriptors for horizontal points distribution (descriptors 5 to 7) have been processed for 2.5 m grid. A weight was applied to the value of the latter descriptors according to the area of grid cells included in the footprint of the segments.
Furthermore, LiDAR reflectance mean and standard deviation (sd) were computed for the segment crowns to differentiate them by their reflectance.
"},{"location":"PROJ-HETRES/#42-image-processing","title":"4.2 Image processing","text":"For the image processing, an initial step was to compute the normalized difference vegetation index (NDVI) for each raster image. The normalized difference vegetation index (NDVI) is an index commonly used for the estimation of the health state of vegetation 51314.
\\[\\begin{align} \\ NDVI = {NIR-R \\over NIR+R} \\ \\end{align}\\]where NIR and R are the value of the pixel in the near-infrared and red band respectively.
To uncover potential distinctive features between the classes, boxplots and principal component analysis were used on the images four bands (RGB-NIR) and the NDVI.
Firstly, we tested if the brute pixel values allowed the distinction between classes at a pixel level. This method avoids the pit of the forest segmentation into trees. Secondly, we tested the same method, but with some low-pass filter to reduce the noise in the data. Thirdly, we tried to find distinct statistical features at the tree level. This approach allows decreasing the noise that can be present in high-resolution information. However, it necessitates having a reasonably good segmentation of the trees. Finally, color filtering and edge detection were tested in order to highlight and extract the linear structure of the branches.
For each treatment, it is possible to do it with or without a mask on the tree height. As only trees between 20 m and 40 m tall are affected by dieback, a mask based on the Canopy Height Model (CHM) raster derived from the LiDAR point cloud was tested.
Figure 4: Overview of different possible data treatments for the the statistical analysis."},{"location":"PROJ-HETRES/#421-statistical-tests-on-the-original-and-filtered-pixels","title":"4.2.1 Statistical tests on the original and filtered pixels","text":"The statistical tests were performed on the original and filtered pixels.
Two low pass filters were tested:
In the original and the filtered cases, the pixels for each GT tree were extracted from the images and sorted by class. Then, the corresponding NDVI is computed. Each pixel has 5 attributes corresponding to its value on the four bands (R, G, B, NIR) and its NDVI. First, the per-class boxplots of the attributes were executed to see if the distinction between classes was possible on one or several bands or on the NDVI. Then, the principal component analysis (PCA) was computed on the same values to see if their linear combination allowed the distinction of the classes.
"},{"location":"PROJ-HETRES/#422-statistical-tests-at-the-tree-level","title":"4.2.2. Statistical tests at the tree level","text":"For the tests at the tree level, the GT trees were segmented by hand. For each tree, the statistics of the pixels were calculated over its polygon, on each band and for the NDVI. Then, the results were sorted by class. Each tree has five attributes per band or index corresponding to the statistics of its pixels: minimum (min), maximum (max), mean, median and standard deviation (std).
Like with the pixels, the per-class boxplots of the attributes were executed to see if the distinction between classes was possible. Then, the PCA was computed.
"},{"location":"PROJ-HETRES/#423-extraction-of-branches","title":"4.2.3 Extraction of branches","text":"One of the beneficiaries noted that the branches are clearly visible on the RGB images. Therefore, it may be possible to isolate them with color filtering based on the RGB bands. We calibrated an RGB filter through trial and error to produce a binary mask indicating the location of the branches. A sieve filter was used to reduce the noise due to the lighter parts of the foliage. Then, a binary dilation was performed on the mask to highlight the results. Otherwise, they would be too thin to be visible at a 1:5'000 scale. A mask based on the CHM is integrated to the results to limit the influence of the ground.
The branches have a characteristic linear structure. In addition, the branches of dead trees tend to be very light line on the dark forest ground and understory. Therefore, we thought that we may detect the dead branches thanks to edge detection. We used the canny edge detector and tested the python functions of the libraries openCV and skimage.
"},{"location":"PROJ-HETRES/#43-satellite-based-indices","title":"4.3 Satellite-based indices","text":"The yearly variation of NDVI and the VHI were used to take account of historical variations of NDVI from 2015 to 2022. For the VHI, the mean for each year is computed over the months considered for the yearly variation of NDVI.
The pertinence of using these indices was explored: the values for each tree in the ground truth were extracted and observed in boxplots per health class in 2022 per year pair over the time span from 2015 to 2022.
"},{"location":"PROJ-HETRES/#44-random-forest","title":"4.4 Random Forest","text":"In R 12, the caret and randomForest packages were used to train the random forest and make predictions. First, the ground truth was split into the training and the test datasets, with each class being split 70 % into the training set and 30 % into the test set. Health classes with not enough samples were completed with copies. Optimization of the RF was performed on the number of trees to develop and on the number of randomly sampled descriptors to test at each split. In addition, 5-fold cross-validation was used to ensure the use of different parts of the dataset. The search parameter space was from 100 to 1000 decision trees and from 4 to 10 descriptors as the default value is the square root of all descriptors, i.e. 7. RF was assessed using a custom metric, which is an adaptation of the false positive rate for the healthy class. It minimizes the amount of false healthy detections and of dead trees predicted as unhealthy (false unhealthy). It is called custom false positive rate (cFPR) in the text. It was preferred to have a model with more unhealthy predictions to control on the field, than missing unhealthy or dead trees. The cFPR goes from 0 (best) to 1 (worse).
Table 2: Confusion matrix for the three health classes.
Ground truth Healthy Unhealthy Dead Prediction Healthy A B C Unhealthy D E F Dead G H IAccording to the confusion matrix in Table 2, the cFPR is computed as follows:
\\[\\begin{align} \\ cFPR = {(\ud835\udc35+\ud835\udc36+\ud835\udc39)\\over(\ud835\udc35+\ud835\udc36+\ud835\udc38+\ud835\udc39+\ud835\udc3b+\ud835\udc3c)}. \\ \\end{align}\\]In addition, the overall accuracy (OA), i.e. the ratio of correct predictions over all the predictions, and the sensitivity, which is, per class, the number of correct predictions divided by the number of samples from that class, are used.
An ablation study was performed on descriptors to assess the contribution of the different data sources to the final performance. An \u201cimportant\u201d descriptor is having a strong influence on the increase in prediction errors in the case of random reallocation of the descriptor values in the training set.
After the optimization, predictions for each DFT segments were computed using the best model according to the cFPR. The inferences were delivered as a thematic map with colors indicating the health state and hue indicating the fraction of decision trees in the RF having voted for the class (vote fraction). The purpose is to give a confidence information, with high vote fraction indicating robust predictions.
Furthermore, the ground truth was evaluated for quantity and quality by two means:
Finally, after having developed the descriptors and the routine on high-quality data, we downgraded them to have resolutions similar to the ones of the swisstopo products (LiDAR: 20 pt/m2, orthoimage: 10 cm) and performed again the optimization and prediction steps. Indeed, the data acquisition was especially commissioned for this project and only covers the study area. If in the future the method should be extended, one would like to test if a lower resolution as the one of the standard national-wide product SWISSIMAGE could be sufficient.
"},{"location":"PROJ-HETRES/#5-results-and-discussion","title":"5 Results and discussion","text":"In this section, the results obtained during the processing of each data source into descriptors are presented and discussed, followed by a section on the random forest results.
"},{"location":"PROJ-HETRES/#51-lidar-processing","title":"5.1 LiDAR processing","text":"For the LiDAR data, the reader will first discover the aspect of beech trees in the LiDAR point cloud according to their health state as studied in the data exploration. Then, the segmentation results and the obtained LiDAR-based descriptors will be presented.
"},{"location":"PROJ-HETRES/#512-data-exploration-for-11-beech-trees","title":"5.1.2 Data exploration for 11 beech trees","text":"The vertical slices of 11 beech trees provided visual information on health state: branch shape, clearer horizontal and vertical point distribution. In Figure 5, one can appreciate the information shown by these vertical slices. The linear structure of the dead branches, the denser foliage of the healthy tree and the already smaller tree crown of the dead tree are well recognizable.
Figure 5: Slices for three trees with different health state. Vertical slices of each tree were rotary extracted, providing visual information on the health state. Dead twigs and density of foliage are particularly distinctive.Some deep learning image classifier could treat LiDAR point cloud slices as artificial images and learn from them before classifying any arbitrary slice from the LiDAR point cloud. However, the subject is not adapted to transfer learning because 200 samples are not enough to train a model to classify three new classes, especially via images without resemblance to datasets used to pre-train deep learning models.
"},{"location":"PROJ-HETRES/#513-segmentation","title":"5.1.3 Segmentation","text":"Since the tree health classes were visually recognizable for the 11 trees, it was very interesting to individuate each tree in the LiDAR point cloud.
After having searched for optimal parameters in the DFT, the best realization of each peak isolation method either slightly oversegmented or slightly undersegmented the forest. The forest has a complex structure with dominant and co-dominant trees, and with understory. A simple yet frequent example is the situation of a small pine growing in the shadow of a beech tree. It is difficult for an algorithm to differentiate between the points belonging to the pine and those belonging to the beech. Complex tree crowns (not spheric, with two maxima) especially lead to oversegmentation.
As best segmentation, the smoothing of maxima on a 0.5 m resolution CHM was identified. Out of 55 GT trees, 52 were within a 4 m distance from the centroid of a segment. The total number of segments is 7347. This corresponds to 272 trees/ha. Report of a forest inventory in the Jura forest between 2003 and 2005 indicated a density of 286 trees/ha in high forest 7. Since the ground truth is only made of point coordinates, it is difficult to assess quantitatively the correctness of the segments, i.e. the attribution of each point to the right segment. Therefore, the work at the tree level is only approximate.
"},{"location":"PROJ-HETRES/#514-structural-descriptors","title":"5.1.4 Structural descriptors","text":"Nevertheless, the structural descriptors for each tree were computed from the segmented LiDAR point cloud. The t-test between health classes for each descriptor at each resolution (10 m, 5 m, 2.5 m and per-tree grid) are given in Appendices 2, 3, 4 and 5. The number of significant descriptors per resolution is indicated to understand better the effect on the RF:
The simulations at 5 m and at 2.5 m seemed a priori the most promising. In both constellations, t-tests indicated a significant different distribution for:
The maximal height and the sdCHM appear to be the most suited descriptors to separate the three health states. The other descriptors are differentiating healthy trees from the others or dead trees from the others. From the 11 LiDAR-based descriptors, 8 are at least significant for the comparison between two classes.
"},{"location":"PROJ-HETRES/#52-image-processing","title":"5.2 Image processing","text":"Boxplots and PCA are given to illustrate the results of the image processing exploration. As the masking of pixels below and above the affected height made no difference in the interpretation of the results, they are presented here with the height mask.
"},{"location":"PROJ-HETRES/#521-boxplots-and-pca-over-the-pixel-values-of-the-original-images","title":"5.2.1 Boxplots and PCA over the pixel values of the original images","text":"When the pixel values of the original images per health class are compared in boxplots (ex. Fig. 6), the sole brute value of the pixel is not enough to clearly distinguish between classes.
Figure 6: Boxplots of the unfiltered pixel values on the different bands and the NDVI index by health class.The PCA in Figure 7 shows that it is not possible to distinguish the groups based on a linear combination of the brute pixel values of the band and NDVI.
Figure 7: Distribution of the pixels in the space of the principal components based on the pixel values on the different branches and the NDVI. "},{"location":"PROJ-HETRES/#522-boxplots-and-pca-over-the-pixel-values-of-the-filtered-images","title":"5.2.2 Boxplots and PCA over the pixel values of the filtered images","text":"A better separation of the different classes is noticeable after the application of a Gaussian filter. The most promising band is the NIR one for a separation of the healthy and dead classes. On the NDVI, the distinction between those two classes should also be possible as illustrated in Figure 8. In all cases, there is no possible distinction between the healthy and unhealthy classes. The separation between the healthy and dead trees on the NIR band would be around 130 and the slight overlap on the NDVI band is between approx. 0.04 and approx. 0.07.
Figure 8: Boxplots of the pixel values on the different bands and the NDVI by health class after a Gaussian filter with sigma=5.As for the brute pixels, the overlap between the different classes is still very present in the PCA (Fig. 9).
Figure 9: Distribution of the pixels in the space of the principal components based on the pixel values on the different branches and the NDVI after a Gaussian filter with sigma=5.The boxplots produced on the resampled images (Figure 10) give similar results to the ones with the Gaussian filter. The healthy and dead classes are separated on the NIR band around 130. The unhealthy class stays similar to the healthy one.
Figure 10: Boxplots of the pixel values on the different bands and the NDVI by health class after a downsampling filter with a factor 1/3.According to the PCA in Figure 11, it seems indeed not possible to distinguish between the classes only with the information presented in this section.
Figure 11: Distribution of the pixels in the space of the principal components based on the pixel values on the different branches and the NDVI after a downsampling filter with a factor 1/3.When the factor for the resampling is decreased, i.e. when the resulting resolution increases, the separation on the NIR band becomes stronger. With a factor of 1/17, the healthy and dead classes on the NDVI are almost entirely separated around the value of 0.04.
"},{"location":"PROJ-HETRES/#523-boxplots-and-pca-over-the-tree-statistics","title":"5.2.3 Boxplots and PCA over the tree statistics","text":"As an example for the per-tree statistics, the boxplots and PCA for the blue band are presented in Figures 12 to 14. On the mean and on the standard deviation, healthy and dead classes are well differentiated on the blue band as visible on Figure 12. The same is observed on the mean, median, and minimum of the NDVI, as well as on the maximum, mean, and median of the NIR band. However, there is no possible differentiation on the red and green bands.
Figure 12: Boxplots of the statistics values for each tree on the blue band by health class.In the PCA in Figure 13, the groups of the healthy and dead trees are quite well separated, mostly along the first component.
Figure 13: Distribution of the trees in the space of the principal components based on their statistical values on the blue band.On Figure 14, the first principal component is influenced principally by the standard deviation of the blue band. The mean, the median and the max have an influence too. This is in accordance with the boxplots where the standard deviation values presented the largest gap between classes.
Figure 14: Influence of the statistics for the blue band on the first and second principal components.The point clouds of the dead and healthy classes are also well separated on the PCA of the NIR band and of the NDVI. No separation is visible on the PCA of the green and red bands.
"},{"location":"PROJ-HETRES/#524-extraction-of-branches","title":"5.2.4 Extraction of branches","text":"Finally, the extraction of dead branches was performed.
"},{"location":"PROJ-HETRES/#use-of-an-rgb-filter","title":"Use of an RGB filter","text":"The result of the RGB filter is displayed in Figure 15. It is important to include the binary CHM in the visualization. Otherwise, the ground can have a significant influence on certain zones and distract from the dead trees. Some interferences can still be seen among the coniferous trees that have a similar light color as dead trees.
Figure 15: Results produced by the RGB filter for the detection and highlight of dead branches over a zone with coniferous, healthy deciduous and dead deciduous trees. The parts in grey are the zones masked by the filter on the height."},{"location":"PROJ-HETRES/#use-of-the-canny-edge-detector","title":"Use of the canny edge detector","text":"Figure 16 presents the result for the blue band which was the most promising one. The dead branches are well captured. However, there is a lot of noise around them due to the high contrasts in some parts of the foliage. The result is not usable as is. Using a stricter filter decreased the noise, but it also decreased the captured pixels of the branches. In addition, using a sieve filter or trying to combine the results with the ones of the RGB filter did not improve the situation.
Figure 16: Test of the canny edge detector from sklearn over a dead tree by using only the blue band. The parts in grey are the zones masked by the CHM filter on the height.The results for the other bands, RGB images or the NDVI were not usable either.
"},{"location":"PROJ-HETRES/#525-discussion","title":"5.2.5 Discussion","text":"The results at the tree level are the most promising ones. They are integrated into the random forest. Choosing to work at the tree-level means that all the trees must be segmented with the DFT. This adds uncertainties to the results. As explained in the dedicated section, the DFT has a tendency of over/under-segmenting the results. The procedures at the pixel level, whether on filtered or unfiltered images, are abandoned.
For the branch detection, the results were compared with some observations on the terrain by a forest expert. He assessed the result as incorrect in several parts of the forest. Therefore, the use of dead branch detection was not integrated in the random forest. In addition, the edge detection was maybe not the right choice for dead branches and maybe we should have taken an approach more focused on detection of straight lines or graphs. The chance of success of such methods are difficult to predict as there can be a lot of variations in the form of the dead branches.
"},{"location":"PROJ-HETRES/#53-vegetation-indices-from-satellite-imagery","title":"5.3 Vegetation indices from satellite imagery","text":"The t-test used to evaluate the ability of satellite indices to differentiate between health states are given in Appendices 6 and 7. In the following two subsections, solely the significant tested groups are mentioned for understanding the RF performance.
"},{"location":"PROJ-HETRES/#531-yearly-variation-of-ndvi","title":"5.3.1 Yearly variation of NDVI","text":"t-test on the yearly variation of NDVI indicated significance between:
t-test on the VHI indicated significance between:
Explanations similar to those for NDVI may partly explain the significance obtained. In any case,it is encouraging that the VHI helps to differentiate health classes thanks to different evolution through the years.
"},{"location":"PROJ-HETRES/#54-random-forest","title":"5.4 Random Forest","text":"The results of the RF that are presented and discussed are: (1) the optimization and ablation study, (2) the ground truth analysis, (3) the predictions for the AOI and (4) the performance with downgraded data.
"},{"location":"PROJ-HETRES/#541-optimization-and-ablation-study","title":"5.4.1 Optimization and ablation study","text":"In Table 3, performance for VHI and yearly variation of NDVI (yvNDVI) descriptors using their value at the location of the GT trees are compared. VHI (cFPR = 0.24, OA = 0.63) performed better than the yearly variation of NDVI (cFPR = 0.39, OA = 0.5). Both groups of descriptors are mostly derived from satellite data with the same resolution (10 m). A conceptual difference is that the VHI is a deviation to a long-term reference value; whereas the yearly variation of NDVI reflects the change between two years. For the latter, values can be high or low independently of the actual health state. Example, a succession of two bad years will indicate few to no differences in NDVI.
Table 3: RF performance with satellite-based descriptors.
Descriptors cFPR OA VHI 0.24 0.63 yvNDVI 0.39 0.5Nonetheless, only the yearly variation of NDVI is used hereafter as it is available free of charge.
Regarding the LiDAR descriptors, the tested resolutions indicated that the 5 m resolution (cFPR = 0.2 and OA = 0.65) was performing the best for the cFPR, but that the per-tree descriptors had the higher OA (cFPR = 0.33, OA = 0.67). At 5 m resolution, fewer affected trees are missed, but there are more errors in the classification, so more control on the field would have to be done. The question of which grid resolution to use on the forest is a complex one, as the forest consists of trees of different sizes. Further, even if dieback affects some areas more severely than others, it's not a continuous phenomenon, and it is important to be able to clearly delimit each tree. However, a grid, as the 2.5 m one, can also hinder to capture the entirety of some trees and the performance may decrease (LiDAR, 2.5 m, OA=0.63).
Table 4: RF performance with LiDAR-based descriptors at different resolutions.
Descriptors cFPR OA LiDAR, 10 m 0.3 0.6 LiDAR, 5 m 0.2 0.65 LiDAR, 2.5 m 0.28 0.63 LiDAR, per tree 0.33 0.67Then, the 5 m resolution descriptors are kept for the rest of the analysis according to the decision of reducing missed dying trees.
The ablation study performed on the descriptor sources is summarized in Table 5.A and Table 5.B. The two tables reflect performance for two different partitions of the samples in training and test sets. Since the performance is varying form several percents, the performance is impacted by the repartition of the samples. Following those values, the best setups for each partition respectively are the full model (cFPR = 0.13, OA = 0.76) and the airborne-based model (cFPR = 0.11, OA = 0.79).
One notices that all the health classes are not predicted with the same accuracy. The airborne-based model, as described in Section 5.2.3, is less sensitive to the healthy class; whereas the satellite-based model and the LiDAR-based model is more polarized to healthy and dead classes, with low sensitivity performance in the unhealthy class.
Table 5.A: Ablation study results, partition A of the dataset.
Descriptor sources cFPR OA Sensitivity healthy Sensitivity unhealthy Sensitivity dead LiDAR 0.2 0.65 0.65 0.61 0.71 Airborne images 0.18 0.63 0.43 0.61 0.94 yvNDVI 0.4 0.49 0.78 0.26 0.41 LiDAR and yvNDVI 0.23 0.7 0.74 0.61 0.76 Airborne images and yvNDVI 0.15 0.73 0.65 0.7 0.88 LiDAR, airborne images and yvNDVI 0.13 0.76 0.65 0.74 0.94Table 5.B: Ablation study results, partition B of the dataset.
Descriptor sources cFPR OA Sensitivity healthy Sensitivity unhealthy Sensitivity dead LiDAR 0.19 0.71 0.76 0.5 0.88 Airborne images 0.11 0.79 0.62 0.8 1 yvdNDVI 0.38 0.62 0.81 0.4 0.65 LiDAR and yvNDVI 0.27 0.74 0.86 0.5 0.88 Airborne images and yvNDVI 0.14 0.78 0.62 0.8 0.94 LiDAR, airborne images and yvNDVI 0.14 0.79 0.71 0.7 1Even if the performance varies according to the dataset partition, the important descriptors remain quite similar between the two partitions as displayed in Figure 17.A and Figure 17.B. The yearly difference of NDVI between 2018 and 2019 (NDVI_diff_1918) is the most important descriptor; standard deviation on the blue band (b_std) and the mean on the NIR band and NDVI (nir_mean and ndvi_mean) are standing out in both cases; from the LiDAR, the standard deviation of canopy cover (sdcc) and of the LiDAR reflectance (i_sd_seg) are the most important descriptors. The order of magnitude explains the better performance on partition B with the airborne-based model: for instance, the b_std has the magnitude of 7.6 instead of 4.6 with Partition B.
Figure 17.A: Important descriptors for the full model, dataset partition A. Figure 17.B: Important descriptors for the full model, dataset partition B.The most important descriptor of the full model resulted to be the yearly variation of NDVI between 2018 and 2019. The former was a year with a dry and hot summer which has stressed beech trees and probably participated to cause forest damages 1. This corroborates the ability of our RF method to monitor the response of trees to extreme drought events. However, the 10 m resolution of the index and the different adaptability of individual beech trees to drought may make the relationship between current health status and the index weak. This can explain that the presence of this descriptor in the full model doesn't offer better performance than the airborne-based model to predict the health state.
Both the mean on the NIR band and the standard deviation on the blue band play an important role. Statistical study in Section 5.2.3 indicated that the models might confuse healthy and unhealthy classes. On one hand, airborne imagery only sees the top of the crown and may miss useful information on hidden part. On the other hand, airborne imagery has a good ability to detect dead trees thanks to different reflectance values in NIR and blue bands.
One argument that could explain the lower performance of the model based on LiDAR-based descriptors is the difficulty to find the right scale to perform the analysis as beech trees can show a wide range of crown diameters.
"},{"location":"PROJ-HETRES/#542-ground-truth-analysis","title":"5.4.2 Ground truth analysis","text":"With progressive removal of sample individuals from the training set, impact of individual beech trees on the performance is further analyzed. The performance variation is shown in Figure 18. The performance is rather stable in the sense that the sensitivities stay in a range of values similar to the initial one up to 40 samples removed, but with each removal, a slight instability in the metrics is visible. The size of the peaks indicates variations of 1 prediction for the dead class, but up to 6 predictions for the unhealthy class and up to 7 for the healthy class. During the sample removal, some samples were always predicted correctly, whereas others were often misclassified leading to the peaks in Figure 18. With the large number of descriptors in the full model, there is no straightforward profile of outliers to identify.
Figure 18: Evolution of the per-class sensitivity with removal of samples.In addition, the subsampling of the training set in Table 6 shows that the OA varies only by max. 3% according to the subset used. It indicated again that the amount of ground truth allows to reach a stable OA range, but the characteristics of the samples does not allow a stable OA value. The sensitivity for the dead classes is stable, whereas sensitivity for healthy and unhealthy class are varying.
Table 6: Performance according to different random seed for the creation of the training subset.
Training set subpartition cFPR OA Sensitivity healthy Sensitivity unhealthy Sensitivity dead Random seed = 2 0.13 0.76 0.61 0.83 0.88 Random seed = 22 0.15 0.78 0.70 0.78 0.88 Random seed = 222 0.18 0.75 0.65 0.74 0.88 Random seed = 2222 0.13 0.76 0.65 0.78 0.88 Random seed = 22222 0.10 0.78 0.65 0.83 0.88 "},{"location":"PROJ-HETRES/#543-predictions","title":"5.4.3 Predictions","text":"The full model and the airborne-based-model were used to infer the health state of trees in the study area (Fig. 19). As indicated in Table 7, with the full model, 35.1 % of the segments were predicted as healthy, 53 % as unhealthy and 11.9 % as dead. With the airborne-based model, 42.6 % of the segments were predicted as healthy, 46.2 % as unhealthy and 11.2 % as dead. The two models agree on 74.3 % of the predictions. In the 25.6 % of disagreement, it is about 77.1% of disagreement between healthy and unhealthy predictions. Finally, 1.5% are critical disagreement (between healthy and dead classes).
Table 7: Percentage of health in the AOI.
Model Healthy [%] Unhealth [%] Dead [%] Full 35.1 53 11.9 Airborne-based 42.6 46.2 11.2Control by forestry experts reported that the predictions mostly correspond to the field situation and that a weak vote fraction often corresponds to false predictions. They confirmed that the map is delivering useful information to help plan beech tree felling. The final model retained after excursion on the field is the full model.
Figure 19: Extract of the predicted thematic health map. Green is for healthy, yellow for unhealthy, and red for dead trees. Hues indicate the RF fraction of votes. The predictions can be compared with the true orthophoto in the background. The polygons approximating the tree crowns correspond to the delimitation of segmented trees."},{"location":"PROJ-HETRES/#544-downgraded-data","title":"5.4.4 Downgraded data","text":"Finally, random forest models are trained and tested on downgraded data with the partition A of the ground truth for all descriptors and by descriptor sources. With this partition, RF have a better cFPR for the full model (0.08 instead of 0.13), the airborne-based model (0.08 instead of 0.21) and the LiDAR-based model (0.28 instead of 0.31). The OA is also better (full model: 0.84 instead of 0.76, airborne-based model: 0.77 instead of 0.63), except in the case of the LiDAR-based model (0.63 instead of 0.66). It indicated that the resolution of 10 cm in the aerial imagery does not weaken the model and can even improve it. For the LiDAR point cloud, a reduction by a factor 5 of the density has not changed much the performance.
Table 7.A: Performance for RF trained and tested with the partition A of the dataset of downgraded data.
Simulation cFPR OA Full 0.08 0.84 Airborne-based 0.08 0.77 LiDAR-based 0.28 0.63Table 7.A: Performance for RF trained and tested with the partition A of the dataset for original data.
Simulation cFPR OA Full 0.13 0.76 Airborne-based 0.21 0.63 LiDAR-based 0.31 0.66When the important descriptors are compared between the original and downgraded model, one notices that the airborne descriptors gained in importance in the full model when data are downgraded. The downgraded model showed sufficient accuracy for the objective of the project.
"},{"location":"PROJ-HETRES/#6-conclusion-and-outlook","title":"6 Conclusion and outlook","text":"The study has demonstrated the ability of a random forest algorithm to learn from structural descriptors derived from LiDAR point clouds and from vegetation reflectance in airborne and satellite images to predict the health state of beech trees. Depending on the used datasets for training and test, the optimized full model including all descriptors reached an OA of 0.76 or of 0.79, with corresponding cFPR values of 0.13 and 0.14 respectively. These metrics are sufficient for the purpose of prioritizing beech tree felling. The produced map, with the predicted health state and the corresponding votes for the segments, delivers useful information for forest management. The cantonal foresters validated the outcomes of this proof-of-concept and explained how the location of affected beech trees as individuals or as groups are used to target high-priority areas. The full model highlighted the importance of the yearly variation of NDVI between a drought year (2018) and a normal year (2019). The airborne imagery showed good ability to predict dead trees, whereas confusion remained between healthy and unhealthy trees. The quality of the LiDAR point cloud segmentation may explain the limited performance of the LiDAR-based model. Finally, the model trained and tested on downgraded data gave an OA of 0.84 and a cFPR of 0.08. In this model, the airborne-based descriptors gained in importance. It was concluded that a 10 cm resolution may help the model by reducing the noise in the image.
Outlooks for improving results include improving\u00a0the ground truth representativeness of symptoms in the field\u00a0and continuing research into descriptors for differentiating between healthy and unhealthy trees:
The possibility of further developments put aside, the challenge is now the extension of the methodology to a larger area. The simultaneity of the data is necessary to an accurate analysis. It has been shown that the representativeness of the ground truth has to be improved to obtain better and more stable results. Thus, for an extension to further areas, we recommend collecting additional ground truth measurements. The health state of the trees showed some autocorrelation that could have boosted our results and make them less representative of the whole forest. They should be more scattered in the forest.
Furthermore, required data are a true orthophoto and a LiDAR point cloud for per-tree analysis. It should be possible to use an old LiDAR acquisition to produce a CHM and renounce to use LiDAR-based descriptors without degrading the performance of the model too much.
"},{"location":"PROJ-HETRES/#7-appendixes","title":"7 Appendixes","text":""},{"location":"PROJ-HETRES/#71-simulation-plan-for-dft-parameter-tuning","title":"7.1 Simulation plan for DFT parameter tuning","text":"
Table 8: parameter tuning for DFT.
CHM cell size [m] Maxima smoothing Local maxima within search radius 0.50 0.1 (3.09632 + 0.00895 * h^2)/2 0.3 (1.7425 * h^0.5566)/2 0.5 (1.2 + 0.16 * h)/2 1.00 0.1 (3.09632 + 0.00895 * h^2)/2 0.3 (1.7425 * h^0.5566)/2 0.5 (1.2 + 0.16 * h)/2 1.50 0.1 (3.09632 + 0.00895 * h^2)/2 0.3 (1.7425 * h^0.5566)/2 0.5 (1.2 + 0.16 * h)/2 2.00 0.1 (3.09632 + 0.00895 * h^2)/2 0.3 (1.7425 * h^0.5566)/2 0.5 (1.2 + 0.16 * h)/2 "},{"location":"PROJ-HETRES/#72-t-tests","title":"7.2 t-tests","text":"t-test were computed to evaluate the ability of descriptors to differentiate the three health states. A t-test value below 0.01 indicates that there is a significant difference between the means of two classes.
"},{"location":"PROJ-HETRES/#721-t-tests-on-lidar-based-descriptors-at-10-m","title":"7.2.1 t-tests on LiDAR-based descriptors at 10 m","text":"
Table 9: t-test on LiDAR-based descriptors at 10 m.
Descriptors healthy vs. unhealthy healthy vs. dead unhealthy vs. dead maximal height 0.002 1.12E-11 3.23E-04 scale parameter 0.005 0.014 0.964 shape parameter 0.037 0.002 0.269 cvLAD 0.001 2.22E-04 0.353 VCI 0.426 0.094 0.358 mean reflectance 4.13E-05 0.002 0.164 sd of reflectance 0.612 3.33E-06 9.21E-05 canopy cover 0.009 0.069 0.340 sdCC 0.002 0.056 0.324 sdCHM 0.316 0.262 0.892 AGH 0.569 0.055 0.120 "},{"location":"PROJ-HETRES/#722-t-test-on-lidar-based-descriptors-at-5-m","title":"7.2.2 t-test on LiDAR-based descriptors at 5 m","text":"
Table 10: t-test on LiDAR-based descriptors at 5 m.
Descriptors healthy vs. unhealthy healthy vs. dead unhealthy vs. dead maximal height 0.001 4.67E-12 1.73E-04 scale parameter 0.072 0.831 0.204 shape parameter 0.142 0.654 0.361 cvLAD 9.14E-06 3.22E-05 0.667 VCI 0.006 0.104 0.485 mean reflectance 6.60E-05 2.10E-06 0.249 sd of reflectance 0.862 2.26E-08 9.24E-08 canopy cover 0.288 0.001 0.003 sdCC 1.42E-05 1.94E-11 0.001 sdCHM 0.004 1.94E-08 0.002 AGH 0.783 0.071 0.095 "},{"location":"PROJ-HETRES/#723-t-test-on-lidar-based-descriptors-at-25-m","title":"7.2.3 t-test on LiDAR-based descriptors at 2.5 m","text":"
Table 11: t-test on LiDAR-based descriptors at 2.5 m.
Descriptors healthy vs. unhealthy healthy vs. dead unhealthy vs. dead maximal height 3.76E-04 7.28E-11 4.80E-04 scale parameter 0.449 0.283 5.60E-01 shape parameter 0.229 0.087 0.462 cvLAD 3.59E-04 1.06E-07 0.012 VCI 0.004 1.99E-05 0.072 mean reflectance 3.15E-04 5.27E-07 0.068 sd of reflectance 0.498 1.10E-10 4.66E-11 canopy cover 0.431 0.004 0.019 sdCC 0.014 1.94E-13 6.94E-09 sdCHM 0.003 5.56E-07 0.006 AGH 0.910 0.132 0.132 "},{"location":"PROJ-HETRES/#724-t-test-on-lidar-based-descriptors-per-tree","title":"7.2.4 t-test on LiDAR-based descriptors per tree","text":"
Table 12: t-test on LiDAR-based descriptors per tree.
Descriptors healthy vs. unhealthy healthy vs. dead unhealthy vs. dead maximal height 0.001 1.98E-11 2.61E-04 scale parameter 0.726 0.618 0.413 shape parameter 0.739 0.795 0.564 cvLAD 0.001 4.23E-04 0.526 VCI 0.145 0.312 0.763 mean reflectance 1.19E-04 0.001 0.949 sd of reflectance 0.674 3.70E-07 4.79E-07 canopy cover 0.431 0.005 0.023 sdCC 0.014 4.43E-13 1.10E-08 sdCHM 0.003 2.71E-07 0.004 AGH 0.910 0.090 0.087 "},{"location":"PROJ-HETRES/#725-t-tests-on-yearly-variation-of-ndvi","title":"7.2.5 t-tests on yearly variation of NDVI","text":"
Table 13: t-test on yearly variation of NDVI.
Descriptors healthy vs. unhealthy healthy vs. dead unhealthy vs. dead 2016 0.177 0.441 0.037 2017 0.079 2.20E-06 0.004 2018 0.093 1.57E-04 0.132 2019 0.003 0.001 0.816 2020 0.536 0.041 0.005 2021 0.002 0.894 0.003 2022 0.131 0.103 0.002 "},{"location":"PROJ-HETRES/#726-t-test-on-vhi","title":"7.2.6 t-test on VHI","text":"
Table 14: t-test on VHI.
Descriptors healthy vs. unhealthy healthy vs. dead unhealthy vs. dead 2015-2016 0.402 0.572 0.767 2016-2017 0.005 0.002 0.885 2017-2018 0.769 0.329 0.505 2018-2019 2.64E-05 3.98E-14 0.001 2019-2020 7.86E-06 9.55E-05 0.427 2020-2021 0.028 0.790 0.018 2021-2022 0.218 0.001 0.080 "},{"location":"PROJ-HETRES/#8-sources-and-references","title":"8 Sources and references","text":"Indications on software and hardware requirements, as well as the code used to perform the project, are available on GitHub: https://github.com/swiss-territorial-data-lab/proj-hetres/tree/main.
Other sources of information mentioned in this documentation are listed here:
OFEV et al. (\u00e9d.). La canicule et la s\u00e9cheresse de l\u2019\u00e9t\u00e9 2018. Impacts sur l\u2019homme et l\u2019environnement. Technical Report 1909, Office f\u00e9d\u00e9ral de l\u2019environnement, Berne, 2019.\u00a0\u21a9\u21a9
Beno\u00eet Grandclement and Daniel Bachmann. 19h30 - En Suisse, la s\u00e9cheresse qui s\u00e9vit depuis plusieurs semaines frappe durement les arbres - Play RTS. February 2023. URL: https://www.rts.ch/play/tv/19h30/video/en-suisse-la-secheresse-qui-sevit-depuis-plusieurs-semaines-frappe-durement-les-arbres?urn=urn:rts:video:13829524 (visited on 2023-03-28).\u00a0\u21a9
Xavier Gauquelin, editor. Guide de gestion des for\u00eats en crise sanitaire. Office National des For\u00eats, Institut pour le D\u00e9veloppement Forestier, Paris, 2010. ISBN 978-2-84207-344-2.\u00a0\u21a9\u21a9
Philipp Brun, Achilleas Psomas, Christian Ginzler, Wilfried Thuiller, Massimiliano Zappa, and Niklaus E. Zimmermann. Large-scale early-wilting response of Central European forests to the 2018 extreme drought. Global Change Biology, 26(12):7021\u20137035, 2020. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/gcb.15360. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/gcb.15360 (visited on 2022-10-13), doi:10.1111/gcb.15360.\u00a0\u21a9
Run Yu, Youqing Luo, Quan Zhou, Xudong Zhang, Dewei Wu, and Lili Ren. A machine learning algorithm to detect pine wilt disease using UAV-based hyperspectral imagery and LiDAR data at the tree level. International Journal of Applied Earth Observation and Geoinformation, 101:102363, September 2021. URL: https://www.sciencedirect.com/science/article/pii/S0303243421000702 (visited on 2022-10-13), doi:10.1016/j.jag.2021.102363.\u00a0\u21a9\u21a9\u21a9
Pengyu Meng, Hong Wang, Shuhong Qin, Xiuneng Li, Zhenglin Song, Yicong Wang, Yi Yang, and Jay Gao. Health assessment of plantations based on LiDAR canopy spatial structure parameters. International Journal of Digital Earth, 15(1):712\u2013729, December 2022. URL: https://www.tandfonline.com/doi/full/10.1080/17538947.2022.2059114 (visited on 2022-12-07), doi:10.1080/17538947.2022.2059114.\u00a0\u21a9\u21a9\u21a9
Patrice Eschmann, Pascal Kohler, Vincent Brahier, and Jo\u00ebl Theubet. La for\u00eat jurassienne en chiffres, R\u00e9sultats et interpr\u00e9tation de l'inventaire forestier cantonal 2003 - 2005. Technical Report, R\u00e9publique et Canton du Jura, St-Ursanne, 2006. URL: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjHuZyfhoSBAxU3hP0HHeBtC4sQFnoECDcQAQ&url=https%3A%2F%2Fwww.jura.ch%2FHtdocs%2FFiles%2FDepartements%2FDEE%2FENV%2FFOR%2FDocuments%2Fpdf%2Frapportinventfor0305.pdf%3Fdownload%3D1&usg=AOvVaw0yr9WOtxMyY-87avVMS9YM&opi=89978449However.\u00a0\u21a9\u21a9
Agnieska Ptak. (5) Amplitude vs Reflectance \\textbar LinkedIn. June 2020. URL: https://www.linkedin.com/pulse/amplitude-vs-reflectance-agnieszka-ptak/ (visited on 2023-08-11).\u00a0\u21a9
BFH-HAFL and BAFU. Waldmonitoring.ch : wcs_ndvi_diff_2016_2015, wcs_ndvi_diff_2017_2016, wcs_ndvi_diff_2018_2017, wcs_ndvi_diff_2019_2018, wcs_ndvi_diff_2020_2019, wcs_ndvi_diff_2021_2020, wcs_ndvi_diff_2022_2021. URL: https://geoserver.karten-werk.ch/wfs?request=GetCapabilities.\u00a0\u21a9
Reik Leiterer, Gillian Milani, Jan Dirk Wegner, and Christian Ginzler. ExoSilva - ein Multi\u00ad-Sensor\u00ad-Ansatz f\u00fcr ein r\u00e4umlich und zeitlich hochaufgel\u00f6stes Monitoring des Waldzustandes. In Neue Fernerkundungs\u00adtechnologien f\u00fcr die Umweltforschung und Praxis, 17\u201322. Swiss Federal Institute for Forest, Snow and Landscape Research, WSL, April 2023. URL: https://www.dora.lib4ri.ch/wsl/islandora/object/wsl%3A33057 (visited on 2023-11-13), doi:10.55419/wsl:33057.\u00a0\u21a9
Matthew Parkan. Mparkan/Digital-Forestry-Toolbox: Initial release. April 2018. URL: https://zenodo.org/record/1213013 (visited on 2023-08-11), doi:10.5281/ZENODO.1213013.\u00a0\u21a9
R Core Team. R: A Language and Environment for Statistical Computing. 2023. URL: https://www.R-project.org/.\u00a0\u21a9\u21a9
Olga Brovkina, Emil Cienciala, Peter Surov\u00fd, and P\u0159emysl Janata. Unmanned aerial vehicles (UAV) for assessment of qualitative classification of Norway spruce in temperate forest stands. Geo-spatial Information Science, 21(1):12\u201320, January 2018. URL: https://www.tandfonline.com/doi/full/10.1080/10095020.2017.1416994 (visited on 2022-07-15), doi:10.1080/10095020.2017.1416994.\u00a0\u21a9
N.K. Gogoi, Bipul Deka, and L.C. Bora. Remote sensing and its use in detection and monitoring plant diseases: A review. Agricultural Reviews, December 2018. doi:10.18805/ag.R-1835.\u00a0\u21a9
Samuli Junttila, Roope N\u00e4si, Niko Koivum\u00e4ki, Mohammad Imangholiloo, Ninni Saarinen, Juha Raisio, Markus Holopainen, Hannu Hyypp\u00e4, Juha Hyypp\u00e4, P\u00e4ivi Lyytik\u00e4inen-Saarenmaa, Mikko Vastaranta, and Eija Honkavaara. Multispectral Imagery Provides Benefits for Mapping Spruce Tree Decline Due to Bark Beetle Infestation When Acquired Late in the Season. Remote Sensing, 14(4):909, February 2022. URL: https://www.mdpi.com/2072-4292/14/4/909 (visited on 2023-10-27), doi:10.3390/rs14040909.\u00a0\u21a9
Shanci Li (Uzufly) - Alessandro Cerioni (Canton of Geneva) - Clotilde Marmy (ExoLabs) - Roxane Pott (swisstopo)
Proposed by the Swiss Federal Statistical Office - PROJ-LANDSTATS September 2022 to March 2023 - Published on April 2023
All scripts are available on GitHub.
Abstract: From 2020 on, the Swiss Federal Statistical Office started to update the land use/cover statistics over Switzerland for the fifth time. To help and lessen the heavy workload of the interpretation process, partially or fully automated approaches are being considered. The goal of this project was to evaluate the role of spatio-temporal neighbors in predicting class changes between two periods for each survey sample point.
The methodology focused on change detection, by finding as many unchanged tiles as possible and miss as few changed tiles as possible. Logistic regression was used to assess the contribution of spatial and temporal neighbors to the change detection. While time deactivation and less-neighbors have a 0.2% decrease on the balanced accuracy, the space deactivation causes 1% decrease. Furthermore, XGBoost, random forest (RF), fully convolutional network (FCN) and recurrent convolutional neural network (RCNN) performance are compared by the means of a custom metric, established with the help of the interpretation team. For the spatial-temporal module, FCN outperforms all the models with a value of 0.259 for the custom metric, whereas the logistic regression indicates a custom metrics of 0.249.
Then, FCN and RF are tested to combine the best performing model with the model trained by OFS on image data only. When using temporal-spatial neighors and image data as inputs, the final integration module achieves 0.438 in custom metric, against 0.374 when only the the image data is used.
It was concluded that temporal-spatial neighbors showed that they could lighten the process of tile interpretation.
"},{"location":"PROJ-LANDSTATS/#1-introduction","title":"1. Introduction","text":"The introduction presents the background and the objectives of the projects, but also introduces the input data and its specific features.
"},{"location":"PROJ-LANDSTATS/#11-background","title":"1.1 Background","text":"Since 1979, the Swiss Federal Statistical Office (FSO) provides detailed and accurate information on the state and evolution of the land use and the land cover in Switzerland. It is a crucial tool for long-term spatial observation. With these statistics, it is possible to determine whether and to what extent changes in land cover and land use are consistent with the goals of Swiss spatial development policies (FSO).
Figure 1: Visualization of the land cover and land use classification.
Every few years, the FSO carries out a survey on aerial or satellite images all over Switzerland. A grid with sample points spaced 100 meters apart overlays the images, providing 4.1 million sample points on which the statistics are based. The classification of the hectare tile is assigned on the center dot, as shown in Figure 1. Currently, a time series of four surveys is accessible, based on aerial images captured in the following years:
The first two surveys of the land statistics in 1979 and 1992 were made by visual interpretation of aerial analogue photos using stereoscopes. Since the 2004 survey, the methodology was deeply renewed, in particular through the use of digital aerial photographs, which are observed stereoscopically on workstations using specific photogrammetry software.
A new nomenclature (2004 NOAS04) has also been introduced in 2004 which systematically distinguishes 46 land use categories and 27 land cover categories. A numerical label from this catalogue is assigned to each point by a team of trained interpreters. The 1979 and 1992 surveys have been revised according to the nomenclature NOAS04, so that all readings (1979, 1992, 2004, 2013) are comparable. On this page you will find the geodata of the Land Use Statistics at the hectare level since 1979, as well as documentation on the data and the methodology used to produce these data. Detailed information on basic categories and principal domains can be found in Appendix 1.
"},{"location":"PROJ-LANDSTATS/#12-objectives","title":"1.2 Objectives","text":"It is known that manual interpretation work is time-consuming and expensive. However, in a feasibility study, the machine learning technique showed great potential capacity to help speed up the interpretation, especially with deep learning algorithms. According to the study, 50% of the estimated interpretation workload could be saved.
Therefore, FSO is currently carrying out a project to assess the relevance of learning and mastering the use of artificial intelligence (AI) technologies to automate (even partially) the interpretation of aerial images for change detection and classification. The project is called Area Statistics Deep Learning (ADELE).
FSO had already developed tools for change detection and multi-class classification using the image data. However, the current workflow does not exploit the spatial and temporal dependencies between different points in the surveys.
The aim of this project is therefore to evaluate the potential of spatial-temporal neighbors in predicting whether or not points in the land statistics will change class. The methodolgy will be focused on change detection, by finding as many unchanged tiles as possible (automatized capacity) and miss as few changed tiles as possible. The detailed objectives of this project are to:
The raw data delivered by the domain experts is a table with 4'163'496 records containing the interpretation results of both land cover and land use from survey 1 to survey 4. An example record is shown in Table 1 and gives following information:
Table 1: Example record of raw data delivered by the domain experts.
RELI EAST NORTH LU4* LC4 LU3 LC3 LU2 LC2 LU1 LC1 training 74222228 2742200 1222800 242 21 242 21 242 21 242 21 0 75392541 2753900 1254100 301 41 301 41 301 41 301 41 0 73712628 2737100 1262800 223 46 223 46 223 46 223 46 0*The shortened LC1/LU1 to LC4/LU4 will be used to simplify the notation of Land Cover/Use of survey 1 to survey 4 in the following documentation.
For machine learning, training data quality has strong influence on model performance. With the training label, domain experts from FSO selected data points that are more reliable and representative. These 348'474 tiles and their neighbors composed the training and testing dataset for machine learning methodology.
"},{"location":"PROJ-LANDSTATS/#2-exploratory-data-analysis","title":"2. Exploratory data analysis","text":"As suggested by domain experts, exploratory data analysis (EDA) is of significance to understand the data statistics and find the potential internal patterns of class transformation. The EDA is implemented from three different perspectives: distribution, quantity and probability. With the combination of the three, we can find that there do exist certain trends in the transformation of both land cover and land use classes.
For the land cover, main findings are:
quantity: there are some clear patterns in quantitative changes
For the land use, main findings are:
Readers particularly interested by the change detection methods can directly go to Section 3; otherwise, readers are welcomed to read the illustrated and detailed EDA given hereafter.
"},{"location":"PROJ-LANDSTATS/#21-distribution-statistics","title":"2.1 Distribution statistics","text":"Figure 2: Land cover distribution plot.
Figure 3: Land use distribution plot.
First, a glance at the overall distribution of land cover and land use is shown in Figure 2 and 3. The X-axis is the label of each class while the Y-axis is the number of tiles in the Log scale. The records of the four surveys are plotted in different colors chronologically. By observation, some trends can be found across the four surveys.
Artificial areas only take up a small portion of the land cover (labels between 10 to 20), while most surface of Switzerland is covered by vegetation or forest (20 - 50). Bare land (50 - 60) and water areas (60 - 70) take up a considerable portion as well. For land use, it is obvious that the agricultural (200 - 250) and forest (300 - 310) areas are the main components while the unused area (421) also stands out from others.
Most classes kept the same tendency during the past 40 years. There are 11 out of 27 land cover classes and 32 out of 46 land use classes which are continuously increasing or decreasing all the time. Especially for land use, compared with 10 classes rising with time, 22 classes dropping, which indicates that there is some transformation patterns that caused the leakage from some classes to those 10 classes. We will dive into these patterns in the following sections.
"},{"location":"PROJ-LANDSTATS/#22-quantity-statistics","title":"2.2 Quantity statistics","text":"The data are explored in a quantitative way by three means:
Figure 4: Land cover transformation from 1985 to 2018.
The analysis of the transformation patterns in quantitative perspective has been implemented in the interactive visualization in Figure 4. The nodes of the same color belong to a common superclass (principle domain). The size of the node represents the number of tiles for the class and the width of the link reflects the number of transformations in log scale. When hanging over your mouse on these elements, detailed information such as the class label code and the number of transformations will be shown. Clicking the legend will enable you to select the superclasses in which the transformation should be analyzed.
Pre-processing had been done for the transformation data. To simplify the graph and stand out the major transformations, links with the number of transformations less than 0.1% of the total were removed from the graph. The filter avoids too many trivial links (580) connecting nearly all the nodes, leaving significant links (112) only. The process filtered 6.5% of the transformations in land cover and 11.5% in land use, which is acceptable considering it is a quantitative analysis focusing on the major transformation.
"},{"location":"PROJ-LANDSTATS/#222-sequential-transformation-visualization","title":"2.2.2 Sequential transformation visualization","text":"Figure 5: Land cover sequential transformation.
In addition to the transformation between the 2 surveys, the sequential transformation over time had also been visualized. Here, a similar filter is implemented as well to simplify the result and only tiles that had changed during the 4 surveys are visualized. In Figure 5, the box of a class in column 1985 (survey 1) is composed of different colors while the box of a class in column 2018 (survey 4) only has one color. This is caused by the color of the link showing a kind of sequential transformation. The different colors of a class in the first column show the end status (classification) of the tiles in survey 4.
There are some clear patterns we can find in the graph. For example, the red lines point out four diamond patterns in the graph. The diamond pattern with the edges in the same color illustrates the continuous trend that one class of tiles is transferred to the other class. In this figure, it is obvious that the Tree Clusters are degraded to the Grass and Herb, while Grass and Herb are transferred to the Consolidated Surfaces, showing the expansion of urban areas and the destruction to the natural environment.
"},{"location":"PROJ-LANDSTATS/#223-quantity-statistics-analysis","title":"2.2.3 Quantity statistics analysis","text":"Comparing the visualization of different periods, a constant pattern has been spotted in both land cover and land use. For example in land cover, the most transformation happened between the superclass of Tree Vegetation and Brush Vegetation. Also, a visible bi-direction transformation between Grass and Herb Vegetation and Clusters of Trees is witnessed. Greenhouses, wetlands and reedy marshes hardly have edges linked to them all over time, which illustrates that either they have a limited area or they hardly change.
A similar property can also be captured in land use classes. The most transformation happened inside the superclass of Arable and Grassland and Forest not Agricultural. Also, a visible transformation from Unused to Forest is highlighted by others.
Combining the findings above, it is clear that the transformation related to the Forest and Vegetation is the main part of the story. The forest shrinks or expands over time, changing to shrubs and getting back later. The Arable and Grassland keeps changing based on the need for agriculture or animal husbandry during the survey year. Different kinds of forests interconvert with each other which is a rational natural phenomenon.
"},{"location":"PROJ-LANDSTATS/#23-probability-matrix","title":"2.3 Probability matrix","text":"The above analysis demonstrates the occurrence of transformation with quantitative statistics. However, the number of tiles for different classes is not a uniform distribution as shown in the distribution analysis. The largest class is thousands of times more than the smallest one. Sometimes, the quantity of a transformation is trivial compared with the majority, but it is caused by the small amount of tiles for the class. Even if the negligible class would not have a significant impact on the performance of change detection, it is of great importance to reveal the internal transformation pattern of the land statistics and support the multi-class classification task. Therefore, the probability analysis is designed as below:
The probability analysis for land cover/use contains 3 parts:
The probability is calculated by the status change between the beginning survey and the end survey stated in the figure title. For example Figure 6 is calculated by the transformation between survey 1 and survey 4, without taking into account possible intermediate changes in survey 2 and 3.
"},{"location":"PROJ-LANDSTATS/#231-land-cover-analysis","title":"2.3.1 Land cover analysis","text":"Figure 6: Land cover probability matrix from LC1 to LC4.
The first information that the matrix provides is the blank blocks with zero probability of conversion. This discloses that transformation between some classes had never happened during the past four decades. Besides, all the diagonal blocks are with distinct color depth, illustrating that all classes of land cover are most likely to keep their status rather than to change.
Another evident features of this matrix are the columns with destination classes Grass and Herb Vegetation (21) and Closed Forest (41). There are a few classes such as Shrubs (31), Fruit Tree (33), Garden Plants (35) and Open Forest (44) which have a noticeable trend to convert to these two classes, which is partially consistent with the quantity analysis while revealing some new findings.
Figure 7: Land cover transformation probability without change.
When it comes to the refined visualization of the diagonal blocks, it is clear that half of the classes have more than an 80% probability of not transforming, while the minimum one only has about 35%. This is caused by the accumulation of the 4 surveys together which lasts 40 years. For a single decade, as the first 3 sub-graphs of Figure 23 in the Appendix A2.1, the majority are over 90% probability and the minimum rises to 55%.
Figure 8: Maximum transformation probability to a certain class when land cover changes.
For those transformed tiles, the maximum probability of converting into another class is shown in Figure 8. This graph together with the matrix in Figure 6 can point out the internal transformation pattern. The top 5 possible transformations between the first survey and the forth survey are:
1. 38% Open Forest (44) --> Closed Forest (41)\n 2. 36% Brush Meadows (32) --> Shrubs (31)\n 3. 34% Garden Plants (35) --> Grass and Herb Vegetation (21)\n 4. 29% Shrubs (31) --> Closed Forest (41)\n 5. 26% Cluster of Tree (47) --> Grass and Herb Vegetation (21)\n
In this case, the accumulation takes effect as well. For a single decade, the maximum probability decreases to 25%, but the general distribution of the probability is consistent between the four surveys according to Figure 24 in the Appendix A2.1.
"},{"location":"PROJ-LANDSTATS/#232-land-use-analysis","title":"2.3.2 Land use analysis","text":"Figure 9: Land use probability matrix from LU1 to LU4.
The land use probability matrix has different features compared with the land cover probability matrix. Although most diagonal blocks are with the deepest color depth, there are two areas highlighted by the red line presenting different statistics. The upper area is related to Construction sites (146) and Unexploited Urban areas (147). These two classes tend to change to other classes rather than keep unchanged, which is reasonable since the construction time of buildings or infrastructures hardly exceeds 10 years. This is confirmed by the left side of the red-edged rectangular block, which has a deeper color depth. This illustrates that construction and unexploited areas ended in the Settlement and Urban Areas (superclass of 100 - 170).
The lower red area account for the pattern concerning the Forest Areas (301 -304). The Afforestation (302), Lumbering areas (303) and Damaged Forest (304) would thrive and recover between the surveys, and finally become Forest (301) again.
Figure 10: Land use transformation probability without change.
Figure 10 further validates the assumptions. With most classes with a high probability of not changing, there are two deep valleys for classes 144 to 147 and 302 to 304, which are exactly the results of the stories mentioned above.
Figure 11: Maximum transformation probability to a certain class when land use changes.
Figure 11 tells the difference in the diversity of transformation destination. The construction and unexploited areas would turn into all kinds of urban areas, with more than 95% changed and the maximum probability to a fixed class is less than 35%. While the Afforestation, Lumbering areas and Damaged Forest returned to Forest with a probability of more than 90%, the transformation pattern within these four classes is fairly fixed.
The distribution statistics, the quantity statistics and the probability matrices have shown to validate and complement each other during the exploratory analysis of the data.
"},{"location":"PROJ-LANDSTATS/#3-methods","title":"3. Methods","text":"The developed method should be integrated in the OFS framework for change detection and classification of land use and land cover illustrated in Figure 12. The interesting parts for this project are highlighted in orange and will be presented in the following.
Figure 12: Planned structure in FSO framework for final prediction.
Figure 12 shows on the left the input data type in the OFS framework. The current project work on the LC/LU neighbors introduced in Section 1.3. The main objective of the project - to detect change by means of these neighbors - is the temporal-spatial module in Figure 12.
As proposed by the feasibility study, FSO had implement studies on change detection and multi-class classification on swisstopo aerial images time series to accelerate the efficiency of the interpretation work. The predicted LC and LU probabilities and information obtained by deep learning are defined as the image-level module.
In a second stage of the project, the best model for combining the temporal-spatial and the image-level module outputs is explored to evaluate the gain in performance after integration of the spatial-temporal module in the OFS framework. This is the so-called integration module. The rest of the input data will not be part of the performance evaluation.
"},{"location":"PROJ-LANDSTATS/#31-temporal-spatial-module","title":"3.1 Temporal-spatial module","text":"Figure 13: Time and space structure of a tile and its neighbors.
The input data to the temporal-spatial module will be the historical interpretation results of the tile to predict and its 8 neighbors. The first three surveys are used as inputs to train the models while the forth survey serves as the ground truth of the prediction. This utilizes both the time and space information in the dataset like depicted in Figure 13.
During the preprocessing, the tiles with missing neighbors were discarded from the dataset to keep the data format consistent, which is insignificant (about 400 out of 348'868). The determination of change is influenced by both land cover and land use. When there is a disparity between the classifications in the fourth survey and the third one for a specific tile, it is identified as altered (positive) in change detection. The joint prediction of land cover and land use is based on the assumption that a correlation may exist between them. If the land cover of a tile undergoes a change, it is probable that its land use will also change.
Moreover, the tile is assigned numerical labels. Nevertheless, the model does not desire a numerical association between classes, even when they belong to the same superclass and are closely related. To address this, we employ one-hot encoding, which transforms a single land cover column into 26 columns, with all values set to '0' except for one column marked as '1' to indicate the class. Despite increasing the model's complexity with almost two thousand input columns, this is a necessary trade-off to eliminate the risk of numerical misinterpretation.
"},{"location":"PROJ-LANDSTATS/#32-change-detection","title":"3.2 Change detection","text":"Usually, spatial change detection is a remote sensing application performed on aerial or satellite images for multiclass change detection. However, in this project, a table of point records is used for binary classification into changed and not changed classes. Different traditional and new deep learning approach have been explored to perform this task. The motivations to use them are given hereinafter. An extended version of this section with detailed introduction to the machine learning models is available in Appendix A3.
Three traditional classification models, logistic regression (LR), XGBoost and random forest (RF), are tested. The three models represent the most popular approaches in the field - the linear, boosting, and bagging models. In this project, logistic regression is well adapted because it can explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables. Concerning XGBoost, it has the advantage that weaker classifiers are introduced sequentially to focus on the areas where the current model is struggling, while misclassified observations would receive extra weight during training. Finally, in random forest, higher accuracy may be obtained and overfitting still avoided through the larger number of trees and the sampling process.
Beyond these traditional popular approaches, another two deep learning algorithms are explored as well: fully connected network and convolutional recurrent neural network. Different from the traditional machine learning algorithms, deep learning does not require manual feature extraction or engineering. Deep neural networks capture the desired feature with back-propagation optimization process. Besides, these deep neural networks have some special design for temporal or spatial inputs, because it is assumed that the internal pattern of the dataset would match with the network structure and the model will have better performance.
"},{"location":"PROJ-LANDSTATS/#321-focal-loss","title":"3.2.1 Focal loss","text":"Deep neural networks need differentiable loss function for optimization training. For this project with imbalanced classification task, the local loss was chosen rather than the traditional (binary) cross entropy loss.
\\[\\begin{align} \\\\ FL(p_t) = -{\\alpha}(1-p_t)^{\\gamma} \\ log(p_t) \\\\ \\end{align}\\]where \\(p_t\\) is the probability of predicting the correct class, \\(\\alpha\\) is a balance factor between positive and negative classes, and \\(\\gamma\\) is a modulation factor that controls how much weight is given to examples hard to classify.
Focal loss is a type of loss function that aims to solve the problem of class imbalance in tasks like classification. Focal loss modifies the cross entropy loss by adding a factor that reduces the loss for easy examples and increases the loss for examples hard to classify. This way, focal loss focuses more on learning from misclassified examples. Compared with other loss functions such as cross entropy, binary cross entropy and dice loss, some advantages of focal loss are:
\\(\\alpha\\) should be chosen based on the class frequency. A common choice is to set \\(\\alpha_t\\) is 1 minus the frequency of class t. This way, rare classes get more weight than frequent classes. \\(\\gamma\\) should be chosen based on how much you want to focus on hard samples. A larger gamma means more focus on hard samples, while a smaller gamma means less focus. The original paper suggested that a gamma equal to 2 is an effective value for most cases.
"},{"location":"PROJ-LANDSTATS/#322-fully-connected-network-fcn","title":"3.2.2 Fully connected network (FCN)","text":"Fully connected network (FCN) in deep learning is a type of neural network that consists of a series of fully connected layers. The major advantage of fully connected networks for this project is that they are structure agnostic. That is, no special assumptions need to be made about the input (for example, that the input consists of images or videos).
A disadvantage of FCN is that it can be very computationally expensive and prone to overfitting due to the large number of parameters involved. Another disadvantage is that it does not exploit any spatial or temporal structure in the input data, which can lead to poor performance for some tasks.
For implementation, the FCN employ 4 hidden layers (2048, 2048, 1024, 512 neurons respectively) besides the input and output layer. Relu activation function are chosen before the output layer while sigmoid function is applied at the end to scale the result to probability representation.
"},{"location":"PROJ-LANDSTATS/#323-convolutional-recurrent-neural-network-convrnn","title":"3.2.3 Convolutional recurrent neural network (ConvRNN)","text":"Convolutional recurrent neural network (ConvRNN) is a type of neural network that combines convolutional neural networks (CNNs) and recurrent neural networks (RNNs). CNNs are good at extracting spatial features from images, while RNNs are good at capturing temporal features from sequences. ConvRNNs can be used for tasks that require both spatial and temporal features as it is meant to be achieved in this project. Furthermore, the historical data of the land cover and land use can be translated to some synthetic images. The synthetic images use channels to represent sequence of surveys and the pixel value represents ground truth label. Thus, the spatial relationship of the neighbour tiles could be extracted from the data structure with the CNN.
Figure 14: Convolutional Recurrent Neural Network Pipeline.
In this project, we explored ConvRNN with structure shown in Figure 14. The sequence of surveys are treated as sequence of input \\(x^t\\). With the recurrent structure and hidden states \\(h^t\\) to transmit information, the temporal information could be extracted. Different from the traditional RNN, the function \\(f\\) in hidden layers of the recurrent structure use convolutional operation instead of matrix computation and an additional CNN module is applied to the sequence output to detect the spatial information.
"},{"location":"PROJ-LANDSTATS/#33-performance-metric","title":"3.3 Performance metric","text":"Once the different machine learning models are trained for the respective module, comparison has to be made on the test set to evaluate their performance. This will be performed with the help of metrics.
"},{"location":"PROJ-LANDSTATS/#331-traditional-metrics","title":"3.3.1 Traditional metrics","text":"As discovered in the distribution analysis, the dataset is strongly unbalanced. Some class is thousands of others. This is of importance to change detection. Moreover, among 348'474 tiles in the dataset, only 58'737 (16.86%) tiles have changed. If the overall accuracy is chosen as the performance metric, the biased distribution would make the model tend to predict everything unchanged. In that case, the accuracy of the model can achieve 83.1%, which is a quite high value achieved without any effort. Therefore, avoiding the problem during the model training and selecting the suitable metric that can represent the desired performance are the initial steps.
The constant model is defined as a model which predicts the third survey interpretation values as the prediction of the forth survey. In simple words, the constant model predicts that everything does not change. By this definition, we can calculate all kinds of metrics for other change detection models and compare them to the constant model metrics to indentify models with better performance.
For change detection with the constant model, the performance is as below:
Figure 15: Confusion matrix of constant distribution as prediction: TP=True Positive, TN=True Negative, FP=False Positive, FN=False Negative.
Table 2: Metrics evaluation for constant model.
Models Accuracy Balanced Accuracy Precision (PPV/NPV) Recall (TPR/TNR) F1-score Constant 0.831 0.500 Positive Negative 0.000 0.831 0.000 1.000 0.000 0.907
Definition of abbreviations:
For positive case:\n\nPrecision = TP / (TP + FP) Recall = TP / (TP + FN)\n(Positive predictive value, PPV) (True positive rate, TPR)\n\nFor negative case:\n\nPrecision = TN / (TN + FN) Recall = TN / (TN + FP) \n(Negative predictive value, NPV) (True negative rate, TNR)\n
The aim of the change detection is to predict the tiles of high confidence that do not change, so that the interpretation from the last survey can be used directly. However, the negative-case-related metrics above and the accuracy are not suitable for the present task because of the imbalance nature of the problem. Indeed, they indicate a high performance for the constant model, which we know is not depicting the reality, because of the large amount of unchanged tiles. After the test, the balanced accuracy which is the mean of the true positive rate and the true negative rate is considered a suitable metric for change detection.
"},{"location":"PROJ-LANDSTATS/#332-specific-weighted-metric-for-change-detection","title":"3.3.2 Specific weighted metric for change detection","text":"In theory, true negative rate is equivalent to 1 minus false positive rate. Optimizing balanced accuracy typically results in minimizing the false positive rate. However, our primary objective is to reduce false negative instances (i.e., changed cases labeled as unchanged), while maximizing the true positive rate and true negative rate. False positives are of lesser concern, as they will be manually identified in subsequent steps. Consequently, balanced accuracy does not adequately reflect the project's primary objective. With the help of FSO interpretation team, an additionnal, specific metric targeting on the objective has been designed to measure the model performance. Reminding the Exploratory Data Analysis, some transformation patterns have been found and applied in this metric as well.
Figure 16: Workflow with multiple input to define a weighted metric.
As depicted in Figure 16, the FSO interpretation team designed two filters to derive a custom metric. The first filter combines inputs from all the possible modules (in this case, the image-level and temporal-spatial modules). The input modules give the probability of change detection or multi-class classification prediction with confidence. As prediction from modules might be different, the first filter will set the final prediction of a tile as positive if any input module gives a positive prediction. Here the threshold to define positive is a significant hyperparameter to finetune.
The Weights Matrix defined by the human experts is the core of the entire metric. Based on professional experience and observation of EDA, the experts assigned different weights to all possible transformations. These weights demonstrate the importance of the transformation to the overall statistics. Besides, part of the labels is defined as Small Classes, which means that these classes are negligible or we do not consider them in this study. The second filter removes all the transformations related to the small classes and apply the weights matrix to all the remained tiles. Finally, the weighted metric is calculated as below:
\\[\\begin{align} Automatized \\ Tiles &= {\\#Predicted \\ Negatives} \\\\ \\\\ Automatized \\ Capacity &= {{\\#Automatized \\ Tiles} \\over {\\#Negatives \\ (ground \\ truth)}} \\\\ \\\\ Missed \\ Weighted \\ Changed \\ Ratio &= {{\\sum \\{Missed \\ Change \\times Weight\\}} \\over {\\sum \\{All \\ Change \\times Weight\\}}} \\\\ \\\\ Weighted \\ Metric &= Automatized \\ Capacity \\times (0.1 - Missed \\ Weighted \\ Changed \\ Ratio) \\ / \\ 0.1 \\end{align}\\]From now on, we will still calculate metrics like balanced accuracy and recall for reference and analysis; however, the Weighted Metric is the decisive metric for model selection.
"},{"location":"PROJ-LANDSTATS/#34-training-and-testing-plan","title":"3.4 Training and testing plan","text":"Introduced in Section 1.3, the 348'474 tiles with temporal-spatial information are selected for training. The 80%-20% split is applied to the selected tiles to create the train set and the test set respectively. Adam optimizer and multi-step learning rate scheduler are deployed for better convergence.
For the temporal-spatial module, metrics for ablation study on the descriptors and descriptor importance are first computed. The descriptor importance is taken from XGBoost simulations. The ablation study is performed with the logistic regression and consists of training the model with:
Then, the baseline configuration is used to trained the traditional algorithms and the deep learning ones. Metrics are compared and the best performing models are kept for the integration module.
Finally, the performance of several configurations are compared for the integration module.
The extra information gain from the temporal-spatial module will be studied by comparison with image-level performance only. The image-level data contain multi-class classification prediction and its confidence. We can calculate the change probability according to the probability of each class. Therefore, the weighted metric can also be applied at the image-level only. Then, the RF and FCN are tested for the integration module which combines various types of information sources.
"},{"location":"PROJ-LANDSTATS/#4-experiments","title":"4. Experiments","text":"The Experiments section covers the results obtained when performing the planned simulations for the temporal-spatial module and the integration module.
"},{"location":"PROJ-LANDSTATS/#41-temporal-spatial-module","title":"4.1 Temporal-spatial module","text":""},{"location":"PROJ-LANDSTATS/#411-feature-engineering-time-and-space-deactivation","title":"4.1.1 Feature engineering (time and space deactivation)","text":"In the temporal-spatial module, the studied models take advantages of both the space (the neighbors) and the time (different surveys) information as introduced in Section 3.1. Ablation study is performed here to acknowledge the feature importance and which information really matters in the model.
Table 3: Model metrics for ablation plan.
Logistic Regression Best threshold Accuracy Balanced Accuracy Precision (PPV/NPV) Recall (TPR/TNR) F1-score Time deactivate 0.515 0.704 0.718 Positive Negative 0.330 0.930 0.740 0.696 0.457 0.796 Space deactivate 0.505 0.684 0.711 Positive Negative 0.316 0.930 0.752 0.670 0.445 0.779 4 neighbors 0.525 0.707 0.718 Positive Negative 0.332 0.929 0.734 0.701 0.458 0.799 Baseline* 0.525 0.711 0.720 Positive Negative 0.337 0.928 0.734 0.706 0.462 0.802*Baseline: 8 neighbors with time and space activated
Table 3 reveals the performance change when time or space information is totally or partially (4-neighbors instead of 8-neighbors) deactivated. While time deactivation and less-neighbors hardly have an influence on the balanced accuracy (only 0.2% decrease), the one for space deactivation decreased by about 1%. The result demonstrates that space information is more vital to the algorithm than time information, even though both have a minor impact.
Figure 17: Feature importance analysis comparasion of 4 (left) and 8 (right) neighbors.
Figure 18: Feature importance analysis comparasion of time (left) and space (right) deactivation.
Figure 17 and 18 give the feature importance analysis from the XGBoost model. The sum of feature importance from variables all related to the tile itself and its neighbors are plotted in the charts. The 4-neighbor and 8-neighbor have similar capacities but the importance of neighbors for the latter is much more than for the former. This is caused by the number of variables. With more neighbors, the number of variables related to the neighbor increases and the sum of the feature importance grows as well.
The feature importance illustrates the weight assigned to the input variables. From Figure 17, it is obvious that the variable related to the tile itself from past surveys is the most critical. Furthermore, the more recent, the more important. The neighbor on the east and west (neighbors 3 and 4) are more significant than others and even more than the land use of the tile in the first survey.
In conclusion, the feature importance is not evenly distributed. However, the ablation study shows that the model with all the features as input achieved the best performance.
"},{"location":"PROJ-LANDSTATS/#412-baseline-models-with-probability-or-tree-models","title":"4.1.2 Baseline models with probability or tree models","text":"Utilizing the time and space information from the neighbors, three baseline methods with probability or tree model are fine-tuned. The logistic regression outperforms the other two, achieving 72.0% balanced accuracy. As result, more than 41'000 tiles are correctly predicted as unchanged while only about 3'000 changed tiles are missed as they are the false negatives. Detailed metrics of each method are listed in Table 4.
Table 4: Performance metrics for traditional machine learning simulation of spatial-temporal model.
Models Accuracy Balanced Accuracy Precision (PPV/NPV) Recall (TPR/TNR) F1-score Logistic Regression 0.711 0.720 Positive Negative 0.337 0.928 0.734 0.706 0.462 0.802 Random Forest 0.847 0.715 Positive Negative 0.775 0.849 0.134 0.992 0.229 0.915 XGBoost 0.837 0.715 Positive Negative 0.533 0.869 0.297 0.947 0.381 0.906 Constant 0.830 0.500 Positive Negative 0.000 0.830 0.000 1.000 0.000 0.90721: Metric changes with different threshold for logistic regression.
Besides the optimal performance with balanced accuracy, logistic regression can manually adjust its ability by changing the decision threshold as its output is the probability to change instead of prediction only. For example, we can trade off between the true positive rate and the negative predictive value. As shown in Figure 19, if we decrease the threshold probability, the precision of the negative case (NPV) will increase while the true negative rate goes down. This means more tiles need manual checks; however, fewer changed tiles are missed. Considering both the performance and the characteristics, Logistic Regression is selected as the baseline model.
"},{"location":"PROJ-LANDSTATS/#413-neural-networks-fcn-and-convrnn","title":"4.1.3 Neural networks: FCN and ConvRNN","text":"FCN and ConvRNN work differently: FCN does not have special structure designed for temporal-spatial data while ConvRNN has specific designation for time and space information respectively. To study these two extreme situations, we explored their performance and compared with the logistic regression which is the best of the baseline models.
Table 5: Performance metrics for deep machine learning simulation of spatial-temporal model
Models Weighted Metric RawMetric Balanced Accuracy Recall Missed Changes MissedChangesRatio Missed Weighted Changes Missed Weighted ChangesRatio Automatized Points Automatized Capacity LR (Macro)* 0.237 0.197 0.655 0.954 349 0.046 18995 0.035 14516 0.364 LR (BA)* 0.249 0.207 0.656 0.957 326 0.043 17028 0.031 14478 0.363 FCN 0.259 0.21 0.656 0.958 322 0.042 15563 0.029 14490 0.363 ConvRNN 0.176 0.133 0.606 0.949 388 0.051 19026 0.035 10838 0.272 Constant -10.717 -10.72 0.500 0.000 7607 1.000 542455 1.00 47491 1.191*Macro: the model is trained with Macro F1-score; BA: the model is trained with Balanced Accuracy.
As a result of its implementation (see Section 3.2.2), FCN outperforms all the models with a value of 0.259 for the weighted metric, slightly above the logistic regression with 0.249. ConvRNN does not perform well even if we have increased the size of hidden states to 1024. Following deliberation, we posit that the absence of one-hot encoding during the generation of synthetic images may be the cause, given that an increased number of channels could substantially explodes computational expenses. Since the ground truth label is directly applied to pixel values, the model may attempt to discern numerical relationships among distinct pixel values that, in reality, do not exist. This warrants further investigation in subsequent phases of our research.
"},{"location":"PROJ-LANDSTATS/#42-integration-module","title":"4.2 Integration module","text":"Table 5 compares the performance of FCN or image-level only to several configurations for the integration module.
Table 5: Performance metrics for the integration model in combination with a spatial-temporal model.
Model Weighted Metric RawMetric Balanced Accuracy Recall Missed Changes Missed ChangesRatio Missed Weighted Changes Missed Weighted ChangesRatio Automatized Points Automatized Capacity FCN 0.259 0.210 0.656 0.958 322 0.042 15563 0.029 14490 0.363 image-level 0.374 0.305 0.737 0.958 323 0.042 15735 0.029 20895 0.524 LR + RF 0.434 0.372 0.752 0.969 241 0.031 10810 0.020 21567 0.541 FCN + RF 0.438 0.373 0.757 0.968 250 0.032 11277 0.021 22010 0.552 FCN + FCN 0.438 0.376 0.750 0.970 229 0.030 9902 0.018 21312 0.534 LR + FCN 0.423 0.354 0.745 0.967 255 0.033 10993 0.020 21074 0.528The study demonstrates that the image-level contains more information related to change detection compared with temporal-spatial neighbors (FCN row in the Table 5). However, performance improvement from the temporal-spatial module when combined with image-level data, achieving 0.438 in weighted metric in the end (FCN+RF and FCN+FCN).
Regarding the composition of different models for the two modules, FCN is proved to be the best one for the temporal-spatial module, while RF and FCN have similar performance in the integration module. The choice of integration module could be influenced by the data format of other potential modules. This will be further studied by the FSO team.
"},{"location":"PROJ-LANDSTATS/#5-conclusion-and-outlook","title":"5. Conclusion and outlook","text":"This project studied the potential of historical and spatial neighbor data in change detection task for the fifth interpretation process of the areal statistic of FSO. For the evaluation of this specific project, a weighted metric was defined by the FSO team. The temporal-spatial information was proved not to be as powerful as image-level information which directly detects change within visual data. However, an efficient prototype was built with 6% performance improvement in weighted metric combining the temporal-spatial module and the image-level module. It is validated that integration of modules with different source information can help to enhance the final capacity of the entire workflow.
The next research step of the project would be to modify the current implementation of ConvRNN. If the numerical relationship is removed from the synthetic image data, ConvRNN should have similar performance as FCN theoretically. Also, CNN is worth trying to validate whether the temporal pattern matters in this dataset. Besides, by changing the size of the synthetic images, we can figure out how does the number of neighbour tiles impact the model performance.
"},{"location":"PROJ-LANDSTATS/#appendix","title":"Appendix","text":""},{"location":"PROJ-LANDSTATS/#a1-classes-of-land-cover-and-land-use","title":"A1. Classes of land cover and land use","text":"Figure 20: Land Cover classification labels. Figure 21: Land Use classification labels."},{"location":"PROJ-LANDSTATS/#a2-probability-analysis-of-different-periods","title":"A2. Probability analysis of different periods","text":""},{"location":"PROJ-LANDSTATS/#a21-land-cover","title":"A2.1 Land cover","text":"Figure 22: Land cover probability matrix. Figure 23: Land cover transformation probability without change. Figure 24: Maximum transformation probability to a certain class when land cover changes."},{"location":"PROJ-LANDSTATS/#a22-land-use","title":"A2.2 Land use","text":"Figure 25: Land use probability matrix. Figure 26: Land use transformation probability without change. Figure 27: Maximum transformation probability to a certain class when land use changes."},{"location":"PROJ-LANDSTATS/#a3-alternative-version-of-section-32","title":"A3 Alternative version of Section 3.2","text":""},{"location":"PROJ-LANDSTATS/#a31-logistic-regression","title":"A3.1 Logistic regression","text":"Logistic regression is a kind of Generalized Linear Model. It is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, logistic regression is a predictive analysis in this project. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.
"},{"location":"PROJ-LANDSTATS/#a32-xgboost-random-forest","title":"A3.2 XGBoost & random forest","text":"Figure 28: Comparison of boosting and bagging models.
XGBoost and Random Forest both originate from the tree model, while one is the sequential variant and the other is the parallel variant.
Extreme Gradient Boosting (XGBoost) is a distributed, scalable gradient-boosted decision tree (GBDT) machine learning algorithm. Gradient boosting is a flexible method used for regression, multi-class classification, and other tasks since it is compatible with all kinds of loss functions. It recasts boosting as a numerical optimization problem with the goal of reducing the loss function of the model by adding weak classifiers while employing gradient descent. Later, a first-order iterative approach, gradient descent, is used to find the local optimal of its differentiable function. Weaker classifiers are introduced sequentially to focus on the areas where the current model is struggling while misclassified observations would receive extra weight during training.
Random Forest is a bagging technique that contains a number of decision trees generated from the dataset. Instead of relying solely on one decision tree, it takes the average of a number of trees to improve the predictive accuracy. For each tree, the input feature is a different sampled subset from all the features, making the model more robust and avoiding overfitting. Then, these trees are trained with a bootstrapping-sampled subset of the dataset respectively. Finally, the random forest takes the prediction from each tree and based on the majority votes makes the final decision. Higher accuracy is obtained and overfitting is avoided through the larger number of trees and the sampling process.
"},{"location":"PROJ-LANDSTATS/#a33-focal-loss","title":"A3.3 Focal loss","text":"The next two methods are Deep Neural Networks which need differentiable loss function for optimization training. Here we first tell the difference between the loss function and evaluation metric.
The choice of loss function and evaluation metric depends on the task and data. The loss function should be chosen based on whether it is suitable for the model architecture and output type, while the evaluation metric should be relevant for the problem domain and application objectives.
The loss function and the evaluation metric are two different concepts in deep learning. The loss function is used to optimize the model parameters during training, while the evaluation metric is used to measure the performance of the model on a test set. The loss function and the evaluation metric may not be the same. For example, Here we use focal loss to train a classification model, but use balanced accuracy or specific defined metric to evaluate its performance. The reason for this is that some evaluation metrics may not be differentiable or easy to optimize, or they may not match with the objective of the model.
For this project with imbalanced classification task, we think the Focal Loss is a better choice than the traditional (binary) Cross Entropy Loss.
\\[\\begin{align} \\\\ FL(p_t) = -{\\alpha}(1-p_t)^{\\gamma} \\ log(p_t) \\\\ \\end{align}\\]where p_t is the probability of predicting the correct class, \\(\\alpha\\) is a balance factor between positive and negative classes, and \\(\\gamma\\) is a modulation factor that controls how much weight is given to examples hard to classify.
Focal loss is a type of loss function that aims to solve the problem of class imbalance in tasks like classification. Focal loss modifies the cross entropy loss by adding a factor that reduces the loss for easy examples and increases the loss for examples hard to classify. This way, focal loss focuses more on learning from misclassified examples. Compared with other loss functions such as cross entropy, binary cross entropy and dice loss, some advantages of focal loss are:
\\(\\alpha\\) should be chosen based on the class frequency. A common choice is to set \\(\\alpha_t\\) = 1 - frequency of class t. This way, rare classes get more weight than frequent classes. \\(\\gamma\\) should be chosen based on how much you want to focus on hard samples. A larger gamma means more focus on hard samples, while a smaller gamma means less focus. The original paper suggested gamma = 2 as an effective value for most cases.
"},{"location":"PROJ-LANDSTATS/#a34-fully-connected-network-fcn","title":"A3.4 Fully connected network (FCN)","text":"Figure 29: Network structure of FCN.
The fully connected network (FCN) in deep learning is a type of neural network that consists of a series of fully connected layers. A fully connected layer is a function from \\(\\mathbb{R}_m\\) to \\(\\mathbb{R}_n\\) that maps each input dimension to each output dimension. The FCN can learn complex patterns and features from data using backpropagation algorithm.
The major advantage of fully connected networks is that they are \u201cstructure agnostic.\u201d That is, no special assumptions need to be made about the input (for example, that the input consists of images or videos). Fully connected networks are used for thousands of applications, such as image recognition, natural language processing, and recommender systems.
A disadvantage of FCN is that it can be very computationally expensive and prone to overfitting due to the large number of parameters involved. Another disadvantage is that it does not exploit any spatial or temporal structure in the input data, which can lead to poor performance for some tasks. A possible alternative to fully connected network is convolutional neural network (CNN), which uses convolutional layers that apply filters to local regions of the input data, reducing the number of parameters and capturing spatial features.
"},{"location":"PROJ-LANDSTATS/#a35-convolutional-neural-network-cnn","title":"A3.5 Convolutional neural network (CNN)","text":"CNN stands for convolutional neural network, which is a type of deep learning neural network designed for processing structured arrays of data such as images. CNNs are very good at detecting patterns in the input data, such as lines, shapes, colors, or even faces and objects. CNNs use a special technique called convolution, which is a mathematical operation that applies a filter (also called a kernel) to each part of the input data and produces an output called a feature map. Convolution helps to extract features from the input data and reduce its dimensionality.
CNNs usually have multiple layers of convolution, followed by other types of layers such as pooling (which reduces the size of the feature maps), activation (which adds non-linearity to the network), dropout (which prevents overfitting), and fully connected (which performs classification or regression tasks). CNNs can be trained using backpropagation and gradient descent algorithms.
CNNs are widely used in computer vision and have become the state of the art for many visual applications such as image classification, object detection, face recognition, semantic segmentation, etc. They have also been applied to other domains such as natural language processing for text analysis.
Figure 30: Workflow of Convolutional Neural Network.
In this project, the historical data of the land cover and land use can be translated to some synthetic images. The synthetic images use channels to represent sequence of surveys and the pixel value represents ground truth label. Thus, the spatial relationship of the neighbour tiles could be extracted from the data structure with the CNN.
"},{"location":"PROJ-LANDSTATS/#a36-convolutional-recurrent-neural-network-convrnn","title":"A3.6 Convolutional recurrent neural network (ConvRNN)","text":"A convolutional recurrent neural network (ConvRNN) is a type of neural network that combines convolutional neural networks (CNNs) and recurrent neural networks (RNNs). CNNs are good at extracting spatial features from images, while RNNs are good at capturing temporal features from sequences. ConvRNNs can be used for tasks that require both spatial and temporal features, such as image captioning and speech recognition.
A ConvRNN consists of two main parts: a CNN part and an RNN part. The CNN part takes an input image or signal and applies convolutional filters to extract features. The RNN part takes these features as a sequence and processes them with recurrent units that have memory. The output of the RNN part can be a single vector or a sequence of vectors, depending on the task. A ConvRNN can learn both spatial and temporal patterns from data that have both dimensions, such as audio signals or video frames. For example, a ConvRNN can detect multiple sound events from an audio signal by extracting frequency features with CNNs and capturing temporal dependencies with RNNs.
Figure 31: Convolutional Recurrent Neural Network Pipeline.
In this project, we explored ConvRNN with structure shown in Figure 31. The sequence of surveys are treated as sequence of input \\(x^t\\). With the recurrent structure and hidden states \\(h^t\\) to transmit information, the temporal information could be extracted. Different from the traditional Recurrent Neural Network, the function \\(f\\) in hidden layers of the recurrent structure use Convolutional operation instead of matrix computation and an additional CNN module is applied to the sequence output to detect the spatial information.
"},{"location":"PROJ-QALIDAR/","title":"Cross-generational change detection in classified LiDAR point clouds for a semi-automated quality control","text":"Nicolas M\u00fcnger (Uzufly) - Gwena\u00eblle Salamin (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo)
Proposed by the Federal Office of Topography swisstopo - PROJ-QALIDAR September 2023 to February 2024 - Published in March 2024
All scripts are available on GitHub.
Abstract: The acquisition of LiDAR data has become standard practice at national and cantonal levels during the recent years in Switzerland. In 2024, the Federal Office of Topography (swisstopo) will complete a comprehensive campaign of 6 years covering the whole Swiss territory. The point clouds produced are classified post-acquisition, i.e. each point is attributed to a certain category, such as \"building\" or \"vegetation\". Despite the global control performed by providers, local inconsistencies in the classification persist. To ensure the quality of a Swiss-wide product, extensive time is invested by swisstopo in the control of the classification. This project aims to highlight changes in a new point cloud compared to a previous generation acting as reference. We propose here a method where a common grid is defined for the two generations of point clouds and their information is converted in voxels, summarizing the distribution of classes and comparable one-to-one. This method highlights zones of change by clustering the concerned voxels. Experts of the swisstopo LiDAR team declared themselves satisfied with the precision of the method.
"},{"location":"PROJ-QALIDAR/#1-introduction","title":"1. Introduction","text":"The usage of light detection and ranging (LiDAR) technology has seen a large increase in the field of geo-surveying over the recent years 1. Data obtained from airborne acquisition provides rich 3D information about land cover in the form of a point cloud. These point clouds are typically processed after acquisition in order to assign a class to each point, as displayed in Figure 1.
Figure 1: View of the Rhine Falls in the classified point cloud of the product swissSURFACE3D.To conduct their LiDAR surveys, the Federal Office of Topography (swisstopo) mandates external companies, in charge of the airborne acquisition and classification in post-processing. The process of verifying the quality of the data supplied is tedious, with an estimated duration of 42 working hours for the verification of an area of 216 km2. A significant portion of this verification process is dedicated to ensuring the precision of the point classification. With the first generation of the LiDAR product 2 nearing completion, swisstopo is keen to leverage the considerable time and effort invested to facilitate the quality assessment of the next generation. In this context, the swisstopo's LiDAR development team contacted the STDL to develop a change detection method.
As reviewed by Stilla & Xu (2023), change detection in point clouds has already been explored in numerous ways3. The majority of the research focus, however, on changes of geometry. Deep learning solutions are being extensively researched to apply the advancements in this field to change detection in point clouds4. However, to the best of our knowledge, no solution currently address the problem of change detection in the classification of two point clouds. Most challenges of change detection in point clouds come from the unstructured nature of LiDAR data, making it impossible to reproduce the same result across acquisition. Therefore, the production of ground truth and application of deep learning to point clouds of different generations can be challenging. To overcome this, data discretization by voxelization has already been studied in several works on change detection in point clouds, with promising results56.
The goal of this project is to create a mapping of the changes observed between two generations of point clouds for a common scene, with an emphasis on classification changes. The proposed method creates a voxel map for the reference point cloud and the new point cloud for which classification was not controlled as thoroughly. By using the same voxel grid for both generations, direct comparisons can be performed on the occupancy of voxels by the previous and the new classes. Based on the domain expert's criteria, an urgency level is assigned to all voxels: non-problematic, grey zone or problematic. Problematic voxels are then clustered into high priority areas. The summarized process is displayed in Figure 2.
Figure 2: Overview of the workflow for change detection and assignment of a criticality level to the detected changes."},{"location":"PROJ-QALIDAR/#2-data","title":"2. Data","text":""},{"location":"PROJ-QALIDAR/#21-lidar-point-clouds","title":"2.1 LiDAR point clouds","text":"The algorithm required two temporally distinct acquisitions for a same area. Throughout the document, we refer to the first point cloud as v.1. It served as reference data and is assumed to have a properly controlled classification. The subsequent point cloud, representing a new generation, is referred as v.2.
"},{"location":"PROJ-QALIDAR/#211-choice-of-the-lidar-products","title":"2.1.1 Choice of the LiDAR products","text":"The swissSURFACE3D product was extensively controlled by swisstopo's LiDAR team before its publication. Therefore, its classification has the quality expected by the domain expert. It acted as the v.1, i.e as the generation of reference.
We thus needed to find some newer acquisition which fulfilled the following conditions:
For our v.2, we used the point cloud produced by the State of Neuch\u00e2tel, which covers the area within its cantonal borders. The characteristics of each point cloud are summarized in Table 1.
Table 1: Characteristics of swissSURFACE3D, used as v1, and the LiDAR product of the State of Neuch\u00e2tel, used as v2. swissSURFACE3D Neuch\u00e2tel Acquisition period 2018-19 2022 Planimetric precision 20 cm 10 cm Altimetric precision 10 cm 5 cm Spatial density ~15-20 pts/m2 ~100 pts/m 2 Number of class 7 21 Dimension of one tile 1000 x 1000 m 500 x 500 m Provided file format LAZ LAZ
"},{"location":"PROJ-QALIDAR/#212-area-of-interest","title":"2.1.2 Area of interest","text":"The delimitation of the LiDAR tiles used in this project is shown in Figure 3. We chose to work with tiles of the dimensions of the Neuch\u00e2tel data, i.e. 500 x 500 m. The tiles are designated by a letter that we refer to in the continuation of this document.
The tiles are located in the region of Le Locle. The zone covers an urban area, where quality control is the most time-consuming. It also possesses a variety of land covers, such as a large band of dense forest or agricultural fields.
Figure 3: Tiles used for the development of our method: A for a result control for the hyperparameter tuning, B for the choice of the voxel size and C for a control of the results by the domain expert."},{"location":"PROJ-QALIDAR/#22-annotations-by-the-domain-expert","title":"2.2 Annotations by the domain expert","text":"To understand the expected result, the domain expert controlled the v.2 point cloud in the region of Le Locle as if it was a new acquisition. A perimeter of around 1.2 km2 was controlled.
The problematic zones were each defined by a polygon with a textual description, as well as the current and the correct class as numbers. A sample of annotations are shown in Figure 4.
Figure 4: Controlled area (left) and examples of control annotations within the detail zone, with the reported error as color and with the original and the corrected class as labels (right).This provided us with annotations of areas where the point cloud data were incorrect. The annotations were used to calibrate the change detection.
It must be noted that this control was achieved without referring the v.1 point cloud. In this case we assume that the v.1 contains no classification error, and that the annotated areas therefore represent classification changes between the two generations.
"},{"location":"PROJ-QALIDAR/#3-method","title":"3. Method","text":""},{"location":"PROJ-QALIDAR/#31-correspondence-between-classes","title":"3.1 Correspondence between classes","text":"To compare the classes between generations, we needed to establish their correspondence. We selected the classes from the swisstopo point cloud, i.e the reference generation, as the common ground. Any added classes in the new generation must come from a subdivision of an existing class, as explained in the requirements for the v.2 point cloud. This is the case with Neuch\u00e2tel data. Each class from Neuch\u00e2tel data was mapped to an overarching class from the reference generation, in accordance with the inputs from the domain expert. The details of this mapping are given in table 2. Notice that the class Ground level noise received the label -1. It means that this class was not treated in our algorithm and every such point is removed from the point cloud. This was agreed with the domain expert as this class is very different from the class Low Point (Noise) and doesn't provide any useful information.
Table 2: Mapping between the v.2 and v.1 point cloud. The field \"original ID\" provides the class number for v.2, the class name corresponds to the class description from the metadata, and the corresponding ID shows the class number from v.1 to which it is assigned. Original ID Class name Corresponding ID 1 Unclassified 1 2 Ground 2 3 Low vegetation 3 4 Medium vegetation 3 5 High vegetation 3 6 Building roofs 6 7 Low Point (Noise) 7 9 Water 9 11 Piles, heaps (natural materials) 1 14 Cables 1 15 Masts, antennas 1 17 Bridges 17 18 Ground level noise -1 19 Street lights 1 21 Cars 1 22 Building facades 6 25 Cranes, trains, temporary objects 1 26 Roof structures 6 29 Walls 1 31 Additional ground points 2 41 Water (synthetic points) 9
Figure 5: Reallocation of points from the v.2 classes (left) to the v.1 classes (right) for tile B with the class numbers from the second generation indicated between parenthesis.
As visible on Figure 5, seven classes were reassigned to class 1 Undefined. However, they represented a small part of the point cloud. The most important classes were ground, with in equal parts of ground and additional ground points, vegetation, with mainly points in high vegetation, and building, with mainly points on building roofs.
"},{"location":"PROJ-QALIDAR/#32-voxelization-of-the-point-clouds","title":"3.2 Voxelization of the point clouds","text":"The method relies on the voxelization of both point clouds. As defined in Xu et al. (2021)7, voxels are a geometry in 3D space, defined on a regular 3D grid. They can be seen as the 3D equivalent to pixels in 2D. Figure 68 shows how a voxel grid is defined over a point cloud.
Figure 6: Representation of a point cloud (a) and its voxel grid (b), courtesy of Shi et al. (2018)."},{"location":"PROJ-QALIDAR/#321-preprocessing-of-lidar-tiles","title":"3.2.1 Preprocessing of LiDAR tiles","text":"It must be noted that the approach operated under the assumption that both point clouds were already projected in the same reference frame, and that the 3D positions of the points were accurate. We did not perform any point-set registration as part of the workflow, as the method focuses on finding errors of classification in the point cloud.
Before creating the voxels, the tiles were cropped to the size of the generation with the smallest tiling grid. Here, the v.1 tiles were cropped from 1000 x 1000 m to the dimensions of v.2, i.e 500 x 500 m. A v.2 tile corresponds exactly to one quarter of a v.1 tile, so no additional operations were needed.
"},{"location":"PROJ-QALIDAR/#323-voxelization-process","title":"3.2.3 Voxelization process","text":"In the interest of keeping our solution free of charge for users, and to have greater flexibility in the voxelization process, we chose to develop our own solution, rather than use pre-existing tools.
We used the python libraries laspy and pandas. Given a point cloud provided as a LAS or LAZ file, it returned a table with one row per voxel. The voxels were identified by their center coordinates. In addition, the columns provided the number of points for each class contained within the voxel for each generation. Figure 7 shows a visual representation of the voxelization process for one voxel element.
Figure 7: Summarized process for the creation of one voxel in the v.1 (left) and the v.2 (right) generation from the point cloud to the class distribution as a vector. The class distribution is saved for both generations in a table."},{"location":"PROJ-QALIDAR/#33-determination-of-the-voxel-size","title":"3.3 Determination of the voxel size","text":"The voxels must be sized to efficiently locate area of changes without being sensitive to negligible local variations in the point location and density.
We assumed that although a point cloud changes between two generations, the vast majority of its features would remain consistent on a tile of 500 x 500 m. Following this hypothesis, we evaluated how the voxel size influenced the proportion of voxels not filled with the same classes in two separate generations. We called this situation a \"categorical change\". A visual example is given in Figure 8.
Figure 8: Example of a situation with no categorical change (left) and a second situation with a categorical change (right)When the proportion of voxels presenting a categorical change was calculated for different voxel sizes, it rose drastically around a size of 1.5 m, as visible on Figure 9. We postulated that this is the minimum voxel size which allows observing changes without interference from the noisy nature of point clouds.
Figure 9: Proportion of categorical changes for different voxel size in tile B. The horizontal axis is the voxel size. The vertical axis represents the percentage of voxels experiencing a categorical change between the two generations.\u00a0For the rest of the development process, square voxels of 1.5 m are used. However, the voxel width and height can be modified in the scripts if desired.
"},{"location":"PROJ-QALIDAR/#34-criticality-tree","title":"3.4 Criticality tree","text":"The algorithm must not only detect changes, but also assign them a criticality level. We translated the domain expert's criteria into a decision tree, which sorts the voxels into different criticality levels for control. The decision tree went through several iterations, in a dialogue with the domain expert. Figure 10 provides the final architecture of the tree.
Figure 10: Decision tree used to classify the voxels based on the different types of changes and their criticality.The decision tree classifies the voxels into three buckets of criticality level: \"non-problematic\", \"grey zone\" and \"problematic\".
Let us note that although only three final buckets were output, we preserved an individual number for each outgoing branch of the criticality tree, as they provided a more detailed information. Those numbers are referred as \"criticality numbers\".
The decisions of the criticality tree are divided into two major categories. Some are based on qualitative criteria which is by definition true or false. Others, however, depend on some threshold which had to be defined.
"},{"location":"PROJ-QALIDAR/#341-qualitative-decisions","title":"3.4.1 Qualitative decisions","text":"Decision A: Is there only one class in both generations and is it the same? Every voxel that contains a single, common class in both generations is automatically identified as non-problematic.
Decision B: Is noise absent from the new generation? Any noise presence is possibly an object wrongly classified and necessitates a control. Any voxel containing noise in the new generation is directed to the \"problematic\" bucket.
Decision G: Is the change a case of complete appearance or disappearance of a voxel? If the voxel is only present in one generation, it means that the voxel is part of a new or disappearing geometry that might or not be problematic, depending on decisions H and J. If the voxel is present in both generations, we are facing a change in the class distribution due to new classes in it. The decision I will compare the voxel with its neighbors to determine if it is problematic.
Decision J: Is it the specific case of building facade or vegetation? Due to the higher point density in the v.2 point cloud, point proportions may change in voxels compared to the v.1 point cloud, even though the geometry already existed. We particularly noticed this on building facades and under dense trees, as shown in the example given in Figure 11. To avoid classifying these detections as problematic, a voxel with an appearance of points in the class building or vegetation is not problematic if it is located under a non-problematic voxel containing points of the same class.
Figure 11: Example of non-problematic appearance of points in the v.2 point cloud due to the difference of density between the two generations."},{"location":"PROJ-QALIDAR/#342-threshold-based-decisions","title":"3.4.2 Threshold based decisions","text":"The various thresholds were set iteratively by visualization of the results on tile A and visual comparison with the expert's annotations described in section 2.2. Once the global result seemed satisfying, we assessed the criticality label for a subset of voxels. Eight voxels were selected randomly for each criticality number. Given that there are 13 possible outcomes, 104 voxels were evaluated. A first evaluation was performed on tile A without the input of the domain expert. It allowed for the hyperparameter tuning. A second evaluation was conducted by the domain expert on tile C and he declared that no further adjustment of the threshold was necessary.
Cosine similarity
The decision C, D and E require to evaluate the similarity between the distribution of the previous and the new classes occupying a voxel. We thus sought a metric adapted to compare the two distributions. Many ways exist to measure the similarity between two distributions9. We settled for the well-known cosine similarity. Given two vectors X and Y, it is defined as: \\(\\text{Cosine Similarity}(\\mathbf{X}, \\mathbf{Y}) = \\frac{\\mathbf{X} \\cdot \\mathbf{Y}}{\\|\\mathbf{X}\\| \\|\\mathbf{Y}\\|}\\)
This metric measures the angle between two vectors. The magnitude of the vectors holds no influence on the results. Therefore, this measure is unaffected by the density of the point clouds. The more the two vectors point in the same direction, the closer the metric is to one. Vectors having null cosine similarity correspond to voxels where none of the classes present in the previous generation match those from the new one.
One limitation of the cosine similarity is its requirement for both vectors to be non-zero. For cases where a voxel is only occupied in a single generation, an arbitrary cosine similarity of -1 is set.
Decision C: Does the proportion of class stay similar and the classes don't change? We assessed whether the proportion of class stays similar between generations. A threshold of 0.8 is set on the cosine similarity.
flowchart LR\n A[Prev. gen.<br> 0 | 0 | 4 | 2 | 0 | 0 | 7] --> E(Cosine similarity)\n B[New gen.<br> 25 | 0 | 20 | 0 | 0 | 5 | 40] --> E\n\n E-->F[0.84]
Graph 1: Example of vectors and their resulting cosine similarity when considering all the classes. Decision D: Do the previous classes keep the same proportions? We computed the cosine similarity based only on the vector elements which are non-empty in the previous generation. A threshold of 0.8 is set as the limit.
Let us note that voxels present only in one of the two generations are here artificially considered to retain the same class proportion. They are treated further down the decision tree by the decision G.
flowchart LR\n A[Prev. gen.<br> 0 | 0 | 4 | 2 | 0 | 0 | 7] --> C[4 | 2 | 7]\n B[New gen.<br> 25 | 0 | 20 | 0 | 0 | 5 | 40] --> D[20 | 0 | 40]\n C-->E(Cosine similarity)\n D-->E\n E-->F[0.97]
Graph 2: Example of vectors and their resulting cosine similarity when considering only the classes present in the reference generation v.1. Decision E: Is the change due to the class 1? We assessed whether the change is due to the influence of the unclassified points (class 1). To do so we computed the cosine similarity with all vector elements except the first one, corresponding to unclassified points. If the cosine similarity was low when considering all vector elements (decision C), but high when discarding the quantity of unclassified points, this indicates that the change is due to class 1. A threshold of 0.8 is set as the limit.
flowchart LR\n A[Prev. gen.<br> 0 | 0 | 4 | 2 | 0 | 0 | 7] --> C[0 | 4 | 2 | 0 | 0 | 7]\n B[New gen.<br> 25 | 0 | 20 | 0 | 0 | 5 | 40] --> D[0 | 20 | 0 | 0 | 5 | 40]\n C-->E(Cosine similarity)\n D-->E\n E-->F[0.96]
Graph 3: Example of vectors and their resulting cosine similarity when excluding the first class. Decision F: Is class 1 presence low in the new generation? In the case where the change is due to the unclassified points, we wished to evaluate whether such points are in large quantity in the new voxel occupancy. Because the number of points is dependent on the density of the new point cloud, we cannot simply set a threshold on the number of points. To solve this issue, we normalize the number of unclassified points in the voxel, \\(n_{unclassified}\\). Let \\(N_{reference}\\) and \\(N_{new}\\) be the total number of points in the v.1 and v.2 point cloud respectively. The normalized number of unclassified points \\(\\tilde{n}_{unclassified}\\) is defined as:
\\[\\tilde{n}_{unclassified}= n_{unclassified}\\cdot \\frac{N_{reference}}{N_{new}} \\]An arbitrary threshold of 1 is set as the limit on \\(\\tilde{n}_{unclassified}\\). Under this threshold, the presence of the class 1 was considered as low in the new generation.
Decision H & I: Do the neighbor voxels share the same characteristics? For both decisions we searched the neighbors of a given voxel to evaluate if they share the same characteristics. To make the search of these neighbors efficient, we built a KD-Tree from the location of the voxels. For each voxel, it then assessed whether the neighbors shared the same classes or not. Each class of the evaluated voxel must be present in at least one neighbor. The radius of search influences the number of voxels used for comparison. Let \\(x\\) be the voxel edge length, using search radii of \\(x\\), \\(\\sqrt{2}x\\) or \\(\\sqrt{3}x\\) leads to considering 6, 18 or 26 neighbors respectively, as displayed in Figure\u00a01210. Note that the radius is not limited to these options and search among further adjacent voxels is possible.
Figure 12: Possible connectivity types to define the neighborhood of a voxel from the website brainvisa.info.In the case where the voxel is only present in one generation, i.e for the decision H, the neighbors considered are the following:
In the case where the class distribution changes due to new classes present in the voxel compared to v.1, i.e for the decision I, the class distribution of the voxel in v.2 will be compared to its neighbors in v.2. Therefore, if the entire area share the same classes, the voxel is classified the grey zone, but if the change is isolated, it goes into the \"problematic\" bucket.
"},{"location":"PROJ-QALIDAR/#343-description-of-the-grey-zone-and-problematic-buckets","title":"3.4.3 Description of the \"grey zone\" and \"problematic\" buckets","text":"We provide a brief description of the output for each branch of the decision tree ending in the grey zone and problematic buckets. They are identified by their criticality number. Let us note that the criticality numbers are not a ranking of the voxel priority level for a control, but identifiers for the different types of change.
Grey zone:
Problematic:
Voxels ending in the \"grey zone\" and \"problematic\" buckets were often isolated. This creates a noisy map, making its usage for quality control challenging. To provide a less granular change map, we chose to cluster the change detections, highlighting only areas with numerous problematic detections in close proximity. In practice, we leveraged the DBSCAN algorithm. Then, the smallest clusters were filtered out and their cluster number is set to one. They are not treated as clusters in the rest of the processing. The hyperparameters for the clustering process are shown in Table 3. They were determined by the expert through the visualization of the results. The epsilon parameter was chosen to correspond to a neighborhood of 18 voxels, as illustrated on Figure 12.
Table 3: Hyperparameters used for the DBSCAN clustering and the filtering of the clusters.
Hyperparameter Description Value Epsilon radius of the neighborhood for a given voxel in meters 2.13 Minimum number of samples minimum number of problematic voxels in the epsilon neighborhood for a voxel to be a core point for the cluster 5 Minimum cluster size minimum number of voxels needed inside a cluster for it to be preserved 10The clusters should be controlled in priority; they form the primary control. Voxels outside a cluster go into the secondary control as illustrated in the schema of the workflow on Figure 13. Their cluster number of those voxels is set to zero.
Figure 13: Schema of the additional step of clustering for the problematic voxels and assignment of the voxels falling inside a cluster to the primary control.All problematic voxels went through this DBSCAN algorithm at once, without distinction based on the criticality number. That way, detections related to the same geometry were grouped together even if its voxels are not all labeled with the same criticality number. In the end, the label which was the most present inside a cluster is attributed to it.
"},{"location":"PROJ-QALIDAR/#36-visualization-of-detections","title":"3.6 Visualization of detections","text":"Several possibilities were considered for the visualization of the results, as shown in Figure 14.
Figure 14: Comparison of a voxel mesh in green (a), a LAS point cloud (b), and a shapefile with the most represented criticality number of the cluster (c) for the visualization of the detections. The cluster in the point cloud and shapefile are colored in orange and blue depending on their criticality number. The v.2 point cloud is visible as background.Table 4 shows the advantages and drawbacks of the different methods. In the end, the domain expert required that we provide the results as a shapefile.
Table 4: Comparison of the visualization methods Voxel mesh LAS point cloud shapefile 2D representation of the space occupied by voxels Yes No Yes 3D representation of the space occupied by voxels Yes No No Visualization of the voxel height Yes Yes No Numerical attributes No Yes Yes Textual attributes No No Yes
"},{"location":"PROJ-QALIDAR/#4-results","title":"4. Results","text":""},{"location":"PROJ-QALIDAR/#41-granularity-of-results","title":"4.1 Granularity of results","text":"Figure 15 shows the voxels produced by the algorithm for the different priority levels. From the base with all the created voxels, each level reduces the number of considered voxels. At the last level, the clustering effectively reduces the dispersion of voxels, keeping only clearly defined groups.
Figure 15: Voxels by their center coordinates in a point cloud for the different levels of priority for tile C. Going from the clustered detections at the top to all the voxels at the bottom.Table 5 gives the number and percentage of voxels retained at each level. For tile C, the voxels falling into the \"grey zone\" and the \"problematic\" buckets represent 14.86% of all voxels. If only the problematic ones are retained, this percentage is reduced to 4.77%. Finally, after removing the voxels which do not belong to a cluster, only 2.30% remains.
Meanwhile, the covered part of the tile decreases from 35.80% with all the problematic voxels and the ones of the grey zone, to 11.89% with only the problematic voxels, and to 4.53% with only the clustered voxels. In the end, an expert controlling the classification would have to check in priority 5% of the total tile area.
Table 5: Number of voxels preserved in each urgency level on tile C. Urgency level Number of voxels Percentage of all voxels Covered tile area Clustered detections 8'756 2.30 % 4.53 % Problematic detections 18'146 4.77 % 11.89 % Problematic + grey zone detections 56'363 14.83 % 35.80 % All voxels 380'165 100 % 100 %
The percentage of voxels and covered tile area decrease consequently between each granularity level. The higher the granularity is, the larger the difference in the voxel number and the covered area between two levels. The covered tile area decreases more slowly than the percentage of voxels retained.
"},{"location":"PROJ-QALIDAR/#42-distribution-of-the-decision-tree-outcomes","title":"4.2 Distribution of the decision tree outcomes","text":""},{"location":"PROJ-QALIDAR/#421-distribution-of-the-points-in-the-criticality-numbers-and-buckets","title":"4.2.1 Distribution of the points in the criticality numbers and buckets","text":"Figure 16 shows the percentage of points from the new point cloud coming out of each branch of the decision tree. The distribution between criticality buckets is given at the top of the figure. The vast majority of points belongs to non-problematic voxels, with around 80% of them being from the first tree branch. This corresponds to the case where only one class is present in both generations. We notice that 10% of the points end up in voxels assigned to the grey zone. It is mostly due to the output of the 8th tree branch. For this specific tile, 1.81% of points from the new point cloud end up in problematic voxels. Let us note that no point ends up in voxels with the 4th and 9th criticality number. This is because those correspond to case of geometry disappearances.
Figure 16: Relative distribution of the points from the new point cloud depending on the criticality number and bucket of their voxel. Results for tile A.Figure 17 shows the same plot, but with the v.1 point cloud for tile C. In that case the percentage of non-problematic points is smaller than for tile A, with more points falling in the \"grey zone\" and \"problematic\" buckets, but the overall trend stays similar. The only changes over 1% are for criticality numbers 1 (-6.99 points), 8 (+5.08 points), and 12 (+1.44 points). Fewer voxels present one same class across generations, marked with the criticality number 1. More voxels present a change in the distribution. This change can be non-problematic if due to the presence of extra classes in the voxel and reflected by the neighboring voxels (criticality number 8). It is problematic if there is a drastic change in the distribution of all classes in the voxel (criticality number 12).
Figure 17: Relative distribution of the points from the new point cloud depending on the criticality number and bucket of their voxel. Results for tile C."},{"location":"PROJ-QALIDAR/#422-distribution-of-the-criticality-numbers-in-the-clusters","title":"4.2.2 Distribution of the criticality numbers in the clusters","text":"Figure\u00a018 displays a sample of clusters as an example. These are shown as a shapefile, which is the visualization format required by the domain expert. One cluster (#1) indicates the disappearance of a tree. Another cluster (#2) designates an appearance. Upon closer examination, the voxels contributing to the cluster comprise different types: \"appearance\" and \"class change\". The most present label is assigned to the cluster. Finally, two zones with differences in classification are highlighted: one (#3) for a building structure going from class unclassified to building, and the other (#4) for a shed going from unclassified to vegetation.
Figure 18: Example of resulting clusters with the corresponding point cloud for the reference generation (v.1) and the uncontrolled generations (v.2).The 8'756 problematic voxels for the primary control are grouped in 263 individual clusters. The repartition of clusters and voxels among the criticality numbers is given in Table 6. Among the clusters, 67% of them contain mostly voxels with the criticality number 12, meaning that there is a major change in the class distribution for the delineated area. Then, 13% and 12% of the voxels are dominated by a geometry appearance and disappearance respectively. Only 7% of the clusters contain in majority an occurrence of the noise class. It is normal that no cluster is tagged with the criticality number 11, because it is assigned by definition to isolated class changes.
The criticality number 12 is the most present criticality number among the clustered voxels. However, its percentage decrease of 18 points at the voxel scale compared to the cluster scale. On the other hand, the presence of the criticality number 8 increase by 15 points at the voxel scale compared to the cluster scale. The other percentages remain stable.
Table 6: Number of clusters and number of voxels in the primary control for each criticality number on tile C. Criticality number and its description Distribution in the clusters Distribution in the voxels in the primary control 9. Appearance of a geometry 13.31 % 27.91 % 10. Disappearance of a geometry 12.17 % 16.47 % 11. Isolated minor class change 0 % 0.13 % 12. Major change in the class distribution 67.30 % 49.63 % 13. Noise 7.22 % 5.86 %
"},{"location":"PROJ-QALIDAR/#423-distribution-of-the-lidar-classes-in-the-criticality-buckets","title":"4.2.3 Distribution of the LiDAR classes in the criticality buckets","text":"Figure 19 shows the distribution of the LiDAR classes in the criticality buckets. We see that for the three main classes of this tile, ground, vegetation and building, the vast majority of points fall in non-problematic voxels, with the ground class having a higher proportion of points falling in \"grey zone\" voxels than the others. Unclassified points fall predominantly in the grey zone voxels. The grey zone gets a lot of voxels due to the decision C of the criticality tree, which requires that the voxels share the same classes in both generations. It is difficult for voxels to end up in the \"non-problematic\" bucket, if they did not pass the decision C. All points classified as noise end up in the problematic bucket, as required by the domain expert. Finally, points from the bridge class fall in \"problematic\" and \"grey zone\" voxels. This class is, however, in very low quantity in the new point cloud (only 0.014% of all points) and is thus not statistically significant.
Figure 19: Distribution of the points among criticality bucket relative to their LiDAR class, as well as the percentage represented by the class in the point cloud. Let us note the results are for the v.2 point cloud on tile C and that no point was classified as water for this tile.
"},{"location":"PROJ-QALIDAR/#43-assessment-of-a-subset-of-detections","title":"4.3 Assessment of a subset of detections","text":"As mentioned in section 3.4.2, 104 voxels were evaluated on tile C, i.e 13 per criticality number. Per the expert review, all the non-problematic and \"grey zone\" voxels were deemed rightfully attributed. However out of the 40 selected problematic voxels, nine detections did not justify their status. Three of those were for cases of appearance and disappearance of geometry. Out of those, two were due to an isolated change of density in the area of the voxel, a situation which can occur in vegetated areas. The other six came from the tree branch 11, which detects small changes that are not present in the neighboring voxels. After discussion with the domain expert, it was agreed that such changes still needed to be classified as problematic, but due to their isolated nature, would not be checked as a priority. After the implementation of the clustering via the DBSCAN algorithm, these voxels of criticality number 11 and isolated changes in vegetation are filtered out.
"},{"location":"PROJ-QALIDAR/#5-discussion","title":"5. Discussion","text":""},{"location":"PROJ-QALIDAR/#51-interpretation-of-the-results","title":"5.1 Interpretation of the results","text":"In Section 4.1, the voxel count for the different granularity levels highlights the number of detections that would have to be controlled at each level. For the clustered detections, which would be the principal mapping to use, only 2.30% of all evaluated voxels are to be controlled. It represents 4.53% of the tile area. The domain expert confirmed that the final amount of voxel to control is reasonable and would allow saving resources compared to the actual situation.
For each granularity level, the percentage of the tile area covered is 2 to 3 times higher than the percentage of voxels considered. It means that between each granularity level, a part of the eliminated voxel does not impact the covered tile area. The reason must be that the area is a 2D measurement while the voxels are positioned in the 3D space and can cover the same area by belonging to the same grid column. The voxels of a same column must be frequently assigned to different criticality buckets. Therefore, the covered tiles area decreases more slowly than the percentage of voxels considered.
From the results obtained in Section 4.2.1, we see that the vast majority of points from the new point cloud end up in non-problematic voxels. The number of points falling in problematic voxels is limited, which is desired as a high quantity of problematic detections would not help in making the quality assessment faster. We notice, however, a relatively large number of points falling in voxels classified as \"grey zone\", due to the 8th tree branch. These voxels typically exhibit high similarity in their distribution between the v.1 and the v.2, but do not retain precisely the same classes. The decision C will therefore exclude them from a quick assignment to the non-problematic voxels. Such a situation occurs, for example, if a few points of vegetation appear in a zone previously filled only with ground points. This situation generally isn't a classification error and reflects the reality of the terrain. However if it were to be a widespread problem of classification, it needs to be raised to the controller. This is why we preserve those rules which lead to a lot of \"grey zone\" detections instead of redirecting them to \"non-problematic\".
In Section 4.2.1, results are presented for tile A and C on Figures 16 and 17 respectively. Tile A has fewer voxels in the \"problematic\" and \"grey zone\" buckets than tile C. It is in accordance with our expectation than urban zones would have more detected changes, as they evolve faster than other areas and have some complex landscape to classify.
The numbers of Section 4.2.2 show that the majority of clusters and the majority of the voxels in clusters have the criticality number 12, indicating a major change in the class distribution. It is a satisfying point as the variations of the classification across generations were the main focus of this work. Let us not, however, that they dominate 67% of the clusters, but only 50% of the voxels in clusters are assigned to this criticality number. On the other hand, the criticality number 9, standing for the appearance of a geometry, represents 28% of the voxels present in clusters while it was only 13% of the clusters. Two possibilities can explain that: the clusters with a geometry appearance are larger than the ones with a major change in the class distribution, or this type of voxel is more present in clusters that were assigned to another criticality number.
Results of Section 4.2.3 show that the points of the three main LiDAR classes are assigned predominantly to the \"non-problematic\" bucket, which makes the map usable. For the unclassified points, the majority are deemed in \"grey zone\". Because these points regroup, among other things, mobile and temporary objects, it is not desirable that every such appearance or disappearance ends up in the primary control. However, geometries which transform from a given class to unclassified, or the opposite, are problematic. That situation happens quite often, as indicated by the 17.43% of points ending in this level. For the bridge class, none of the points fall in \"non-problematic\" because, in this specific tile, a small zone was classified as bridge in v.2 but no point of that class is present in v.1.
Finally, from the evaluation by the domain expert described in Section 4.3, we understand that the voxels are correctly classified into their criticality level, except for some minor cases. Some of the problematic voxels were not rightfully attributed. Even so, six out of nine of those voxels had the criticality number 11, whose detections are removed when applying the clustering. This sample evaluation instills confidence that the level of urgency attributed to the voxels corresponds well to the situation contained within, making it relevant for usage in a control of the classification.
"},{"location":"PROJ-QALIDAR/#52-discussion-of-the-results","title":"5.2 Discussion of the results","text":"As seen in the previous section, the proposed method generates a somewhat reasonable amount of problematic detections, accompanied by a considerable volume of instances falling within the \"grey zone\". The map for this intermediate level may not be suitable for initial quality control but can offer a more detailed delineation for precise assessment. The map of non-problematic voxels could also be used to highlight the areas requiring no quality assessment given the absence of changes in the distribution.
The proportion of points identified as problematic is very low (1-4%). However, their visual representation can be overwhelming for the controller, given the high number of scattered detections. To address this challenge, we introduced the clustering and filtering of detections. Though this allows for visually more understandable areas, it naturally sacrifices the exhaustiveness of the detections. For example, low walls and hedges were frequently classified differently between the v.1 and the v.2. Due to the clustering favoring areas with grouped elements, such elongated and thin objects can be cut out of the mapping. Possible future works could study other filtering methods to attenuate this issue.
Currently, no full assessment of the detections on a tile was performed. Therefore, it is hard to estimate the quantity of detections that would be missing in clusters or in the \"problematic\" bucket.
For the precision of the results, the evaluation of the small subset of detections by the domain experts indicates that they are relevant and possibly useful as a tool for quality assessment.
While the developed method allows for finding changes between two point clouds, it has some limitations. First, it only works if the classes from v.2 can be mapped to overarching classes in v.1. This is not always the case, due to the lack of consensus between LiDAR providers. Another limitation comes from the voxel size. Indeed, by employing fixed volumes of 3.375 m3 to detect the changes, points not contributing to the actual change will also be included in the highlighted areas. A possible improvement would be to refine the detection area after the clustering. Another thing to consider is that the method work on a single tile at a time, without consideration of the tiles around it. This can potentially affect the clustering step as voxels on the border have fewer neighbors. To ensure that this does not affect the results a buffer could be taken around the tile. This buffer could also ensure that the total tile size is a multiple of the voxel size. The method is currently limited to the use of a single reference generation. However, with the frequent renewal of LiDAR acquisitions, it should soon be possible to compare several generations with a new acquisition. The decision tree could then be adapted to take into account the stability of the classification in the voxels and prioritize change in stable areas over areas with high variation, such as forests.
"},{"location":"PROJ-QALIDAR/#6-conclusion","title":"6. Conclusion","text":"Quality assessment of LiDAR classification is a demanding task, requiring a considerable amount of work by an operator. With the proposed method, controllers can leverage a previous acquisition to highlight changes in the new one. The detections are divided in different levels of urgency, allowing for control of various granularity levels.
The limited number of voxels preserved in the map of primary changes encourages the prospect of its usefulness in a quality assessment process. The positive review of a sample of voxels by the domain expert further confirms the method quality.
A possible step to make the detections more suited to experts' specific needs could be to review a broader sample of voxel by criticality buckets to optimize the thresholds of the decision tree.
In the planned future, the cluster produced by the algorithm will be tested on tiles in another region and with other LiDAR data. If the results are deemed satisfying, the method will be tested in swisstopo's workflow when the production for the next generation of swissSURFACE3D begins. The test in the workflow should enable a control of the detection precision and exhaustively, as well as an estimation of the time spared for an operator by working with the developed algorithm.
By the domain expert's evaluation, out of all the operations applied during a quality assessment, the developed method touches on operations which make up 52.7% of the control time. These operations could be made faster by having zones of interest already precomputed.
"},{"location":"PROJ-QALIDAR/#7-acknowledgements","title":"7. Acknowledgements","text":"This project was made possible thanks to the swisstopo's LiDAR team that submitted this task to the STDL and provided regular feedback. Special thanks are extended to Florian Gandor for his expertise and his meticulous review of the method and results. In addition, we are very appreciative of the active participation of Matthew Parkan and Mayeul Gaillet to our meetings.
"},{"location":"PROJ-QALIDAR/#8-bibliography","title":"8. Bibliography","text":"Xin Wang, HuaZhi Pan, Kai Guo, Xinli Yang, and Sheng Luo. The evolution of LiDAR and its application in high precision measurement. IOP Conference Series: Earth and Environmental Science, 502(1):012008, May 2020. URL: https://iopscience.iop.org/article/10.1088/1755-1315/502/1/012008 (visited on 2024-02-20), doi:10.1088/1755-1315/502/1/012008.\u00a0\u21a9
swissSURFACE3D. URL: https://www.swisstopo.admin.ch/fr/modele-altimetrique-swisssurface3d#technische_details (visited on 2024-01-16).\u00a0\u21a9
Uwe Stilla and Yusheng Xu. Change detection of urban objects using 3D point clouds: A review. ISPRS Journal of Photogrammetry and Remote Sensing, 197:228\u2013255, March 2023. URL: https://linkinghub.elsevier.com/retrieve/pii/S0924271623000163 (visited on 2023-10-05), doi:10.1016/j.isprsjprs.2023.01.010.\u00a0\u21a9
Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, and Mohammed Bennamoun. Deep Learning for 3D Point Clouds: A Survey. June 2020. arXiv:1912.12033 [cs, eess]. URL: http://arxiv.org/abs/1912.12033 (visited on 2024-01-18).\u00a0\u21a9
Harith Aljumaily, Debra F. Laefer, Dolores Cuadra, and Manuel Velasco. Voxel Change: Big Data\u2013Based Change Detection for Aerial Urban LiDAR of Unequal Densities. Journal of Surveying Engineering, 147(4):04021023, November 2021. Publisher: American Society of Civil Engineers. URL: https://ascelibrary.org/doi/10.1061/%28ASCE%29SU.1943-5428.0000356 (visited on 2023-11-20), doi:10.1061/(ASCE)SU.1943-5428.0000356.\u00a0\u21a9
J. Gehrung, M. Hebel, M. Arens, and U. Stilla. A VOXEL-BASED METADATA STRUCTURE FOR CHANGE DETECTION IN POINT CLOUDS OF LARGE-SCALE URBAN AREAS. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, IV-2:97\u2013104, May 2018. Conference Name: ISPRS TC II Mid-term Symposium \\textless q\\textgreater Towards Photogrammetry 2020\\textless /q\\textgreater (Volume IV-2) - 4–7 June 2018, Riva del Garda, Italy Publisher: Copernicus GmbH. URL: https://isprs-annals.copernicus.org/articles/IV-2/97/2018/isprs-annals-IV-2-97-2018.html (visited on 2024-01-18), doi:10.5194/isprs-annals-IV-2-97-2018.\u00a0\u21a9
Yusheng Xu, Xiaohua Tong, and Uwe Stilla. Voxel-based representation of 3D point clouds: Methods, applications, and its potential use in the construction industry. Automation in Construction, 126:103675, June 2021. URL: https://www.sciencedirect.com/science/article/pii/S0926580521001266 (visited on 2024-01-18), doi:10.1016/j.autcon.2021.103675.\u00a0\u21a9
Zhenwei Shi, Zhizhong Kang, Yi Lin, Yu Liu, and Wei Chen. Automatic Recognition of Pole-Like Objects from Mobile Laser Scanning Point Clouds. Remote Sensing, 10(12):1891, 2018. Number: 12, Publisher: Multidisciplinary Digital Publishing Institute. URL: https://www.mdpi.com/2072-4292/10/12/1891 (visited on 2024-01-19), doi:10.3390/rs10121891.\u00a0\u21a9
Sung-Hyuk Cha. Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. INTERNATIONAL JOURNAL OF MATHEMATICAL MODELS AND METHODS IN APPLIED SCIENCES, 4(1):300\u2013307, 2007. URL: https://pdodds.w3.uvm.edu/research/papers/others/everything/cha2007a.pdf (visited on 2024-03-14).\u00a0\u21a9
[Volume of Labels] Compute Clique Statistics. URL: https://brainvisa.info/axon/fr/processes/AtlasComputeCliqueFromLabels.html (visited on 2024-02-23).\u00a0\u21a9
Nils Hamel (UNIGE) - Huriel Reichel (swisstopo)
Proposed by the Federal Statistical Office - TASK-REGBL December 2020 to February 2021 - Published on March 2, 2021
Abstract: The Swiss Federal Statistical Office is in charge of the national Register of Buildings and Dwellings (RBD) which keeps track of every existing building in Switzerland. Currently, the register is being completed with buildings in addition to regular dwellings to offer a reliable and official source of information. The completion of the register introduced issues due to missing information and their difficulty to be collected. The construction year of the buildings is one missing information for a large amount of register entries. The Statistical Office mandated the STDL to investigate on the possibility to use the Swiss National Maps to extract this missing information using an automated process. A research was conducted in this direction with the development of a proof-of-concept and a reliable methodology to assess the obtained results.
"},{"location":"PROJ-REGBL/#introduction","title":"Introduction","text":"The Swiss Federal Statistical Office [1] is responsible of maintaining the Federal Register of Buildings and Dwellings (RBD) in which a collection of information about buildings and homes are stored. Currently, a completion operation of the register is being conducted to include to it any type of construction on the Swiss territory.
Such completion operation comes with many challenges including the gathering of the information related to the construction being currently integrated to the register. In this set of information are the construction years of the buildings. Such information is important to efficiently characterise each Swiss building and to allow the Statistical Office to provide a reliable register to all actors relying on it.
The construction year of buildings turns out to be complicated to gather, as adding new buildings to the register already impose an important workload even for the simple information. In addition, in many cases, the construction year of the building is missing or can not be easily collected to update the register.
The Statistical Office mandated the STDL to perform researches on the possibility to automatically gather the construction year by analysing the swisstopo [3] National Maps [4]. Indeed, the Swiss national maps are known for their excellency, their availability on any geographical area, and for their temporal cover. The national maps are made with a rigorous and well controlled methodology from the 1950s and therefore they can be used as a reliable source of information to determine the buildings' construction year.
The STDL was then responsible for performing the researches and developing a proof-of-concept to provide all the information needed to the Statistical Office for them to take the right decision on considering national maps as a reliable way of assigning a construction year for the buildings lacking information.
"},{"location":"PROJ-REGBL/#research-project-specifications","title":"Research Project Specifications","text":"Extracting the construction date out of the national maps is a real challenge, as the national maps are a heavy dataset, they are not easy to be considered as a whole. In addition, the Statistical Office needs the demonstration that it can be done in a reliable way and within a reasonable amount of time to limit the cost of such process. They are also subjected to strict tolerances on the efficiency of the construction years extraction through an automated process. The goal of at least 80% of overall success was then provided as a constraint to the STDL.
As a result, the research specifications for the STDL were:
Gathering and understanding the data related to the problem
Developing a proof-of-concept demonstrating the possibility to extract the construction years from the national maps
Assessing the results with a reliable metric to allow demonstrating the quality and reliability of the obtained construction years
In this research project, two datasets were considered: the building register itself and the national maps. As both datasets are heavy and complex, considering them entirely for such a research project would have been too complicated and unnecessary. It was then decided to focus on four areas selected for their representativeness of Swiss landscape:
Basel (BS): Urban area
Bern (BE): Urban and peri-urban area
Biasca (TI): Rural and mountainous
Caslano (TI): Peri-urban and rural
The following images give a geographical illustration of the selected areas through their most recent map:
Illustration of the selected areas: Basel (2015), Bern (2010), Biasca (2012) and Caslano (2009) Data: swisstopoBasel was selected as it was one example of an area on which the building register was already well filled in terms of construction years. The four regions are 6km by 6km squared areas which allows up to twenty thousand buildings to be considered on a single one.
"},{"location":"PROJ-REGBL/#federal-register-of-buildings-and-dwellings","title":"Federal Register of Buildings and Dwellings","text":"The register of buildings is a formal database composed with entries, each of them representing a specific building. Each entry comes with a set of information related to the building they describe. In this project, a sub-set of these informations was considered:
Federal identifier of the building (EGID)
The position of the building, expressed in the EPGS:2056 (GKODE, GKODN)
The building construction year, when available (GBAUJ)
The surface of the building, when available, expressed in square metres (GAREA)
In addition, tests were conducted by considering the position of the entries of each building. In turned out rapidly that they were not useful in this research project as they were missing on a large fraction on the register and only providing a redundant information according to the position of the buildings.
The following table gives a summary of the availability of the construction year in the register according to the selected areas:
Area Buildings Available years Missing fraction Basel 17\u2019088 16\u2019584 3% Bern 21\u2019251 4\u2019499 79% Biasca 3\u2019774 1\u2019346 64% Caslano 5\u2019252 2\u2019452 53%One can see that the amount of missing construction year can be large depending on the considered area.
"},{"location":"PROJ-REGBL/#national-maps","title":"National Maps","text":"On the side of the national maps, the dataset is more complex. In addition to the large number of available maps, variations of them can also be considered. Indeed, maps are made for different purposes and come with variations in their symbology to emphasise elements on which they focus. Moreover, for modern years, sets of vector data can also be considered in parallel to maps. Vector data are interesting as they allow to directly access the desired information, that is the footprint of the building without any processing required. The drawback of the vector data is their temporal coverage which is limited to the last ten to twenty years.
The following images give an illustration of the aspect of the available maps and vector datasets considering the example of the Bern area. Starting with the traditional maps:
Available map variations: KOMB, KGRS and KREL - Data: swisstopoand the more specific and vector ones:
Available map variations: SITU, GEB and DKM25-GEB (vector) - Data: swisstopoIn addition to the number of available variations and data types, they all come with their specific temporal coverage. In the case of this research project, we tried to go back in time as much as possible, simplifying the choice for the older maps. The question still remains for more modern times.
As we are mostly interesting in buildings, the availability of already extracted building layers, that can be either raster or vector data, is highly interesting. But the problem of data selection is complex in our case. Indeed, no matter the choice, on the older times, the only available maps have to be considered. In addition to building footprint access, the question of the continuity of the data as to be considered with care. More than building footprints, we are interested in the continuity of these footprints, in order to be able to safely assume the cycle of life of the tracker buildings.
This consideration led us to discover variation in methodologies depending on the considered set of data. Indeed, buildings are not shaped in the same way on traditional maps than they are in layer focusing on them. It follows that variation of the symbology, so do the shape of the building, appears between traditional maps and building layers (raster and vector). These variations can lead to shocks going from a map to the one preceding it in time. This can break the continuity of the building footprints along time, making them much more difficult to track safely.
This is the reason we choose to focus on the KOMB variation of the maps. These maps are very stable and covers the largest temporal ranges. The methodology was kept very similar along the years, making this dataset much more reliable to work with when the time dimension is considered. Only considering the KOMB variation of the maps also allows to ensure that all source data is treated the same in the processing pipeline, easing the assessment of the results.
In addition, the KOMB maps are dense in information and come with colorimetry of their symbology. This opens the possibility to more easily extract the information we need in this project, that are the building footprints. One exception was made concerning the KOMB maps: in their very latest version, the methodology changed, causing the symbology to be different with the older KOMB maps. In their latest version, texts are much more numerous and tend to cover a large amount of the buildings, making them invisible. For this reason, their latest version was dropped, slightly reducing the temporal coverage on the 2015-2020 period.
Selecting the KOMB variation allowed us to obtain the following temporal coverage for the four selected areas:
Area Oldest map Latest map Mean separation Basel 1955 2015 5.5 Years Bern 1954 2010 5.6 Years Biasca 1970 2012 6.0 Years Caslano 1953 2009 6.2 YearsOne can see that a large portion of the 20th century can be covered using the maps with a very good resolution of around five to six years between the maps.
"},{"location":"PROJ-REGBL/#research-approaches","title":"Research Approaches","text":"In this research project, the main focus was put on the national maps to extract the construction year of buildings as the maps are sources on which we can rely and assess the results. The only drawback of the maps is their limited temporal coverage, as they only start to be available in the 1950s.
This is the reason why another experimental approach was also added to address the cases of building being built before the 1950s. This secondary approach focused on a statistical methodology to verify to which extent it could be possible to assign a construction date even in the case no maps are available.
National Maps: This main approach focuses on the national maps from which the construction year of a building is deduced from a temporal analysis of the maps. Each building is tracked until it disappears or change its shape on a given map allowing to deduce that the construction of the building was made in the gap separating the map and its successor one.
Statistical Analysis: This method is based on the principle of spatial dependence and furthermore on concentric zones of urban development. This is technically an interpolator which deduces construction years based first on different searching radii for difference variances, second by splitting the data in quantiles and, finally, by a gaussian mixture model unsupervised learning technique to gather the final predictions.
The statistical analysis allows then to consider buildings that were detected on all maps, meaning their construction is older than the oldest available map, to assign them an estimation of their construction year, knowing they had to be older than the oldest map.
"},{"location":"PROJ-REGBL/#research-approach-national-maps","title":"Research Approach: National Maps","text":"In order to detect construction year of buildings, we need to be able to track them down on the maps across the temporal coverage. The RBD is providing the reference list of the building, each coming with a federal identifier (EGID) and a position. This position can then be used to track down the building on maps for its appearance or morphological change.
As the maps are already selected, as the research areas, this research approach can be summarised in the following way:
Translating maps into binary images containing only building
Extracting the RBD buildings related to the analysed area
Detection procedure of the buildings on the maps
Detection of the morphological variation of the buildings
Assessment of the obtained results
The four first points are related to the development of the proof-of-concept. The last one concern a very sensitive and complicated question relative to the considered problem: how to analyse and assess the obtained results. This question was to most difficult question in this research, and finding a clear and reliable answer is mandatory before to develop anything. For this reason, it is considered in the first place.
"},{"location":"PROJ-REGBL/#reliability-of-the-data","title":"Reliability of the Data","text":"Assessing the results is essentially having a strong reference allowing to compare both in order to obtain a reliable characterisation of the success rate in the deduction of the construction years. This question leads to the discovery that this problem is much more complex that and can appear in the first place.
Indeed, we were warned by the Statistical Office that the RBD, considering the construction years it already gives, can be unreliable on some of its portions. This can be explained by the fact that collecting such information is a long and complicated administrative process. As an example, the following image gives an illustration of a building tracked on each of the available selected maps:
Temporal track of a selected buildingOn this illustration, one can see two things: the RBD announce a construction year in 1985; the maps are clearly indicating something different, locating its construction year between 1963 and 1969. So both datasets are contradicting each other. In order to solve the contradiction, we manually searched for historical aerial images. The following images illustrate what was found:
Aerial view of the building situation: 1963, 1967 and 1987 - Data: swisstopoOne can clearly see that the maps seem to give the correct answer concerning the construction date of this specific building, the RBD being contradicted by two other sources. This illustrates the fact that the RBD can not be directly considered as a reliable reference to assess the results.
The same question applies for the maps. Even if it is believed that they are highly reliable, one has to be careful with such consideration. Indeed, looking at the following example:
Temporal track of a selected buildingIn this case, the RBD gives 1986 as the construction date of the pointed building. The maps are giving a construction year between 1994 and 2000. Again, the two datasets are contradicting each other. The same procedure was conducted to solve the contradiction:
Aerial view of the building situation: 1970, 1986 and 1988 - Data: swisstopoLooking at the aerial images, it seems that the tracked building was there in 1988. One can see that the map in 1994 continue to represent the four old buildings instead on the new one. It's only in 2000 that the maps are correctly representing the new building. This shows that despite maps are a reliable source of geo-information, they can also be subjected to delay in their symbology.
The maps also come with the problem of the consistency of the building footprint symbology. Looking at the following example:
Temporal track of a selected buildingone can see that the maps seem to indicate a strange evolution of the situation: a first building appears in 1987 and it is destroyed and replaced by a larger one in 1993. Then, this new large building seems to have been destroyed right after its construction to be replaced by a new one in 1998. Considering aerial images of the building situation:
Aerial image view of the building situation: 1981, 1987 and 1993 - Data: swisstopoone can clearly see that a first building was constructed and completed by an extension between 1987 and 1993. This shows an illustration where the symbology of the building footprints can be subjected to variation than can be de-synchronised regarding the true situation.
"},{"location":"PROJ-REGBL/#metric","title":"Metric","text":"In such context, neither the RBD or the national maps can be formally considered as a reference. It follows that we are left without a solution to assess our results, and more problematically, without any metric able to guide the developments of the proof-of-concept in the right direction.
To solve the situation, one hypothesis is made in this research project. Taking into account both the RBD and the national maps, one can observe that both are built using methodologies that are very different. On one hand, the RBD is built out of a complex administrative process, gathering the required information in a step by step process, going from communes to cantons, and finally to the Statistical Office. On the other hand, the national maps are built using regular aerial image campaigns conducted over the whole Switzerland. The process of establishing maps is quite old and can then be considered as well controlled and stable.
Both datasets are then made with methodologies that can be considered as fully independent from each other. This led us to the formulation of our hypothesis:
One should remain careful with this hypothesis, despite it sounds reasonable. It would be very difficult to assess it as requiring to gather complex confirmation data that would have to be independent of the RBD, the national maps and the aerial images (as maps are based on them). This assumption is the only one made in this research project.
Accepting this assumption leads us to the possibility to establish a formal reference that can be used as a metric to assess the results and to guide the development of the proof-of-concept. But such reference has to be made with care, as the problem remains complex. To illustrate this complexity, the following figure gives a set representation of our problem:
Set representation of the RBD completion problemThe two rectangles represent the set of buildings for a considered area. On the left, one can see the building set from the RBD point of view. The grey area shows the building without the information of their construction year. Its complementary set is split in two sub-sets that are the buildings having a construction year that is absolutely correct and absolutely incorrect (the limit between both is subject to a bit of interpretation, as the construction year is not a strong concept). If a reference can be extracted, it should be in the green sub-set. The problem is that we have no way of knowing which building are in which sub-set. So the national maps were considered to define another sub-set: the synchronous sub-set where both RBD and national maps agree.
To build the metric, the RBD sub-set of buildings coming with the information of the construction year is randomly sub-sampled to extract a representative sub-set: the potentials. This sub-set of potentials is then manually analysed to separate the building on which both datasets agree and to reject the other. At the end of the process, the metric sub-set is obtained and should remain representative.
On the right of the set representation is the view of the buildings set through the national maps. One can see that the same sub-set appears but it replaces the construction years by the representation of the building on the maps. The grey part is then representing the building that are not represented on the maps because of their size or because they can be hidden by the symbology for example. The difference is that the maps do not give access to the construction years directly, but they are read from the maps through our developed detector. The detector having a success rate, it cuts the whole set of sub-sets in half, which is exactly what we need for out metric. If the metric sub-set remains representative, the success rate of the detector evaluated on it should generalise to the whole represented buildings.
This set representation demonstrates that the problem is very complex and has to be handled with care. Considering only the six most important sub-set and considering construction year are extracted by the detector from the maps, it means that up to 72 specific case can apply on each building randomly selected.
To perform the manual selection, a random selection of potential buildings was made on the RBD set of buildings coming with a construction year. The following table summarises the selection and manual validation:
Area Potentials Metric Basel 450 EGIDs 209 EGIDs Bern 450 EGIDs 180 EGIDs Biasca 336 EGIDs 209 EGIDs Caslano 450 EGIDs 272 EGIDsThe previous table gives the result of the second manual validation. Indeed, two manual validation sessions were made, with several weeks in-between, to check the validation process and how it evolved with the increase of the view of the problem.
Three main critics can then be addressed to the metric: the first one is that establishing validation criterion is not simple as the number of cases in which buildings can fall is very high. Understanding the problem takes time and requires to see a lot of these cases. It then follows that the second validation session was more stable and rigorous than the first one.
The second critic that can be made on our metric is the selection bias. As the process is made by a human, it is affected by its way of applying the criterion and more specifically on by its severity on their application. Considering the whole potentials sub-set, one can conclude that a few buildings could be rejected and validated depending on the person doing the selection.
The last critic concerns specific cases for which the asynchronous criterion to reject them is weak. Indeed, for some buildings, the situation is very unclear in the way the RBD and the maps give information that can not be understood. This is the case for example when the building is not represented on the map. This can be the position in the RBD or the lack of information on the maps that lead to such an unclear situation. These cases are then rejected, but without being fully sure of the asynchronous aspect regarding the maps and the RBD.
"},{"location":"PROJ-REGBL/#methodology","title":"Methodology","text":"With a reliable metric, results can be assessed and the development of the proof-of-concept can be properly guided. As mentioned above, the proof-of-concept can be split in four major steps that are the processing of the maps, the extraction of the RBD buildings, detection of the building on the maps and detection in morphological changes.
"},{"location":"PROJ-REGBL/#national-maps-processing","title":"National Maps Processing","text":"In order to perform the detection of building on the maps, a reliable methodology is required. Indeed, one could perform the detection directly on the source maps but this would lead to a complicated process. Indeed, maps are mostly the result of the digitisation of paper maps creating a large number of artefacts on the digital images. This would lead to an unreliable way of detecting building as a complicated decision process would have to be implemented each time a RBD position is checked on each map.
A map processing step was then introduced in the first place allowing to translate the color digitised images into reliable binary images on which building detection can be made safely and easily. The goal of this process is then to create a binary version of each map with black pixels indicating the building presence. A method of extracting buildings on maps was then designed.
Considering the following example of a map cropped according to a defined geographical area (Basel):
Example of a considered map: Basel in 2005 and closer view - Data: swisstopo
The first step of the map processing methodology is to correct and standardise the exposure of the digitised maps. Indeed, as maps mostly result of a digitisation process, they are subjected to exposure variation due to the digitisation process. A simple standardisation is then applied.
The next step consists in black pixel extraction. Each pixel of the input map is tested to determine whether or not it can be considered as black using specific thresholds. As the building are drawn in black, extracting black pixels is a first way of separating the buildings from the rest of the symbology. The following result is obtained:
Result of the black extraction process
As one can see on the result of the black extraction process, the buildings are still highly connected to other symbological elements and to each others in some cases. Having the building footprints well separated and well defined is an important point for subsequent processes responsible of construction years deduction. To achieve it, two steps are added. The first one uses a variation of the Conway game of life [5] to implement a morphological operator able to disconnect pixel groups. The following image gives the results of this separation process along with the previous black extraction result on which it is based:
Result of the morphological operator (right) compare to the previous black extraction (left)
As the morphological operator provides the desired result, it also shrinks the footprint of the elements. It allows to eliminate a lot of structures that are not buildings, but it also reduces the footprint of the buildings themselves, which can increase the amount of work to perform by the subsequent processes to properly detect a building. To solve this issue and to obtain building footprints that are as close as possible to the original map, a controlled re-growing step is added. It uses a region threshold and the black extraction result to re-grow the buildings without going any further of their original definition. The following images give a view of the final result along with the original map:
Final result of the building footprints extraction (right) compared to the original map
As the Conway morphological operator is not able to get rid of all the non-building elements, such as large and bold texts, the re-growing final step also thickening them along with the building footprints. Nevertheless, the obtained binary image is able to keep most of the building footprint intact while eliminating most of the other element of the map as illustrated on the following image:
Extracted building footprints, in pink, superimposed on the Bern mapThe obtained binary images are then used for both detection of building and detection of morphological changes as the building are easy to access and to analyse on such representation.
"},{"location":"PROJ-REGBL/#building-extraction-from-rbd","title":"Building Extraction from RBD","text":"In the case of limited geographical areas as in this research project, extracting the relevant buildings from the RBD was straightforward. Indeed, the RBD is a simple DSV database that is very easy to understand and to process. The four areas were packed into a single DSV file and the relevant building were selected through a very simple geographical filtering. Each area being defined by a simple geographical square, selecting the buildings was only a question of checking if their position was in the square or not.
"},{"location":"PROJ-REGBL/#building-detection-process","title":"Building Detection Process","text":"Based on the computed binary images, each area can be temporally covered with maps on which building can be detected. Thanks to the processed maps, this detection is made easily, as it was reduced to detect black pixels in a small area around the position of the building provided in the RBD. For each building in the RBD, its detection on each temporal version of the map is made to create a presence table of the building. Such table is simply a Boolean value indicating whether a building was there or not according to the position provided in the RBD.
The following images give an illustration of the building detection process on a given temporal version of a selected map:
Detection overlay superimposed on its original map (left) and on its binary counterpart (right)
One can see that for each building and for each temporal version of the map, the decision of a building presence can be made. At the end of this process, each building is associated to a list of presence at each year corresponding to an available map.
"},{"location":"PROJ-REGBL/#morphological-change-detection","title":"Morphological Change Detection","text":"Detecting the presence of a building on each temporal version of the map is a first step but is not enough to determine whether or not it is the desired building. Indeed, a building can be replaced by another along the time dimension without creating a discontinuity in the presence timeline. This would lead to misinterpret the presence of building with another one, leading the construction year to be deduced too far in time. This can be illustrated by the following example:
Example of building being replaced by another one without introducing a gap in the presence tableIn case the detection of the presence of the building is not enough to correctly deduce a construction year, a morphological criterion is added. Many different methodologies have been tried in this project, going from signature to various quantities deduce out of the footprint of the building. The most simple and most reliable way was to focus on the pixel count of the building footprint, which corresponds to its surface in geographical terms.
A morphological change is considered as the surface of the building footprint changes up to a given threshold along the building presence timeline. In such a case, the presence timeline is broken at the position of the morphological change, interpreting it in the same way as a formal appearance of a building.
Introducing such criteria allowed to significantly improve our results, especially in the case of urban centers. Indeed, in modern cities, large number of new buildings were built just after a previous building was being destroyed due to the lack of spaces left for new constructions.
"},{"location":"PROJ-REGBL/#results","title":"Results","text":"The developed proof-of-concept is applied on the four selected areas to deduce construction year for each building appearing in the RBD. With the defined metric, it is possible to assess the result in a reliable manner. Nevertheless, assessing the results with clear representations is not straightforward. In this research project, two representations were chosen:
Histogram of the success rate: For this representation, the building of the metric are assigned to temporal bins of ten years in size and the success rate of the construction year is computed for each bins.
Distance and pseudo-distance distribution: As the previous representation only gives access to a binary view of the results, a distance representation is added to understand to which extent mistakes are made on the deduction of a construction year. For buildings detected between two maps, the temporal middle is assumed as the guessed construction year, allowing to compute a formal distance with its reference. In case a building is detected before or beyond the map range, a pseudo-distance of zero is assigned in case the result is correct according to the reference. Otherwise, the deduced year (that is necessarily between two maps) is compared to its reference extremal map date to obtain an error pseudo-distance.
In addition to the manually defined metric, the full RBD metric is also considered. As the construction years provided in the RBD have to be considered with care, as part of them are incorrect, comparing the results obtained the full RBD metric and the metric we manually defined opens the important question of the synchronisation between the maps and the RBD, viewed from the construction perspective.
"},{"location":"PROJ-REGBL/#results-basel-area","title":"Results: Basel Area","text":"The following figures give the Basel area result using the histogram representation. The left plot uses the full RBD metric while the right one uses the manually validated one:
Histogram of the success rate - Ten years binsOne can see one obvious element that is the result provided by the full RBD metric (left) and the manually validated metric (right) are different. This is a clear sign that the RBD and the maps are de-synchronised on a large fraction of the building set of Basel. The other element that can be seen on the right plot is that the deduction of the construction year are more challenging where maps are available. Indeed on the temporal range covered by the maps (vertical white lines), the results drops from the overall results to 50-60% on some of the histogram bins.
The following figures show the distance and pseudo-distance distribution of the error made on the deduced construction year according to the chosen metric:
Distance (red and blue) and pseudo-distance (red) of the error on the construction yearsThe same differences as previously observed between the two metrics can also be seen here. Another important observation is that the distribution seems mostly symmetrical. This indicates that no clear deduction bias can be observed in the results provided by the proof-of-concept.
"},{"location":"PROJ-REGBL/#results-bern-area","title":"Results: Bern Area","text":"The following figures give the histogram view of the results obtained on the Bern area:
Histogram of the success rate - Ten years binsOne can observe that the results are similar to the result of Basel whilst being a bit better. In addition, one can clearly see that the difference between the full RBD metric and the manually validated metric huge here. This is probably the sign that the RBD is mostly incorrect in the case of Bern.
The following figures show the distance distributions for the case of Bern:
Distance (red and blue) and pseudo-distance (red) of the error on the construction yearsAgain, the distribution of the error on the deduced construction year is symmetrical in the case of Bern.
"},{"location":"PROJ-REGBL/#results-biasca-area","title":"Results: Biasca Area","text":"The following figures give the histogram view of the success rate for the case of Biasca:
Histogram of the success rate - Ten years binsIn this case, the results are much better according to the manually validated metric. This can be explained by the fact that Biasca is a rural/mountainous area in which growing of the urban areas are much simpler as buildings once built tend to remain unchanged, limiting the difficulty to deduce a reliable construction year.
The following figures show the distance distribution for Biasca:
Distance (red and blue) and pseudo-distance (red) of the error on the construction yearsThis confirms the results seen on the histogram figure and shows that the results are very good on such areas.
"},{"location":"PROJ-REGBL/#results-caslano-area","title":"Results: Caslano Area","text":"Finally, the following figures show the histogram view of the success rate of the proof-of-concept on the case of Caslano:
Histogram of the success rate - Ten years binsThe same consideration applies as for the Biasca case. The results are very good as part of the Caslano area can be considered as rural or at least peri-urban. The results are a bit less good than in the Biasca case, drawing the picture that urban centres are more difficult to infer than rural areas.
The following figures show the error distribution for Caslano:
Distance (red and blue) and pseudo-distance (red) of the error on the construction years"},{"location":"PROJ-REGBL/#results-synthesis","title":"Results: Synthesis","text":"In order to synthesise the previous results, that were a bit dense due to the consideration of two representations and two metrics, the following summary is given:
Basel: 78.0% of sucess rate and 80.4% of building correctly placed within \u00b15.5 years
Bern: 84.4% of sucess rate and 85.0% of building correctly placed within \u00b15.6 years
Biasca: 93.5% of sucess rate and 93.9% of building correctly placed within \u00b16.0 years
Caslano: 90.8% of sucess rate and 91.2% of building correctly placed within \u00b16.2 years
These results only consider the manually validated metric for all of the four areas. By weighting each area with their amount of buildings, one can deduce the following numbers:
These last numbers can be considered as a reasonable extrapolation of the proof-of-concept performance on the overall Switzerland.
"},{"location":"PROJ-REGBL/#conclusion","title":"Conclusion","text":"As a main conclusion to the national maps approach, one can consider the results as good. It was possible to develop a proof-of-concept and to apply it on selected and representative areas of Switzerland.
In this approach, it turns out that developing the proof-of-concept was the easy part. Indeed, finding a metric and demonstrating its representativeness and reliability was much more complicated. Indeed, as the two datasets can not be considered as fully reliable in the first place, a strategy had to be defined in order to be able to demonstrate that the chosen metric was able to assess our result in the way expected by the Statistical Office.
In addition, the metric only required one additional hypothesis on top of the two datasets. This hypothesis, consisting in assuming that the synchronous sub-set was a quasi-sub-set of the absolutely correct construction years, can be assumed to be reasonable. Nevertheless it is important to emphasise that it was necessary to make it, leading us to remains critic and careful whilst reading the results given by our metric.
The developed proof-of-concept was developed in C++, leading to an efficient code able to be used for the whole processing of Switzerland without the necessity to deeply modify it.
"},{"location":"PROJ-REGBL/#research-approach-statistical","title":"Research Approach: Statistical","text":"As the availability of the topographic/national maps does not reach the integrity of all building's year of construction in the registry, an add-on was developed to infer this information, whenever there was this need for extrapolation. Usually, the maps availability reaches the 1950s, whilst in some cities the minimum year of construction can be in the order of the 12th century, e.g. The core of this statistical model is based on the Concentric Zones Model (Park and Burgess, 1925)[6] extended to the idea of the growth of the city from the a centre (Central Business District - CBD) to all inner areas. The concept behind this statistical approach can be seen below using the example of a crop of Basel city:
Illustration of the Burgess concentric zone modelAlthough it is well known the limits of this model, which are strongly described in other famous urban models such as from Hoyt (1939)[7] and Harris and Ullman (1945)[8]. In general those critics refer to the simplicity of the model, which is considered and compensated for this application, especially by the fact that the main prediction target are older buildings that are assumed to follow the concentric zones pattern, differently than newer ones (Duncan et al., 1962)[9]. Commonly this is the pattern seen in many cities, hence older buildings were built in these circular patterns to some point in time when reconstructions and reforms are almost randomly placed in spatial and temporal terms. Moreover processes like gentrification are shown to be dispersed and quite recent (R\u00e9rat et al, 2010)[10].
In summary, a first predictor is built on the basis that data present a spatial dependence, as in many geostatistical models (Kanevski and Maignan, 2004[11]; Diggle and Ribeiro, 2007[12]; Montero and Mateu, 2015[13]). This way we are assuming that closer buildings are more related to distant buildings (Tobler, 1970[14]) in terms of year of construction and ergo the time dimension is being interpolated based on the principles of spatial models. We are here also demonstrating how those two dimensions interact. After that concentric zones are embedded through the use of quantiles, which values will be using in a probabilistic unsupervised learning technique. Finally, the predicted years are computed from the clusters generated.
"},{"location":"PROJ-REGBL/#metric_1","title":"Metric","text":"Similar to the detection situation, generating a validation dataset was an especially challenging task. First of all, the dates in the RBD database could not be trusted in their integrity and the topographic maps used did not reach this time frame. In order to ascertain the construction year in the database, aerial images from swisstopo (Swiss Federal Office of Topography) were consulted and this way buildings were manually selected to compound a validation dataset.
References extraction from aerial images manual analysisOne of the problems related to this approach was the fact that a gap between the surveys necessary for the images exists. This way it is not able to state with precision the construction date. These gaps between surveys were approximately in the range of 5 years, although in Basel, for some areas, it reached 20 years. An example of this methodology to create a trustworthy validation set can be seen below. In the left-hand side one can see the year of the first image survey (up) and the year registered in the RBD (down) and in the right-hand side, one can see the year of the next image survey in the same temporal resolution.
"},{"location":"PROJ-REGBL/#methodology_1","title":"Methodology","text":"First of all, a prior searching radius is defined as half of the largest distance (between random variables). For every prediction location, the variance between all points in the prior searching radius will be used to create a posterior searching radius. This way, the higher the variance, the smaller the searching radius, as we tend to trust data less. This is mainly based on the principle of spatial dependence used in many geostatistical interpolators. The exception to this rule is for variances that are higher than 2 x the mean distance between points. In this case, the searching radius increases again in order to avoid clusters of very old houses that during tests caused underestimation. The figure below demonstrates the logic being the creation of searching radii.
Searching radii computation processbeing d the distance between points, \u03bc the mean and s\u00b2 the variance of random variable values within the prior searching radius.
It is important to mention that in case of very large number of missing data, if the searching radius does not find enough information, the posterior mean will be the same as the prior mean, possibly causing over/underestimation in those areas.
This first procedure is used to fill the gaps in the entry database so clustering can be computed. The next step is then splitting the data in 10 quantiles, what could give the idea of concentric growth zones, inspired, in Burgess Model (1925)[7]. Every point in the database will then assume the value of its quantile. It is also possible to ignore this step and pass to clustering directly, what can be useful in two situations, if a more general purpose is intended or if the concentric zones pattern is not observed in the study area. As default, this step is used, which will be followed by an unsupervised learning technique. A gaussian mixture model, which does not only segments data into clusters, but indicates the probability of each point belonging to every cluster is then performed. The number of components computed is a linear function to the total number of points being used, including the ones that previously had gaps. The function to find the number of components is the following:
being np the number of components/clusters, and nc the total number of points used. The number of clusters shall usually be very large compared to a standard clustering exercise. To avoid this, this value is being divided by ten, but the number of clusters will never be smaller than five. An example of clustering performed by the embedded gaussian mixture model can be seen below:
Example of clustering process on the Basel areaHence the matrix of probabilities of every point belonging to each cluster (\u03bb - what can be considered a matrix of weights) is multiplied by the mean of each cluster ( 1 x nc matrix mc), forming the A matrix:
or in matrices:
Finally, the predictions can then be made using the sum of each row in the A matrix.
It is important to state that the same crops (study areas) were used for this test. Although Caslano was not used in this case, as it possesses too few houses with a construction date below the oldest map available. Using the metric above explained a hold out cross-validation was performed, this way a group of points was only used for validation and not for training. After that, the RMSE (Root Mean Squared Error) was calculated using the difference between the date in the RBD database and the predicted one. This RMSE was also extrapolated to the whole Switzerland, so one could have a notion of what the overall error could be, using the following equation (for the expected error):
where E is the error and n the number of buildings in each region.
In addition to the RMSE, the 95th percentile was computed for every study area and using all combined as well. Hence, one could discuss the spread and predictability of errors.
"},{"location":"PROJ-REGBL/#results_1","title":"Results","text":"The first case analysed was Basel, where the final RMSE was 9.78 years. The density plot below demonstrates the distribution of errors in Basel, considering the difference between the year of construction in the RBD database and the predicted one.
Distribution of error on construction year extrapolationAmong the evaluated cases, Basel presented a strong visible spatial dependence, and it was also the case which the largest estimated proportion of houses with construction years older than (1955) the oldest map (11336 or approximately 66% of buildings). Based on the validation dataset only, there was an overall trend of underestimation and the 95th percentile reached was 20 years, showing a not so spread and flat distribution of errors.
Bern was the second case evaluated, and it demonstrated to be an atypical case. This starts from the fact that a big portion of the dates seemed incongruent with reality, based on the aerial images observed and as seen in the previous detection approach. Not only that, but almost 80% of the buildings in Bern had missing data to what refers to the year of construction. This is especially complicated as the statistical method here presented is in essence an interpolator (intYEARpolator). Basically, as in any inference problem, data that is known is used to fill unknown data, therefore a reasonable split among known and unknown inputs is expected, as well as a considerable confidence on data. In the other hand, an estimated number of 1079 (approximately 27% of the buildings) buildings was probably older than the oldest map available (1954) in Bern crop. Therefore, in one way liability was lower in this case, but the number of prediction points was smaller too. The following figure displays the density of errors in Bern, where an RMSE of 20.64 years was computed.
Distribution of error on construction year extrapolationThere was an overall trend for overestimation, though there was still enough lack of spread in errors, especially if one considers the 95th percentile of 42.
Finally, the crop on Biasca was evaluated. The computed RMSE was of 13.13 years, which is closer to the Basel case and the 95th percentile was 17 years, this way presenting the least spread error distribution. In Biasca an estimated 1007 (32%) buildings were found, which is not much more than the proportion in Bern, but Biasca older topographic map used was from 1970, making of it an especially interesting case. The density plot below demonstrates the concentrated error case of Biasca:
Distribution of error on construction year extrapolationOnce the RMSE was computed for the three regions, it was extrapolated to the whole Switzerland by making consideration the size of each dataset:
Extrapolation of the error distribution on the whole SwitzerlandThe expected extrapolated error calculated was 15.6 years and the 95th percentile was then 31 years.
"},{"location":"PROJ-REGBL/#conclusion_1","title":"Conclusion","text":"This add-on allows extrapolating the predictions to beyond the range of the topographical maps. Its predictions are limited, but the accuracy reached can be considered reasonable, once there is a considerable lack of information in this prediction range. Nor the dates in the RBD, nor the topographic maps can be fully trusted, ergo 15.6 years of error for the older buildings is acceptable, especially by considering the relative lack of spread in errors distribution. If a suggestion for improvement were to be given, a method for smoothing the intYEARpolator predictions could be interesting. This would possibly shift the distribution of the error into closer to a gaussian with mean zero. The dangerous found when searching for such an approach is that the year of construction of buildings does not seem to present a smooth surface, despite the spatial dependence. Hence, if this were to be considered, a balance between smoothing and variability would need to found.
We also demonstrated a completely different perspective on how the spatial and temporal dimensions can be joined as the random variable predicted through spatial methodology was actually time. Therefore a strong demonstration of the importance of time in spatially related models and approaches was also given. The code for the intYEARpolator was developed in Python and it runs smoothly even with this quite big proportion of data. The singular case it can be quite time-demanding is in the case of high proportion of prediction points (missing values). It should also be reproducible to the whole Switzerland with no need for modification. A conditional argument is the use of concentric zones, that can be excluded in case of a total different pattern of processing time.
"},{"location":"PROJ-REGBL/#reproduction-resources","title":"Reproduction Resources","text":"The source code of the proof-of-concept for national maps can be found here :
The README provides all the information needed to compile and use the proof-of-concept. The presented results and plots can be computed using the following tools suite :
with again the README giving the instructions.
The proof-of-concept source code for the statistical approach can be found here :
with its README giving the procedure to follow.
The data needed to reproduce the national maps approach are not publicly available. For the national maps, a temporal series of the 1:25'000 maps of the same location are needed. They can be asked to swisstopp :
With the maps, you can follow the instruction for cutting and preparing them on the proof-of-concept README.
The RBD data, used for both approaches, are not publicly available either. You can query them using the request form on the website of the Federal Statistical Office :
Both proof-of-concepts READMEs provide the required information to use these data.
"},{"location":"PROJ-REGBL/#references","title":"References","text":"[1] Federal Statistical Office
[2] Federal Register of Buildings and Dwellings
[3] Federal Office of Topography
[4] National Maps (1:25'000)
[5] Conway, J. (1970), The game of life. Scientific American, vol. 223, no 4, p. 4.
[6] Park, R. E.; Burgess, E. W. (1925). \"The Growth of the City: An Introduction to a Research Project\". The City (PDF). University of Chicago Press. pp. 47\u201362. ISBN 9780226148199.
[7] Hoyt, H. (1939), The structure and growth of residential neighborhoods in American cities (Washington, DC).
[8] Harris, C. D., and Ullman, E. L. (1945), \u2018The Nature of Cities\u2019, Annals of the American Academy of Political and Social Science, 242/Nov.: 7\u201317.
[9] Duncan, B., Sabagh, G., & Van Arsdol,, M. D. (1962). Patterns of City Growth. American Journal of Sociology, 67(4), 418\u2013429. doi:10.1086/223165
[10] R\u00e9rat, P., S\u00f6derstr\u00f6m, O., Piguet, E., & Besson, R. (2010). From urban wastelands to new\u2010build gentrification: The case of Swiss cities. Population, Space and Place, 16(5), 429-442.
[11] Kanevski, M., & Maignan, M. (2004).\u00a0Analysis and modelling of spatial environmental data\u00a0(Vol. 6501). EPFL press.
[12] Diggle, P. J. Ribeiro Jr., P. J. (2007). Model-based Geostatistics. Springer Series in Statistics.
[13] Montero, J. M., & Mateu, J.\u00a0(2015). Spatial and spatio-temporal geostatistical modeling and kriging\u00a0(Vol. 998). John Wiley & Sons.
[14] Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic geography, 46(sup1), 234-240.
"},{"location":"PROJ-ROADSURF/","title":"Classification of road surfaces","text":"Gwena\u00eblle Salamin (swisstopo), Cl\u00e9mence Herny (Exolabs), Roxane Pott (swisstopo), Alessandro Cerioni (\u00c9tat de Gen\u00e8ve)
Proposed by the Federal Office of Topography swisstopo - PROJ-ROADSURF August 2022 to March 2023 - Published on August 28, 2023 All scripts are available on GitHub: https://github.com/swiss-territorial-data-lab/proj-roadsurf
Abstract: The Swiss road network extends over 83\u2019274 km. Information about the type of road surface is useful not only for the Swiss Federal Roads Office and engineering companies, but also for cyclists and hikers. Currently, the data creation and update is entirely done manually at the Swiss Federal Office of Topography. This is a time-consuming and methodical task, potentially suitable to automation by data science methods. The goal of this project is classifying Swiss roads according to their surface type, natural or artificial. We first searched for statistical differences between these two classes, in order to then perform supervised classification based on machine-learning methods. As we could not find any discriminant feature, we used deep learning methods. In terms of balanced F1 score, we obtained a global score of 0.74 over the training, validation and test area, 0.56 over the inference-only area.
"},{"location":"PROJ-ROADSURF/#1-introduction","title":"1. Introduction","text":"The Swiss road network extends over 83'274 km 1. Not only cyclists and hikers can be interested in knowing whether a given road section is covered by a natural or an artificial surface, but also the Swiss Federal Roads Office, which is in charge of road maintenance, and engineering companies. This information is found within the swissTLM3D 2 dataset, the large-scale topographic model of Switzerland produced by the Federal Office of Topography (swisstopo). Keeping the swissTLM3D dataset up to date is a time-consuming work that has to be done methodically. Operators draw and georeference new elements and fill in their attributes based on the visual interpretation of stereoscopic aerial images. The update of existing elements also follows this manual approach. Data science can help by autmatizing this time-consuming and systematic tasks.
So far, the majority of data science studies on the identification of the road surface type, in particular those based on artificial intelligence, have been conducted in the context of improving the driving and security of autonomous vehicles 3456. These works rely on images shot by cameras mounted at the front of the moving vehicle itself. To our knowledge, only one study, carried out by Mansourmoghaddam et al. (2022) 7, proposed a method based on object-based classification from aerial imagery, which could successfully tell artificial roads from natural ones. Another possible approach is to use spectral indices, as done by Zhao & Zhu (2022) 8 working on the distinction between artificial surfaces and bare land. However, their method is not specifically designed for road surfaces.
The goal of this project was to determine whether the road cover is artificial or natural with the development of data science tools. For this first test, only the roads of the class \"3m Strasse\" are considered.
Figure 1: Overview of the workflow for this project.As the location of roads was known, we faced a problem of supervised classification. Two approaches were tested to address it: machine learning (ML) and deep learning (DL). Both approaches used the same input data, aerial images and vector road location.
"},{"location":"PROJ-ROADSURF/#2-data","title":"2. Data","text":"As input data, this project used two datasets produced by the Federal Office of Topography: swissTLM3D and SWISSIMAGE RS. We worked with data for the year 2018, for which the images and ground truth, i.e. the manually vectorized and classified roads, are available for the area of interest (AOI). Coordinates are expressed in the EPSG:2056 reference system.
"},{"location":"PROJ-ROADSURF/#21-area-of-interest","title":"2.1. Area of interest","text":"Figure 2: Delimitation of the area of interest with the tile numbers of the 1:25'000 Swiss national map.The area of interest (AOI) defined for this study was represented by the tiles 1168, 1188, 1208 and 1228 of the Swiss national map at a scale of 1:25'000. This zone covers an area of 840 km2 and was chosen because of its representativeness of the Swiss territory.
"},{"location":"PROJ-ROADSURF/#22-swisstlm3d","title":"2.2. swissTLM3D","text":"The swissTLM3D 2 dataset is a large-scale topographic model of Switzerland. It contains geospatial data necessary o the national map, such as roads, buildings and land cover. Periodical updates rely on the manual work of specialized operators. They interpret stereoscopic images and fill in attributes with the help of some additional information, like cadastral surveys and terrestrial images. The specification of aerial imagery is similar to the SWISSIMAGE RS product. The road layer contains lines with the identifier, the structure (none, bridge, tunnel, etc.), the object type (highways, 8m roads, 1 m paths, etc.) and the surface type as attributes. The two possible classes of the surface type are defined in the metadata: artificial (German: Hart) and natural (Natur). The artificial class contains surfaces of hard artificial materials like asphalt, concrete or slabs. The natural class contains roads with a surface of natural materials like gravel or dirt, and untreated surfaces.
In this project, it was decided to test the classification for the type \"3m Strasse\" (3 m roads). This class encompasses roads that are between 2.81 m and 4.20 m wide. Within this subset, 6486 roads have an artificial surface and 289 a natural one. The dataset is heavily unbalanced toward the artificial roads.
In addition, the swissTLM3D dataset was used to identify the forests. Indeed, they prevent us from observing roads from aerial images; hence those roads cannot be used in our study. As no layer in the swissTLM3D is specifically devoted to forested areas, they were deduced from the land cover classes. A filter was applied to only keep forests (\"Wald\") and open forests (\"Wald offen\").
Over the AOI, all the roads in quarries have a natural surface. We used our own layer from the project on the detection of mineral extraction sites to know their location. However, it is possible to use the information on the area of use from the swissTLM3D dataset which has a class on gravel quarries and one on stone quarries.
"},{"location":"PROJ-ROADSURF/#23-swissimage-rs","title":"2.3. SWISSIMAGE RS","text":"The product SWISSIMAGE RS 9 contains aerial images of Switzerland composed by four bands: near-infrared (NIR), red (R), green (G) and blue (B). The ground resolution equals 0.10 m over the area of interest, except in some high altitude regions or regions with complex topography, where a resolution of 0.25 m is deemed sufficient. The standard deviation is +/- 0.15 m (1 sigma) for a ground resolution of 0.10 m and +/- 0.25 m (1 sigma) for a ground resolution of 0.25 m, +/- 3-5 m (1 sigma). The dataset is composed of a collection of 16-bit encoded GeoTIFF orthorectified images. The overlap between images varies, but stays always present.
"},{"location":"PROJ-ROADSURF/#3-preprocessing","title":"3. Preprocessing","text":"Both the swissTLM3D and SWISSIMAGE RS dataset were processed to be suitable for the algorithms we wanted to develop. This was achieved with two procedures: the generation of the road domain and the creation of a raster mosaic.
"},{"location":"PROJ-ROADSURF/#31-generation-of-the-road-domain","title":"3.1. Generation of the Road Domain","text":"The swissTLM3D contains a vector layer representing every road section as a 3D line with some attached attributes. As a first test, the beneficiaries requested us to perform the analysis only on roads of the type \"3m Strasse\", i.e the roads wider than 2.81 m and thinner than 4.20 m. The engineered structures were excluded based on the attribute \"KUNSTBAUTE\". Only bridges and road sections without structures were kept. Data preparation differs slightly between the two performed analyses, machine and deep learning. Results for both approaches are shown here below.
Figure 3: Resulting labels (left) from the initial TLM lines (right) in the case of the machine learning. Figure 4: Resulting labels (left) from the initial TLM lines (right) in the case of deep learning.For the machine learning analysis, only the 3m roads were kept (figure 3). For the deep learning analysis, we judged safer to keep all the visible roads (figure 4). Therefore, the neighboring roads were also considered. We made the hypothesis that we would obtain better results by training the model on all the visible roads, rather than on the 3m ones only. Still, the focus on \"3m Strasse\" class was enforced through the selection of raster tiles: only those tiles containing the specific class were used as input data. Road geometries, originally linear, were transformed into polygons by adding a buffer with a flat cap style. This procedure generated unwanted overlapping areas in the neighborhood of the intersection points between contiguous road sections. Such artifacts were handled differently depending on the road types:
Once that the polygons were generated, sections hidden by a forest canopy were excluded. A buffer of 2 m was also added around forests as the canopy was often seen to be extending beyond the forest delimitation as recorded in the swissTLM3D dataset.
We considered adding some information about the altitude of the length of the roads to the labels. Natural and artificial roads share pretty much the same distribution in terms of altitude. For the length, the longest roads all had an artificial surface. However, the experts could not tell us if it was the case for all Switzerland or a coincidence on our AOI. For the deep learning analysis, we tried to improve the overlap between labels and images by taking cadastral data into account. A larger buffer was used on the lines for the TLM. Then, only the parts of the buffer intersecting the road surfaces from cadastral surveying were kept. As described in the deep learning analysis section, we tested the labels straight out of the TLM and the ones augmented by the cadastral surveying. We also tried to merge the labels by width type or by surface type.
After the pre-processing step described here above,
Let us remind that there were many more roads labeled in the second case as we considered all the visible roads. Especially for natural roads, the vast majority did not belong to the class of interest, but rather to the \"1m Weg\" and \"2m Weg\" classes.
"},{"location":"PROJ-ROADSURF/#32-raster-mosaic-generation","title":"3.2. Raster Mosaic Generation","text":"As said in the description of SWISSIMAGE RS, a large overlap between images is present in the dataset. To remove this overlap, a mosaic was created. Instead of merging all the images into one, we decided to set up a XYZ raster tile service, allowing us to work at different resolutions. The first step consists in reprojecting images in the EPSG:3857 projection, compliant with standard tile map services. Then, to save memory and disk space, images were converted from 16 to 8 bits. Besides, normalization was performed to optimize the usage of the available dynamic range. Finally, images were exported to the Cloud-Optimized GeoTIFF (COG) format. COG files can then be loaded by the TiTiler application, an Open Source dynamic tile server application 10. The MosaicJSON specification was used to store image metadata 11. Zoom levels were bound between 17 and 20, corresponding to resolutions between 1.20 m and 0.15 m.
"},{"location":"PROJ-ROADSURF/#4-machine-learning-analysis","title":"4. Machine Learning Analysis","text":""},{"location":"PROJ-ROADSURF/#41-methodology","title":"4.1. Methodology","text":"Before delving into machine learning, we performed some exploratory data analysis, aiming at checking whether already existing features were discriminant enough to tell natural roads from artificial ones. Additional predictive features were also generated, based on
The machine learning analysis was performed only on the two middle tiles of the AOI.
The most promising spectral index we found in the literature is the Artificial Surface Index (ASI) defined by Zhao & Zhu (2022) 12. Unfortunately, the computation of the ASI requires the shortwave infrared (SWIR) band which is not available in the SWISSIMAGE RS data. The SWIR band can be available in satellite imagery (e.g.: Landsat 8, Sentinel 2), yet spatial resolution (20-30 m/px) is not enough for the problem at hand.
Instead, the VgNIR-BI index 13 could be computed in our case, since it combines the green and NIR bands:
\\[\\begin{align} \\ \\mbox{VgNIR-BI} = {\\rho_{green} - \\rho_{NIR} \\over \\rho_{green} + \\rho_{NIR}} \\ \\end{align}\\]where \u03c1 stands for the atmospherically corrected surface reflectance values of the band. In our case, no atmospheric correction was applied, because we dealt with aerial imagery instead of satellite imagery.
Boxplots were generated to visualize the distribution of the aforementioned predictive features. Principal component analysis (PCA) were performed, too. The group of values passed to the PCA were the following: - pixel values: Each pixel displays 11 attributes corresponding to (1) its values on the different bands (R, G, B, NIR), (2) the ratio between bands (G/R, B/R, NIR/R, G/B, G/NIR, B/NIR), and (3) the VgNIR-BI spectral index 13. - summary statistics: Each road has 5 attributes for each band: the mean, the median, the minimum (min), the maximum (max), and the standard deviation (std).
Let us note that:
In order not to make the presentation too cumbersome, here we only show results produced at zoom level 18, on the entire dataset, and considering road sections corresponding to the following criteria:
We can see on the figure 5 that both the median and the upper quartile are systematically higher for natural than for artificial roads across all the bands, meaning the natural roads have brighter parts. Unfortunately, we have that pixel value statistics do not allow a sharp distinction between the two classes, as the lower quartile are very close.
Figure 6: Boxplots of the pixel distribution on the VgNIR-BI index and the ratios between bands. Each graph represents a ratio or the index and each boxplot a surface type. Figure 6bis: Boxplots of the pixel distribution on the ratios between bands. Each graph represents a ratio and each boxplot a surface type.The ratios between bands and the VgNIR-BI present similar values for the artificial and natural roads, allowing no distinction between the classes.
Figure 7: Boxplots of the distribution for the road summary statistics on the blue band. Each graph represents a statistic and each boxplot a type of road surface.Boxplots produced with the summary statistics computed per band and per road section lead to similar conclusions. Natural roads tend to be lighter than artificial ones. However, the difference is not strong enough to affect the lower quartiles and allow a sharp distinction between classes.
Figure 8: PCA of the pixels based on their value on each band. Figure 9: PCA of the roads based on their statistics on the blue band.The figures 8 and 9 present respectively the results of the PCA on the pixel values and on the statistics over road sections. Once more, we have to acknowledge that, unfortunately, artificial and natural roads cannot be separated.
"},{"location":"PROJ-ROADSURF/#43-discussion","title":"4.3. Discussion","text":"Although boxplots reveal that some natural roads can be brighter than artificial roads, statistical indicators overlap in such a way that no sharp distinction between the two classes can be drawn. The PCA confirms such an unfortunate finding.
Those results are not surprising. As a matter of fact, natural roads which are found in the \"3m Strasse\" type are mainly made by gravel or similar materials which, color-wise, make them very similar to artificial roads.
"},{"location":"PROJ-ROADSURF/#5-deep-learning-analysis","title":"5. Deep Learning Analysis","text":""},{"location":"PROJ-ROADSURF/#51-methodology","title":"5.1. Methodology","text":"To perform the detection and classification of roads, the object detector (OD) framework developed by the STDL 14 was used. It is described in details in the dedicated page.
The two central parts of the AOI constitute the training zone, i.e. the zone for the training, validation and test datasets. The two exterior parts constitute the inference-only zone, i.e. for the \"other\" dataset, to test the trained model on an entirely new zone.
To assess the predictions, a script was written, final_metrics.py
instead of using the one directly from the STDL's OD. We decided to take advantage that: 1. Predictions are not exclusive between classes. Every road section was detected several times with predictions of different class overlapping. 2. The delimitation of the roads are already known.
Therefore, rather than choosing one correct prediction, we aggregated the predictions in a natural and an artificial index over each label. Those indices were defined as follows:
\\[\\begin{align} \\ \\mbox{index}_{class} = \\frac{\\sum_{i=1}^{n} (A_{\\%,i} \\cdot \\mbox{score}_{class,i})}{\\sum_{i=1}^{n} A_{\\%,i}} \\ \\end{align}\\]where n is the number of predictions belonging to the class, \\(A_{\\%, i}\\) is the percentage of overlapping area between the label and the prediction, \\(\\mbox{score}_{class,i}\\) is its confidence score.
\\[\\begin{align} \\ \\text{final class} = \\begin{cases} \\mbox{artificial} \\quad \\mbox{ if } \\quad \\mbox{index}_{artificial} \\gt \\mbox{index}_{natural}\\\\ \\mbox{natural} \\quad \\mbox{ if } \\quad \\mbox{index}_{artificial} \\lt \\mbox{index}_{natural} \\\\ \\mbox{undetected} \\quad \\text{ if } \\quad \\mbox{index}_{artificial} = 0 \\; \\text{ and } \\; \\mbox{index}_{natural} = 0 \\\\ \\mbox{undetermined} \\quad \\text{ if } \\quad \\mbox{index}_{artificial} = \\mbox{index}_{natural} \\; \\text{ and }\\; \\mbox{index}_{artificial} \\neq 0\\\\ \\end{cases} \\ \\end{align}\\]The largest index indicates the right class as better predictions are supporting it. Once every road has an attributed class, the result was evaluated in terms of recall, precision and balanced F1 score.
\\[\\begin{align} \\ P_{class} = \\frac{TP_{class}}{TP_{class}+FP_{class}} \\text{ and } P = \\frac{P_{natural} + P_{artificial}}{2} \\ \\end{align}\\] \\[\\begin{align} \\ R_{class} = \\frac{TP_{class}}{TP_{class}+FN_{class}} \\text{ and } R = \\frac{R_{natural} + R_{artificial}}{2} \\ \\end{align}\\] \\[\\begin{align} \\ F1\\text{ }score = \\frac{2PR}{P + R} \\ \\end{align}\\]where
The predictions are not necessarily all taken into account. They are filtered based on their confidence score. Thresholds were tested over the balanced F1 score of the validation dataset.
The current dataset exhibits a very strong class imbalance. Therefore, we decided to use balanced metrics, giving the same weight to both classes. The balanced F1 score was chosen as the determining criterion between the different tested models. As it gives equal weight to both classes, the quality of the classification for the natural road was well taken into consideration. However, we have to keep in mind that we gave great importance to this class compared to its number of individuals.
A great risk exists that the model would be biased toward artificial roads, because of the imbalance between classes. Therefore, we decided on a baseline model (BLM) where all the roads in the training zone are classified as artificial. Its metrics are the following:
Artificial Natural Global Precision 0.97 0 0.49 Recall 1 0 0.5 F1 score 0.98 0 0.49Table 1: Metrics for the BLM with all the roads classified as artificial
The trained models should improve the global F1 score of 0.49 to be considered as an improvement.
Finally, we wanted to know if the artificial and natural index could constitute a confidence score for their respective classes. The reliability diagram has been plotted to visualize the accuracy of the classification at different levels of those indices.
Figure 10: Listing of the various tests carried out.To achieve the best possible results, several input parameters and files for the model training were tested. 1. We tried to improve the quality of the labels by integrating data from cadastral surveying and by merging the roads based on their cover, on their type, or not at all. 2. We trained the model with different zoom level images, from 17 to 20. 3. The influence of different band combinations on the model performance was investigated: true colors (RGB) and false colors (NirRG).
For each test, the best configuration was chosen based on the global balanced F1 score. This method supposes that the best choice for one parameter did not depend on the others.
"},{"location":"PROJ-ROADSURF/#52-results","title":"5.2. Results","text":"When testing different procedures to create the labels, using only the TLM and excluding the data from the cadastral survey gave the best metrics. Besides, cutting the label corresponding to the road sections and not merging them by road type or surface gave better metrics. Increasing the zoom level improved the balanced F1 score. Using the bands RGB and RG with NIR gave very similar results and an equal F1 score. Therefore, the best model is based on labels deduced from the TLM and using the RGB bands at a zoom level 20.
Artificial Natural Global Precision 0.99 0.74 0.87 Recall 0.97 0.74 0.86 F1 score (1) 0.98 0.74 0.86 F1 score for the BLM (2) 0.98 0 0.49 Improvement: (1)-(2) 0 0.74 0.32Table 2: metrics for the best model over the training, test and validation area.
The F1 score for the natural roads and the global one outperformed the BLM. The per-class F1 scores has been judged as satisfying by the beneficiaries.
Artificial Natural Global Precision 0.98 0.22 0.60 Recall 0.95 0.26 0.61 F1 score (1) 0.96 0.24 0.60 F1 score for the BLM (2) 0.98 0 0.49 Improvement: (1)-(2) -0.02 0.24 0.11Table 3: metrics for the best model over the inference-only area.
Those metrics are worse than the ones obtained over the training area. The global F1 score is still higher than for the BLM. However, the natural F1 score is not high enough.
Figure 11: Absolute and relative repartition of the roads in the inference-only zone.93.2% of the roads are correctly classified, 4.2% are in the wrong class and 2.6% are undetected or undetermined. Nearly half of the natural roads are either undetected or in the wrong class, but as they represent a tiny proportion of the dataset, they impact little the accuracy. In the training zone, only 2% of the roads are in the wrong class and 1.7% are undetected or undetermined.
Figure 12: Reliability curves for the training and the inference-only zone.The artificial index can be used as confidence score for the artificial roads. The natural index can be used as confidence score for the natural ones. Indeed, the accuracy of the results for each class increases with their value.
"},{"location":"PROJ-ROADSURF/#53-discussion","title":"5.3. Discussion","text":"The F1 score obtained is 0.86 over the area to train and validate the model and 0.60 over the rest of the AOI. The difference is essentially due to the decrease in the F1 score of the natural roads, passing from 0.74 to 0.24. The first intuition is that we were facing a case of overfitting. However, the validation loss was controlled in order to stop the training on time and avoid this problem. Another possibility would be that the two zones differ significantly and that a model trained on one cannot apply on the other. Hence, we also split the tiles randomly between the training and the inference-only zone. The gap between the balanced F1 score of the training and inference-only zone passed from 0.25 to 0.19 with the same hyper-parameters.
The high recall for artificial roads indicates that the model properly detects them. However, once the artificial recall is high, the high artificial precision is in this case necessarily due. As the roads have a known location, the false positives not due to a class confusion are eliminated from our assessment. Then, only the roads classified in the wrong class can affect precision. As there are not a lot of natural roads, even if they were all wrongly classified as artificial like in the BLM, the precision would still remain well at 0.97. In the current case, the precision of the trained model is 0.01 higher than the one of the BLM. The drop in the natural F1 score is due to all the roads predicted in the wrong class. As they are only a few natural roads, errors of the model affect them more heavily. The part of the misclassified road increased by 44% between the training and the inference-only zone. Meanwhile, the part of undetermined roads only increased by 1%
The F1 score could maybe be further improved by focusing more strictly on the 3m roads. Indeed, we considered it would be safer to teach the algorithm to differentiate only between surfaces and not between road types, which are defined by width. Therefore, the tiles were selected because they intersected 3m roads, but then all the roads on the tiles were transformed into labels. Because of the rarity of 3m natural roads, most of the natural roads seen by the algorithm are 2m roads and those often have a surface with grass, where the 3m natural roads have a surface made only of gravel or dirt. Over the training zone, 110 natural roads are 3m ones and 1183 ones are 2 m and 1 m paths. Maybe, labelling only the 3m roads would give better results than labelling all the visible roads. We did not tune the hyperparameter used by the deep learning model once we found a satisfying enough combination. In addition, as the algorithm is based on detectron2, not everything can easily be tuned. Using an entirely new framework and tuning the loss weights would allow better handling the class imbalance. A new framework could also allow integrating an attention mask and take advantage of the known road location like recommended by Epel (2018)15. Using a new framework could also allow to use images with 4 bands and integrating the NIR. However, we decided here to first try the tools we already had in our team.
We can say that there is a bias in the model encouraging it to predict artificial roads. However, it is still better than the BLM. Therefore, this model is adapted for its purpose.
"},{"location":"PROJ-ROADSURF/#531-elements-specific-to-the-application-on-the-swisstlm3d-product","title":"5.3.1. Elements specific to the application on the SwissTLM3D product","text":"All these findings seem negative, which is why it is appropriate to recall the significant imbalance between the classes. If we look at the percentages, 93.2% of the dataset is correctly classified over the inference-only zone. This could represent a significant gain of time compared to an operator who would do the classification manually. Indeed, once the model trained, the procedure documented here only needs 20 minutes to classify the roads of the AOI. Besides, the artificial and natural indices allow us to find most of the misclassified roads and limit the time needed for a visual verification. In addition, the information of the road surface type is already available for the whole Switzerland. When using the algorithm to update the swissTLM3D dataset, it would be possible to perform change detection between the previous and new surface type. Then, those changes could be visually verified.
"},{"location":"PROJ-ROADSURF/#6-conclusion","title":"6. Conclusion","text":"Keeping the swissTLM3D dataset up to date is a time consuming and methodical task. This project aimed at finding a method to automatize the determination of the road surface type (artificial vs. natural). We focused on roads belonging to the \"3m Strasse\" class and discovered that statistics stemming from pixel values are not enough discriminating to tell artificial roads from natural ones. Therefore, we decided not to attempt any supervised classification based on machine learning. Instead, deep learning methods are performed. With 93% of the roads classified correctly, this method gave better results in regard to the global F1 score than a baseline model classifying all the roads as artificial. However, the model classifies 4.2% of the roads in the wrong class and has difficulties performing new zones. To ensure the quality of the swissTLM3D product, we advise to first perform a classification with the algorithm, then to check roads with a low class index or a change in surface type compared to the previous version years. It could represent a huge time saver for the operators who currently classify and check a second time all the roads.
Despite our investigations, we could not find the cause of the gap between the metrics for the training and the inference-only zone. Further investigation is needed. The next step for this project would be to extend the algorithm to paths of 1 to 2 m wide. The natural roads of 3 m are mostly made of gravel, which strongly resembles asphalt, while natural paths are mostly made of dirt and can grow grass. Therefore, when mixing the two road width classes in one model, the natural roads of 3 m could be too difficult to distinguish from artificial roads and end up neglected.
"},{"location":"PROJ-ROADSURF/#7-references","title":"7. References","text":"Office f\u00e9d\u00e9ral de la statistique. Longueur des routes en 2020 | Office f\u00e9d\u00e9ral de la statistique. https://www.bfs.admin.ch/news/fr/2020-0273, November 2020.\u00a0\u21a9
swisstopo. swissTLM3D. https://www.swisstopo.admin.ch/de/geodata/landscape/tlm3d.html.\u00a0\u21a9\u21a9
Lushan Cheng, Xu Zhang, and Jie Shen. Road surface condition classification using deep learning. Journal of Visual Communication and Image Representation, 64:102638, October 2019. doi:10.1016/j.jvcir.2019.102638.\u00a0\u21a9
Susi Marianingsih, Fitri Utaminingrum, and Fitra Abdurrachman Bachtiar. Road Surface Types Classification Using Combination of K-Nearest Neighbor and Na\u00efve Bayes Based on GLCM. International Journal of Advanced Software Computer Application, 11(2):15\u201327, 2019.\u00a0\u21a9
Marcus Nolte, Nikita Kister, and Markus Maurer. Assessment of Deep Convolutional Neural Networks for Road Surface Classification. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC), 381\u2013386. Maui, HI, November 2018. IEEE. doi:10.1109/ITSC.2018.8569396.\u00a0\u21a9
Viktor Slavkovikj, Steven Verstockt, Wesley De Neve, Sofie Van Hoecke, and Rik Van De Walle. Image-Based Road Type Classification. In 2014 22nd International Conference on Pattern Recognition, 2359\u20132364. Stockholm, August 2014. IEEE. doi:10.1109/ICPR.2014.409.\u00a0\u21a9
Mohammad Mansourmoghaddam, Hamid Reza Ghafarian Malamiri, Fahime Arabi Aliabad, Mehdi Fallah Tafti, Mohamadreza Haghani, and Saeed Shojaei. The Separation of the Unpaved Roads and Prioritization of Paving These Roads Using UAV Images. Air, Soil and Water Research, 15:117862212210862, January 2022. doi:10.1177/11786221221086285.\u00a0\u21a9
Hailing Zhou, Hui Kong, Lei Wei, Douglas Creighton, and Saeid Nahavandi. On Detecting Road Regions in a Single UAV Image. IEEE Transactions on Intelligent Transportation Systems, 18(7):1713\u20131722, July 2017. doi:10.1109/TITS.2016.2622280.\u00a0\u21a9
swisstopo. SWISSIMAGE RS. https://www.swisstopo.admin.ch/fr/geodata/images/ortho/swissimage-rs.html.\u00a0\u21a9
TiTiler. https://developmentseed.org/titiler/.\u00a0\u21a9
Vincent Sarago, Sean Harkins, and Drew Bollinger. Developmentseed / mosaicjson-spec. https://github.com/developmentseed/mosaicjson-spec, 2021.\u00a0\u21a9
Yongquan Zhao and Zhe Zhu. ASI: An artificial surface Index for Landsat 8 imagery. International Journal of Applied Earth Observation and Geoinformation, 107:102703, March 2022. doi:10.1016/j.jag.2022.102703.\u00a0\u21a9
Ronald C. Estoque and Yuji Murayama. Classification and change detection of built-up lands from Landsat-7 ETM+ and Landsat-8 OLI/TIRS imageries: A comparative assessment of various spectral indices. Ecological Indicators, 56:205\u2013217, September 2015. doi:10.1016/j.ecolind.2015.03.037.\u00a0\u21a9\u21a9
Swiss Territorial Data Lab. Object detector. February 2023. URL: https://github.com/swiss-territorial-data-lab/object-detector.\u00a0\u21a9
Sagi Eppel. Classifying a specific image region using convolutional nets with an ROI mask as input. December 2018. arXiv:1812.00291.\u00a0\u21a9
Cl\u00e9mence Herny (Exolabs) - Gwena\u00eblle Salamin (Exolabs) - Alessandro Cerioni (\u00c9tat de Gen\u00e8ve) - Roxane Pott (swisstopo)
Proposed by the Canton of Geneva - PROJ-ROOFTOPS Mars 2023 to January 2024 - Published in May 2024
Abstract: Free roof surfaces offer great potential for the installation of new infrastructure such as solar panels and vegetated rooftops, which are essential for adapting cities to climate change. The arrangement of objects on rooftops can be complex and dynamic. Inventories of existing roof objects are often scarce, incomplete and difficult to update, making it difficult to assess their potential. In this project, in collaboration with the Canton of Geneva, we have developed and tested three methods to automatically identify occupied and free surfaces on roofs: (1) classification of roof plane occupancy based on a random forest, (2) segmentation of objects in LiDAR point clouds based on a clustering and (3) segmentation of objects in aerial imagery based on a deep learning. The results are vector layers containing information about surface occupancy. True orthophotos and LiDAR data acquired over the canton of Geneva in 2019 were used. The methods were developed using a subset of 122 buildings selected to be representative of a diversity of objects and roofs, and on which the ground truth objects were manually vectorized. The developed methods achieved satisfactory performance. About 85% of the roof planes were correctly classified. The segmentation method was able to detect most of the objects with f1 scores of 0.78 and 0.75 for the LiDAR-based segmentation and the image-based segmentation respectively. The global shape of the occupied surface was more difficult to reproduce with a median intersection over the union of 0.35 and 0.37 respectively. The results of all three methods were considered satisfactory by the experts, with 70% to 95% of the results considered acceptable. Considering the quality of the results and the computational time, only the classification method was selected for an application at the cantonal level.
"},{"location":"PROJ-ROOFTOPS/#1-introduction","title":"1. Introduction","text":"To address the challenges of the climate crisis and the ecological transition, local authorities need to adapt their land use policies. One possible measure is to use the surface available on rooftops to install new infrastructure while minimizing the impact on land use. For instance, solar panels can be installed on rooftops to produce local energy with a minimal impact on the landscape1. Rooftops can also accommodate vegetated areas, promoting biodiversity in cities and mitigating the heat island effect2. Accurate knowledge of available rooftop surface and an inventory of the existing infrastructure, such as solar panels and vegetated rooftops, are required to plan and prioritize future investments. Ignoring rooftop objects could firstly lead to overestimating the potential for new infrastructure, such as the solar potential1, and secondly, slowing down the process of new installations. Unfortunately, information on this topic is often scarce and difficult to keep up to date, especially in big cities, limiting our understanding of the current situation. This can be explained by the number and diversity of roofs and roof objects. In addition, the rooftop landscape is dynamic and requires regular monitoring.
With increasing urbanization and the need for sustainable cities, there is a growing interest in knowing the potential of rooftops. The availability of high-resolution satellite and aerial imagery, as well as LiDAR data, along with the development of advanced numerical methods, has yielded to the multiplication of studies. The crowdsourcing approach3 makes it possible to vectorize objects on a large scale, but requires a large workforce and can suffer from a lack of homogeneity. Computer vision-based solutions show promising results for segmenting objects of interest. A deterministic approach based on pixel analysis and a 3D building model, developed by Narjabadifam et al. (2022)4, was able to detect suitable areas for installing solar panels, taking into account large roof objects (e.g. ventilation). The watershed method is commonly used for image segmentation. It can detect small objects (e.g. roof windows) in high-resolution images but involves a complex workflow to achieve satisfactory results5. Deep learning (DL) methods are used to train detection models for objects of interest such as solar panels6, vegetated roofs5, superstructures on roofs7 or available roof area89 with variable performances depending on studies and studied objects. The main difficulty in training DL models is the availability of a qualitative dataset of labels7 as the production of such dataset is a time-consuming task. LiDAR data is often used to assess the solar potential of rooftops by segmenting their main planes1011. Continuous improvements in point density make it possible today to retrieve the detailed morphology of the roof, including superstructures (e.g. dormers) but also smaller objects, such as chimneys. Therefore, segmentation of objects protruding from flat roof planes provides valuable information about the area available on rooftops.
In this context, the State of Geneva, through the Cantonal Office for Energy (OCEN) and the Cantonal Office for Agriculture and Nature (OCAN), contacted the STDL to explore possibilities of improving knowledge of rooftops. Both offices have developed methods for producing vector layers for solar panels and vegetated rooftops, respectively, but neither provided a satisfactory level of automation, accuracy, or completeness. Besides, information on other objects present on the rooftops, like air conditioners, pipes or windows, is incomplete. Therefore, both offices expressed the need to further automate the detection of available roof surfaces to assess the potential, define realistic objectives and strategies to achieve them, and prioritize investments. The objective for the STDL was to produce a binary vector layer of the available and occupied surfaces on roofs in the canton of Geneva. In this report, we first describe the data used, including high-resolution aerial imagery, 3D LiDAR point clouds and available vector layers of rooftops. We then present the methods and results of three approaches developed to evaluate available rooftop surface, namely, (1) LiDAR-based classification of roof occupancy, (2) LiDAR-based object segmentation, and (3) image-based object segmentation. Next, we discuss the possibility of combining the results of the different methods to improve the results. Finally, we provide conclusions on the ability of the developed methods to address the problem and on the most appropriate solution.
"},{"location":"PROJ-ROOFTOPS/#2-input-data","title":"2. Input data","text":""},{"location":"PROJ-ROOFTOPS/#21-lidar-point-cloud","title":"2.1 LiDAR point cloud","text":"The LiDAR point cloud was acquired in March 2019 by the State of Geneva. It has a density of 25 pts/m2, an altimetric accuracy of +/- 10 cm and a planimetric accuracy of 20 cm. It is distributed in georeferenced tiles of 500 m each. The point cloud is classified into 11 classes, including a \"building\" class. This class includes the whole building without distinction for the facades, rooftop or roof superstructures. Within the framework of the classification of the roof plane occupancy, the presence of the class \"building\" was evaluated, as explained in Section 4.1.2. To avoid the influence of classification errors, points from all classes were considered in the LiDAR segmentation.
"},{"location":"PROJ-ROOFTOPS/#22-true-orthophotos","title":"2.2 True orthophotos","text":"The RGB aerial imagery was acquired in May 2019 by the State of Geneva with a ground sampling distance of 5 cm. A true orthophoto was derived based on a photomesh. It has a ground sampling distance of 6.8 cm. The product, available on request, is served as RGB GeoTIFF images with a size of 500 m. True orthophotos are more complicated to obtain than orthophotos, and thus rarer. Their use was motivated by the fact that orthorectification aligns the roofs and bases of buildings. As a result, the objects detected on true orthophotos have the true position, allowing us to compare our results with those obtained with LiDAR data.
"},{"location":"PROJ-ROOFTOPS/#23-delimitation-of-the-roofs","title":"2.3 Delimitation of the roofs","text":"Information on building roofs is provided by the roof vector layer produced by the State of Geneva. It includes the main roof planes and some superstructure elements, defined by their area between 1 m2 and 9 m2. Each roof has been assigned the following attributes:
The vector layer is regularly updated to reflect of the destruction and construction of buildings. The version used for this project was downloaded in March 2023.
"},{"location":"PROJ-ROOFTOPS/#24-ground-truth","title":"2.4 Ground truth","text":"In the Canton of Geneva, several vector layers exist for roof objects (see the SITG catalog) but are incomplete for the purposes of our project. Consequently, it was decided to produce a precise ground truth (GT) dedicated to the project instead of using existing layers. It consists of a vector layer segmenting all the visible objects on the roofs, the objects partially covering the roofs, such as trees, as well as the delimitation of free surfaces. This work was performed manually on the 2019 true orthophotos. A single GT was produced for both the LiDAR and the true orthophoto datasets as they are aligned and synchronized in time. All vectorized objects were assigned to a class listed in Figure 1.
Figure 1: Number of objects per class of the ground truth for the training and test datasets.The GT is a list of 122 buildings chosen to be representative of the diversity (villas, industrial buildings, old town...). Of these, 105 were used to develop and optimize the workflows, i.e. as a training dataset, and 17 were used to check the stability of metrics, i.e. as a test dataset. The labeled objects in the GT, occupying surfaces on the selected roofs, represent about 50% of the total surface in both training and test datasets (Table 1).
Dataset Number of buildings Occupied area (m2) Free area (m2) Training subset 25 3,087 14,147 Training 105 57,303 60,526 Test 17 6,214 7,415 Table 1: Occupied and free surface areas for the different ground truth datasets. The training subset is specific to the image segmentation workflow (see Section 6.1.4).
Buildings were classified by occupation, hereinafter referred to as the building type, and roof typology, hereinafter referred to as the roof type, to evaluate the impact of these parameters on the results. The following building types were selected:
The following roof types were selected:
Note that all administrative and industrial roofs have a flat roof and all the pitched roofs are residential. The GT was used to optimize and assess the different workflows. No custom training was done for this project.
"},{"location":"PROJ-ROOFTOPS/#3-evaluation-of-the-results","title":"3. Evaluation of the results","text":""},{"location":"PROJ-ROOFTOPS/#31-metrics","title":"3.1 Metrics","text":"The performances of the developed methods were evaluated by computing the number of GT labels detected, namely, the precision P and the ability of the algorithm to be exhaustive with its detections, namely the recall R. The two were combined to obtain the f1 score. The respective formulas are presented below:
with:
The main challenge in calculating these metrics is related to the count of TP. Indeed, roof objects can have complex shapes, such as pipes, aeration outlets and solar panels, which can be vectorized in many ways, all of them possibly correct (Fig. 2). It can be difficult to reproduce the labels with detections, especially from one algorithm to another. Several detections may well cover one label, just as one detection may well cover several labels and be equally correct for all.
Figure 2: Illustration of different approaches to object vectorization and possible segmentation results. Solar panels can be vectorized as a group (left), as lines (middle) or as individual panels (right).To account for this aspect, a connected-component method was adopted. Graphs of overlapping detections and labels were generated as illustrated in Figure 3. A detection was considered to overlap a label when more than 10% of the detection surface was covered. All the elements in a connected graph were tagged as TP. Detections within the group were merged and the assigned TP value is equal to the number of labels within the connected graph. The labels and detections that were not part of any connected graph were assigned FN and FP labels respectively.
Figure 3: Labels (a) and detected obstacles (b) for the EGID 1005001, the corresponding graphs for the numbered elements on the balconies (c) and the resulting merged tagged detections (d).In addition to object detection, the ability to reproduce the shape of the occupied surface was evaluated. The main objective of the project is to recover the delimitation of free and occupied surfaces. Because of the difficulty of pairing detections and labels and the fact that it is not necessary to know the delimitation of objects inside an occupied surface, we calculated the intersection over union (IoU) of the detections and the labels at the roof scale:
\\[\\begin{align} \\ \\mbox{IoU} = {A_{detections \\cap labels} \\over A_{detections \\cup labels}} \\ \\end{align}\\]with:
The median IoU (mIoU) of all the roof provide the evaluation metric for the dataset considered.
The optimal value for the selected metrics, i.e. f1 score and mIoU, is 1.
"},{"location":"PROJ-ROOFTOPS/#32-hyperparameter-optimization","title":"3.2 Hyperparameter optimization","text":"The algorithms used and developed in this project involve numerous hyperparameters. We adopted the Optuna framework to automate the search for the value of each hyperparameter giving the best results. The optimization was performed for the LiDAR segmentation and the image segmentation workflows. Although the values to be optimized are different, the strategy is similar.
We sought to maximize the f1 score and the mIoU. The search for the best hyperparameter value was performed using the Tree-structured Parzen Estimator12 (TPE) algorithm. At each iteration, the workflow was executed from segmentation to assessment. At the end of the process, the best hyperparameter combinations optimizing the metrics were provided. In addition, the relative importance of precision compared to recall can be tuned by adding one of these metrics to the list of value to be optimized.
The hyperparameters obtained for the whole training dataset are referred to as \"global\". Specific optimization can be performed given the building type or the roof type to take into account of specific features. In this case, the obtained hyperparameters are referred to as \"specialized\".
"},{"location":"PROJ-ROOFTOPS/#33-evaluating-the-relevance-of-the-detections","title":"3.3 Evaluating the relevance of the detections","text":"In addition to the selected metrics, the results were analyzed in terms of object characteristics relevant to the project objective, i.e. providing indications of potential surface available for the installation of new facilities such as solar panels and vegetated rooftops. The experts expect to get an estimate of the free surface available to estimate the potential. Therefore, the occupied and free areas obtained with the different methods were computed and compared to the GT to evaluate the accuracy. In addition, the continuity of the roof surface is an important parameter to consider when installing facilities. It depends on the size of the objects and their position on the roof. A large object or an object located in the middle of a roof can constitute an obstacle. To evaluate the models' ability to detect such objects, the surface area of the object and the position of its centroid relative to the roof edge were computed, and the metrics were analyzed accordingly.
"},{"location":"PROJ-ROOFTOPS/#4-classification-of-roof-plane-occupancy","title":"4. Classification of roof plane occupancy","text":"A first method was developed to identify potentially free and occupied surfaces on rooftops. It consists of using statistics derived from LiDAR data as an indicator of occupancy. We assumed that some LiDAR properties can capture the presence of objects on the target roofs. For instance, changes in intensity could be caused by the LiDAR hitting different objects. In addition, a surface covered with objects is likely to be rougher than a flat, free surface. Zonal statistics on these two parameters, intensity and roughness, were used in addition to the LiDAR classification and roof plane area to classify roof planes into three classes:
First, the intensity values of the LiDAR points classified as building were interpolated with inverse distance weighting and converted to raster. Second, a DEM was computed from the LiDAR point cloud and roughness was derived and saved as raster. The Python library WhiteboxTools was used for this processing. The roughness was calculated at a scale of 1 m, which was the smallest possible scale. The produced rasters of intensity and roughness have a resolution of 0.3 m/px.
Zonal statistics of intensity and roughness were computed for each roof plane and used to classify them with manual thresholds and a random forest (RF), as described in the next two sections. If a roof plane extended over several tiles, then the result was kept for the tile with the largest overlap.
The initial processing was performed for all the roofs of the 45 LiDAR tiles containing GT and eight test tiles selected in the city center, representing a total of 95,699 roof planes. It took around 30 minutes to create the rasters and get the zonal statistics for the LiDAR tiles, while the classification took less than a minute with 32 GB of RAM and a i7-1260P CPU.
"},{"location":"PROJ-ROOFTOPS/#412-classification-with-manual-thresholds","title":"4.1.2 Classification with manual thresholds","text":"Roof planes smaller than 2 m2 were classified as \"occupied\", because they are too small for solar or vegetated installations. In addition, roof planes for which the LiDAR point cloud was classified as \"building\" for less than 25% of the area were classified as \"undefined\". To classify the remaining roof planes, thresholds were set on the statistical values presented in Table 2. They were selected to reflect the variations in intensity and roughness induced by the presence of objects on the roof, as well as the presence of non-building classes in the LiDAR point cloud.
Variable Threshold Margin of error of intensity 400 Standard deviation of intensity 5500 Median roughness (m) 7.5 Overlap with interpolated pixels not classified as building (%) 25
Table 2: Variables considered to classify the roof planes and the thresholds at which they are classified as occupied.
A roof was classified as occupied if it exceeds the threshold for at least one statistical value. The thresholds were set through trials and errors until we came to a satisfying result. The resulting classification was reviewed by the experts for 650 roof planes. A satisfaction rate was calculated (Section 4.2.2). Further tests were performed to improve them by adjusting the thresholds, but no better combination could be found.
"},{"location":"PROJ-ROOFTOPS/#413-classification-with-random-forest","title":"4.1.3 Classification with random forest","text":"To avoid classification based on arbitrary thresholds, RF was used with zonal statistics (Tables A1 and A2). The manual threshold classification, reviewed by the experts, was used as GT to train two RFs, one for each office. The roof planes of the class \"undefined\" and the ones smaller than 2 m2 were ignored. The number of roof planes used in the training of each RF is presented in Table 3.
Office Potentially free Occupied OCAN 258 324 OCEN 301 297
Table 3: Correct classification of the roof reviewed by the experts and used as the ground truth for the random forest.
The GT was split with 80% of the roof planes for training and 20% for testing. Satisfaction rates were calculated on the test dataset to evaluate the performance.
Only roofs that could be used for potential solar and vegetated installations were classified by the RF. We excluded the roof planes smaller than 2 m2 that are automatically classified as \"occupied\", and the roof planes classified as \"undefined\".
"},{"location":"PROJ-ROOFTOPS/#42-results","title":"4.2 Results","text":""},{"location":"PROJ-ROOFTOPS/#421-classification","title":"4.2.1 Classification","text":"Examples of roof plane classification obtained with the manual thresholds and the RF models are shown in Figure 4. The results for the OCAN are closer to the results obtained with the manual thresholds than the ones for the OCEN. In addition, let us note that the RF for the OCAN is classifying more roof planes as \"occupied\" than the RF for the OCEN.
Figure 4: Results of the manual thresholds, the random forest for the classification of occupancy for the OCAN and the OCEN.The visualization of the results shows that not only roofs with obstacles are classified as \"occupied\", but also some of small or narrow empty roof planes, because they display high median roughness and/or high minimum roughness. The roof planes classified as \"undefined\" can often be considered as occupied due to the presence of vegetation or walkways. The corresponding areas in the LiDAR point cloud is mostly classified as ground and vegetation.
"},{"location":"PROJ-ROOFTOPS/#422-expert-assessment","title":"4.2.2 Expert assessment","text":"OCEN and OCAN experts are generally satisfied with the classification based on the manual thresholds (Table 4), with global satisfaction rates ranging from 83% to 89%.
Office Global Occupied Potentially free Undefined OCAN 89% 86% 93% 66% OCEN 83% 84% 81% - Table 4: Satisfaction rates of the OCAN and OCEN experts with the classification of 650 roof planes using manual thresholds. Global satisfaction rates were computed only for planes classified as \"occupied\" and \"potentially free\". The review of planes classified as \"undefined\" is not available for OCEN.
Satisfaction rates for the \"occupied\" roof planes are similar for both offices, while that for \"potentially free\" roof planes is 12 points higher for OCAN, reaching an excellent score of 93%. For the OCEN expert, small roof planes are more easily considered as occupied than for the OCAN expert. The OCAN expert approved the \"undefined\" class in 66% of the cases, while this class was not reviewed by the OCEN expert.
Manual threshold classification Global Occupied Potentially free OCAN 79% 70% 91% OCEN 77% 72% 82%
RF classification
Global Occupied Potentially free OCAN 86% 78% 96% OCEN 83% 74% 91%Table 5: Satisfaction rates of the OCAN and OCEN experts on the test dataset for the two classification methods.
The satisfaction rates obtained with the manual thresholds and the RF on the test dataset are presented in Table 5. The classification with RF outperforms the manual thresholds, with satisfaction rates increasing by 7 and 6 points for OCAN and OCEN respectively. Note that the satisfaction rates are improved by between 2 and 9 points, for the \"occupied\" and \"potentially free\" classes.
"},{"location":"PROJ-ROOFTOPS/#423-variable-importance","title":"4.2.3 Variable importance","text":"The influence of the variables considered in the two RF models can be identified by their relative importance (Tables A1 and A2 in Appendix A) provided by the algorithm.
The models are consistent, with four common variables, namely the margin of error (MOE) of intensity, the median roughness, the mean roughness and the minimum roughness, in the top 5 most influential variables with an importance higher than 7%. Note that the ranking differs. In particular, the median roughness is showing the greatest divergence, with a difference of 11 points between the two models. It plays the most important role in the OCAN's RF (19.3%) while its role is limited in the OCEN's model (8.3%). Difference in importance for the other variables does not exceed 3.2 points. The roof plane area plays a non-negligible role in the OCEN's RF (13.6%), while this is less the case in the OCAN's RF (5.8%). The standard deviation of intensity has a greater influence in the OCAN's RF (7.8%) than in the OCEN's RF (4.6%). The percentage of overlap with non-building pixels with less than 2%, is the least important parameter for both RF.
"},{"location":"PROJ-ROOFTOPS/#43-discussion","title":"4.3 Discussion","text":""},{"location":"PROJ-ROOFTOPS/#431-manual-thresholds-vs-rf","title":"4.3.1 Manual thresholds vs RF","text":"Although both the manual threshold and the RF methods give satisfactory results (Tables 3 and 4), classification with RF is better. This result was expected, as RF is a machine learning algorithm based on 14 variables, whereas the threshold method involves manual adjustments of only 4 variables. The choice of a small number of variables for the manual thresholds was made for simplicity sake. Our choices of selecting the MOE of intensity and the median roughness were pertinent as these variables are among the most influential (Tables A1 and A2). The standard deviation of intensity plays a stronger role for the OCEN's model but its significance remains limited (7.8%). Selecting the percentage of overlap with non-building data appears to not be relevant as this variables comes last in the list of relative importance (< 2%) for both RF models. On the other hand, we missed important variables in the manual thresholds such as the minimum roughness and the mean roughness of roof planes playing a significant role (> 10%) in the RF models. The mean roughness was absent of the manual thresholds as it was considered redundant with the median roughness.
Both methods have their advantages. The manual threshold method is easy to set up and does not require GT, while the RF method is automated, i.e. it does not require an operator to perform manual testing, which can be tedious.
"},{"location":"PROJ-ROOFTOPS/#432-classification-of-small-roof-planes","title":"4.3.2 Classification of small roof planes","text":"When using the manual thresholds, small or narrow roof planes are often classified as \"occupied\", because of their median roughness above the threshold. As the roughness was calculated at a scale of 1 m (Section 4.1.1), objects located up to 1 m away from the pixel will affect its value. As a result, the roughness of small or narrow roof planes is more influenced by their surroundings than the larger ones. This is the case, for example, with empty roof planes receding or protruding from other planes. This interpretation is supported by the fact that the minimum roughness of roof planes is a critical parameter in the RF (Tables A1 and A2). For the considered roughness scale of 1 m, the minimum value strongly depends on the dimensions of the roof plane. A large roof plane can have a low minimum value, because the obstacles on it and its surroundings do not affect the roughness values over the whole plane as can be the case for a small one.
Small unobstructed roof planes could be used for the installation of solar panels or vegetation. However, the more receding or protruding they are, the more difficult it is to install facilities on them. In addition, due to the limited benefits they would represent in comparison to the effort necessary to develop them, they are not a priority in the planning strategy of the Canton of Geneva. Therefore, the fact that the algorithm often classifies small roof planes as occupied suited the experts.
"},{"location":"PROJ-ROOFTOPS/#433-differences-between-random-forests","title":"4.3.3 Differences between random forests","text":"The differences in the results obtained for OCAN and OCEN can be explained by their different requirements (Tables 3, A1 and A2).
From the OCAN's point of view, which aims to develop vegetated rooftops, some surfaces already covered with low vegetation can be considered as \"occupied\". Conversely, the presence of some obstacles on the roof plane may not prevent the installation of vegetated rooftops and can be considered as \"potentially free\". The tolerance of the presence of sparse objects on roof planes could be captured by the median roughness driving the OCAN's RF classification. From the OCEN's point of view, which aims to install solar panels, large continuous area are required for a roof plane to be considered as \"potentially free\". This is consistent with the fact that the surface area of roof planes and the minimum roughness are critical parameters in OCEN's RF.
"},{"location":"PROJ-ROOFTOPS/#434-relevance-of-the-methods","title":"4.3.4 Relevance of the methods","text":"The primary goal of classifying roof planes is to provide a product that assists experts in identifying available surfaces for the installation of future equipment. The surfaces classified as \"potentially free\" need to be examined to assess their actual potential. The surfaces classified as \"occupied\" are assumed to be unusable and should not be taken into account when estimating potential. It is therefore important to obtain robust results for this class. The experts did not specify a minimum satisfaction rate, but were satisfied with the provided results. Thus, it is planned to apply the developed method on a larger scale for use by the experts. The generated vector layers can be used alone or combined with the results of other methods, such as the those presented in Sections 5 and 6.
It should to be recognized that the classification only evaluates the occupancy of a roof plane. Other factors such as roof slope or roof material were ignored. In addition, although the LiDAR intensity was normalized, its value can vary from one acquisition campaign to another, potentially affecting the results of the classification.
"},{"location":"PROJ-ROOFTOPS/#5-lidar-segmentation","title":"5. LiDAR segmentation","text":"The goal of this second method based on LiDAR point cloud is to detect objects on rooftops. It is assumed that each roof plane can be approximated by a flat plane and that obstacles protrude from it. The processing resulted in the production of a vector layer of occupied and free surfaces per building.
"},{"location":"PROJ-ROOFTOPS/#51-method","title":"5.1 Method","text":"The roof plane vectors were merged by EGID to obtain the roof delimitation for each building. Next, the point cloud was clipped according to the roof shape using WhiteboxTools. If the building extended over several LiDAR tiles, the clipped point clouds were merged. Finally, the point clouds were filtered with the minimum altitude of the roof to retain only the roof points.
Roof segmentation was performed per building using Open3D. Each plane in the 3D point cloud was segmented using the RANSAC algorithm. The DBSCAN algorithm was applied to the points of the potential plane to mitigate noise. The cluster with the largest number of points was retained and considered as a roof plane. This process was repeated with the rest of the point cloud for the expected number of roof planes, given by the roof vector layer, as long as enough points remained. Finally, the remaining points were clustered using DBSCAN and considered as obstacles. Despite our endeavors to fix the seed and make the process deterministic, slight variations remained in the output of the RANSAC algorithm. However, the observed impact was only a few hundredths on the final metrics.
The planes and obstacles were transformed from point clusters to concave polygons using the alpha shape algorithm. A minimum, respectively maximum, threshold was set on the projected area of the planes, respectively obstacles. If a polygon had a value that did not meet the threshold of its category, its category was changed.
The results were evaluated using the metrics described in Section 3.1. Note that the GT was adapted for the optimization of this method. Indeed, LiDAR segmentation is unable to detect low objects such as lawns, extensive vegetation and empty terraces and balconies. However, these objects can occupy entire flat roofs, creating a bias in the optimization of the process that would tend to segment entire roof planes as objects. Therefore, the aforementioned objects were excluded from the ground truth when running the optimization. As explained in Section 3.2, the hyperparameters of the RANSAC and the DBSCAN algorithms, as well as the thresholds on area, were optimized for the training dataset and for subsets based on the building type and the roof type. Combinations were tested between results obtained with different sets of hyperparameters, depending on the types.
The resulting detection shapes were unsatisfactory. They were shaky and sometimes had a lot of overlap due to the 3D component of the LiDAR data as visible in Figure 5 (left). To improve the rendering, the polygons were smoothed by buffering and cropping and by applying the Visvalingam-Wyatt algorithm. Though the polygons aspect have been improved by the simplifications, they still present a shaky aspect (Fig. 5, right). Then, the overlapping detection polygons were merged for each EGID and compared to the roof extend to create a partition of the occupied and free surfaces on roofs. This post-processing was performed after the optimization.
Figure 5: Original (left) and simplified (right) polygons obtained with LiDAR segmentation.Finally, the results were submitted to the OCAN and OCEN's experts for assessment.
"},{"location":"PROJ-ROOFTOPS/#52-results","title":"5.2 Results","text":"Figure 6: Example of LiDAR segmentation results.Figure 6 shows the results for seven buildings of the GT.
"},{"location":"PROJ-ROOFTOPS/#521-effect-of-the-optimized-hyperparameters","title":"5.2.1 Effect of the optimized hyperparameters","text":"A f1 score of 0.70 and a mIoU of 0.34 were obtained on the adapted GT with the global hyperparameters. The specialized hyperparameters have different influence according to the building and roof type considered (Table 6). Administrative buildings and pitched roofs show lower f1 scores than those of other categories with values around 0.70. The mIoU is lower than 0.5 for all subsets.
Global hyperparameters
Metric Administrative Industrial Residential Flat Mixed Pitched f1 score 0.41 0.70 0.72 0.73 0.71 0.59 mIoU 0.14 0.50 0.30 0.42 0.45 0.13Specialized hyperparameters
Metric Administrative Industrial Residential Flat Mixed Pitched f1 score 0.63 0.72 0.72 0.67 0.68 0.49 mIoU 0.11 0.49 0.38 0.44 0.23 0.31Table 6: Metrics obtained with the global and specialized hyperparameters on the subsets for each type of building and roof with the adapted GT.
The f1 score obtained for administrative buildings using specialized hyperparameters is improved by about 50%, while the mIoU is reduced by about 20%. The impact of specialized hyperparameters on the segmentation of industrial and residential buildings is rather limited, with a variation of less than 2.5%, except for the mIoU of the residential buildings, which increases by about 25%. The flat roofs do not benefit from specific optimization, with variations in the f1 score and mIoU of less than 8%. On the contrary, the pitched roofs are affected by the use of specialized hyperparameters, with a decrease in the f1 score of about 27% and an increase in mIoU of 150%. In this specific case, the general hyperparameters favor the segmentation of the entire roof as an obstacle, while the specialized hyperparameters improve the distinction between roof planes and obstacles (Fig. 7). The metrics of mixed roofs are a combination of the two previous types, with a f1 score that is little affected by the use of specialized hyperparameters (< 5%) and a mIoU 50% lower.
Figure 7: Comparison of results obtained with global (left) and specialized (right) hyperparameters on buildings with pitched roofs.To take advantage of the best segmentation results, combinations were produced (Table 7).
Metric Global Combined with administrative buildings Combined with pitched roofs f1 score 0.72 0.73 0.70 mIoU 0.37 0.37 0.41
Table 7: Metrics obtained for the global results and for their combination with specialized results on the training dataset with the adapted GT.
The influence of combining the global results with the specialized hyperparameter ones is limited. The metrics vary by less than 4%, except for the mIoU, which improves by 11% when combined with the optimized results for the pitched roofs. The improvement in object segmentation is sufficient to choose to use specialized hyperparameters for the pitched roofs. In addition, this is necessary to ensure results sufficiently discriminating, as visible on Figure 7.
After applying the post-processing procedure to the combined results, the final metrics are a f1 score of 0.77 and a mIoU of 0.42. The metrics are improved by the better coverage of the detected objects thanks to polygons smoothing and merging of the detections as visible on Figure 5.
"},{"location":"PROJ-ROOFTOPS/#522-global-results","title":"5.2.2 Global results","text":"Ground truth Precision Recall f1 score mIoU Relative error (%) adapted GT, training set 0.77 0.77 0.77 0.42 11 whole GT, training set 0.78 0.77 0.78 0.35 38 whole GT, test set 0.75 0.80 0.77 0.38 26
Table 8: Metrics and relative error on the occupied area for the training dataset when using the GT adapted for the LiDAR optimization or the whole GT, as well as for the test set on the whole GT.
The f1 score remains stable when using the whole GT (Table 8), meaning that the extensive vegetation, lawn and terraces that were removed are detected. However, they are not correctly delineated, resulting in a drop of about 17% in the mIoU and an increase in the relative error of the occupied area by a factor of 3.5. The values for the precision and recall are always close. The results obtained with the test dataset are consistent with those obtained with the training dataset (Table 8). The observations made in the following sections about the characteristics of the detections in the training dataset, are also valid for the test dataset.
"},{"location":"PROJ-ROOFTOPS/#523-detection-characteristics","title":"5.2.3 Detection characteristics","text":"Figure 8: Number of TP, FP and FN as a function of object area.Figure 8 shows that labeled objects with an area greater than 1 m2 are well detected, with f1 scores between 0.82 and 0.92. On the other hand, detection of objects with an area lower than 1 m2 is less trustworthy with a majority of FP detections and almost half of the labels tagged as FN.
Figure 9: Number of TP, FP and FN as a function of the distance of the centroid of the object from the roof edge.Figure 9 shows that labeled objects whose centroid is more than 1 m from the roof edge are well detected with a f1 score between 0.80 and 0.85. On the other hand, among the detections whose centroid is less than 1 m from the roof edge, 65% are FP, making them less trustworthy.
Visualizing the detections, we note that FP covering no obstacle, although they do exist, are rare. Most of the FP form a group of small detections delimiting a roof edge, sometimes detecting barriers that have not been vectorized as obstacles in the GT. Therefore, the FP having an area smaller than 1 m2 must often be the same as those with a centroid closer than 1 m from the roof edge.
Object class Recall Antenna 0.24 Pipe 0.59 Lawn 0.70 Other obstacle 0.70 Extensive vegetation 0.72 Window 0.76 Chimney 0.79 Aero 0.83 Solar thermal 0.83 Intensive vegetation 0.88 Solar unknown 0.89 Balcony / terrace 0.90 Solar photovoltaic 0.92
Table 9: Recall for each object class of the ground truth.
The developed method shows good performance in detecting most of the object classes (Table 9). Aeration outlets, balconies and terraces, intensive vegetation, and solar facilities all have recalls greater than 0.80. Antennas are the most difficult class to detect with a recall of 0.24. Next come pipes lawn, other obstacles and extensive vegetation with recall values between 0.59 and 0.72. Windows and chimneys, which are low and thin objects respectively, are detected satisfactorily with a recall of 0.76 and 0.78 respectively.
Although the detection of objects is globally satisfactory, the reproduction of their shape was not assessed. For example, only the upper parts of solar panels are generally detected as shown in Figure 10.
Figure 10: Example of results for the segmentation of solar panels using the segmentation of LiDAR data.The same goes for lawn and extensive vegetation, which are always partially detected (Fig 11).
Figure 11: Roof with a terrace and an area of extensive vegetation (left). Both are detected as TP, but are not covered by the area detected as occupied (right)."},{"location":"PROJ-ROOFTOPS/#524-estimated-area","title":"5.2.4 Estimated area","text":"Administrative Industrial Residential Flat Mixed Pitched Area labeled as occupied 4,986 32,720 19,399 54,875 1,386 844 Area detected as occupied 1,195 20,953 12,986 30,980 3,052 1,102 Total area 6,692 78,011 33,278 108,415 5,018 4,584
Table 10: Occupied area for the labels and the detections, as well as the total roof area in m2.
In total, 35,134 m2 of roofs were detected as occupied while 57,105 m2 were labeled as such (Table 10). This represents an error of 38%. Administrative buildings have the largest error in estimating the occupied area with an error of 76% compared to less than 37% for other building types. The occupied area is underestimated for flat roofs, while it is overestimated for pitched and mixed roofs. The mixed roofs have an error of 120%, i.e. the estimated occupied area is about twice larger than the actual value. Flat and pitched roofs each have an error of 44% and 30% respectively.
"},{"location":"PROJ-ROOFTOPS/#525-expert-assessment","title":"5.2.5 Expert assessment","text":"The experts were at least partially satisfied by more than 69% of the segmented roofs (Table 11).
Evaluation OCAN OCEN Not satisfied 22% 31% Partially satisfied 54% 33% Satisfied 24% 36%
Table 11: Expert's satisfaction with the results produced using segmentation of LiDAR data. OCAN's expert assessed 122 buildings, while OCEN's expert assessed 39 buildings.
The most satisfactory types were the administrative buildings and the flat roofs, while the most unsatisfactory types were the industrial buildings and the mixed roofs. This is in contradiction with the metrics for which the administrative buildings have the lowest mIoU.
"},{"location":"PROJ-ROOFTOPS/#53-discussion","title":"5.3 Discussion","text":""},{"location":"PROJ-ROOFTOPS/#531-global-capability-of-the-method","title":"5.3.1 Global capability of the method","text":"The method proved its ability to detect objects with a f1 score of 0.78 on the whole GT. The primary goal, which was to detect in priority large and roof-centered objects, was satisfactorily achieved. Indeed, objects larger than 1 m2 have a f1 score higher than 0.81 and objects with their centroid at more than 1 m from the roof edge have a f1 score higher than 0.79.
However, the mIoU remains lower than 0.50, indicating that the shapes of the detections are poorly reproduced. It should be noted that this metric seems to be sensitive to small variations in shape, making it a very strict metric. In addition, it should be remembered that the mIoU evaluates the delimitation of the occupied surface at the roof scale, including TP, FP and FN detections.
In addition to the low mIoU, the global estimation of the occupied area is medium with an error of 38%. However, this value is reduced to 11% when the lawn, the empty balconies and most of the extensive vegetation are removed from the ground truth. This highlights that the method is generally good, except for the segmentation of low objects. Indeed, although the aforementioned classes have a recall of 0.70 or higher, we note that their total area is largely underestimated, as visible on Figures 10 and 11. Once those classes removed from the GT, most of the false detections and missed objects have a small area (Fig. 8).
"},{"location":"PROJ-ROOFTOPS/#532-problematic-detections-and-labels","title":"5.3.2 Problematic detections and labels","text":""},{"location":"PROJ-ROOFTOPS/#5321-false-positive-detections","title":"5.3.2.1 False positive detections","text":"The majority of FP detections cover small areas (Fig. 8) and are located near the roof edges (Fig. 9). In many cases, FP detections near to the edge actually detect barriers that are not labeled in the GT. It may be considered that the protruding roof edges should be included in the annotation to improve the precision and better reflect the actual performance of the method.
"},{"location":"PROJ-ROOFTOPS/#5322-by-object-type","title":"5.3.2.2 By object type","text":"Antennas are often missed (Table 9). We think that LiDAR points due to the presence of an antenna were considered as noise during point clustering, because the object is represented by only a few points due to its morphology and the LiDAR density. Improving antenna detection in the LiDAR point cloud may require specific developments or the use of a denser point cloud. Other thin objects, such as small chimneys, are also missed.
As shown in Section 5.2.3, low objects such as windows, extensive vegetation, lawns, and pipes are more difficult to detect, as they do not protrude above roof planes. The recall, between 0.58 and 0.76, is acceptable, but the shape of objects is always only partially detected.
Finally, the method also has trouble detecting objects in the \"other obstacle\" class (Table 9). However, the lack of a precise definition for this category makes it difficult to label and it encompasses several types of objects. In particular, it would be necessary to define with measurable values when a roof part is labeled as a free surface or as \"other obstacle\".
"},{"location":"PROJ-ROOFTOPS/#5323-by-building-type-and-roof-type","title":"5.3.2.3 By building type and roof type","text":"We notice that administrative buildings tend to have low or small objects, such as windows, extensive vegetation, or small chimneys. In addition, there are many FP detections because of the roof edges. The use of specialized hyperparameters does not improve the metrics on the adapted GT used for the optimization (Tables 6 and 7), supporting the difficulty to detect these objects. At the cantonal level, the error on the estimated surface should have a limited impact, since the administrative buildings represent only a small fraction of all the buildings. However, as they belong to the state, they could be prioritized for the installation of facilities on their roof.
The pitched roofs are often segmented into a single obstacle when using the global hyperparameters (Fig. 7). This could be explained by the fact they have a different typology than other roofs. In addition, the training dataset is dominated by flat roofs, with 72 buildings against 29 buildings with pitched roofs. The hyperparameters resulting from the optimization are therefore better suited to the typology of flat roofs, motivating our choice to use specialized hyperparameters for pitched roofs. Pitched roofs can be automatically identified in the Canton of Geneva using the slope available in the roofs and buildings vector layers. If this information is not available, for instance in another city or canton, areas of interest can be defined, as pitched roofs are generally located in residential areas and old towns.
Most of the 21 roofs assigned to the \"mixed\" type have the entirety of their flat or pitched planes segmented as obstacles. Some of them would have benefited from being segmented with the parameters for pitched roofs. The definition of a pitched roof have to be studied further to define more precisely when to use the specialized hyperparameters.
"},{"location":"PROJ-ROOFTOPS/#533-limitation-and-further-developments","title":"5.3.3 Limitation and further developments","text":"The process relies on a roof vector layer. Methods exist to produce this information automatically1013. Their application should be tested in order to extend the project to areas where a roof vector layer does not yet exist. The one for the Canton of Geneva is produced manually to guarantee its quality.
Variations in detection quality were observed from one building to another. In addition, the detection shapes are not intuitive, making them difficult to interpret and less pleasing to the eye. Therefore, despite the method's respectable results, the experts were not interested in taking the algorithm to the production stage.
The visual aspect of the results could be improved by modifying the vectorization function to smooth the polygons directly during their production. Alternatively, more advanced processing could try to take advantage of the fact that obstacles have simple geometries, like a cylinder for straight pipes or a parallelepiped for aeration. by trying to match these shapes to the clustered point cloud, more precise and visually pleasing detections could be produced.
"},{"location":"PROJ-ROOFTOPS/#6-image-segmentation","title":"6. Image segmentation","text":"The third method consists of segmenting all the potential objects present in a given image. The processing resulted in the production of a vector layer of occupied surfaces per building.
"},{"location":"PROJ-ROOFTOPS/#61-method","title":"6.1 Method","text":"Figure 12: Illustration of the different steps in the image segmentation workflow for EGID 1005027. Black polygons correspond to the roof delimitation. (a) Bounding box (blue polygon) used to clip the true orthophotos for a given roof with a 1 m positive buffer. (b) Segmentation masks (colored pixels) obtained by processing the tile with SAM. (c) Vector masks (red polygons) of the detected objects after post-processing. (d) Detection tags assigned to the vectors."},{"location":"PROJ-ROOFTOPS/#611-image-preparation","title":"6.1.1 Image preparation","text":"Similar to the LiDAR segmentation workflow, we adopted a per-roof processing strategy to process the true orthophotos. For each selected building, the roof delimitation was used to derive a bounding box from which the true orthophoto was clipped (Fig. 12(a)). In case the roof was spread over several true orthophotos, the images were first merged. One tile was obtained for each roof considered. The tiles have the same pixel resolution as the true orthophotos, but different sizes according to the roof size.
"},{"location":"PROJ-ROOFTOPS/#612-object-segmentation-and-vectorization","title":"6.1.2 Object segmentation and vectorization","text":"First, potential objects visible in images were segmented using Segment Anything Model14 (SAM) implemented with PyTorch. It aims to be an open-source foundation model for object segmentation in images with strong zero-shot generalization capabilities. Instance segmentation is performed using a vision transformer (ViT-H) model and a mask is produced for each detected object (Fig. 12(b)). For the project, the default pre-trained model (checkpoints: sam_vit_h_4b8939
) was used without any fine-tuning specific to roof objects. Although the object classes are available in the GT dataset, no classification was performed. Second, SAM does not handle georeferenced datasets. To simplify the process of leveraging SAM for geospatial data analysis, we used the Python library segment-geospatial15 (samgeo). Georeferenced tiles are used as input to the algorithm. The coordinate reference system of the image is assigned to the SAM masks and their corresponding polygon vectors (Fig. 12(c)).
Some large buildings, up to 300 m in length, may be encountered. In this case, the number of pixels in the tile can saturate the RAM during image segmentation. To handle this issue, large tiles are split into smaller sub-tiles of 512 px size. Boundary effects are the downside of this method. Sub-tiles are processed individually by SAM. The output masks are then merged to recover the original tile extent, but the joints between the sub-tile masks may not match, causing artifacts in the vector layer (Fig. 13).
Figure 13: The squared orange polygon is an artifact due to the tiling performed ot process large tiles (EGID 1011376). The tagged detections are superimposed on (left) the segmented masks (white: detection, black: background) and (right) the true orthophotos. Grey and black polygons correspond to the building delineation."},{"location":"PROJ-ROOFTOPS/#613-result-filtering","title":"6.1.3 Result filtering","text":"To improve the quality of the results, post-processing tasks were performed. Polygons were discarded based on geometric considerations:
The vector layer of detected objects was clipped with the roof delimitation polygon to ensure that objects did not overlap several roofs (Fig. 12(c)).
Each building was processed independently. The vector layers were finally merged into a single layer.
"},{"location":"PROJ-ROOFTOPS/#614-assessment-and-hyperparameter-optimization","title":"6.1.4 Assessment and hyperparameter optimization","text":"The detections were compared to the GT labels (Fig. 12(d)) and metrics were calculated (Section 3.1) to evaluate the performance of the algorithm and the choice of post-processing parameters.
SAM displays numerous hyperparameters for which values were assigned after running the optimization workflow presented in Section 3.2. Depending on some hyperparameter values and the image size, processing a single image can take several minutes. Since the optimization requires tens of iterations, it was unreasonable to run the process on the entire training dataset. Therefore, we chose to sub-sample the training dataset down to 25 roofs (Table 1), selected to be representative of the entire dataset. Running the optimization process on this subset for 50 iterations took between 1 and 2 days using a 16 GiB GPU machine. Based on several replications of the optimization process, including one performed on 100 trials, four of the most influential hyperparameters14 were identified: (1) the threshold on the stability score of the predicted mask, (2) the stability score offset, (3) the box IoU cutoff used by non-maximal suppression to filter duplicated masks and (4) the prediction threshold. The other SAM hyperparameters have a limited impact on the value ranges explored. However, we noticed that the number of points sampled per side strongly influence the processing duration (Table 12). 64 points per side is a good trade-off between performance and computation time, which guided our final choice to set this value.
Points per side f1 score mIoU Duration (min) 128 0.75 0.40 43 96 0.75 0.44 25 64 0.74 0.41 12 32 0.66 0.40 4
Table 12: Influence of the number of points sampled per image side on the f1 score, mIoU and duration of segmentation using SAM on a 16 GiB GPU machine. Results obtained on the training sub-sampled dataset with the optimized hyperparameter values set in the configuration file and only varying the points per side.
The selected hyperparameter values can be found in the configuration file of the image segmentation workflow.
"},{"location":"PROJ-ROOFTOPS/#62-results","title":"6.2 Results","text":""},{"location":"PROJ-ROOFTOPS/#621-global","title":"6.2.1 Global","text":"Figure 14: Example of a result obtained with the image segmentation workflow. Free surfaces were obtained by subtracting detected objects from the roof boundary (black polygons).The image segmentation method produced vectors of detected objects for each roof considered (Fig. 14). The metrics obtained for the different datasets are presented in Table 13. They were obtained for a set of hyperparameters that balanced the precision and the recall as much as possible. Alternative result, obtained with a set of hyperparameters promoting the recall over the precision was also produced and evaluated but was not preferred by the experts, in particular due to the presence of large FP detections segmenting whole roofs. Overall, similar metric values are obtained for the different datasets, demonstrating the consistency of the method.
Dataset Precision Recall f1 score mIoU Relative error (%) Training subset 0.73 0.78 0.75 0.41 7 Training 0.75 0.82 0.78 0.37 42 Test 0.75 0.71 0.73 0.37 23
Table 13: Metrics and relative errors on the occupied area for the training and the test datasets.
Satisfactory f1 scores, between 0.73 and 0.78, are achieved. We note a slight imbalance, from 4 to 7 points, between the precision and the recall according to the different datasets. The value of the mIoU are modest, ranging between 0.37 and 0.41, with a standard deviation of about 0.20 (later noted as +/- 0.20). High mIoU (>= 0.70) are associated with high f1 score (0.87 +/- 0.11 in average) but the opposite is not true (Fig. 15). When an object is detected, the method usually shows good ability to segment it accurately. However, small discrepencies with GT shapes lower down the IoU value significantly.
Figure 15: Examples of detections with high f1 scores but variables IoU. (left) Roof segmentation (EGID 295060134) with both high f1 score and IoU and (right) roof segmentation (EGID 1023590) with a high f1 score and an averaged IoU.Finally, the surface area occupied by the detected objects in the training and test datasets, which represents about 30% of the total area, is significantly underestimated compared with the 50% of the GT (Table 1). This results in relative errors between 23% and 42%, for the test and training datasets respectively. Note the significant variation in relative errors depending on the dataset considered, in particular, the variability between the training subset and the training datasets. This highlights that errors concerning large objects can drastically increase the relative error on area estimations.
"},{"location":"PROJ-ROOFTOPS/#622-roof-characteristics","title":"6.2.2 Roof characteristics","text":"Metric Administrative Industrial Residential Flat Mixed Pitched f1 score 0.81 0.80 0.77 0.79 0.75 0.74 mIoU 0.23 0.33 0.38 0.30 0.39 0.51
Table 14: Metrics calculated by building and roof type for the training dataset.
Table 14 shows that the model detects objects similarly for the different building and roof types with a f1 score within 5 points for each types. The mIoU depends on the roof characteristics. Industrial and residential buildings have a mIoU of about 0.35, while the value for the administrative buildings is about 35% lower. The mIoU increases from flat, to mixed, to pitched roof over a range of about 20 points.
Figure 16: Comparison of the occupied and free surface areas of the GT labels and the detections according to (top) the building types and (bottom) the roof types for the training dataset.The detected occupied areas are underestimated regardless of the building type (Fig. 16, top). Industrial and residential buildings have a relative error of about 40%, while the administrative buildings have a higher error of 67%. The occupied areas of pitched and mixed roofs are accurately estimated with a relative error of about 10%, while the performance for the flat roof is worse, with an error of 43% (Fig. 16, bottom. Remember that all administrative and industrial roofs have a flat roof.
Considering the similar f1 scores obtained for all the roof properties and the significant amount of time required to run the optimization workflow, no specific optimization was carried out to date.
"},{"location":"PROJ-ROOFTOPS/#623-object-characteristics","title":"6.2.3 Object characteristics","text":""},{"location":"PROJ-ROOFTOPS/#6231-class","title":"6.2.3.1 Class","text":"The image segmentation method detects objects of different classes (Fig. 17) with an average recall of 0.80 +/- 0.11, with the exception of pipes, which performs significantly worse with a recall of 0.27.
Figure 17: Recall for each object class. The results are obtained for the training dataset.Lawns, PV panels and windows are particularly well detected with recall values above 0.93.
"},{"location":"PROJ-ROOFTOPS/#6232-surface-area","title":"6.2.3.2 Surface area","text":"Figure 18 shows that objects with a surface area between 0.5 m2 and 100 m2 are detected with equal performances by the algorithm with a recall of 0.84 +/- 0.02. Smaller and larger objects are more difficult to detect, with 65% and 76% of GT objects detected, respectively.
Figure 18: Number of TP and FN labels, as well as FP detections, depending on the object area (m2). The results are obtained on the training dataset.The proportion of FP detections increases for surface areas of less than 1 m2 leading to an average precision of 0.60 +/- 0.06, while the average precision for larger objects is 0.83 +/- 0.05.
"},{"location":"PROJ-ROOFTOPS/#6233-position-on-the-roof","title":"6.2.3.3 Position on the roof","text":"Objects are well detected, with an average recall of 0.83 +/- 0.04, as long as their centroid is more than 1 m from the roof edge (Fig. 19). For objects closer to the roof edge, the recall is only 0.56.
Figure 19: Number of TP and FN labels, as well as FP detections, depending on the distance of the object centroid to the roof edge (m). The results are obtained for the training dataset.The precision also decreases significantly for objects located near to the roof edge, from an average of 0.77 +/- 0.03 for object centroids more than 1 m away to 0.51 below, due to an increase of FP detections.
"},{"location":"PROJ-ROOFTOPS/#624-expert-assessment","title":"6.2.4 Expert assessment","text":"The experts are at least partially satisfied by over 86% with the image segmentation method (Table 15).
Evaluation OCAN OCEN Not satisfied 6% 14% Partially satisfied 40% 49% Satisfied 54% 37%
Table 15: Expert's satisfaction with the results produced using image segmentation. OCAN's expert assessed 122 buildings, while OCEN's expert assessed 39.
Satisfaction is independent of the building type and the roof type which is consistent with the f1 score (Table 13). Slightly lower satisfaction are attributed to the administrative buildings and the flat roofs, which is consistent with the fact that these types have the lowest mIoU.
The experts are generally satisfied with the shapes of the detection polygons and the consistency of the results from one building to another.
"},{"location":"PROJ-ROOFTOPS/#63-discussion","title":"6.3 Discussion","text":""},{"location":"PROJ-ROOFTOPS/#631-limits-to-object-segmentation","title":"6.3.1 Limits to object segmentation","text":"Although the workflow provides overall satisfactory results, there are inherent limitations when using the SAM algorithm to detect roof objects:
SAM is a pre-trained model showing good zero-shot generalization performance but is not dedicated to detect objects on roofs. The possibility of fine-tuning the model can be considered to improve the performance. It would require additional training with the dataset at our disposal (true orthophotos plus GT annotation).
"},{"location":"PROJ-ROOFTOPS/#632-reproduction-of-object-shape","title":"6.3.2 Reproduction of object shape","text":"We acknowledge that the mIoU has low values on the training and test datasets (Table 13), but note that this metric is strict. It is sensitive to the detection or not of an object as it was computed on all the polygons present on the roof, including TP, FP and FN. Thus, if a large object is not detected or if there is a large FP detection, the mIoU value will be strongly affected. In addition, the IoU metric is also sensitive to discrepancies in polygon shapes with the GT (Fig. 16(b)). While the object delineation may appear satisfactory from visual inspection, the metric may display low value. This aspect is difficult to improve as it dependents on the segmentation model and the GT delineation strategy. Overall, the shape of the object is usually satisfactorily reproduced when detected (Figs. 16 and 21).
The method tends to underestimate the occupied surface area (Fig. 17). Thus, the estimated free surface constitutes an upper limit to assess the potential. The small relative error of 10% obtained on the occupied area for the mixed and pitched roofs can be explained by the fact that they generally correspond to villas with limited roof surfaces and a \"simple\" arrangement of small objects. In comparison, industrial roofs can be large with complex arrangements of objects such as pipes, solar panels or ventilation systems. Therefore, detection errors on villas have usually less impact on the area estimation than detection errors on large industrial buildings.
"},{"location":"PROJ-ROOFTOPS/#633-relevance-of-the-method","title":"6.3.3 Relevance of the method","text":"The results provide strong arguments in favor of the ability of the image segmentation method to correctly detect and segment objects. The fact that the metrics are consistent between the different datasets (Table 13) is encouraging for its applicability to a wider area with a variety of buildings.
The performance of the method is lower for small objects (Fig. 19) and objects close to the roof edge (Fig. 20). However, the accurate detection of these objects is less critical as they interfere less with the continuity of the roof for the potential installation of solar panels and vegetated roofs.
The experts were satisfied with the results and interested in putting the method into production. However, the current processing time, about 12 min for 25 buildings, is an hindrance to extending the method to the whole canton of Geneva, gathering about 80,000 buildings. Parallelizing the algorithm to apply the method to an area of interest should be considered.
Finally, true orthophotos were used in this case. These provide the actual position of an object on a roof. However such product is rare because it is more expensive to produce. Thus, the product may not be available or regularly updated. However, we are confident that this segmentation method can be applied to orthophotos, more regularly acquired. In this case, methods for reprojecting the position of roofs and/or vectors will need to be explored.
"},{"location":"PROJ-ROOFTOPS/#7-combination-of-results","title":"7. Combination of results","text":"The developed methods display different strengths and weaknesses. For instance, LiDAR segmentation has difficulty detecting low and thin objects, which image segmentation does not. Conversely, image segmentation has difficulty with color change segmentation and pipe detection, which LiDAR segmentation does not. Therefore, combining the two results could yield interesting outcomes.
"},{"location":"PROJ-ROOFTOPS/#71-method","title":"7.1 Method","text":"Two combinations of results were tested:
The resulting combined vector layers were then assessed with metrics but not by the experts.
"},{"location":"PROJ-ROOFTOPS/#72-results","title":"7.2 Results","text":"Combination method Precision Recall f1 score mIoU Relative error (%) Concatenation 0.68 0.94 0.79 0.45 8 Spatial join 0.81 0.69 0.75 0.33 48
Table 16: Metrics obtained for the training dataset.
Comparing Tables 8 and 13 with Table 16, we note that the combination results in similar f1 scores, around 0.77. However, precision and recall values are affected differently. The recall increases by more than 10 points with the concatenation method, reaching the excellent value of 0.94. This means that most of the GT objects are detected, including the pipes reaching the satisfactory value of 0.68. On the other hand, the proportion of FPs increases, diminishing the precision by about 8 points. The spatial join discards all the single FP detections, improving the precision by 3 to 6 points. Non-overlapping TPs are discarded as well, reducing the recall value by more than 8 points.
The concatenation has a positive impact, more than 10 points, on the mIoU compared to the spatial join, which provides higher values than the segmentation methods. Finally, the relative error on the occupied surface is significantly reduced to less than 10% by concatenating the results while it increases to about 50% with the spatial join method.
"},{"location":"PROJ-ROOFTOPS/#73-discussion","title":"7.3 Discussion","text":"Combining the results does not improve the f1 score, but allows for modulation of the results, i.e. whether favor precision or recall, depending on the needs (Table 16).
The high recall value obtained with concatenation proves the complementarity of the two methods for detecting different objects. Note that in this case, the final vector layer contains polygons with different aspects. A higher recall value tends to favor the mIoU, since more GT objects are detected, despite the addition of FP. The surface of the detected object is thus improved, but the addition of FPs also contributes to the reduction of the relative error on the occupied surface, which must be carefully analyzed.
Note that the results of the object segmentation can also be combined with occupancy classification to refine the information on the \"potentially free\" roof planes. Finally, although incomplete, the roof and roof superstructure vector layers produced by the State of Geneva contain vectors of some roof objects that can be used additionally to improve the accuracy of the results.
"},{"location":"PROJ-ROOFTOPS/#8-conclusion","title":"8. Conclusion","text":"Detecting objects on rooftops is a key aspect of assessing the potential for installing facilities in cities, such as solar panels and vegetated rooftops. The STDL explored three methods to achieve this objective, based on machine learning and deep learning algorithms and on LiDAR, aerial imagery and vector data. All methods provided satisfactory results. Occupancy classification enabled roof planes to be classified with 85% accuracy. The two segmentation methods reached similar results, with a f1-score of about 0.77, a mIoU of about 0.36 and a relative error on the detected occupied area of 40%. In particular, segmentation methods have made it possible to accurately detect large objects and objects centered on the roof, which are most likely to constitute obstacles to the installation of facilities.
Overall, the beneficiaries were satisfied with all the methods, with at least 70% of buildings having satisfactory detections. Despite similar performance to image segmentation, LiDAR segmentation was considered the least satisfactory due to the appearance of the detection shapes and the varying results between buildings and object classes. Image segmentation gives satisfactory results overall, but at the current stage, the processing time is unrealistic to consider scaling up the method at the cantonal level. Further developments are required to reduce the computational cost. Finally, the classification method reconciles both accurate results and fast processing time. Therefore, it was selected for an application at the cantonal level. A vector layer indicating the presumed occupancy of roof planes will be produced helping the beneficiaries to find and assess areas potentially available for new installations.
Combining the results is an asset to enhance the strengths of the different methods. Combining segmentation results increases either precision or recall, depending on the chosen method, without changing the f1 score. A better recall translated into an enhanced delineation of the occupied area on a roof. Cross-referencing information sources, such as occupation classification and published vector layers, can improve results accuracy and help identify areas of interest.
It should be noted that the results are in line with the STDL's objective to automatically detect occupied and free surfaces on roofs. These results from numerical models are indications that need to be verified by an expert as part of an installation project. Our results do not indicate whether a facility can actually be installed. Additional parameters such as roof material, slope, solar potential, protected buildings, etc., which affect the possibility and prioritization of an installation, are not taken into account and are the responsibility of the beneficiaries.
"},{"location":"PROJ-ROOFTOPS/#code-availability","title":"Code availability","text":"The codes are available on the STDL's GitHub repository: proj-rooftops
"},{"location":"PROJ-ROOFTOPS/#acknowledgements","title":"Acknowledgements","text":"This project was made possible thanks to a tight collaboration between the STDL team and beneficiaries from the offices of the Etat de Gen\u00e8ve. In particular, the STDL team acknowledges the key contributions from Basile Grandjean (OCEN), Benjamin Guinaudeau (OCAN), Alisa Freyre (PanData), Mayeul Gaillet (DIT) and Geraldine Chollet (OCAN). We thank PanData for the production of the ground truth. This project has been funded by Strategie Suisse pour la G\u00e9oinformation.
"},{"location":"PROJ-ROOFTOPS/#appendix","title":"Appendix","text":""},{"location":"PROJ-ROOFTOPS/#a-variable-importance-in-the-random-forests","title":"A. Variable importance in the random forests","text":"Variable Importance for OCAN median roughness 19.3 margin of error of intensity 16.8 mean roughness 15.7 minimum roughness 10.6 standard deviation of intensity 7.8 area 5.8 mean intensity 4.5 median intensity 3.9 minimum altitude 3.5 standard deviation of roughness 3.3 maximum intensity 3.0 maximum roughness 2.5 minimum intensity 2.4 % of overlap with non-building data 1.1Table A1: List of the variables considered in the random forest and their importance in the classification for the OCAN.
Importance for OCEN margin of error of intensity 17.6 minimum roughness 17.4 area 13.5 mean roughness 10.7 median roughness 8.3 standard deviation of roughness 5.5 standard deviation of intensity 4.6 maximum roughness 4.2 maximum intensity 4.1 minimum altitude 3.6 mean intensity 3.1 minimum intensity 3.0 median intensity 2.4 % of overlap with non-building data 2Table A2: List of the variables considered in the random forest and their importance in the classification for the OCEN.
"},{"location":"PROJ-ROOFTOPS/#references","title":"References","text":"Qing Zhong, Jake R. Nelson, Daoqin Tong, and Tony H. Grubesic. A spatial optimization approach to increase the accuracy of rooftop solar energy assessments. Applied Energy, 316:119128, June 2022. URL: https://linkinghub.elsevier.com/retrieve/pii/S0306261922005062 (visited on 2022-05-27), doi:10.1016/j.apenergy.2022.119128.\u00a0\u21a9\u21a9
Junjing Yang, Devi Llamathy Mohan Kumar, Andri Pyrgou, Adrian Chong, Mat Santamouris, Denia Kolokotsa, and Siew Eang Lee. Green and cool roofs\u2019 urban heat island mitigation potential in tropical climate. Solar Energy, 173:597\u2013609, October 2018. URL: https://linkinghub.elsevier.com/retrieve/pii/S0038092X18307667 (visited on 2024-03-21), doi:10.1016/j.solener.2018.08.006.\u00a0\u21a9
Dan Stowell, Jack Kelly, Damien Tanner, Jamie Taylor, Ethan Jones, James Geddes, and Ed Chalstrey. A harmonised, high-coverage, open dataset of solar photovoltaic installations in the UK. Scientific Data, 7(1):394, November 2020. URL: https://www.nature.com/articles/s41597-020-00739-0 (visited on 2024-03-21), doi:10.1038/s41597-020-00739-0.\u00a0\u21a9
Nima Narjabadifam, Mohammed Al-Saffar, Yongquan Zhang, Joseph Nofech, Asdrubal Cheng Cen, Hadia Awad, Michael Versteege, and Mustafa G\u00fcl. Framework for Mapping and Optimizing the Solar Rooftop Potential of Buildings in Urban Systems. Energies, 15(5):1738, February 2022. URL: https://www.mdpi.com/1996-1073/15/5/1738 (visited on 2024-03-21), doi:10.3390/en15051738.\u00a0\u21a9
Youssef El Merabet, Cyril Meurie, Yassine Ruichek, Abderrahmane Sbihi, and Raja Touahni. Building Roof Segmentation from Aerial Images Using a Line and Region-Based Watershed Segmentation Technique. Sensors, 15(2):3172\u20133203, February 2015. URL: http://www.mdpi.com/1424-8220/15/2/3172 (visited on 2023-03-28), doi:10.3390/s150203172.\u00a0\u21a9\u21a9
Jordan M. Malof, Kyle Bradbury, Leslie M. Collins, and Richard G. Newell. Automatic detection of solar photovoltaic arrays in high resolution aerial imagery. Applied Energy, 183:229\u2013240, December 2016. URL: https://linkinghub.elsevier.com/retrieve/pii/S0306261916313009 (visited on 2024-03-21), doi:10.1016/j.apenergy.2016.08.191.\u00a0\u21a9
Sebastian Krapf, Lukas Bogenrieder, Fabian Netzler, Georg Balke, and Markus Lienkamp. RID\u2014Roof Information Dataset for Computer Vision-Based Photovoltaic Potential Assessment. Remote Sensing, 14(10):2299, May 2022. URL: https://www.mdpi.com/2072-4292/14/10/2299 (visited on 2022-05-27), doi:10.3390/rs14102299.\u00a0\u21a9\u21a9
Roberto Castello, Simon Roquette, Martin Esguerra, Adrian Guerra, and Jean-Louis Scartezzini. Deep learning in the built environment: automatic detection of rooftop solar panels using Convolutional Neural Networks. Journal of Physics: Conference Series, 1343(1):012034, November 2019. URL: https://iopscience.iop.org/article/10.1088/1742-6596/1343/1/012034 (visited on 2024-03-21), doi:10.1088/1742-6596/1343/1/012034.\u00a0\u21a9
Alexander Apostolov, August Baum, Ghali Chraibi, and Roberto Castello. Automatic detection of available area for rooftop solar panel installation. Technical Report, EPFL, December 2020. URL: https://www.epfl.ch/labs/mlo/wp-content/uploads/2021/05/crpmlcourse-paper859.pdf.\u00a0\u21a9
Fayez Tarsha Kurdi, Mohammad Awrangjeb, and Alan Wee-Chung Liew. Automated Building Footprint and 3D Building Model Generation from Lidar Point Cloud Data. In 2019 Digital Image Computing: Techniques and Applications (DICTA), 1\u20138. Perth, Australia, December 2019. IEEE. URL: https://ieeexplore.ieee.org/document/8946008/ (visited on 2024-03-21), doi:10.1109/DICTA47822.2019.8946008.\u00a0\u21a9\u21a9
Mohammad Aslani and Stefan Seipel. Automatic identification of utilizable rooftop areas in digital surface models for photovoltaics potential assessment. Applied Energy, 306:118033, January 2022. URL: https://www.sciencedirect.com/science/article/pii/S0306261921013283 (visited on 2023-03-24), doi:10.1016/j.apenergy.2021.118033.\u00a0\u21a9
Shuhei Watanabe. Tree-Structured Parzen Estimator: Understanding Its Algorithm Components and Their Roles for Better Empirical Performance. Technical Report, University of Freiburg, May 2023. arXiv:2304.11127 [cs]. URL: http://arxiv.org/abs/2304.11127 (visited on 2024-04-29), doi:10.48550/arXiv.2304.11127.\u00a0\u21a9
Zhen Qian, Min Chen, Teng Zhong, Fan Zhang, Rui Zhu, Zhixin Zhang, Kai Zhang, Zhuo Sun, and Guonian L\u00fc. Deep Roof Refiner: A detail-oriented deep learning network for refined delineation of roof structure lines using satellite imagery. International Journal of Applied Earth Observation and Geoinformation, 107:102680, March 2022. URL: https://linkinghub.elsevier.com/retrieve/pii/S030324342200006X (visited on 2022-05-27), doi:10.1016/j.jag.2022.102680.\u00a0\u21a9
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Doll\u00e1r, and Ross Girshick. Segment Anything. April 2023. arXiv:2304.02643 [cs]. URL: http://arxiv.org/abs/2304.02643 (visited on 2024-04-09).\u00a0\u21a9\u21a9
Qiusheng Wu and Lucas Prado Osco. Samgeo: A Python package for segmenting geospatial data with the Segment Anything Model (SAM). Journal of Open Source Software, 8(89):5663, September 2023. URL: https://joss.theoj.org/papers/10.21105/joss.05663 (visited on 2024-03-22), doi:10.21105/joss.05663.\u00a0\u21a9
Xiaoxia Liu, Fengbao Yang, Hong Wei, and Min Gao. Shadow Removal from UAV Images Based on Color and Texture Equalization Compensation of Local Homogeneous Regions. Remote Sensing, 14(11):2616, May 2022. URL: https://www.mdpi.com/2072-4292/14/11/2616 (visited on 2024-03-22), doi:10.3390/rs14112616.\u00a0\u21a9
Nicolas Beglinger (swisstopo) - Clotilde Marmy (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo) - Thilo D\u00fcrr-Auster (Canton of Fribourg) - Daniel K\u00e4ser (Canton of Fribourg)
Proposed by the Service de l'environnement (SEn) of the Canton of Fribourg - PROJ-SOILS May 2023 to April 2024 - Published in April 2024
All code is available on GitHub.
Abstract: This project focuses on developing an automated methodology to distinguish areas covered by pedological soil from areas comprised of non-soil. The goal is to generate high-resolution maps (10cm) to aid in the location and assessment of polluted soils. Towards this end, we utilize deep learning models to classify land cover types using raw, raster-based aerial imagery and digital elevation models (DEMs). Specifically, we assess models developed by the Institut National de l\u2019Information G\u00e9ographique et Foresti\u00e8re (IGN), the Haute Ecole d'Ing\u00e9nierie et de Gestion du Canton de Vaud (HEIG-VD), and the Office F\u00e9d\u00e9ral de la Statistique (OFS). The performance of the models is evaluated with the Matthew's correlation coefficient (MCC) and the Intersection over Union (IoU), as well as with qualitatifve assessments conducted by the beneficiaries of the project. In addition to testing pre-existing models, we fine-tuned the model developed by the HEIG-VD on a dataset specifically created for this project. The fine-tuning aimed to optimize the model performance on the specific use-case and to adapt it to the characteristics of the dataset: higher resolution imagery, different vegetation appearances due to seasonal differences, and a unique classification scheme. Fine-tuning with a mixed-resolution dataset improved the model performance of its application on lower-resolution imagery, which is proposed to be a solution to square artefacts that are common in inferences of attention-based models. Reaching an MCC score of 0.983, the findings demonstrate promising performance. The derived model produces satisfactory results, which have to be evaluated in a broader context before being published by the beneficiaries. Lastly, this report sheds light on potential improvements and highlights considerations for future work.
"},{"location":"PROJ-SOILS/#1-introduction","title":"1. Introduction","text":"Polluted soils present diverse health risks. In particular, contamination with lead, mercury, and polycyclic aromatic hydrocarbons (PAHs) currently mobilizes the Federal Office for the Environment 1. Therefore, it is necessary to know about the location of contaminated soils, like for prevention and management of soil displacement during construction works.
Current maps indicating the land cover or land use are often only accurate to the parcel level and therefore imprecise near houses (a property often includes a house and a garden), although those areas are especially prone to contamination 2. The Fribourgese Service de l'environnement wants to improve the knowledge about the location of contaminated soils. In this process, two phases can be distinguished:
The aim of this project is to explore methodologies for the first step only, creating a high-resolution map that distinguishes soil from non-soil areas. The problem of this project can be stated as following:
Identify or develop a model, that is able to distinguish areas covered by pedological soil from areas covered by non-soil land cover, given a raster-based input in the form of aerial imagery and digital elevation models (DEMs).
"},{"location":"PROJ-SOILS/#2-acceptance-criteria-and-concerned-metrics","title":"2. Acceptance criteria and concerned metrics","text":"The acceptance criteria describe the conditions that must be met by the outcome of the project, by which the proof-of-concept is considered a success.
These conditions can be of qualitative or quantitative nature. In the present case, the former ones rely on visual interpretation; the latter ones consist of metrics which measure the performance of the methodologies to evaluate and are easily standardized.
The chosen evaluation strategies are described below.
"},{"location":"PROJ-SOILS/#21-metrics","title":"2.1 Metrics","text":"As metrics, the Mathew's correlation coefficient and the intersection over union have been used.
Mathew's Correlation Coefficient (MCC) The Matthew's correlation coefficient (MCC) offers a balanced evaluation of model performance by incorporating and combining all four components of the confusion matrix: true positives, false positives, true negatives, and false negatives. This makes the metric be effective even in cases of class imbalance, which could be a challenge when working with aerial imagery.
\\[MCC = \\frac{TP \\times TN - FP \\times FN}{\\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}\\]Where:
The MCC is the only binary classification rate that generates a high score only if the binary predictor was able to correctly predict the majority of positive data instances and the majority of negative data instances. It ranges from -1 to 1, where 1 indicates a perfect prediction, 0 indicates a random prediction, and -1 indicates a perfectly wrong prediction3.
Intersection over Union (IoU) The IoU, also known as the Jaccard index, measures the overlap between two datasets. In the context of image segmentation, it calculates the ratio of the intersection (the area correctly identified as a certain class) to the union (the total area predicted and actual, combined) of these two areas. This makes the IoU a valuable metric for evaluating the performance of segmentation models. However, it's important to note that the IoU does not take true negatives into account, which can make interpretation challenging in certain cases.
\\[IoU = \\frac{TP}{TP + FP + FN}\\]Where:
The IoU ranges from 0 to 1, where 1 indicates a perfect prediction and 0 indicates no overlap between the ground truth and the prediction. The mIoU is the mean of the IoU values of all classes and is a common metric for semantic segmentation tasks.
In a binary scenario, the IoU does not render the same scores for the two classes. This means, that either the mIoU is considered to be the final metrics, or one of the two classes soil or non-soil is considered to be the positive or the negative class, respectively. The decision was made that the mIoU, meaning the mean of the IoU for soil and the IoU for non-soil is used in the binary case.
"},{"location":"PROJ-SOILS/#22-qualitative-assessment","title":"2.2 Qualitative Assessment","text":"To incorporate a holistic perspective of the results and to make sure that the evaluation and ranking based on the above metrics correspond to the actually perceived quality of the models, a qualitative assessment is also conducted. For this reason, the beneficiaries were asked to rank predictions of the models qualitatively. If the qualitatively assessed ranking corresponds to the ranking based on the above metrics, we can be confident that the chosen metrics are a good proxy for the actual perceived quality and usability of the models.
"},{"location":"PROJ-SOILS/#3-data","title":"3. Data","text":"The models evaluated in the project make use of different data: after inference on images with or without DEM, the obtained predictions were compared to ground truth data. All these data are described in this section.
"},{"location":"PROJ-SOILS/#31-input-data","title":"3.1 Input Data","text":"As stated in the introduction, the explored methodology should work with raw, raster-based data. The following data is provided by swisstopo and well-adapted to our problem:
The imagery and the data for the DEMs computation were not acquired at the same time, which means that the depicted land cover can differ between the two datasets. An important factor in this respect is the season (leaf-on or leaf-off). Data for swisstopo's DEMs are always acquired in the leaf-off period, which means that the used imagery should also have been acquired in the leaf-off period. To get the best fit regarding temporal and seasonal similarity, imagery from 2020 and DEMs from 2019 were used.
"},{"location":"PROJ-SOILS/#32-ground-truth","title":"3.2 Ground Truth","text":"The ground truth data for this project is used to compare the predictions of the models to the actual land cover types and to fine-tune an existing model for the project's specific needs. It was digitized by the beneficiaries of the project and is based on the SWISSIMAGE RS acquisition from 2020. As vector data allows for a more precise delineation of the land cover types, the ground truth data was digitized in a vector format. All contiguous areas comprised of the same land cover type were digitized as polygons.
"},{"location":"PROJ-SOILS/#classification-scheme","title":"Classification Scheme","text":"Although the goal of this project is to distinguish soil from non-soil areas, the ground truth data is classified into more detailed classes. This is due to the fact that it is easier to identify possible shortcomings of the models when the classes are more detailed. With a classification, techniques like confusion matrices can be used to identify which classes are often confused with each other, leading also to a better understanding about what areas should be covered in additional ground truth digitizations.
During development of the classification scheme, the focus lied on the distinction between soil and non-soil, which means that every class can be attributed to either soil, or non-soil, thereby respecting the legal definitions of soil according to the Federal Ordinance on Soil Pollutions4. The final classification scheme of the ground truth data is a product of an iterative process and has been subject of compromises between an optimal fit to the legal definitions and practical limitations like the possibility of a mere optical identification of the classes. Essentially, the scheme consists of 17 classes. However, during fine-tuning, it was found that some classes are too heavily underrepresented to be learnt by the model. As a result, the classes were merged into a new scheme consisting of 12 classes. Another feature of the classification scheme to keep in mind is that it is optimized for the Fribourgese territory, which means that some classes may not be directly applicable to other regions. The classification scheme is depicted in Figure 1.
Figure 1: Classification scheme of the ground truth data. Soil classes are depicted in green."},{"location":"PROJ-SOILS/#extent","title":"Extent","text":"The ground truth has been digitized on the Fribourgese territory on about 9.6 km\u00b2, including diverse land cover types. The area of interest is depicted in Figure 2.
Figure 2: Ground truth of the area of interest."},{"location":"PROJ-SOILS/#4-existing-models","title":"4. Existing Models","text":"There are no existing models that directly fit to the project's problem: models directly outputing georeferenced, binary raster-images distinguishing soil from non-soil. However, there are models that are able to classify land cover types on aerial imagery. Three institutions have developed such models which are assessed in the evaluation section. All of them are deep learning neural networks. In the following subchapters, the models are discussed briefly.
"},{"location":"PROJ-SOILS/#41-institut-national-de-linformation-geographique-et-forestiere-ign","title":"4.1 Institut National de l\u2019Information G\u00e9ographique et Foresti\u00e8re (IGN)","text":"The D\u00e9partement d'Appui \u00e0 l'Innovation (DAI) at IGN has implemented three AI models for land cover segmentation: odeon-unet-vgg16, smp-unet-resnet34-imagenet, and smp-fpn-resnet34-imagenet, each trained with two input modalities, named RVBI and RVBIE, resulting in six configurations.
The model architectures are:
The input modalities are:
IGN's own assessment of these 6 configurations suggests that resnet34 encoder models, with their larger receptive fields, generally outperform vgg16 models, benefiting from the spatial context in prediction. The pre-training with ImageNet further enhances model performance. Current evaluations of IGN are focusing on models from the FLAIR-1 challenge, which may replace existing models in production.
The FLAIR-1 Challenge was designed to enhance artificial intelligence (AI) methods for land cover mapping. Launched on November 21, 2022, the challenge focused on the FLAIR-1 (French Land cover from Aerospace ImageRy) dataset, one of the largest datasets for training AI models in land cover mapping. The dataset included data from over 50 departments, encompassing more than 20 billion annotated pixels, representing the diversity of the French metropolitan territory. The total area of the ground truth data is calculated as:
\\[A = \\frac{(512px*0.2\\frac{m}{px})^2 * 77412\\ tiles}{10^6} = 811.7 km^2\\]All of the used model architectures of the IGN are in the family of the convolutional neural networks (CNNs), which are a type of deep learning algorithm. Inspired by biological processes, CNNs implement patterns of connectivity between artificial neurons similar to the organization in the biological visual system. CNNs are particularly effective in image recognition tasks, as they can automatically learn features from the input data6.
According to IGN, the main challenges in model performance lie in adapting to varying radiometric calibrations and vegetation appearances in different datasets, such as the lack of orthophotos taken during winter (\u201cleaf-off\u201d) in the French training data. Ongoing efforts are aimed at improving model generalization across different types of radiometry and training with winter images to account for leafless vegetation appearances.
"},{"location":"PROJ-SOILS/#42-haute-ecole-dingenierie-et-de-gestion-du-canton-de-vaud-heig-vd","title":"4.2 Haute Ecole d'Ing\u00e9nierie et de Gestion du Canton de Vaud (HEIG-VD)","text":"The Institute of Territorial Engineering (INSIT) at the HEIG-VD has participated in the FLAIR-1 challenge.
INSIT use a Mask2Former7 architecture, which is an attention-based model. Attention-based models in computer vision are neural networks that selectively focus on certain areas of an image during processing. They also mimic the biological visual system by concentrating on specific parts of an image while ignoring others 8.
The researchers at HEIG-VD could not prove a significant performance increase in including the near-infrared (NIR) channel and/or a DEM. As a result, their model works with RGB imagery only.
"},{"location":"PROJ-SOILS/#43-office-federal-de-la-statistique-ofs","title":"4.3 Office F\u00e9d\u00e9ral de la Statistique (OFS)","text":"OFS has also created a deep learning model prototype to automatically segment land cover types. However, different than the models of IGN and HEIG-VD, it works with two steps:
The Methodology section describes the infrastructure used to run the models and to reproduce the project. Furthermore, it describes precisely the evaluation and fine-tuning approaches.
"},{"location":"PROJ-SOILS/#51-infrastructure","title":"5.1 Infrastructure","text":"The term \u201cinfrastrucutre\u201d refers here to both hardware and software resources.
"},{"location":"PROJ-SOILS/#hardware","title":"Hardware","text":"Most of the development of this project was conducted on a MacBook Pro (2021) with an M1 Pro chip. To accelerate the inference and fine-tuning of the models, virtual machines (VMs) were used. The VMs were provided by Infomaniak and were equipped with 16 CPUs, 32 GB of RAM, and an NVIDIA Tesla T4 GPU.
"},{"location":"PROJ-SOILS/#reproducibility","title":"Reproducibility","text":"The code is versioned using Git and hosted on the Swiss Territorial Data Lab GitHub repository. To ensure reproducibility across different environments, the environment is containerized using Docker12.
"},{"location":"PROJ-SOILS/#deep-learning-framework","title":"Deep Learning Framework","text":"We received the source code and the model weights of the HEIG-VD model and of the OFS model. Both models are implemented using the deep learning framework PyTorch13. The HEIG-VD model uses an additional library called mmsegmentation14, which is built on top of PyTorch and provides a high-level interface for training and evaluating semantic segmentation models.
"},{"location":"PROJ-SOILS/#52-evaluation","title":"5.2 Evaluation","text":"To realize the evaluation of the afore-mentionned models, reclassification of land cover classes into soil classes were necessary, as well as the definition of a common extent to the availabe inferences. Furthermore, the metrics were implemented in the workflow and a rigorous qualitative assessment was defined.
"},{"location":"PROJ-SOILS/#inference","title":"Inference","text":"In the beginning of the evaluation phase, the inferences of the models were generated directly by the aforementioned institutions. Later, after receiving the model weights and the source codes, we could infere from the models of the HEIG-VD and OFS directly.
"},{"location":"PROJ-SOILS/#reclassification","title":"Reclassification","text":"As already touched upon, the above-stated models do not directly output binary (soil/non-soil) raster images, but output segmented rasters with multiple classes. The classification scheme depends on the data that was used for training. The classes of the models of IGN and HEIG-VD are almost identical, since they have both been trained on French imagery and ground truth. They differ only in the numbering of the classes. The model of OFS, however outputs completely different classes. To harmonize the results of all three institution\u2019s models and to make them fit for out problem at hand, all outputs have been reclassified to the same classification scheme named \u201cPackage ID\u201d. The reason for this name is that there is an N:M relationship between the IGN-originated classes and the Fribourg ground truth classes. One \u201cpackage\u201d thus consists of all the classes that are connected via N:M relationships. The mapping of the classes is depicted in Figures 3 and 4.
Figure 3: Mapping between the package ID and the classification schemes of IGN and HEIG-VD. Figure 4: Mapping between the package ID and the classification scheme of OFS."},{"location":"PROJ-SOILS/#extents","title":"Extents","text":"From the extent originally covered by the ground truth, a smaller extent had to be defined during the evaluation for reasons of inferences availability and to understand the performance of the different models.
Extent 1 Because, we did not have inferences of all models for the whole area of the ground truth, we did not have the possibility to evaluate all the models on the whole extent. To allow for a fair comparison between the models, the evaluation was therefore conducted on the largest possible extent, which is the intersection between all the received inferences and the ground truth. This extent is called \u201cExtent 1\u201d and makes up a total area of about 0.42 km\u00b2. The area can be seen in Figure 5.
Figure 5: Ground truth of Extent 1.Masked Extent 1: Areas around buildings In the Extent 1, a great share of the area consists of vegetated soil. To check the performance only in the urban areas, the evaluation is also conducted for only the subset of pixels that are within 20 m of buildings. Although the extent of this modification is the same as Extent 1, it is treated like a separate extent, called \u201cextent1-masked\u201d, which is depicted in Figure 6.
Figure 6: Ground truth of Extent 1, masked to focus on the areas around buildings.Extent 2 The output of the HEIG-VD's model is affected by square-shaped artefacts, which can be seen in Figure 7. The squares coincide with the size of the model's receptive field, which is 512x512 pixels. With an image resolution of 10 cm, the artefacts are thus of size 51.2x51.2 m, or with an image resolution of 20 cm of size 102.4x102.4 m.
There are to observations regarding the occurence of the artefacts:
The artefacts, then, are probably a combination of those two factors.
Figure 7: Representative map showing square artefacts in areas without high-frequency context. Lines: GT, Fills: predictionsThe artefacts produce large areas of false predictions that supposedly greatly influence any evaluation metric that is computed. To obtain a clearer understanding of the influence of those artefacts on the metrics, a second extent, Extent 2 as shown in Figure 8, has been created, that excludes all the tiles where the HEIG-VD model produces those artefacts.
Figure 8: Ground truth of Extent 2."},{"location":"PROJ-SOILS/#metrics","title":"Metrics","text":"Both the MCC and the IoU values are created in a raster-based fashion. This means that the spatially overlapping pixels of the predictions and the GT are compared to be the same. For each class, each pixel is therefore classified as one of the following:
As described in the acceptance criteria, the MCC and the IoU are then calculated as a specific combination of these values. As the MCC is only suited for a binary classification, the MCC is computed for the binary classification of the models. Only the IoU is also computed for the multiclass classification of the models. The general workflow of the evaluation pipeline is the following:
More details about the technical implications of the evaluation pipeline can be found in the GitHub repository of the project.
"},{"location":"PROJ-SOILS/#qualitative-assessment","title":"Qualitative assessment","text":"As stated in the Qualitative Assessment section, this visual assessment serves to ensure that the chosen metrics (MCC, IoU) correspond to the qualitative evaluation of the beneficiaries. Three models where chosen, such that (regarding the metrics) high- and low-performing models were included. As the problem with the artefacts in the HEIG-VD model\u2019s predictions is very evident, for this assessment, only a subset from Extent 2 has been taken into account.
To conduct the qualitative assessment, the beneficiaries were given the predictions of the three chosen models on 4 representative tiles. The tiles were chosen to represent different land cover types (as far as possible on this area). The beneficiaries were then asked to rank the predictions of the models from best to worst.
As the inferences for the OFS model were not available at the relevant point in time, the qualitative assessment was conducted only for the IGN and the HEIG-VD models for the tiles displayed in Figure 9.
Figure 9: Tiles that were used for the qualitative assessment. Ground truth depicted as outlines, predictions as fills. IGN1: smp-unet-resnet34-imagenet_RVBI, IGN2: odeon-unet-vgg16_RVBIE."},{"location":"PROJ-SOILS/#53-fine-tuning","title":"5.3 Fine-Tuning","text":"After the evaluation, considerations about which model to take for further progress in the project were made and the HEIG-VD model was identified as the most promising in terms of performance and availability (more in the Discussion section). The model has been trained on the FLAIR-1 dataset (see Existing Models), which differs from the present dataset in several aspects. Fine-tuning allows to retrain a model to let it adapt to the specifics of the dataset. In this case, fine-tuning aims to adjust the model to the following specifics:
The Swiss imagery is of a higher resolution than the French imagery (10 cm vs 20 cm), which means that the model has to be able to work with more detailed information.
The Swiss imagery is of a different season than the French imagery, which means that the model has to be able to work with different vegetation appearances.
The classification scheme of the Swiss ground truth is different from the French ground truth, which means that the model has to be able to work with different classes.
For fine-tuning, the dataset is split into the training dataset and the validation dataset. The training dataset consists of 80% of the input imagery and ground truth, while the validation dataset consists of the remaining 20%. The dataset is split in a stratified manner, which means that the distribution of the classes in the training dataset is as close as possible to the distribution of the classes in the validation dataset. This is important to ensure that the model is trained on a representative sample of the data. The split is conducted in a semi-random and tile-based manner:
As stated in the Ground Truth section, the fine-tuning is conducted using the classification scheme consisting of 12 classes.
Figure 10: Class frequency distribution of the training and validation dataset. Mind that the y-axis is logarithmic.To mitigate the effect of the artefacts, mentioned in the Extent section, we propose to decrease the spatial resolution of the input (and thus also the output) of the model to increase the spatial receptive field of the model. With input tiles covering a larger area, the chance of the occurrence of high-frequency features that give context to the image increases. A visualization of this proposal is shown in Figure 11: while the 10 cm input tile has only low-frequency agricultural context, the 40 cm input tile has high-frequency context in the form of a road. This context, as proposed, could help the model to make a more informed decision.
Figure 11: Visualization of the changing receptive field of the model with different input resolutionsAs a model adjusts for a certain resolution during training, we test the effect of training on different resolutions. Thus, the model is fine-tuned on two different datasets, one with a spatial resolution of 10 cm and one with mixed resolutions of 10 cm, 20 cm, and 40 cm. The input shape of all the image tiles, regardless of the ground sampling distance, is 512x512 pixels. The spatially largest tiles (40 cm) were assigned to either the training or the validation set, and all the smaller tiles that are contained within the larger tiles were assigned to the same set. This way, the model can be trained and evaluated, respectively, on the same area at different resolutions. The resulting nested grid is depicted in Figure 12.
Figure 12: Example of the used grid. The shape of the tiles is always 512 by 512 pixels, only the ground sampling distance changes. Borders have an offset to increase legibility, in reality, they're perfectly overlapping.The obtained datasets and their sizes are summarized here:
Training Dataset
Validation Dataset
Both the models trained on the single-resolution and on the mixed-resolution dataset have been trained for a total of 160'000 iterations using the mmsegmentation library14. One iteration in this context means one batch of data has been processed. Because of memory limitations, the models were trained with a batch-size of 1, which means, that one iteration corresponds to one tile being processed. Thus, one epoch (one pass through the whole dataset) consists of 2'640 iterations for the single-resolution dataset and 3'460 iterations for the mixed-resolution dataset. During training, the models were evaluated on the validation set after every epoch by computing the mIoU metric. If the mIoU increased, a model checkpoint was saved and the old one deleted. After training for the predefined number of iterations, the model with the highest mIoU on the validation set was chosen as the final model.
"},{"location":"PROJ-SOILS/#6-results","title":"6. Results","text":"The metrics values for the evaluation and the fine-tuning parts of the project are first presented. Afterwards, a close view of the final product is shown.
"},{"location":"PROJ-SOILS/#61-evaluation","title":"6.1 Evaluation","text":"The multiclass evaluation is briefly presented before showing in details, from a quantitative and qualitative perpectives, the evaluation of the models for the binary classification in soil and non-soil classes.
"},{"location":"PROJ-SOILS/#multiclass-evaluation","title":"Multiclass Evaluation","text":"As the focus of this project lies in the binary distinction between soil and non-soil areas, the multiclass classification results are not discussed in further detail. However, plots displaying the class-IoU values of the different models are depicted in Figures 13 and 14. Confusion matrices can be found in the Appendices.
Figure 13: IoU values for different models and classes on Extent 1. Soil-classes are depicted in the green rectangles. Figure 14: IoU values for different models and classes on Extent 2. Soil-classes are depicted in the green rectangles."},{"location":"PROJ-SOILS/#quantitative-evaluation","title":"Quantitative Evaluation","text":"Figure 15 and 16 show the MCC values and mIoU values, respectively, computed for the binary classification of different models. As the distribution of the metrics across the models is very similar, only the MCC values are discussed in the following and are precisely given in Table 1.
Figure 15: MCC values of the binary predictions of the models on the two extents. Figure 16: mIoU values of the binary predictions of the models on the two extents. ModelMCC (Extent 1)MCC (Masked Extent 1)MCC (Extent 2) IGN_smp-unet-resnet34-imagenet_RVBI0.8250.8080.813 OFS_ADELE2(+SAM)0.8180.8020.794 IGN_smp-unet-resnet34-imagenet_RVBIE0.8100.7980.824 HEIG-VD0.7890.8390.859 IGN_smp-fpn-resnet34-imagenet_RVBIE0.7140.7490.795 IGN_odeon-unet-vgg16_RVBI0.7100.7060.794 IGN_smp-fpn-resnet34-imagenet_RVBI0.7100.7450.792 IGN_odeon-unet-vgg16_RVBIE0.6400.6130.713 Table 1: MCC of the binary predictions of the models on the three extents.Extent 1 The best-performing model in Extent 1 is the IGN_smp-unet-resnet34-imagenet_RVBI model, with an MCC of 0.825. The inclusion of the elevation channel does not seem to have a significant impact on the model's performance, as the IGN_smp-unet-resnet34-imagenet_RVBIE model only achieves an MCC of 0.810. The OFS_ADELE2(+SAM) model follows closely with an MCC of 0.818. The HEIG-VD model, on the other hand, performs significantly worse, with an MCC of 0.789. The models IGN_smp-fpn-resnet34-imagenet_RVBIE, IGN_odeon-unet-vgg16_RVBI, and IGN_smp-fpn-resnet34-imagenet_RVBI all achieve an MCC of around 0.710. The IGN_odeon-unet-vgg16_RVBIE model performs the worst, with an MCC of 0.640.
masked Extent 1 & Extent 2 The greatest difference to Extent 1 is that in masked Extent 1 and in Extent 2, the HEIG-VD model performs significantly better than in masked Extent 1, with an MCC of 0.839. Generally, the models perform similarly in masked Extent 1 and in Extent 1. The models are generally performing better in Extent 2.
"},{"location":"PROJ-SOILS/#qualitative-evaluation","title":"Qualitative Evaluation","text":"The results of the qualitative assessment are depicted in Figure 17. The qualitative assessment rendered the following ranking:
The ranking corresponds to the ranking based on the metric measures.
Figure 17: Qualitative assessment by the beneficiaries."},{"location":"PROJ-SOILS/#62-fine-tuning","title":"6.2 Fine-Tuning","text":"For the second part of the project - fine-tuning of the HEIG-VD model - the quantitative binary performance is first presented. Afterwards, the multiclass outputs are quantitatively, qualitatively and visually given. This allows to understand what is behind the visual binary outputs qualitatively discussed it the final subsection.
"},{"location":"PROJ-SOILS/#binary-results","title":"Binary Results","text":"Figure 18 shows the progress of the models during fine-tuning, with a datapoint after every epoch. The curves are quite similar to each other, but both are rather noisy. The best checkpoint of the 10 cm model is at epoch 71 with an mIoU of 0.939. The best checkpoint of the mixed model is at epoch 145 with an mIoU of 0.930. The names of the two models are thus HEIG-VD-10cm-71k and HEIG-VD-mixed-145k.
The training for the models for 160'000 iterations with the above stated hardware took about 7 days. Performing inference on one single tile with 512x512 pixels takes about 1 second. This means that with 10 cm input tiles, the model takes about 380 seconds or 6 minutes and 20 seconds to process 1 km\u00b2. As the canton of Fribourg has an area of about 1'670 km\u00b2, the model would take about one week to process the whole canton. The model would able to process the whole canton in a reasonable amount of time.
Figure 18: Training progress of the models.Figure 19 and Table 2 show the MCC values for the original HEIG-VD model, as well as for the two fine-tuned models one the evaluation extent. When comparing the MCC values of the original HEIG-VD model (MCC=0.553) with the fine-tuned models (MCC after 10 cm training : 0.939; MCC after mixed training: 0.938 ), the fine-tuned models perform significantly better. However, one should notice that the original HEIG-VD model was trained on a different dataset and with a different classification scheme. It was evaluated on the same extent but using the package ID, which is introduced in the Reclassification section.
Regarding the performance of the two models on inference with different input resolutions, they perform quite similarly on the 10 cm resolution input. Both models perform worse as the ground sampling distance increases:
However, the performance of the HEIG-VD-mixed-145k model is not decreasing as much as the HEIG-VD-10cm-71k model on the 20 cm and 40 cm resolution inputs.
Figure 19: MCC values of the binary predictions of the model fine-tuned on different resolutions. ModelMCC (10cm input)MCC (20cm input)MCC (40cm input) HEIG-VD-original0.553 HEIG-VD-10cm-71k0.9390.8840.795 HEIG-VD-mixed-145k0.9380.9300.893 Table 2: MCC values of the binary predictions of the model fine-tuned on different resolutions."},{"location":"PROJ-SOILS/#multi-class-results","title":"Multi-Class Results","text":"As in the Evaluation results section, the results are not discussed in further details. Confusion matrices can be found in the Appendices. Figures 20 and 21 shows the IoU values of the mixed-resolution model (Figure 20) and the 10 cm model (Figure 21) on different resolutions.
Figure 20: IoU values of the fine-tuned models on the 10 cm dataset. Figure 21: IoU values of the fine-tuned models on the mixed-resolution dataset."},{"location":"PROJ-SOILS/#qualitative-analysis-of-the-outputs","title":"Qualitative Analysis of the Outputs","text":"Figures 22, 23, and 24 show the outputs of the two models for 10, 20, and 40 cm input resolution, respectively, on three different areas. The areas were chosen to represent different land cover types. Since there is no ground truth on this areas, these inferences can only be analyzed qualitatively. On the inferences it is apparent, that the models still have trouble on regions with little high-frequency context and are prone to square artefacts. In the urban and countryside areas (Figure 22 and 23), the combination of decreased resolution (and thus increased spatial receptive field) and the fine-tuning on the mixed dataset seems to have a positive effect on the occurrence of the square artefacts. In the mountainous area (Figure 24), however, the artefacts are even more pronounced in the outputs of the mixed-resolution model than in the 10cm-only model.
An effect of the decreased resolution is that, generally, the predictions seem to be less impacted by the artefacts. However, if there are artefacts, their spatial extent, being the same as the spatial receptive field of the model, is larger.
Figure 22: Comparison of the predictions of the model fine-tuned on different resolutions in urban areas. Figure 23: Comparison of the predictions of the model fine-tuned on different resolutions in countryside areas. Figure 24: Comparison of the predictions of the model fine-tuned on different resolutions in mountainous areas."},{"location":"PROJ-SOILS/#qualitative-analysis-of-the-binary-outputs","title":"Qualitative Analysis of the Binary Outputs","text":"Figures 25, 26, and 27 show the binary output versions of the three Figures above (22, 23, and 24). Looking at the inferences on the same areas, one can see that the artefacts are much less of an issue in the binary outputs. They are still present to some extent, however, since many of the artefacts and their surroundings are in fact soil, or non-soil, respectively, the artefacts dissolve in the binary outputs. The artefacts are still present in the mountainous area where the mixed model predicts large areas of water, which is a non-soil class.
Figure 25: Comparison of the binary predictions of the model fine-tuned on different resolutions in urban areas. Figure 26: Comparison of the binary predictions of the model fine-tuned on different resolutions in countryside areas. Figure 27: Comparison of the binary predictions of the model fine-tuned on different resolutions in mountainous areas."},{"location":"PROJ-SOILS/#63-examplary-inference","title":"6.3 Examplary Inference","text":"Finally, to show an example of the model output, an inference of the HEIG-VD-mixed-145k model on a 10 cm input resolution tile is given in Figure 28. The inference is a zoomed part in the north-east of the extent shown in Figure 22 and 25. The inference illustrates that the model is capable of distinguishing between different land cover classes in great detail.
Figure 28: Representative inference of the HEIG-VD-mixed-145k model on a 10 cm input resolution tile."},{"location":"PROJ-SOILS/#7-discussion","title":"7. Discussion","text":"After presentation of the results, the evaluation and the fine-tuning outcomes are successively discussed.
"},{"location":"PROJ-SOILS/#71-evaluation","title":"7.1 Evaluation","text":"All institutions and models have their strengths and weaknesses:
IGN Regarding Extent 1, the model IGN_smp-unet-resnet34-imagenet_RVBI produced the best metrics. Furthermore, the CNN models of IGN are computationally less expensive than the other models and the inferences are not prone to the square artefacts that the HEIG-VD model produces.
HEIG-VD The HEIG-VD model, although it is outperformed by the other two institutions' models on Extent 1, performs significantly better in masked Extent 1 and in Extent 2. The model also performed best in the qualitative assessment. The assessment of the performance in Extent 2 shows that the square artefacts are responsible for a great share of false predictions.
OFS The OFS model OFS_ADELE2(+SAM) performs similarly to the best-performing IGN model, its outputs are not prone to square artefacts, and the inferences are very clean due to its usage of the SAM model. The downside of the OFS model is that it is specifically adapted for the Statistique suisse de la superficie10 and thus cannot be retrained on a different dataset.
The goal of the evaluation phase was to identify the most promising model for further steps in the project. Based on the results of the evaluation, the HEIG-VD model was chosen. It performed best in masked Extent 1 and in Extent 2, and it performed best in the qualitative assessment. Additionally, the model needs only aerial imagery with the three RGB channels which allows for an easier reproducibility. The model weights and source code of the HEIG-VD model were kindly shared with us, which enabled us to fine-tune the model to adapt to the specifics of this project. However, the premise of choosing the HEIG-VD model was that we are able to mitigate the square artefacts to an acceptable degree.
"},{"location":"PROJ-SOILS/#72-fine-tuning","title":"7.2 Fine-Tuning","text":"The following keypoints can be extracted from the fine-tuning results:
"},{"location":"PROJ-SOILS/#performance-increase","title":"Performance Increase","text":"The fine-tuning procedure could improve the model performance substantially, even though a small dataset was used. For comparison: The FLAIR-1 dataset comprises more than 800 km\u00b2, which is more than 80 times the size of our used dataset. The improvement is especially impressing, since the chosen model is an attention-based model, which is known to be dependent on large amounts of data 15. A possible explanation for the success of the fine-tuning is that most of the features that the model has to learn are already present in the pre-trained model. The adjustments of the weights needed to adapt to the specifics of the dataset may, in comparison to the vast amount of information needed to train a model from scratch, be quite small.
"},{"location":"PROJ-SOILS/#adaptability","title":"Adaptability","text":"Fine-Tuning allows to adjust for different specifics of new datasets. In this case, the model was able to adjust for different resolutions, a different acquisition season, and a new classification scheme. However, also the model that has been trained on the mixed-resolution dataset performed worse on 20 cm and 40 cm resolution input than on 10 cm resolution input. This could be due to the fact that the model has been trained on 4 times as many 10 cm resolution tiles as on 20 cm resolution tiles and 16 times as many 10 cm resolution tiles as on 40 cm resolution tiles. As a result, the model could be biased towards the 10 cm resolution. Another explanation imaginable could be that the defined classes are more easily identifiable in high-resolution input in general. While e.g., the IoU values for the class \"sol_vegetalise\" does not fluctuate much between the different resolutions, the IoU values for e.g., the class \"roche_dure_meuble\" seems to depend considerably on the resolution.
"},{"location":"PROJ-SOILS/#square-artefacts","title":"Square Artefacts","text":"While a decreased resolution and fine-tuning could not remove the square artefacts completely, their occurence could be drastically reduced. Even more so in the binary case, where the depicted confusion between water and vegetated soil in Figure 27 seems to contribute the most to the square artefacts, which could be reduced by a post-processing step, using known waterbodies as a mask. These water square artefacts that appear by decreasing the resolution, show that the model depends on both resolution and context. Indeed, the lower resolution seems to have removed the specific texture of mountainous meadow and rendered it similar to waterbody. Another factor contributing to this confusion could be a possible bias in the ground truth caused by overrepresented lake sediments that resemble soil.
The resolution decrease affects also the size of the smallest object segmentable. Luckily, urban areas profit the most from high resolution and are not prone to square artefacts, which means that a trade-off could be circumvented by a spatial seperation of high- and low-resolution inferences (e.g., urban: 10 cm, countryside: 40 cm).
"},{"location":"PROJ-SOILS/#73-remarks-from-beneficiaries","title":"7.3 Remarks from Beneficiaries","text":"The beneficiaries provided a feedback of the final state of the model. They were especially content with the performance in heterogeneous areas (i.e., urban areas) and stressed the quality of the inference regarding ambiguous features: the model is able to distinguish between soil and non-soil even in areas where the ground is covered by large objects (e.g., truck trailers), or where the soil is covered by canopies. The model is also not affected by shadows, which was a great concern at the beginning of the project, and shows a good separability of gravel and concrete, which could be used for mapping impervious surfaces. However, the square artefacts are still leading to soil/non-soil confusion, typically appearing as 51.2x51.2m squares in homogeneous areas (with 10 cm resolution). The beneficiaries concluded that around buildings, the soil map produced by the model appears more reliable than other available products and offers the opportunity to cross-reference the binary result (soil \u2013 non-soil) with existing indicative maps and to improve the quantitative assessment of the soil concerned by pollutions.
"},{"location":"PROJ-SOILS/#8-conclusion","title":"8. Conclusion","text":"One of the main findings of this project is that modern deep learning models are feasible tools to segment various land cover classes on aerial imagery. Furthermore, even complicated models can be fine-tuned for derived specifics and enhanced performance, even in the case of small datasets.
As the mixed-resolution model produces overall better results than the 10cm-only model, it can be considered as the main output of this project. It performs quite well, with an MCC value of 0.938 on the 10 cm validation set. It was able to adapt to the specifics of the Swiss dataset, which incorporates different resolutions, a different acquisition season, and a new classification scheme. The model performs especially well in urban and other high-frequency context areas, where the issue with square artefacts is less pronounced.
The project provides a methodology on how to compare different segmentation models in the geographic domain, gives insights in how a best-suited model can be chosen, and how it can be fine-tuned to adapt to a specific dataset.
"},{"location":"PROJ-SOILS/#81-limitations","title":"8.1 Limitations","text":"The main limitation of this project is the extent of the ground truth data. The ground truth data is only available for a small area in the canton of Fribourg. With a larger dataset, the model may have been able to perform even better and the generalization to other areas may have been better, because each class could have been presented to the model in a more nuanced way.
Another limitation of this project is that the seasonal diversity in the imagery used for this project is very limited. We showed that the model is able to adapt to different vegetation appearances, but the produced model has only been fine-tuned for the vegetation period of the imagery used for training. The model might perform worse in other vegetation periods.
Last but not least, the square artefacts, which were a main concern in the project, still occur within the inferences. The fine-tuning of the model on a mixed-resolution dataset was able to mitigate the effect of the artefacts, but not to remove it completely.
"},{"location":"PROJ-SOILS/#82-outlook","title":"8.2 Outlook","text":"Some ideas that emerged during the project but could not be implemented due to time constraints are:
As the square artefacts are not much of a problem in urban, high-frequency areas and a lower resolution can help to mitigate the effect of the artefacts in low-frequency, countryside areas, a possible approach could be to infer the model on 10 cm in the urban areas and on 40 cm in the countryside areas. Another way to combine low- and high-resolution inferences could be to make use of an ensemble technique, which combines the predictions of different models to get \u201cthe best of both worlds\u201d.
The confusion between water and vegetated soil is a main cause of error in the binary predictions. A post-processing step to remove these square artefacts could be conducted by using known waterbodies as a mask.
We would like to express our gratitude to the people working at HEIG-VD, IGN, and OFS, which contributed significantly to this project by sharing not only their code and models, but also their thoughts and experiences with us. It was a pleasure to collaborate with them.
"},{"location":"PROJ-SOILS/#9-appendices","title":"9. Appendices","text":"Figure 29: Confusion matrix of the HEIG-VD-10cm-71k model on the 10 cm validation set. Figure 30: Confusion matrix of the HEIG-VD-mixed-145k model on the 10 cm validation set. Figure 31: Confusion matrix of the HEIG-VD-10cm-71k model on the 20 cm validation set. Figure 32: Confusion matrix of the HEIG-VD-mixed-145k model on the 20 cm validation set. Figure 33: Confusion matrix of the HEIG-VD-10cm-71k model on the 40 cm validation set. Figure 34: Confusion matrix of the HEIG-VD-mixed-145k model on the 40 cm validation set. Figure 35: Confusion matrix of the original HEIG-VD model on Extent 1. Figure 36: Confusion matrix of the original HEIG-VD model on Extent 2. Figure 37: Confusion matrix of the IGN model smp-unet-resnet34-imagenet_RVBI on Extent 1. Figure 38: Confusion matrix of the IGN model smp-unet-resnet34-imagenet_RVBI on Extent 2. Figure 39: Confusion matrix of the OFS model OFS_ADELE2(+SAM) on Extent 1."},{"location":"PROJ-SOILS/#10-bibliography","title":"10. Bibliography","text":"Pieter Poldervaart and Bundesamt f\u00fcr Umwelt BAFU \\textbar Office f\u00e9d\u00e9ral de l'environnement OFEV \\textbar Ufficio federale dell'ambiente UFAM. Bleibelastung: Schweres Erbe in G\u00e4rten und auf Spielpl\u00e4tzen. September 2020. URL: https://www.bafu.admin.ch/bafu/de/home/themen/thema-altlasten/altlasten--dossiers/bleibelastung-schweres-erbe-in-gaerten-und-auf-spielplaetzen.html (visited on 2024-01-04).\u00a0\u21a9
Christian Niederer. Schwermetallbelastungen in Hausg\u00e4rten in Freiburgs Altstadt (Kurzfassung), Studie im Auftrag des Amtes f\u00fcr Umwelt des Kantons Freiburg. Technical Report, BMG Engineering AG, 2015.\u00a0\u21a9
Davide Chicco and Giuseppe Jurman. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1):6, January 2020. URL: https://doi.org/10.1186/s12864-019-6413-7 (visited on 2024-02-05), doi:10.1186/s12864-019-6413-7.\u00a0\u21a9
Conseil f\u00e9d\u00e9ral suisse. Ordonnance sur les atteintes port\u00e9es aux sols. 1998. URL: https://www.fedlex.admin.ch/eli/cc/1998/1854_1854_1854/fr.\u00a0\u21a9
Pavel Iakubovskii. Segmentation Models Pytorch. 2019. Publication Title: GitHub repository. URL: https://github.com/qubvel/segmentation_models.pytorch.\u00a0\u21a9
Rikiya Yamashita, Mizuho Nishio, Richard Kinh Gian Do, and Kaori Togashi. Convolutional neural networks: an overview and application in radiology. Insights into Imaging, 9(4):611\u2013629, August 2018. Number: 4 Publisher: SpringerOpen. URL: https://insightsimaging.springeropen.com/articles/10.1007/s13244-018-0639-9 (visited on 2024-04-08), doi:10.1007/s13244-018-0639-9.\u00a0\u21a9
Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, and Rohit Girdhar. Masked-attention Mask Transformer for Universal Image Segmentation. June 2022. arXiv:2112.01527 [cs]. URL: http://arxiv.org/abs/2112.01527 (visited on 2024-03-21), doi:10.48550/arXiv.2112.01527.\u00a0\u21a9
Meng-Hao Guo, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R. Martin, Ming-Ming Cheng, and Shi-Min Hu. Attention Mechanisms in Computer Vision: A Survey. Computational Visual Media, 8(3):331\u2013368, September 2022. arXiv:2111.07624 [cs]. URL: http://arxiv.org/abs/2111.07624 (visited on 2024-04-08), doi:10.1007/s41095-022-0271-y.\u00a0\u21a9
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Doll\u00e1r, and Ross Girshick. Segment Anything. April 2023. arXiv:2304.02643 [cs]. URL: http://arxiv.org/abs/2304.02643 (visited on 2024-04-09).\u00a0\u21a9
Unknown. Arealstatistik Schweiz. Erhebung der Bodennutzung und der Bodenbedeckung. (Ausgabe 2019 / 2020). Number 9406112. Bundesamt f\u00fcr Statistik (BFS), Neuch\u00e2tel, September 2019. Backup Publisher: Bundesamt f\u00fcr Statistik (BFS). URL: https://dam-api.bfs.admin.ch/hub/api/dam/assets/9406112/master.\u00a0\u21a9\u21a9
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A ConvNet for the 2020s. March 2022. arXiv:2201.03545 [cs]. URL: http://arxiv.org/abs/2201.03545 (visited on 2024-03-21), doi:10.48550/arXiv.2201.03545.\u00a0\u21a9
Dirk Merkel. Docker: lightweight linux containers for consistent development and deployment. Linux journal, 2014(239):2, 2014.\u00a0\u21a9
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K\u00f6pf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An Imperative Style, High-Performance Deep Learning Library. December 2019. arXiv:1912.01703 [cs, stat]. URL: http://arxiv.org/abs/1912.01703 (visited on 2024-04-02), doi:10.48550/arXiv.1912.01703.\u00a0\u21a9
MMSegmentation Contributors. MMSegmentation: OpenMMLab Semantic Segmentation Toolbox and Benchmark. 2020. URL: https://github.com/open-mmlab/mmsegmentation.\u00a0\u21a9\u21a9
Abdul Mueed Hafiz, Shabir Ahmad Parah, and Rouf Ul Alam Bhat. Attention mechanisms and deep learning for machine vision: A survey of the state of the art. June 2021. arXiv:2106.07550 [cs]. URL: http://arxiv.org/abs/2106.07550 (visited on 2024-04-09).\u00a0\u21a9
Adrian Meyer (FHNW) Contributions to Background & Agricultural Law: Pascal Salath\u00e9 (FHNW)
Proposed by the Canton of Thurgau - PROJ-TGOBJ March 2021 to June 2021 - Published on July 7, 2021
Abstract: The Cultivable agricultural area layer (\"LN, Landwirtschaftliche Nutzfl\u00e4che\") is a GIS vector product maintained by the cantonal agricultural offices and serves as the key calculation index for the receipt of direct subsidy contributions to farms. The canton of Thurgau requested a spatial vector layer indicating locations and area consumption extent of the largest silage bale deposits intersecting with the known LN area, since areas used for silage bale storage are not eligible for subsidies. Having detections of such objects readily available greatly reduces the workload of the responsible official by directing the monitoring process to the relevant hotspots. Ultimately public economical damage can be prevented which would result from the payout of unjustified subsidy contributions.
"},{"location":"PROJ-TGLN/#1-introduction","title":"1 Introduction","text":""},{"location":"PROJ-TGLN/#11-background","title":"1.1 Background","text":"Switzerland's direct payment system is the basis for sustainable, market-oriented agriculture. The federal government supports local farms in the form of various types of contributions and enables farming families to claim an adequate income. (cf. Art. 104 BV)
In the years 2014-2017 a new agricultural policy system was introduced in Switzerland. With specialized direct payment subsidies named \u00abLandscape Quality Contributions\u00bb (\u00abLQ\u00bb, Landschaftsqualit\u00e4tsbeitr\u00e4ge in German, Contributions \u00e0 la qualit\u00e9 du paysage in French) farms and agricultural businesses can be awarded for complying with measures that aim at increasing biodiversity and maintaining extensively cultivated open grasslands.
Subsidies are calculated by area and the agricultural offices of the respective cantonal administration have to constantly monitor the landscape status as well as the compliance of the business operations in order to approve the requested amounts. Only certain land usage profiles are eligible for subsidies payment.
According to Art. 104 \u00a71 BV, the agricultural sector, for its part, has to make a substantial decisive contribution to:
In order to be able to claim direct payments, farms are subject to various conditions. The Cultivable agricultural area layer (\u00abLN\u00bb, from German Landwirtschaftliche Nutzfl\u00e4che) is a GIS product maintained by the cantonal agricultural offices and serves as the key calculation index for the receipt of contributions. (cf. Art. 35 DZV).
The registration and adjustment of the LN is part of the periodic update (\u00abPNF\u00bb, Periodische Nachf\u00fchrung) within the framework of the official cadastral survey (\u00abAV\u00bb, Amtliche Vermessung) and is usually carried out every 6 years (Gamma 2021). Its correct determination is of immense importance, because if the LN area derived from the cadastral survey data deviates from the actual conditions on site, incorrect contribution amounts may be paid out (swisstopo/BLW/BUWAL 2000).
Farm areas that are not eligible for contributions, in particular areas that are not usable for effective agriculture such as farmyards or storage areas (e.g. for silage hay bales), are constantly changing due to the high degree of mechanization in agriculture and often fall within the perimeter of the LN. The tracking of these areas with conventional surveying such as repeated field visits or the visual interpretation of current aerial imagery proves to be very time-consuming and costly. Possible alternative approaches are searched for in the context of this use case project.
Artificial neural networks based on Deep Learning (DL) have been used for automated detection and classification of image features for quite some time. Reliable detection from aerial imagery using applications of DL would enable cost-effective detection of uneligible areas and provide added value to agricultural offices in all cantons.
The Swiss Territorial Data Lab (STDL) is a project of co-creation and a space of experimentation which aims to solve concrete problems of public administrations by using data science applied to geodata. These characteristics make it the perfect environment to conduct this project. Research in the agricultural domain was already lead by project's partners at Fachhochschule Nordwestschweiz (FHNW) using machine learning. Furthermore, students are regularly involved in these projects, for example to automatically define the agricultural cultivation boundaries in collaboration with the Canton of Thurgau.
"},{"location":"PROJ-TGLN/#12-silage-bales","title":"1.2 Silage Bales","text":"Photo of wrapped and stacked silage hay bales (Source Wikimedia).
One of several features of interest specifically excluded from the subsidized cultivable LN area are silage hay bales. These bales are processed and compacted fermenting grass cuttings wrapped in plastic foil. They often roughly measure 1 - 2 cubic meters in volume and are weighed in at around 900kg. They are mainly used as animal food during winter when no fresh hay is available. Farmers are encouraged to compactly (\u00abdiscretely\u00bb) stack them in regular piles at few locations rather than keeping them in scattered collections consuming large areas.
The agricultural office can assess the silage bale stack locations and sizes in order to approve the application for subsidies, since areas where silage bales are stored do not count into the cultivable LN area. Farmers can specify those areas where they must not receive contributions for in a specialized webGIS system by digitizing them manually with the attribute \u00abcode 898\u00bb. For validation purposes specialists are manually evaluating aerial imagery and conduct field visits. The process of aerial imagery evaluation is arduous and monotonous and could therefore greatly profit from automatization.
The agricultural office of the Canton of Thurgau (LWA) requested a spatial vector layer indicating locations and area consumption extent of the largest silage bale deposits intersecting with the known LN area. The delivered dataset should be compatible with their webGIS workflow and should be made available with new aquisitions of aerial imaging campaigns. Having such detections readily available would reduce the workload of the responsible official by directing the monitoring to the relevant hotspots. Ultimately public economical damage can be prevented which would result from the payout of unjustified subsidy contributions. This project therefore aims at the development of an efficient silage bale detection algorithm which offers a highly accurate performance and can be quickly deployed over imaged areas as large as the complete canton of Thurgau (approx. 992 km\u00b2).
"},{"location":"PROJ-TGLN/#2-method","title":"2 Method","text":""},{"location":"PROJ-TGLN/#21-overview","title":"2.1 Overview","text":"Sileage bale stacks are clearly visible on the newest 2019 layer of the 10cm Swissimage orthophoto provided by Swisstopo. A few hundred of these stacks were manually digitized as vector polygons with QGIS in a semi-automatic approach.
Following the structure of the STDL Object Detection Framework, an Area of Interest (AoI) was defined (most of the cantonal area of Thurgau) and tiled into smaller quadratic images (tiles). Those tiles containing an intersecting overlap with an annotation were subsequently fed to a neural object detection network for training in a process known as Transfer Learning. A random portion of the dataset was kept aside from the training process in order to allow an unbiased evaluation of the detector performance.
Multiple iterations were performed in order to find out near-optimal input parameters such as tile size, zoom level, or network- and training-specific variables termed \u00abhyperparameters\u00bb. All detector models were evaluated for their prediction perforwmance on the reserved test dataset. The best model was chosen by means of its optimal overall performance.
This model was used in turn to perform a prediction operation (\u00abInference\u00bb) on all tiles comprising the AoI \u2013 thereby detecting silage hay bale stacks over the whole canton of Thurgau.
Postprocessing included filtering the resulting polygons by a high confidence score threshold provided by the detector for each detection in order to reduce the risk of false positive results (misidentification of an object as a silage bale stack). Subsequently adjacent polygons on seperate tiles were merged by standard vector operations. A spatial intersection with the known LN layer was performed to identify the specific areas occupied by silage stacks which should not receive contributions but potentially did in last years rolling payout. Only stacks covering more than 50m2 of LN area are considered \u00abrelevant\u00bb for the final delivery which translates to the equivalent of max. 10 CHF subsidy payment difference. For completeness, all LN-intersecting polygons of detections covering at least 20m2 are included in the finaly delivery. Filtering can be undertaken easily on the end user side by sorting the features with along a precalculated area column.
"},{"location":"PROJ-TGLN/#22-aerial-imagery","title":"2.2 Aerial Imagery","text":"The prototypical implementation uses the publically available Swissimage dataset. It was last flown for Thurgau in spring 2019 and offers a maximum spatial resolution of 10cm GSD (Ground Sampling Distance) at 3 year intervals. As the direct subsidies are paid out yearly the periodicity of Swissimage in theory is insufficient for annual use. In this case the high quality imagery on the one hand can serve as a proof of concept though. On the other hand the cantons have the option to order own flight campaigns to increase the periodicity of available aerial imagery if sufficient need can shown from several relevant administrative stakeholders. For our approach aerial images need to be downloaded as small quadratic subsamples of the orthomosaic called \u00abtiles\u00bb to be used in the Deep Learning process. The used tiling grid system follows the slippy map standard with an edge length of 256 pixels and a zoom level system which is derived from a quadaratic division on a mercator-projected world map (whole world equals zoom level = 0). A zoom level = 18 in this system would roughly equal to a ground sampling distance (GSD) of 60 cm.
"},{"location":"PROJ-TGLN/#23-labels-annotations","title":"2.3 Labels / Annotations","text":"As no conducive vector dataset for silage bale locations exists in Thurgau or other sources known at this point, the annotations for this use case had to be created manually by the data scientists at STDL. A specific labeling strategy to obtain such a dataset was therefore implemented.
Using Swissimage 10cm as a WMS bound basemap in QGIS, a few rural areas throughout the canton of Thurgau were selected and initially approximately 200 stacks of silage bales were manually digitized as polygons. Clearly disjunct stacks were digitized as two separate polygons. For partially visible stacks only visible parts were included. Loose collection of bales were connected into one common polygon if the distances between the single bales were not exceeding the diameter of a single bale. Ground imprints where silage bales were previously stored were not included. Also shadows on the ground were not part of the polygon. Plastic membrane rests were not included unless they seemed to cover additional bales. Most bales were of circular shape with an approximate diameter of 1.2 \u2013 1.5 m, but also smaller rectangular ones were common. Colours ranged from mostly white or green tinted over still common dark green or grey to also more exotic variants such as pink, light blue and yellow (the latter three are related to a specific cancer awareness program).
Image: Example of the annotation rules.
With these initial 200 annotations a preliminary detector was trained on a relatively high zoom level (18, 60cm GSD, tiling grid at about 150m) and predictions were generated over the whole cantonal area (See section \u00abTraining\u00bb for details). Subsequently, the 300 highest scoring new predictions (all above 99.5%) were checked visually in QGIS, precisely corrected and then transferred into the training dataset.
Image: Example of label annotations manually drawn (left and top), as well as semiautomatically generated (right) \u2013 the pixel structure of the detector is visible in the label.
All tiles containing labels were checked visually again at full zoom and missing labels were created manually. The resulting annotation dataset consists of approximately 700 silage bale stacks.
Image: Positions of the Silage Bale Labels (red) within the borders of Thurgau.
"},{"location":"PROJ-TGLN/#24-training","title":"2.4 Training","text":"Training of the model was performed with the STDL Object Detection Framework. The technology is based on a Mask RCNN architecture implemented with the High-Level API Detectron2 and the Deep Learning framework Pytorch. Parallelisation is achieved with CUDA-enabled GPUs on the High-Performance Computing cluster at the FHNW server facility in Muttenz. The Mask RCNN Backbone is formed by a ResNet-50 implementation and is accompanied by a Feature Pyramid Network (FPN). This combination of code elements results in a neural network leveraging more than 40 Mio. parameters. The dataset consists of RGB images and feature regions represented by pixel masks superimposing the imagery in the shape of the silage bale stack vectors.
Training is performed iteratively by presenting subsets of the tiled dataset to modify \u00abedge weights\u00bb in the network graph. Progress is measured step by step by statistically minimizing the loss functions. Only tiles containing masks (labels) can be trained. Two smaller subsets of all label containing tiles are reserved from the training set (TRN), so a total of 70% of the trainable tiles are presented to the network for loss minimization. The validation set (VAL, 15%) and the test set (TST, 15%) also contain labels but are statistically independent from the TRN set. The VAL set is used to perform recurrent evaluation during training. Training can be stopped if the loss function on the validation set has reached a minimum since after that point further training would push the model into an overfitting scenario. The TST set serves as an unbiased reserve to evaluate the detector performance on previously \u00abunseen\u00bb, but labelled data. Tiles not containing a label yet were classified into a separate class called \u00abother\u00bb (OTH). This dataset was only used for generating predictions.
Image: Dataset Split \u2013 Grey tiles are only used in prediction (OTH); they do not contain any labels during training. The colourful tiles contain labels, but are scattered relatively sparsely. Green tiles are used for training the model weights (TRN); orange tiles validate the learning progress during training to avoid overfitting (VAL) and blue tiles are reserved for unbiased post-training evaluation (TST).
Multiple training runs were performed not only to optimize the network-specific variables called \u00abhyper-parameters\u00bb (such as batch size, learning rate or momentum), but also to test which zoom level (spatial resolution) would yield the best results.
"},{"location":"PROJ-TGLN/#25-prediction-and-assessment","title":"2.5 Prediction and Assessment","text":"For the TRN, VAL and TST subset, confusion matrix counts and classification metrics calculations can be performed since they offer a comparison with the digitized \u00abground truth\u00bb. For all subsets (including the rest of the canton as OTH), predictions are generated as vectors covering those areas of a tile that the detector algorithm identifies as target objects and therefore attributes a confidence score.
In case of the label containing tiles, the overlap between the predictions and the labels can be checked. Is an overlap found between a label and a prediction this detection is considered a \u00abTrue Positive\u00bb (TP). If the detector missed a label entirely this label can be considered as \u00abFalse Negative\u00bb (FN). Did the detector predict a silage bale stack that was not present in the labelled data it is considered \u00abFalse Positive\u00bb (FP). On the unlabelled OTH tiles, all detections are by definition therefore considered FP.
The counting of TPs, FPs and FNs on the TST subset allows the calculation of standard metrics such as precision (user accuracy), recall (producer accuracy) and F1 score (a common overall performance metric calculated as the harmonic mean of precision and recall). The counts, as well as the metrics can be plotted as function of the minimum confidence score threshold (THR) which can be set to an acceptable percentage for a certain detection task. A low threshold should generally yield fewer FN errors, while a high threshold should yield fewer FP detections.
The best performing model by means of maximum F1 score was used to perform a prediction run over the entire cantonal surface area.
"},{"location":"PROJ-TGLN/#26-post-processing","title":"2.6 Post-Processing","text":"In order to obtain a consistent result dataset, detections need to be postprocessed. Firstly, the confidence score threshold operation is applied. Here, a comparatively high threshold can be used for this operation. \u00abMissing\u00bb the detection of a silage bale stack (FN) is not as costly for the analysis of the resulting dataset at the agricultural office as analyzing large numbers of FP detections would be. Also missing single individual silage bales is much less problematic than missing whole large stacks. These larger stacks are typically attributed with high confidence scores though and are therefore less likely to be missed.
In some cases, silage bale stacks cross the tiling grid and are therefore detected on multiple images. This results in edge artifacts along the tile boundaries intersecting detections that should be unified. For this resaon adjacent detection polygons need to merged into a single polygon. This is achieved by first buffering all detections with a 1.5m radius (about the diameter of a single bale). Then all touching polygons are dissolved into a single feature. Afterwards, negative buffering with -1.5m radius is applied to restore the original boundary. This process also leads to an edge smoothing by planing the pixel step derived vector boundary into curves.
Image: Example of adjacent detection polygons that need to be unified (buffer dissolved).
Curve polygons contain a high number of vertex points, which is why a simplification operation can be performed afterwards. The intersection with the LN layer required a preparation of that dataset. First, the perimeters of all LN polygons in Thurgau, stemming from the cadastre, were intersected with the layer \"LN difference\". Areas which contained the attribute \"No LN\" in the difference layer were therefore removed, areas with the attribute \"LN\" or \"To be checked\" were kept or if necessary (if not yet available) added to the LN dataset. Areas excluded by farmers from the subsidy themself (so-called \"layer code 898\") were removed from the LN polygons. The silage bale detections were now intersected (clipped) with all remaining LN areas such that only those portions of the detections remained that were present within the LN perimeter. For all these leftover detection polygons, the area is calculated and added as an attribute to the polygon. With a threshold operation all silo bale stacks with an area below 20 m2 are filtered out of the dataset in order to provide only economically relevant detections.
"},{"location":"PROJ-TGLN/#3-results","title":"3 Results","text":""},{"location":"PROJ-TGLN/#31-metrics-and-evaluation","title":"3.1 Metrics and Evaluation","text":"Figure: Performance of serveral detectors depending on zoom level (ground sampling distance) as measured by their maximum F1-Score.
The model trained with tiles at zoom level = 19 (every pixel approx. 30cm GSD) showed the highest performance with a maximum F1 Score of 92.3%. Increasing the resolution even further by using 15 cm/px GSD did not result in a gain in overall detection performance while drastically increasing storage needs and computational load.
Figure: Confusion matrix counts on the TST dataset in dependency of the minimum confidence score threshold.
The detector model is performing very well on the independent TST dataset detecting the largest portion of silage bale stacks at any given confidence threshold. The number of FP reaches very low counts towards the higher end of the threshold percentage.
Figure: Performance metrics on the TST dataset as a function of the minimum confidence score threshold.
Precision, Recall and F1 Score all remain on very performant values throughout the threshold range. The F1 Score plateaus above 90% performance between 5% and 93% essentially allowing to choose any given threshold value to adapt the model performance to the end user needs.
For delivery of the dataset a detector was subsequently used at a threshold of 96%. At this value 809 silage bale stacks were rediscovered in the TRN, TST and VAL subset. Just 10 FP detections were found in these subsets. 97 silage bale stacks were not rediscovered (FN). Hence, the model precision (user accuracy) was set at approx. 99% and the recall (hit rate, producer accuracy) was set at approx. 89%.
The applied model detected a total of 2\u2019473 additional silage bale stacks over the rest of the canton of Thurgau (FP on OTH).
"},{"location":"PROJ-TGLN/#32-examples","title":"3.2 Examples","text":"Image: Raw detections (yellow) of silage bale stacks displaying very high confidence scores.
Image: Raw detections (yellow) and postprocessed detections (red) \u2013 the area occupied by these silage bale stacks does not interesect with the Cultivable land (LN, green hatched). Direct subsidies are correctly paid out in this case.
"},{"location":"PROJ-TGLN/#33-relevant-features-for-delivery","title":"3.3 Relevant Features for Delivery","text":"In total, 288 silage bale stack sections are placed within the subsidized LN area and exhibit an area consumption larger than 20m\u00b2. 87 silage bale stacks consume more than 50m\u00b2, 24 stacks consume more than 100m\u00b2. One has to keep in mind that many stacks only partially intersect with the LN layer. The overlap between all detected silage bale stacks over 20m\u00b2 and the LN layer amounts to 14\u2019200m\u00b2 or an estimated damage between CHF 1'420.- and CHF 2'840.- (assuming the subsidy payout ranges between CHF 10.- and CHF 20.- per 100m\u00b2). Considering only the overlap of the 87 largest stacks with the LN layer the area consumption amounts to 7\u2019900m\u00b2 or a damage between CHF 790.- and CHF 1'580.-.
Image: Undeclared silage bale stack (red and yellow) that intersects with the cultivable land layer \u00abLN\u00bb (green).
Image: The left side silage bale stack (red) is only touching the LN area (green). The center bottom silage bale stack is completely undeclared within the LN area.
Image: Approximately half of the center silage bale stack (red) is undeclared and situated within the LN area.
Image: This farm selfdeclared almost all areas needed (blue) for silage bales (red) to be excluded from direct subsidies areas (green). Pink areas are already pre-excluded by the agricultural office.
Image: The intersection between the silage bale stack (red) and the LN area (green) is so minute, that it should not be found within the delivery dataset to the agricultural office.
Image: Small silage bale stacks in the very left and very right of the image (yellow) are undeclared but each detection falls below the relevance threshold.
"},{"location":"PROJ-TGLN/#4-discussion","title":"4 Discussion","text":""},{"location":"PROJ-TGLN/#41-feedback-by-the-agricultural-office","title":"4.1 Feedback by the Agricultural Office","text":"The contact person at the agricultural office, Mr. T. Froehlich describes the detections as very accurate with a very low percentage of wrong detections. As a GIS product the detections layer can be used in the standard workflow in order to cross-check base datasets or to perform updates and corrections.
On an economical scale the damage from misplaced sileage bale stacks in the LN areas is not negligible but also not extremely relevant. Federal annual direct agricultural subsidies of approx. 110 Mio. CHF stand in stark contrast to the estimated economical damage of maybe approx. CHF 2'000.- that misplaced silage bales might have caused for the Canton of Thurgau in 2019.
Most farmers adhere to the policies and false declaration of areas followed by sanctions is extremely rare. Silage bales are therefore not the first priority when monitoring the advancements and updates considering the LN layer. Nevertheless these new detections allow the end users at the agricultural office to direct their eyes more quickly at relevant hotspots and spare them some aspects of the long and tidious manual search that was performed in the past.
"},{"location":"PROJ-TGLN/#42-outlook","title":"4.2 Outlook","text":"Silage bales are by far not the only object limiting the extent of the cultivable subsidized land. A much larger area is consumed by farm yards \u2013 heterogenous spaces around the central farm buildings. Monitoring the growth of these spaces into the LN layer would greatly diminuish the manual workload at the agricultural office. As these spaces might also be detectable by a similar approach, this project will continue to investigate the potential of the STDL Object Detection Framework now into this direction.
"},{"location":"PROJ-TGLN/#references","title":"References","text":"Federal Office of Topography swisstopo (2020). SWISSIMAGE 10 cm - The Digital Color Orthophotomosaic of Switzerland. https://www.swisstopo.admin.ch/en/geodata/images/ortho/swissimage10.html
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448). https://openaccess.thecvf.com/content_iccv_2015/html/Girshick_Fast_R-CNN_ICCV_2015_paper.html
He, K., Gkioxari, G., Doll\u00e1r, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969). https://arxiv.org/abs/1703.06870
OpenStreetMap Foundation (2021). Slippy Map. https://wiki.openstreetmap.org/wiki/Slippy_Map
QGIS.org (2021). QGIS Geographic Information System. QGIS Association. https://qgis.org/en/site/
Wu, Y., Kirillov, A., Massa, F., Lo, W. Y., & Girshick, R. (2019). Detectron2. https://github.com/facebookresearch/detectron2
Adrian Meyer (FHNW) - Alessandro Cerioni (Canton of Geneva)
Proposed by the Canton of Thurgau - PROJ-TGPOOL January 2021 to April 2021 - Published on April 21, 2021
Abstract: The Canton of Thurgau entrusted the STDL with the task of producing swimming pool detections over the cantonal area. Specifically interesting was to leverage the ground truth annotation data from the Canton of Geneva to generate a predictive model in Thurgau while using the publicly available SWISSIMAGE aerial imagery datasets provided by swisstopo. The STDL object detection framework produced highly accurate predictions of swimming pools in Thurgau and thereby proved transferability from one canton to another without having to manually redigitize annotations. These promising detections showcase the highly useful potential of this approach by greatly reducing the need of repetitive manual labour.
"},{"location":"PROJ-TGPOOL/#introduction","title":"Introduction","text":"Until February 2021 the Swiss Territorial Data Lab developed an approach based on Mask RCNN Deep Learning algorithms for the detection of objects on aerial images, with swimming pools serving as a demonstration object. The official cadastres of the Canton of Thurgau include \u2013 among many other objects \u2013 the registration of larger private swimming pools that are permanently anchored in the ground.
The challenge is to keep the cadastre up to date on a regular basis which is usually done manually by surveying or verification with aerial imagery. Because the Canton of Thurgau (unlike the Canton of Geneva) does not maintain an own specific register of swimming pools, this study primarily serves as a technology demonstration.
A secondary goal encompasses detailed knowledge transfer from the data scientist team at the STDL to the cantonal authorities such as providing insight and interpretation guidance into the performance metrics and raising awareness for the prerequisites of the detector framework.
"},{"location":"PROJ-TGPOOL/#methodology","title":"Methodology","text":""},{"location":"PROJ-TGPOOL/#process-overview","title":"Process Overview","text":"Generating a Model from Cadastral Vectors and Aerial Images to Predict Objects in the Same or a New Area of Interest (AoI).
The STDL object detection framework is based on a bipartite approach of training and inference. This means that a predictive model is statistically adapted to known and verified data (\"training\") in order to then generate classification predictions on new, unknown data (\"inference\"). To achieve this we resample large high-resolution orthophoto mosaics by decomposing them into small square image tiles on which vectorized annotations of swimming pools are drawn.
Verified vector annotation data (\"ground truth\") for the training process was available for the cantonal area of Geneva, as well as for a smaller part of the cantonal area of Neuch\u00e2tel covering a total of almost 5'000 swimming pools present in 2019.
The predictive model used is a convolutional neural network developed for computer vision (Mask RCNN). It was trained on a high performance computing cluster at the University of Applied Sciences Northwestern Switzerland FHNW using the open source Detectron2 object detection library.
During inference, pixel-precise vector contours (\u201csegments\u201d) are produced over the tiled imagery of the canton of Thurgau. Each segment is attributed a confidence score which indicates the certainty of the detections when applied to new data. Using this score as a threshold level, performance metrics are computed in post-classification assessment.
"},{"location":"PROJ-TGPOOL/#ground-truth-dataset","title":"Ground Truth Dataset","text":"Label annotations are derived from cadastral data and manually curated
Vector ground truth annotations demarcating private swimming pools were available at two locations: A near-complete coverage of the cantonal area of Geneva which contains 4\u2019652 known objects, as well as a smaller subsection of the cantonal area of Neuchatel which contains 227 known objects. Label annotations in both cases are derived from cadastral surface vector datasets and then manually curated/verified. In case of the Geneva dataset the manual verification was performed by STDL data scientists in a previous study; in case of the Neuchatel dataset the manual verification was performed by the local cadastre experts.
"},{"location":"PROJ-TGPOOL/#reference-data-and-area-of-interest","title":"Reference Data and Area of Interest","text":"Approximately 5000 cross checked swimming pool annotations are available as vectorized shapes in the Cantons of Geneva and partially in Neuch\u00e2tel. They are compatible with orthophotos from 2018/19 such as the latest SWISSIMAGE 10cm layer.
The Area of Interest (AoI) for all tests conducted in this study are divided into two main sections:
Those areas in Geneva and Neuchatel containing vectorized ground truth labels are used as \u201cTraining AoI\u201d.
The cantonal area of Thurgau is used as \u201cPrediction AoI\u201d.
Only those parts of the cantonal surface of Thurgau are used as Prediction AoI which are designated as relevant settlement areas. For this purpose the STDL has received two additional reference datasets from the canton of Thurgau:
Vector layer: List of all water basins from the official survey; 3'131 objects.
Vector layer: Settlement areas / construction zones to delimit the study area.
2,895 objects from the water basin layer are located wholly or partially within the \u201cPrediction AoI\u201d. Only these objects were used for analysis (see Figure 4, light green objects). For each grid square, an image file with 256x256 pixels edge length and 60cm GSD was generated by WMS. Metadata and georeferencing were stored in an associated JSON. A quick qualitative review of the Thurgau datasets in QGIS revealed two limitations of the datasets.
About 7,5% of the water basins are not located in the selected settlement area (e.g., on remote farmsteads or mixed industrial / commercial zones), so no detection attempt was initially undertaken for areas encompassing these objects. It is important to note that there are some objects in the water basin layer that are not comparable to private swimming pools in shape or size, such as public large scale swimming pools, but also sewage treatment plants, silos, tanks, reservoirs, or retention dams. By limiting the Prediction AoI to residential areas and adjacent land, the largest portion of these objects could be excluded.
Example of a water treatment plant that appears in the \u201cwater basin layer\u201d and had to be excluded by limiting the \u201cPrediction AoI\u201d to residential and adjacent areas.
To additionally calculate metrics on the quality of this reference dataset vs. the quality of the detections a small area over the city of Frauenfeld (Thurgau) containing approximately 100 swimming pools was manually curated and verified by the STDL data scientists.
"},{"location":"PROJ-TGPOOL/#orthocorrected-imagery","title":"Orthocorrected Imagery","text":"Orthoimagery tiles of 150m/256px edge length containing labelled annotations
Both AoIs are split by a regular checkerboard segmentation grid into squares (\u201ctiles\u201d), making use of the \u201cSlippy Map Tiles\u201d quadtree-style system. The image data used here was tested with different zoom level resampling resolutions (Ground Sampling Distance, GSD) between 30 cm and 480 cm edge length per pixel while maintaining a consistent extent of 256x256 pixels. Query of the imagery was undertaken using public web map services such using common protocols such as WMS or the MIL standard.
Three separate imagery sources were used over the course of the study. The 10cm GSD RGB orthophotomosaic layer SWISSIMAGE of Swisstopo was the primary target of investigation as it was used as the basis of prediction generation over the cantonal area of Thurgau. SWISSIMAGE was also used as the imagery basis for most of the training test runs over the ground truth areas of Geneva and Neuchatel. Additionally, a model was trained leveraging combined cantonal orthophoto imagery from Geneva (SITG) and Neuchatel (SITN) to comparatively test the prediction performance of such a model on the unrelated SWISSIMAGE inference dataset in Thurgau.
As it was known from the STDL\u2019s previous work, that the usage of tiles exhibiting a GSD of ~60cm/Px (tile zoom level 18) offered a decent tradeoff between reaching high accuracies during training while keeping computational effort manageable this approach was used for the test using the own cantonal imagery of Geneva and Neuchatel.
Using SWISSIMAGE for training, zoom levels in a range between 15 (~480 cm/Px) and 19 (~30 cm/Px) were tested.
"},{"location":"PROJ-TGPOOL/#training","title":"Training","text":""},{"location":"PROJ-TGPOOL/#transfer-learning","title":"Transfer Learning","text":"The choice of a relevant predictive approach fell on a \u201cCOCO-pretrained\u201d deep learning model of the type \"ResNet 50 FPN\" structured in a \u201cMask-RCNN\u201d architecture and implemented with Python and the Detectron2 API. In a transfer learning process about 44 million trainable statistical parameters are adapted (\u201cfinetuned\u201d) as edge weights in a pretrained neural network graph through a number of iterations trying to minimize the value of the so-called \u201closs function\u201d (which is a primary measure for inaccuracy in classification).
Transfer Learning is common practice with Deep Learning models. The acquired knowledge gained from massive datasets allows an adaptation of the model to smaller new datasets.
Training is performed through highly multithreaded GPU parallelisation of the necessary tensor / matrix operations to speed up training duration. For this purpose the vector annotations are converted into pixel-per-pixel binary masks which are aligned with the respective input image.
Network- or Training-specific pre-set variables (\u201chyperparameters\u201d) such as learning rate, learning rate decay, optimizer momentum, batch size or weight decay were either used in their standard configuration or iteratively manually tuned until comparatively high accuracies (e.g. by means of the F1-Score) could be reached. More systematic approaches such as hyperparameter grid search or advanced (e.g. Bayesian) optimization strategies could be implemented in follow-up studies.
"},{"location":"PROJ-TGPOOL/#dataset-split","title":"Dataset Split","text":"Tiles falling into the \u201cTraining AoI\u201d but not exhibiting any intersecting area with the Ground Truth Labels are discarded. The remaining ground truth tile datasets are randomly sampled into three disjunct subsets:
The \u201cTraining Subset\u201d consists of 70% of the ground truth tiles and is used to change the network graph edge weights.
The \u201cValidation Subset\u201d consists of another 15% of the ground truth tiles and is used to validate the generalization performance of the network during training. The iteration cycling is stopped when the loss on the validation dataset is minimized.
The \u201cTest Subset\u201d consists of the last 15% of the ground truth tiles and is entirely reserved from the training process to allow for independent and unbiased assessment in the post processing.
Subdivision of Ground Truth Datasets
"},{"location":"PROJ-TGPOOL/#inference-and-assessment","title":"Inference and Assessment","text":"After training, tile by tile the entire \u201cPrediction AoI\u201d as well as the ground truth datasets presented to the final model for prediction generation. From a minimum confidence threshold up to 100% the model produces a segmentation mask for each swimming pool detection delimiting its proposed outer boundary. This boundary can be vectorized and transformed back from image space into map coordinates during post-processing. Through this process we can accumulate a consistent GIS-compatible vector layer for visualization, counting and further analysis.
In case of the ground truth data the resulting vector layer can be intersected with the original input data (especially the \u201cTest Subset\u201d) to obtain unbiased model performance metrics. In case of a well-performing model the resulting vector layer can then be intersected with the \u201cPrediction AoI\u201d-derived Thurgau dataset to identify missing or surplus swimming pools in the cadastre.
"},{"location":"PROJ-TGPOOL/#results","title":"Results","text":""},{"location":"PROJ-TGPOOL/#metrics-and-model-selection","title":"Metrics and Model Selection","text":"Results of different training runs using SWISSIMAGE depending on the chosen zoom level
The choice of a correct confidence threshold (\"THR\") is of central importance for the interpretation of the results. The division of a data set into true/false positives/negatives is a function of the confidence threshold. A high threshold means that the model is very confident of a detection; a low threshold means that as few detections as possible should be missed, but at the same time more false positive (\"FP\") detections should be triggered.
Results of different training runs using SWISSIMAGE depending on the chosen zoom level
There are several standardized metrics to evaluate model performance on unknown data. The most important are \"Precision\" (user accuracy), \"Recall\" (hit rate or producer accuracy) and \"F1 Score\" (the mathematical harmonic mean of the other two). \"Precision\" should increase with higher THR, \"Recall\" should decrease. The maximum F1 Score can be used as a measure of how well the model performs regardless of the viewing direction.
Results of different training runs using SWISSIMAGE depending on the chosen zoom level
Using the cantonal orthomosaics as training input with zoom level 18 the F1 Score reached a maximum of 81,0%. Using SWISSIMAGE as training input with zoom level 18 a slightly higher maximum F1 Score of 83,4% was achieved resulting in the choice of a \u201conly SWISSIMAGE\u201d approach for both, training and inference.
The best detection by means of maximum F1 Score was reached using tiles with zoom level 19 displaying a GSD of approx. 30 cm/Px. Since the Slippy Map tile system is based on equal division of squares increasing the zoom level by one step results roughly in quadrupling the number of tiles presented for analysis. Hence also computational demand increases with an exponential factor in particular for file system read/write and sequential processing operations if the zoom level is increased.
On the other hand increasing the zoom level (and therefor the GSD) also boosts visibility and size of the target objects which in turn increases detection accuracy. Comparatively slight increases in F1 Score between zoom levels 17, 18 and 19 suggest an asymptotic behaviour where the usage of massively higher amounts computing resources will not result in a much higher detection accuracy any longer. Zoom level 20 (GSD~15cm/Px) was not computed for this reason.
"},{"location":"PROJ-TGPOOL/#true-positives","title":"True Positives","text":"A detection is considered \"True Positive\" (TP) if the algorithm detected a pool that was listed at the same position in the cadastral layer. Setting the threshold very low (THR \u2265 5%), 2'227 of 2\u2019959 swimming pools were detected. This corresponds to a detection proportion of 75% of the recorded water pools. Conversely, this could mean that 25% or 732 objects are False Negatives and therefore \"erroneously\" recorded in the cadastre as swimming pools or missed by the algorithm.
\u201cTrue Positive\u201d detections \u2013 note that cases of empty and covered swimming pools are detected with a very high confidence threshold in this example.
"},{"location":"PROJ-TGPOOL/#false-negatives","title":"False Negatives","text":"FN describe those objects that the algorithm completely failed to detect, no matter what threshold is set. A total of 732 objects were not detected. FN easily occur when there are obvious discrepancies between orthophoto and cadastre - for example, a pool may have been constructed after the time of flight.
The combined number from FN and TP corresponds to the number of analyzed labels from the water pool layer (2\u2019959 objects). Due to the splitting of pools at the segmentation grid boundaries, this value is slightly higher than the 2\u2019895 objects that were in the \u201cPrediction AoI\u201d. Here, only objects larger than 5m\u00b2 in area were counted, since the segmentation grid cuts some pools into several parts and tiny residual of only a few pixels in total area polygons might otherwise be counted as FN even though the largest part of a swimming pool was detected (and therefore counted as TP).
\u201cFalse Negatives\u201d \u2013 (Left) An obvious mismatch between the cadastre and the orthophoto, an update should be considered. (Right) An ambiguous swimming pool which might be covered by a white canvas and was therefore missed by the detector.
"},{"location":"PROJ-TGPOOL/#false-positives","title":"False Positives","text":"Swimming pools that were recognized as such in the orthophoto but are not found in the cadastre represent the FP group. If the threshold is set very low (e.g. THR \u2265 5%), a total of 9'427 additional pools would be found in the settlement area. However, this number is not realistic, since most of the detections at such a low threshold do not correspond to pools, but only mark image areas that are related to a pool in a very distant way.
Therefore, to get a better estimation of objects that really represent private pools but are still missing in the cadastre, the choice of a very high threshold is recommended. For example, the geoinformation services of the Canton of Geneva work with a threshold of THR \u2265 97%. Applying this threshold, 271 unrecorded swimming pools remain in the dataset with an extremely high probability of correct redetection (9% of the cadastre).
However, it is still worth looking at slightly less likely FP detections with a threshold of THR \u2265 90% here. Filtering with this value, a total of 672 unregistered swimming pools were found, which would correspond to 23% of the cadastre layer. At the same time the risk for clear errors by the object detector also increases at lower thresholds, leading to some misclassifications.
\u201cFalse Positive\u201d detections \u2013 (Top) Two clear examples of detected swimming pools that are missing in the cadastre. (Bottom Left) More ambiguous examples of detected swimming pools which might be missing in the cadastre. (Bottom Right) A clear error of the detector misclassifying a photovoltaic installation as a swimming pool.
"},{"location":"PROJ-TGPOOL/#conclusion","title":"Conclusion","text":""},{"location":"PROJ-TGPOOL/#manual-evaluation","title":"Manual Evaluation","text":"In the city of Frauenfeld a sample district was chosen for manual evaluation by a STDL data scientist. Even though this task should ideally be performed by a local expert this analysis does provide some insight on the potential errors currently existing within the cadastre as well as the object detection quality. Within the sampled area a total of 99 identifiable swimming pool objects were found to be present.
Table: Manually evaluated dataset accuracy vs. detector performance comparison. Green indicates the preferred value.
Overall, the STDL Detector was more accurate than the provided dataset with a F1 Score of ~90% vs. ~87%. Especially a lot fewer swimming pools (5 FN) were missing in the detections than in the cadastre (18 FN). Room for improvement exists with the False Positives, where our detector identified 16 surplus objects as potential swimming pools which could be falsified manually. At the same time only 9 surplus objects were found in the cadastre.
"},{"location":"PROJ-TGPOOL/#interpretation","title":"Interpretation","text":"We can conclude that the use of annotation data gathered in another canton of Switzerland allows for highly accurate predictions in Thurgau using the freely and publicly available SWISSIMAGE dataset. We demonstrate that such a transferrable approach can therefore be applied within a relatively short time span to other cantons without the effort of manually digitizing objects in a new area. This is supported by the assumption that SWISSIMAGE is of the same consistent radiometrical and spatial quality we see in Thurgau over the whole country.
Manual evaluation will stay paramount before authorities take for example legal action or perform updates and changes to the cadastre. Nevertheless a great amount of workload reduction can be achieved by redirecting the eyes of the experts to the detected or undetected areas that are worth looking at.
"},{"location":"PROJ-TGPOOL/#references","title":"References","text":"Federal Office of Topography swisstopo (2020). SWISSIMAGE 10 cm - The Digital Color Orthophotomosaic of Switzerland. https://www.swisstopo.admin.ch/en/geodata/images/ortho/swissimage10.html
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448). https://openaccess.thecvf.com/content_iccv_2015/html/Girshick_Fast_R-CNN_ICCV_2015_paper.html
He, K., Gkioxari, G., Doll\u00e1r, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969). https://arxiv.org/abs/1703.06870
OpenStreetMap Foundation (2021). Slippy Map. https://wiki.openstreetmap.org/wiki/Slippy_Map
QGIS.org (2021). QGIS Geographic Information System. QGIS Association. https://qgis.org/en/site/
Wu, Y., Kirillov, A., Massa, F., Lo, W. Y., & Girshick, R. (2019). Detectron2. https://github.com/facebookresearch/detectron2
Nils Hamel (UNIGE) - Huriel Reichel (FHNW) Supervision : Roxane Pott (swisstopo)
Project in collaboration with Geneva and Neuch\u00e2tel States - TASK-TPNL July 2021 to February 2022 - Published on February 22, 2022
Abstract: Deployment of renewable energy becomes a major stake in front of our societies challenges. This imposes authorities and domain expert to promote and to demonstrate the deployment of such energetic solutions. In case of thermal panels, politics ask domain expert to certify, along the year, of the amount of deployed surface. In front of such challenge, this project aims to determine to which extent data science can ease the survey of thermal panel installations deployment and how the work of domain expert can be eased.
"},{"location":"PROJ-TPNL/#introduction","title":"Introduction","text":"For authorities, being able to track the deployment of renewable energy is becoming a major challenge in front of stakes of our societies. In addition, following the deployment of installations on territory is difficult, as construction permits are not sufficient evidences. Indeed, the construction permits materialize a will, but the actual deployment and its specifications can differ from paperwork to reality. In case of thermal panels, domain experts are then put in front of a major challenge, as they have to certify of the surface of solar thermal energy that is deployed and active on their territory on a regular basis. This reporting is made for politics that aim to deploy a certain amount of renewable energy, part of territories energetic politic.
Mainly based on paperwork, the current survey of thermal panels deployment are affected by drawbacks. Indeed, it is currently complicated to determine whether a construction permit lead to the deployment of a thermal panel installation and if this installation is still in activity. The goal of this project is to determine if data science is able to provide new solutions for the survey of thermal energy production in order to report more accurate surface values to the politics.
"},{"location":"PROJ-TPNL/#research-project-specification","title":"Research Project Specification","text":"In this project, the goal is to determine whether it is possible to track down thermal panels installation on territory by using aerial images and deep learning methods. The main axis are :
Train a deep learning model on aerial images to detect thermal panels
Assess the performances of the deep learning model
Determine to which extent it is possible to link the predictions to existing domain expert database
This research project was made in collaboration with the States of Neuch\u00e2tel and Geneva. Both domain experts are facing similar challenges and their needs are nearly identical, despite their current processes differs. For each collaboration, the goals are similar, but the methodology is different. With Neuch\u00e2tel, the domain expert database is considered while with Geneva, the database is not considered.
Considering the database in the collaboration with Neuch\u00e2tel lead to a much larger amount of work, as the database need to be pre-processed before to put it into perspective of the deep learning network results. It is nevertheless important to be able to assess the possibility to insert our demonstrator in the existing procedures, that are used by domain expert to track thermal panels installations.
"},{"location":"PROJ-TPNL/#research-data-selected-areas","title":"Research Data & Selected Areas","text":"As mentioned, the best (and probably the only) solution to track down thermal panels is to use aerial images. Indeed, due to their nature, thermal panels are always visible on aerial images. Exceptions to this rule are unusual. In addition, aerial images are acquired regularly and a full set of orthomosaic can be easily obtained each five years (at least in Switzerland). For Geneva and Neuch\u00e2tel, it is not impossible to obtain a set of images each two years.
Nevertheless, using aerial images come with drawbacks. The main one is of course resolution (GSD). Aerial image sets used to compose orthomosaics are acquired to cover the whole territory. It follows that the resolution is limited. For a large amount of applications, the available resolution is sufficient. But for thermal panels, the resolution starts to become challenging.
Illustration of the resolution at which thermal panels can be viewed on aerial images - Data : swisstopo, SITG (GSD ~ 10 cm/pixel)Despite the resolution, aerial images are selected to train a deep learning network. Mainly SWISSIMAGE from swisstopo are considered for this research project. At the time, the 2020 version of the orthomosaic is considered for both Neuch\u00e2tel and Geneva.
For both cases, a test area is defined. On the side of Neuch\u00e2tel, a large test area is chosen in order to cover a large portion of the territory that mixes constructed zones and more rural ones. On the side of Geneva, the test area is defined by the domain expert and consists of a rectangular zone.
Illustration of the test areas defined on Neuch\u00e2tel (left) and Geneva (right) - Data : swisstopoThe research project is then only focusing on portion of territory to keep the scale realistic for such demonstrator according to the available time.
"},{"location":"PROJ-TPNL/#deep-learning-model-initial-training","title":"Deep Learning Model Initial Training","text":"In this project, it is not possible to extract a ground truth, that is annotations on aerial images, from the domain expert databases. Thankfully, the FHNW, partner of the STDL, conducted some year ago annotations for thermal panels on the States of Aargau. The set consists of thousands of annotated tiles of 80x80m in size made on the SWISSIMAGE images set (2020). The annotation work was made by students of the FHNW and supervised by the D. Jordan scientists team.
Such data-set is exactly the required bootstrap data to train an initial deep learning model. The only constraint is coming from the fact that the ground truth is defined by the 80x80m wide tiles on which annotations are made.
Illustration of the FHNW ground truth - Labels in white, tiles in red - Data : swisstopo, FHNWSeveral training sessions are conducted in order to determine which sub-tiling system lead to the best performances scores. Due to the predefined ground truth, only sub-tiles of the 80x80m original tiles are possible. As a result, 80x80m, 40x40m and 26x26m tiles are considered for the network training.
In all training sessions, the results are quite stable around a F1-score of 0.8-0.85, with always a non-negligible proportion of false positives. The best results are obtained for the smaller tiles : 26x26m. It is unfortunate as small tiles comes with drawbacks. Indeed, using small tiles impose important tiling strategy to cover a large area. In addition, using small tiles also induce larger amount of cuts that have to be merged afterwards to create a usable geographical layer. Despite these drawbacks, as a demonstrator is desired, the performances are favored.
The following plot shows the precision, recall and F1-score obtained for the initial training using the data of the FHNW. These values are computed over the test set, that consists of 15% of the total data-set.
Scores obtained with the FHNW ground truth - Precision, Recall and F1-scoreOn the previous plot, the scores are all computed entity-wise and not pixel-wised. This choice is made to fit the main necessity of domain experts, which is to inventory thermal panel installations more than estimating their surfaces, which is a secondary goal. One can see that encouraging results are obtained, but one can also see that the F1-score plateau is not significantly marked, a sign that the model is not yet optimal, despite the large amount of data.
As we are working with domain experts, presenting F1-score according to threshold can be challenging and difficult to understand. During other research projects, it has been clear that efforts have to be put on our side to present the performances of our tools in a way that is informative and understandable by the domain exerts, in order to ensure a working collaboration and dialog, without which, such research projects can be difficult to conduct.
It is the reason why an alternate representation of the performances are introduced. It shows the performances of the neural network in a more compact and readable way, focusing on elements that are interesting for the domain experts and their real-world necessities. The proposed plot is as follows :
Simplified representation used with domain experts of the obtained scores - The green area is the true positives, the yellow one is false negatives and the red on is the false positive. The upper percentage give the inventory capacity, the lower one adding the false positive to the percentage.The bar is containing three proportions : the true positives, the false negatives and the false positives. The two first proportions are grouped into one in order to represent the capacity of the network to create a reliable inventory. It shows the amount of thermal panels detected over their total amount (recall). The overall bar adds the proportion of false positive, that are seen by domain experts as pollution of the obtained inventory. Showing this proportions indicates to the domain experts, in a simple way, how usable the inventory is.
"},{"location":"PROJ-TPNL/#assessment-and-improvement-of-the-model","title":"Assessment and Improvement of the Model","text":"This section is split into two separated parts, one for the Geneva case and the other for the Neuch\u00e2tel one, as the chosen strategy is different. The case of Geneva, with a more direct approach (not considering the domain expert pre-existing database), is presented first.
"},{"location":"PROJ-TPNL/#case-of-geneva","title":"Case of Geneva","text":"In the case of Geneva, the choice is made to not consider existing databases and to proceed on detecting thermal panel installations directly on images to create an inventory that can then be assessed by the domain expert to extract reliable performance scores.
"},{"location":"PROJ-TPNL/#assessment-by-the-domain-expert","title":"Assessment by the Domain Expert","text":"In order to produce the predictions over the test area, in this case defined by the domain expert, the area is split into tiles with the chosen size. The tiles are then sent to the deep learning network in order to produce the predictions of thermal panel installations. The following image shows the tiling system over the test area :
Illustration of the tiling system applied on the Geneva test area (26x26m tiles)A set of tiles is obtained with predictions on them. The optimal threshold, deduced from the initial training on the FHNW data-set, is used to filter the predictions over the test area of Geneva. The tiles containing no prediction are removed by an automated process. The other tiles are associated with the predictions geographical footprints that and stored in a shapefile to keep the format simple and easy to exchange with the domain expert.
By defining a common language with the domain expert on how to validate the predictions, the shapefile containing the predictions are sent to the domain expert along with the aerial images on which predictions are made. The role of the domain expert is to assess the predictions to indicate, on the tiles containing at least a prediction, the true positives, the false positives and the false negatives.
Illustration of the common language defined to assess the predictions - The domain expert simply puts a mark in the determined false positive and at the location of the false negative. The true positives are left untouchedBy assessing the prediction with a domain expert, ensure that the obtained scores are reliable, as thermal panels are difficult to identify on the aerial image for a non expert. Without assessing the predictions through a domain expert would lead to unreliable scores.
"},{"location":"PROJ-TPNL/#results","title":"Results","text":"The assessment of the predictions made by the domain expert lead to the following results on the test area. A total of 89 tiles are formally certified by the domain expert with the following counts :
Category Count TP 47 FP 63 FN 35On a total of 110 predictions on the certified tiles, 47 are true positives, 63 being false positives. A total of 35 missing predictions are pointed by the domain expert. It follows that 47 thermal panel installations are found over 47+35=82. This leads to the performances score for the initial deep learning over the Geneva test area model of :
Score Value Precision 0.43 Recall 0.57 F1 0.49From the inventory point of view, nearly 60% of the thermal panel installations are found by the initial deep learning model on the test area. This is clearly below the initial model, showing that the data-set are not sufficient to obtain stable results at this stage. The following plot shows the results presented in the simplified form :
Score obtained on Geneva with the initial deep learning model - Simple representationTaking into account the large amount of false positives, the initial training is clearly not at the desired level to be usable by the domain expert to produce a reliable geographical layer of thermal panel installations. But these number are important, as they are certified by a domain expert, ensuring the ground truth used to assess the prediction is reliable.
"},{"location":"PROJ-TPNL/#improvement-of-the-deep-learning-network","title":"Improvement of the Deep Learning Network","text":"With the assessment made by the domain expert, reliable scores are obtained. In addition, as predictions are marked as correct or incorrect, with addition of missing thermal panel installations on the certified tiles, it was possible to create an extension to the ground truth. Indeed, driven by the corrections of the domain expert, new annotations are made on the certified tiles, including true positives and false negatives.
These annotations are made by STDL on the images used to produce the predictions. The predictions in themselves are not sufficiently reliable to be directly translated into labels, and the false negative have to be added anyway.
Annotations created on the Geneva area driven by the assessment of the domain expert - The labels are in white and the certified tiles in redIn the case of Geneva, the ground truth extension is made on the same images used to produce the prediction. As the number of certified tiles is low, a strategy is tested in order to improve the ground truth extension. The idea consist in looking along the time dimension. Indeed, in Switzerland, aerial images are acquired in a regular basis. It follows that a history of aerial images is available.
The time range from 2000 to 2020 is then considered in terms of the available images. For each set of images, the annotation created on the 2020 image set are transferred to the older images. This process is not straightforward, as each prediction have to be checked to certify that the thermal panel installation is there on older images. In addition, each tile has to be checked individually in order to check that no older thermal panel installation was there and destroyed before 2020.
Illustration of the propagation of the ground truth along the time dimension - The image on the right illustrates the limit of the processBy doing this exploration along the time dimension, it was possible to increase the ground truth extracted from the assessment procedure made by the domain expert. From only 41 tiles and 64 annotations extracted using the initial test zone on the year 2020, 394 tiles and 623 annotations are obtained by considering the 2000 to 2020 time range for aerial images.
Considering the time dimension allows to better leverage the assessment made by the domain expert, despite the procedure is time-consuming. One has to keep in mind that such process is not ideal, as the same examples are simply repeated. It has some interest as it allows showing the same examples under different condition of luminosity and orientation, which can improve the deep learning model detection ability.
With this new ground truth, it was possible to re-train the initial network. This was done using both the FHNW initial ground truth and the annotations made on Geneva. The following results are obtained, shown using the simple representation :
Scores obtained on Geneva with consideration of the new annotations certified by the domain expert - Simple representationThis plot shows the results on the test set limited to the Geneva test area. Again, the test set contains 15% of the ground truth, and limiting it to the area of Geneva leads to only several tens of tiles. This amount of tiles is quite low to conclude on the encouraging results obtained with the extended ground truth. This is reinforced by the lack of stability already observed in the previous results.
"},{"location":"PROJ-TPNL/#conclusion","title":"Conclusion","text":"It is clear that the initial deep learning model, trained with the FHNW ground truth is not satisfying for a real-world usage by domain experts. Its ability to produce an inventory is not optimal, and the amount of false positives make the produced geographical layer difficult to use.
Nevertheless, reliable score are obtained and can be trusted on the Geneva area thanks to the domain expert assessment. In addition, the assessment made by the domain expert, as it also included the false negatives (at least on the considered tiles), allowed to extend the ground truth. The extension of the ground truth along the time dimension allows taking advantage of the work of the domain expert as much as possible, leading to more certified tiles.
The new training allowed to improve the situation on the Geneva area quite clearly. The inventory capacity of the deep learning model went from around 60% to around 80%. The amount of false positives is also drastically reduced. These are encouraging results, but the small amount of new tiles and the multiplication of the same examples along the time dimension has to lead us to a certain care, especially due to the instabilities of the results.
"},{"location":"PROJ-TPNL/#case-of-neuchatel","title":"Case of Neuch\u00e2tel","text":"The case of Neuch\u00e2tel is clearly more complex than the case of Geneva. In this case, the database of the domain expert is considered in order to try to link the predictions with the entries of the existing database. This choice is made to demonstrate the ability to integrate data science technology in existing pipeline, in order to avoid creating disruptive effect.
"},{"location":"PROJ-TPNL/#processing-and-linkage-with-the-domain-expert-database","title":"Processing and Linkage with the Domain Expert Database","text":"In the first stage, the domain expert database is analyzed in order to determine the best solution to link the prediction made by the deep learning model and the entries of the database.
The database in itself is a simple Excel sheet, with each line corresponding to a subsidy query that goes along the construction permit. Subsidies are provided by the authorities to promote deployment of the renewable energy. This is also a reason explaining the necessity for authorities to track down the construction of thermal panel installations.
The major issue with the database is the localization of the thermal panels installation. Along the years, the database being quite old, different ways of localizing the installation were used. Three different localization systems are then available : the postal addresses, the geographical coordinate and the EGID (federal building identifier). Unfortunately, these standards are mixed, and all entries are localized differently. Sometimes only one localization is available, sometimes two or three. In some cases, the different localization information are not consistent, which lead to contradictions in the installation position.
For some entries, the localization information is also incorrect or only approximate, which can lead to difficulties to associate a geo-referenced prediction to an entry of the database.
For these reason, lots of efforts are put on the pre-processing of the database to make the link between prediction and entries as reliable as possible. The RegBL (federal register of buildings and dwellings) is used to assess the EGID and the postal addresses and to track down contradiction. In addition, the post addresses of the State of Neuch\u00e2tel is also considered to match addresses with geographical positions for the same reason.
By doing this, many potential positions are extracted for each entry of the database. This allows to assess the contradiction in order to retain the most probable and reliable localization for each entry of the database. Of course, in many cases, the assessment is quite weak as the amount of information on localization is low (this is especially the case for older installation, the new one being localized in a much more reliable manner using the EGID).
At the end of this complex and time-consuming task, almost all entries of the database are associated with a geographical position. This allows to match predictions, that are geographically localized, to the most probable entry of the database. This process is important as it allows the domain expert to not only have a geographical layer of the thermal panel installation but to have also the link with its pre-existing database. This allows to put into perspective prediction and database to track down construction and destruction of installations along the time dimension.
"},{"location":"PROJ-TPNL/#assessment-by-the-domain-expert_1","title":"Assessment by the Domain Expert","text":"After pre-processing of the domain expert database, a region of Neuch\u00e2tel state is defined. A tiling strategy is made to translate the defined area in tiles of the appropriated size according to the initial training of the deep learning model. Predictions are then made on each of the tiles. Again, the optimal threshold is selected according to the initial training to filter the predictions made on the test area.
At this stage, the procedure differ from the case of Geneva. Here, tiles are not filtered based on their content of prediction or not. The database is considered, after its pre-processing, and the predictions are linked to the optimal entry according to its best localization. As a result, a set of predictions linked to a specific entry of the database is obtained. The other predictions are simply discarded for this specific assessment procedure.
In order to serve as much as possible the interests of the domain expert, a specific assessment procedure is set. This is set to allow to assess the prediction on one side and to help the domain expert to correct the bad localization of the thermal panel installation in his database on the other side. The chosen approach is based on a dictionary of GeoTIFF images on which the prediction are shown and on which additional information are specified to help the domain expert to assess the localization provided by the database.
Illustration of one page of the dictionary corresponding to one database entry - For each entry, such image is provided, showing information on the entry, its localization consistency and the prediction made by the model - Each image is a geo-referenced TIFFThe dictionary is made of a GeoTIFF per prediction that is linked with a unique entry of the database. In addition to the prediction geometry drawn on the image, basic information on the linked database entry is provided. The optimal localization (between post addresses, coordinates or EGID) used to link the prediction and the entry of the database is also indicated to help the domain expert to understand the link. Information about the estimated quality of the localization of the thermal panel installation is also provided.
This quality indicator is based on the consistency of the multiple location information (post address, coordinates and EGID). The more consistent they are, the better the localization is considered. In case of a potential bad localization, the domain expert is invited to check the entry of the database to correct the position.
In parallel, a simple Excel file is set and filled by the domain expert along the procedure. It allows setting the corrected positions, when required, and to indicate if the prediction is correct and correctly linked to the database entry. This process allows setting a win-win strategy where incorrectly located installation are treated on the side of the database and the prediction is assessed for the correct localization.
The procedure for the domain expert consists then only to parse a sequence of images on which all the information are shown and to fill columns in the assessment Excel sheet. This allows to assess quickly and efficiently the prediction while correcting the inconsistency in the database.
"},{"location":"PROJ-TPNL/#results_1","title":"Results","text":"Thanks to the assessment procedure, part of the predictions are certified by the domain expert. This allows to compute scores on the capacity of the initial deep learning model to compute inventory of thermal panel installations. Unfortunately, this assessment procedure does not allow the computation of the formal scores, as the false negative are not considered. This is the main drawback coming from the fact that we work in parallel with the domain expert database.
On the 354 predictions linked to the database, 316 corresponds to correctly localized entries of the database. On the 316 correct entries of the database, the domain expert reported 255 visible installation. This shows that many installations, present in the database through an entry, are not visible in the reality. With these numbers, one can deduce that 61 installations are reported in the database through paper work but cannot be found in the real world. The explanation is probably complex, but this shows how difficult it is to keep a database of installation up to date with the reality.
Without a formal historical analysis, it is not possible to determine what happened to these missing installation. For some of them, one needs to consider the natural cycle of life of such installations. Indeed, thermal panel have a determined lifetime and need to be replaced or decommissioned. It is also possible that for some of them, the construction permit was asked but without leading to the actual construction of a thermal panel installation. This case is expected to be less usual.
Back to the score of the initial deep learning model, on the 255 visible installation, the domain expert determined that 204 are correctly detected by the model. This lead to an inventory capacity of 0.8 which remains in the initial model scores. It is interesting to observe that the initial model scores seem to hold in the case of Neuch\u00e2tel but not in the previous case of Geneva. Indeed, on Geneva, the inventory capacity drop to 0.6.
"},{"location":"PROJ-TPNL/#improvement-of-the-deep-learning-network_1","title":"Improvement of the Deep Learning Network","text":"With the assessment made by the domain expert, despite false negatives are not considered, it was possible to increase the ground truth with new annotation on the test area of Neuch\u00e2tel.
The procedure starts by isolating all prediction that are marked as correct (true positive) by the domain expert. A tiling system is then set to cover the entire test area with size fitting the initial choices. The certified true positive are then manually processed to create a proper annotation, as the prediction are not reliable enough. The certifications made by the domain expert are sufficiently clear for a data scientist to do this task autonomously.
The last stage consist in validating the tiles containing a new annotation. This part is the most complex one, as the data scientist has to work autonomously. The tiles containing a new annotation can only be validated, and enter the ground-truth, if and only if no other ambiguous element appear in the validated tiles. If any ambiguities arise for a tile, it needs to be dropped and not considered for the ground truth. In the case of Neuch\u00e2tel a few tiles are then removed for this reason.
With this procedure, 272 new annotation are added to the ground truth on 254 tiles. These annotations, as for Geneva, are certified by a domain expert, providing a reliable ground truth. With this new set of annotation, and considering the new annotation made in the case of Geneva, it is possible to conduct a new training of the deep learning model. For this last training, the full ground truth is considered, with the FHNW annotations and those coming from the domain experts of Geneva and Neuch\u00e2tel.
The following plot gives an overall simple representation of the obtained results :
Score obtained using all the available ground truth, including FHNW, Geneva and Neuch\u00e2tel - Simple representationOn the test set, an F1-score of 0.82 is obtained, which is slightly worse that for the initial training (0.85). On the overall, one can also sees that the inventory capacity is decreased while the amount of false positive is reduced. Again, one can here sees the instabilities of the results, showing that the used data is not sufficient or not enough well suited for such task.
One can see on this following plots, the simple representation of the score reduced only the Geneva and Neuch\u00e2tel areas :
Score obtained restricted to the Geneva (test set) - Simple representationScore obtained restricted to the Neuch\u00e2tel (test set) - Simple representation
One has to take into account that restricting the score to such area leads to very few prediction, leading to poor statistics. It is nevertheless clear that the results on the Neuch\u00e2tel restriction demonstrate the instabilities observed all along the project. On Neuch\u00e2tel, choosing a different threshold could lead to a better inventory capacity, but the fact that the threshold needs to be adapted according to the situation shows that the model was not able to generalise.
It is most likely that the nature of the objects, its similarity with other objects and the resolution of the images play a central role in the lack of generalisation. As a conclusion, detecting thermal panels needs higher resolution in order for the model to be able to extract more reliable features from the object instead of relying only on the situation of the object.
"},{"location":"PROJ-TPNL/#conclusion_1","title":"Conclusion","text":"In the case of Neuch\u00e2tel, the procedure is more complex, as the database is considered. The work on the database is time-consuming and the linkage of the predictions with the entries of the database is not straightforward, mainly due to the inconsistencies on thermal panel installation localization.
In addition, considering the database lead it to be the main point of view from which the prediction are analyzed, assessed and understood. It offers a very interesting point of view as it allows assessing the synchronization between the database and the current state of the thermal panel installations deployment. Nevertheless, such point of view also introduce drawback, as it does not allow to directly assess the false negative and only part of the false positive. This lead to intermediate scores, that are more focused on the database-reality synchronization than the performances of the deep learning model.
It is then clearly demonstrated that a deep learning model can be interfaced with an existing database to ensure processes continuity with the introduction of new technologies in the territory management. It shows that new methods can be introduced without requiring to abandon the previous processes, which is always complicated and undesired.
On the initial deep learning model assessment, with an inventory capacity of around 0.85 (recall), one can observe a difference between Neuch\u00e2tel and Geneva. Indeed, in Geneva, the recall dropped to around 0.6 while it was more around 0.8 in the Neuch\u00e2tel case. A possible explanation is the similarity between the Aargau (used as to train the initial deep learning model) and Neuch\u00e2tel in terms of geography. The case of Geneva is more urban than these two others. This confirms the instabilities already observed and seems to indicate that thermal panels remains a complex object to detect at this stage considering the available data.
"},{"location":"PROJ-TPNL/#conclusion-and-perspectives","title":"Conclusion and Perspectives","text":"As a main conclusion, this project, performed in two stage with Geneva and Neuch\u00e2tel states, is a complex task. The nature of the object of interest is the main source of difficulty.
The current available aerial images made the detection of such object possible, but the resolution of the images (GSD) makes the task very difficult. Indeed, as mentioned, the thermal panel installations visible on the image are at the limit of resolution. This forces the deep learning model to learn more with the context than with the object features themselves.
To add complexity, thermal panels appear very alike electrical panels on images, leading to a major source of confusion. The fact that the deep learning model is relying more on context than on object features lead the electrical panel to be reported as a thermal one, reducing the efficiency of inventory, leading to large amount of false positive.
Despite that, interesting results are obtained and cannot lead to the conclusion that inventory such object is currently impossible. It remains very challenging, but data science can already bring help in the tracking and surveillance of the thermal panel installations.
The collaboration with the domain experts is here a necessity. Indeed, such installations, especially with the image resolution, are extremely complex to confirm as such (mainly due to the confusion with electrical panels and other roof elements). Even for the domain expert, determining if a prediction is a true positive or not is challenging and time-consuming. Without the help of domain experts, data scientists are not able to tackle such problem.
Another positive outcome is the demonstration that data science can be interfaced smoothly with existing processes. This is shown with the Neuch\u00e2tel case, where the predictions can instantly be linked to the entries of the pre-existing domain expert database. This eases the domain expert assessment procedure and can also participate to assess the synchronization between the database and the reality.
As a final word, the obtained deep learning model is not formally able to enter the management of the territory. It is demonstrated that the nature of the object and the available data makes the model unstable from a situation to another. This shows that the current data available is not formally enough to lead to the production of a fully working prototype able to satisfy the specifications of the domain experts. Nevertheless, such model can already perform pre-processes to ease the work of domain expert in the complex task of tracking the deployment of thermal energy generators on the Swiss territory.
"},{"location":"PROJ-TREEDET/","title":"Tree Detection from Point Clouds over the Canton of Geneva","text":"Alessandro Cerioni (Canton of Geneva) - Flann Chambers (University of Geneva) - Gilles Gay des Combes (CJBG - City of Geneva and University of Geneva) - Adrian Meyer (FHNW) - Roxane Pott (swisstopo)
Proposed by the Canton of Geneva - PROJ-TREEDET May 2021 to March 2022 - Published on April 22, 2022
Abstract: Trees are essential assets, in urban context among others. Since several years, the Canton of Geneva maintains a digital inventory of isolated (or \"urban\") trees. This project aimed at designing a methodology to automatically update Geneva's tree inventory, using high-density LiDAR data and off-the-shelf software. Eventually, only the sub-task of detecting and geolocating trees was explored. Comparisons against ground truth data show that the task can be more or less tricky depending on how sparse or dense trees are. In mixed contexts, we managed to reach an accuracy of around 60%, which unfortunately is not high enough to foresee a fully unsupervised process. Still, as discussed in the concluding section there may be room for improvement.
"},{"location":"PROJ-TREEDET/#1-introduction","title":"1. Introduction","text":""},{"location":"PROJ-TREEDET/#11-context","title":"1.1 Context","text":"Human societies benefits from the presence of trees in cities and their surroundings. More specifically, as far as urban contexts are concerned, trees deliver many ecosystem services such as:
Moreover, they play an important role of support of the biodiversity by offering resources and shelter to numerous animal, plant and fungus species.
The quality and quantity of such benefits depend on various parameters, such as the height, the age, the leaf area, the species diversity within a given population of trees. Therefore, the preservation and the development of a healthy and functional tree population is one of the key elements of those public policies which aim at increasing resilience against climate change.
For these reasons, the Canton of Geneva has set the ambitious goal of increasing its canopy cover (= ratio between the area covered by foliage and the total area) from 21% (as estimated in 2019) to 30% by 2050. In order to reach this goal, the concerned authorities (i.e. the Office cantonal de l\u2019agriculture et de la nature) need detailed data and tools to keep track of the cantonal tree population and drive its development.
The Inventaire Cantonal des Arbres Isol\u00e9s (ICA) is the most extensive and detailed source of data on isolated trees (= trees that do not grow in forests) within the Canton of Geneva. Such dataset is maintained by a joint effort of several public administrations (green spaces departments of various municipalities, the Office cantonal de l\u2019agriculture et de la nature, the Geneva Conservatory and Botanical Garden, etc.). For each tree, several attributes are provided: geographical coordinates, species, height, plantation date, trunk diameter, crown diameter, etc.
To date, the ICA includes data about more than 237\u00a0000 trees. However, it comes with a host of known limitations:
In light of Geneva's ambitions in terms of the canopy growth, the latter observations call for the need of a more efficient methodology to improve the exhaustivity and veracity of the ICA. Over the last few years, several joint projects of the Canton, the City and the University of Geneva explored the potential of using LiDAR point clouds and tailored software to characterize trees in a semi-automatic way, following practices that are already established in forestry. Yet, forest and urban settings are quite different from each other: forests exhibit higher tree density, which can hinder tree detection; forests exhibit lower heterogeneity in terms of species and morphology, which can facilitate tree detection. Hence, the task of automatic detection is likely to be harder in urban contexts than in forests.
The study reported in this page, proposed by the Office cantonal de l\u2019agriculture et de la nature (OCAN) and carried out by the STDL, represents a further yet modest step ahead towards the semi-automatic digitalisation of urban trees.
"},{"location":"PROJ-TREEDET/#12-objectives","title":"1.2 Objectives","text":"The objectives of this project was fixed by the OCAN domain experts and, in one sentence, amount to designing a robust and reproducible semi-automatic methodology allowing one to \"know everything\" about each and every isolated tree of the Canton of Geneva, which means:
Regarding quality, the following requirements were fixed:
Property Expected precision Trunk geolocation 1 m Top geolocation 1 m Height 2 m Trunk diameter at 1m height 10 cm Crown diameter 1 m Canopy area 1 m\u00b2 Canopy volume 1 m\u00b3
In spite of such thorough and ambitious objectives, the time span of this project was not long enough to address them all. As a matter of fact, the STDL team only managed to tackle the tree detection and trunk geolocation.
"},{"location":"PROJ-TREEDET/#13-methodology","title":"1.3 Methodology","text":"As shown in Figure 1.1 here below, algorithms and software exist, which can detect individual trees from point clouds.
Figure 1.1: The two panels represent a sample of a point cloud before (top panel) and after (bottom) tree detection.
Not only such tools take point cloud as input data, but also the values of a bunch of parameters have to be chosen by users. The quality of results depend both on input data and on input parameters. The application of some pre-processing to the input point cloud have an impact, too. Therefore, it becomes clear that in order to find the optimal configuration for a given context, one should be able to measure the quality of results as a function of the chosen parameters as well as of the pre-processing operations. To this end, the STDL team called for the acquisition of ground truth data. Further details about input data (point cloud and ground truth), software and methodology will be provided shortly.
"},{"location":"PROJ-TREEDET/#14-input-data","title":"1.4 Input data","text":""},{"location":"PROJ-TREEDET/#141-lidar-data","title":"1.4.1 LiDAR data","text":"A high-density point cloud dataset was produced by the Flotron Ingenieure company, through Airborne Laser Scanning (ALS, also commonly known by the acronym LiDAR - Light Detection And Ranging). Thanks to a lateral overlap of flight lines of ~80%, more than 200 pts/m\u00b2 were collected, quite a high density when compared to more conventional acquisitions (30 \u2013 40 pts/m\u00b2). Flotron Ingenieure took care of the point cloud classification, too.
The following table summarizes the main features of the dataset:
LIDAR 2021 - OCAN, Flotron Ingenieure Coverage Municipalities of Ch\u00eane-Bourg and Th\u00f4nex (GE) Date of acquisition March 10, 2021 Density > 200 pts/m\u00b2 Planimetric precision 20 mm Altimetric precision 50 mm Tiles 200 tiles of 200 m x 200 m Format LAS 1.2 Classes 0 - Unclassified 2 - Ground 4 - Medium vegetation (0.5 - 3m) 5 - High vegetation (> 3m) 6 - Building 7 - Low points 10 - Error points 13 - Bridges 16 - Noise / Vegetation < 0.5m
Figs.\u00a01.2 and 1.3 represent the coverage of the dataset and a sample, respectively.
Figure 1.2: Coverage and tiling of the 2021 high-density point cloud dataset.
Figure 1.3: A sample of the 2021 high-density point cloud. Colors correspond to different classes: green = vegetation (classes 4 and 5), orange = buildings (class 6), grey = ground or unclassified points (class 2 and 0, respectively).
"},{"location":"PROJ-TREEDET/#142-test-sectors-and-ground-truth-data","title":"1.4.2 Test sectors and ground truth data","text":"In order to be able to assess the exhaustivity and quality of our results, we needed reference (or \"ground truth\") data to compare with. Following the advice of domain experts, it was decided to acquire ground truth data regarding trees within three test sectors, which represent three different types of contexts: [1] alignment of trees, [2] park, [3] a mix of [1] and [2]. Of course, these types can also be found elsewhere within the Canton of Geneva.
Ground truth data was acquired through surveys conducted by geometers, who recorded
for every tree having a trunk diameter larger than 10 cm.
Details about the three test sectors are provided in the following, where statistics on species, height, age and crown diameter stem from the ICA.
"},{"location":"PROJ-TREEDET/#avenue-de-bel-air-chene-bourg-ge","title":"Avenue de Bel-Air (Ch\u00eane-Bourg, GE)","text":"Property Value Type Alignment of trees Trees 135 individuals Species monospecific (Tilia tomentosa) Height range 6 - 15 m Age range 17 - 28 yo Crown diameters 3 - 10 m Comments Well separated trees, heights and morphologies are relatively homogenous, no underlying vegetation (bushes) around the trunks.
Figure 1.4: \"Avenue de Bel-Air\" test sector in Ch\u00eane-Bourg (GE). Orange dots represents ground truth trees as recorded by geometers.
"},{"location":"PROJ-TREEDET/#parc-floraire-chene-bourg-ge","title":"Parc Floraire (Ch\u00eane-Bourg, GE)","text":"Property Value Type Park with ornemental trees Trees 95 individuals Species 65 species Height range 1.5 - 28 m Age range Unknown Crown diameters 1 - 23 m Comments Many ornemental species of all sizes and shapes, most of them not well separated. Very heterogenous vegetation structure.
Figure 1.5: \"Parc Floraire\" test sector in Ch\u00eane-Bourg (GE). Orange dots represents ground truth trees as recorded by geometers.
"},{"location":"PROJ-TREEDET/#adrien-jeandin-thonex-ge","title":"Adrien-Jeandin (Th\u00f4nex, GE)","text":"Property Value Type Mixed (park, alignment of tree, tree hedges, etc.) Trees 362 individuals Species 43 species Height range 1 - 34 m Age range Unknown Crown diameters 1 - 21 m Comments Mix of different vegetation structures, such as homogenous tree alignments, dense tree hedges and park with a lot of underlying vegetation under big trees.Figure 1.6: \"Adrien-Jeandin\" test sector in Th\u00f4nex (GE). Orange dots represents ground truth trees as recorded by the geometers.
"},{"location":"PROJ-TREEDET/#15-off-the-shelf-software","title":"1.5 Off-the-shelf software","text":"Two off-the-shelf software products were used to detect trees from LiDAR data, namely TerraScan and the Digital Forestry Toolbox (DFT). The following table summarizes the main similarities and differences between the two:
Feature Terrascan DFT Licence Proprietary (*) Open Source (GPL-3.0) Price See here Free Standalone No: requires MicroStation or Spatix No: requires Octave or MATLAB Graphical User Interface Yes No In-app point cloud visualization Yes (via MicroStation or Spatix) No (**) Scriptable Partly (via macros) Yes Hackable No Yes
(*) Unfortunately, we must acknowledge that using network licenses turned out to be quite problematic. Weeks of unexpected downtime were experienced, due to puzzling issues related to the interplay between the self-hosted license server, firewalls, VPN and end-devices. (**) We used the excellent Potree Free and Open Source software for visualization.
The following sections are devoted to brief descriptions of these two tools; further details will be provided in Section 4 and Section 5.
"},{"location":"PROJ-TREEDET/#151-terrascan","title":"1.5.1 Terrascan","text":"Terrascan is a proprietary software, developed and commercialized by Terrasolid, a MicroStation and Spatix plugin which is capable of performing several tasks on point clouds, including visualisation, classification. As far as tree detection is concerned, Terrascan offers multiple options to
Two methods are provided to group (one may also say \"to segment\") points into individual trees:
For further details on these two methods, we refer the reader to the official documentation.
"},{"location":"PROJ-TREEDET/#152-digital-forestry-toolbox-dft","title":"1.5.2 Digital Forestry Toolbox (DFT)","text":"The Digital Forestry Toolbox (DFT) is a
collection of tools and tutorials for Matlab/Octave designed to help process and analyze remote sensing data related to forests (source: official website)
developed and maintained by Matthew Parkan, released under an Open Source license (GPL-3.0).
The DFT implements algorithms allowing one to perform
We refer the reader to the official documentation for further information.
"},{"location":"PROJ-TREEDET/#2-method","title":"2. Method","text":"As already stated, in spite of the thorough and ambitious objectives of this project (cf. here), only the
sub-tasks could be tackled given the resources (time, humans) which were allocated to the STDL.
The method we followed goes through several steps,
which are documented here-below.
"},{"location":"PROJ-TREEDET/#21-pre-processing-point-cloud-reclassification-and-cleaning","title":"2.1 Pre-processing: point cloud reclassification and cleaning","text":"[1] In some cases, points corresponding to trunks may be misclassified and lay in class 0 \u2013 Unclassified instead of class 4 \u2013 Medium vegetation. As the segmentation process only takes vegetation classes (namely classes 4 and 5) into account, the lack of trunk points can make some trees \"invisibles\".
[2] We suspected that the standard classification of vegetation in LiDAR point clouds could be too basic for the task at hand. Indeed, vegetation points found at less (more) than 3 m above the ground are classified as 4 \u2013 Medium Vegetation (5 \u2013 High Vegetation). This may cause one potential issue: all the points of a given tree that are located at up to 3 meters above the ground (think about the trunk!) belong to a class (namely class no.\u00a04) which can also be populated by bushes and hedges. The \"contamination\" by bushes and hedges may spoil the segmentation process, especially in situations where dense low vegetation exists around higher trees. Indeed, it was acknowledged that in such situations the segmentation algorithm fails to properly identify trunk locations and distinguish one tree from another.
Issues [1] and [2] can be solved or at least mitigated by reclassifying and cleaning the input point cloud, respectively. Figures\u00a02.1 and 2.2 show how tree grouping (or \"segmentation\") yields better results if pre-processed pointclouds are used.
Figure 2.1: Tree grouping (or \"segmentation\") applied to the original (top panel) vs pre-processed (bottom) point cloud. Without pre-processing, two trees connected by a hedge are segmented as one single individual. Therefore, only one detection is made (green circle slightly above the ground). With pre-processing, we get rid of the hedge and recover the lowest trunk points belonging to the tree on the left. Eventually, both trees are properly segmented and we end up having two detections (green circles).
Figure 2.2: Tree grouping (or \"segmentation\") applied to the original (left panel) vs reclassified (right) point cloud. Without pre-processing, segmentation yields a spurious detection (= false positive, red circle slightly above the ground), resulting from the combination of a pole and a hedge. With pre-processing, we get rid of most of the points belonging to the hedge and the pole; no false positive shows up.
"},{"location":"PROJ-TREEDET/#211-reclassification-with-terrascan-and-fme-desktop","title":"2.1.1 Reclassification with Terrascan and FME Desktop","text":"The reclassification step aims at recovering trunk points which might be misclassified and hence found in some class other than class 4 \u2013 Medium Vegetation (e.g. class 0 - Unclassified). It was carried out with Terrascan using the Classify by normal vectors tool, which
Finally, during the cleaning process with FME Desktop (cf.\u00a0Chapter 2.1.2 here below), these points are reclassified in class 4.
The outcome of this reclassification step is shown in Figure\u00a02.3.
Figure 2.3: Outcome of reclassification. In the upper picture, the trunk of the tree on the left is partially misclassified, while the trunk of the tree in the middle is completely misclassified. After reclassification, almost all the points belonging to trunks are back in class 4.
Let us note that the reclassification process may also recover some unwanted objects enjoying linear features similar to trees (poles, power lines, etc.). However, such spurious objects can at least partly filtered out by cleaning step described here below.
"},{"location":"PROJ-TREEDET/#212-cleaning-point-clouds-with-fme-desktop","title":"2.1.2 Cleaning point clouds with FME Desktop","text":"The cleaning step aims to filter as many \"non-trunk\" points as possible out of class 4 \u2013 Medium Vegetation, in order to isolate trees from other types of vegetation. Vegetation is considered as part of a tree if higher than 3 m.
Cleaning consists in two steps:
Note that in case the point cloud is reclassified in order to recover missing trunks, the cleaning step also allow to get rid of unwanted linear objects (poles, electric lines, etc) that have been recovered during the reclassification. The class containing reclassified points (class 10) will simply be process together with class 4 and receive the same treatment. Eventually, reclassified points that are kept (discarded) by the cleaning process will be integrated in class 4 (3).
Figure 2.4: Outcome of the cleaning process. Red points correspond to the \"cleaned\" points that were moved to class 3.
Figure 2.5: Outcome of the cleaning process. Red points correspond to the \"cleaned\" points that were moved to class 3. Hedges under trees escape the cleaning.
"},{"location":"PROJ-TREEDET/#213-fme-files-and-documentation-of-pre-processing-steps","title":"2.1.3 FME files and documentation of pre-processing steps","text":"More detailed information about the reclassification and cleaning of the point cloud can be found here.
FME files can be downloaded by following these links:
Further information on the generation of a Canopy Cover Layer can be found here.
"},{"location":"PROJ-TREEDET/#22-running-terrascan","title":"2.2 Running Terrascan","text":"Terrascan offers multiple ways to detect trees from point clouds. In this project, we focused on the fully automatic segmentation, which is available through the \"Assign Groups\" command.
As already said (cf.\u00a0here), two methods are available: highest point (aka \"watershed\") method and trunk method. In what follows, we introduce the reader to the various parameters that are involved in such methods.
"},{"location":"PROJ-TREEDET/#221-watershed-method-parameters","title":"2.2.1 Watershed method parameters","text":""},{"location":"PROJ-TREEDET/#group-planar-surfaces","title":"Group planar surfaces","text":"Quoting the official documentation,
If on, points that fit to planes are grouped. Points fitting to the same plane get the same group number.
"},{"location":"PROJ-TREEDET/#min-height","title":"Min height","text":"This parameter defines a minimum threshold on the distance from the ground that the highest of a group of points must have, in order for the group to be considered as a tree. The default value is 4 meters. The Inventaire Cantonal des Arbres Isol\u00e9s includes trees which are at least 3 m high. This parameter ranged from 2 to 6 m in our tests.
Figure 2.6: Cross-section view of two detected trees. The left tree would not be detected if the parameter \"Min height\" were larger than 3.5 m.
"},{"location":"PROJ-TREEDET/#require","title":"Require","text":"This parameter defines the minimum number of points which are required to form a group (i.e. a tree). The default value is 20 points, which is very low in light of the high density of the dataset we used. Probably, the default value is meant to be used with point clouds having a one order of magnitude smaller density.
In our analysis, we tested the following values: 20 (default), 50, 200, 1000, 2000, 4000, 6000.
"},{"location":"PROJ-TREEDET/#222-trunk-method-parameters","title":"2.2.2 Trunk method parameters","text":""},{"location":"PROJ-TREEDET/#group-planar-surfaces_1","title":"Group planar surfaces","text":"See here.
"},{"location":"PROJ-TREEDET/#min-height_1","title":"Min Height","text":"Same role as in the watershed method, see here.
"},{"location":"PROJ-TREEDET/#max-diameter","title":"Max diameter","text":"This parameter defines the maximum diameter (in meters) which a group of points identified as trunk can reach. Default value is 0.6 meters. Knowing that
we used the following values: 0.20, 0.30, 0.40, 0.60 (default), 0.80, 1.00, 1.50 meters.
"},{"location":"PROJ-TREEDET/#min-trunk","title":"Min trunk","text":"This parameter defines a minimum threshold on the length of tree trunks. Default value is 2 m. We tested the following values: 0.50, 1.00, 1.50, 2.00 (default), 2.50, 3.00, 4.00, 5.00 meters.
"},{"location":"PROJ-TREEDET/#group-by-density","title":"Group by density","text":"Quoting the official documentation,
If on, points are grouped based on their distance to each other. Close-by points get the same group number.
"},{"location":"PROJ-TREEDET/#gap","title":"Gap","text":"Quoting the official documentation,
Distance between consecutive groups:
Automatic: the software decides what points belong to one group or to another. This is recommended for objects with variable gaps, such as moving objects on a road.
User fixed: the user can define a fixed distance value in the text field. This is suited for fixed objects with large distances in between, such as powerline towers.
We did not attempt the optimization of this parameter but kept the default value (Auto).
"},{"location":"PROJ-TREEDET/#223-visualizing-results","title":"2.2.3 Visualizing results","text":"Terrascan allows the user to visualize the outcome of the tree segmentation straight from within the Graphical User Interface. Points belonging to the same group (i.e. to the same tree) are assigned the same random color, which allows the user to perform intuitive, quick, qualitative in-app assessments. An example is provided in Figure 2.7.
Figure 2.7: Three examples of tree segmentations. From a qualitative point of view, we can acknowledge that the leftmost (rightmost) example is affected by undersegmentation (oversegmentation). The example in the middle seems to be a good compromise.
"},{"location":"PROJ-TREEDET/#224-exporting-results","title":"2.2.4 Exporting results","text":"As already said, Terrascan takes point clouds as input data and can run algorithms which form group out of these points, each group corresponding to an individual tree. A host of \"features\" (or \"measurements\"/ \"attributes\"/...) are generated for each group, which the user can export to text files using the \"Write group info\" command. The set of exported features can be customized through a dedicated configuration panel which can be found within the software settings (\"File formats / User group formats\").
The list and documentation of all the exportable features can be found here. Let us note that
The following table summarizes the features which the watershed and trunk methods can export:
Feature Watershed Method Trunk Method Group ID Yes Yes Point Count Yes Yes Average XY Coordinates Yes Yes Ground Z at Avg. XY Yes Yes Trunk XY No Yes Ground Z at Trunk XY No Yes Trunk Diameter See here below See here below Canopy Width Yes Yes Biggest Distance above Ground (Max. Height) Yes Yes Smallest Distance above Ground Yes Yes Length Yes Yes Width Yes Yes Height Yes Yes
"},{"location":"PROJ-TREEDET/#225-trunk-diameters","title":"2.2.5 Trunk Diameters","text":"Terrascan integrates a functionality allowing users to measure trunk diameters (see Figure 2.8).
Figure 2.8: Screenshots of the trunk diameter measurement function.
Let us note that the measurement of trunk diameters can be feasible or not, depending on the number of points which sample a given trunk.
We performed some rapid experiments, which showed that some diameters could actually be estimated, given the high density of the point cloud we used (cf. here). Still, we did not analyzed the reliability of such estimations against reference/ground truth data.
"},{"location":"PROJ-TREEDET/#23-running-dft","title":"2.3 Running DFT","text":"As already said, DFT consists of a collection of functions which can be run either with Octave or MATLAB. The former software was used in the frame of this context. A few custom Octave scripts were written to automatize the exploration of the parameter space.
Our preliminary, warm-up tests showed that we could not obtain satisfactory results by using the \"tree top detection method\" (cf. here). Indeed, upon using this method the F1-score topped at around 40%. Therefore, we devoted our efforts to exploring the parameter space of the other available method, namely the \"tree stem detection method\" (cf.\u00a0this tutorial). In the following, we provide a brief description of the various parameters involved in such a detection method.
"},{"location":"PROJ-TREEDET/#232-parameters-concerned-by-the-tree-stem-detection-method","title":"2.3.2 Parameters concerned by the tree stem detection method","text":"Quoting the official tutorial,
The stem detection algorithm uses the planimetric coordinates and height of the points above ground as an input.
To compute the height, DFT provides a function called elevationModels
, which takes the classified 3D point cloud as input, as well as some parameters. Regarding these parameters, we stuck to the values suggested by the official tutorial, except for
cellSize
parameter (=\u00a0size of the raster cells) which was set to 0.8 (meters);searchRadius
parameter which was set to 10 (meters).Once that each point is assigned an height above the ground, the actual tree stem detection algorithm can be invoked (treeStems
DFT function, cf.\u00a0DFT Tree Stem Detection Tutorial / Step 4 - Detect the stems), which takes a host of parameters. While referring the reader to the official tutorial for the definition of these parameters, we provide the list of values we used (unit\u00a0=\u00a0meters):
Parameter Value cellSize
0.9 bandWidth
0.7 verticalStep
0.15 searchRadius
from 1 to 6, step = 0.5 minLength
from 1 to 6, step = 0.5
searchRadius
(minLength
) was fixed to 4 (meters) when minLength
(searchRadius
) was let vary between 1 and 6 meters.
DFT does not include any specific Graphical User Interface. Still, users can rely on Octave/MATLAB to generate plots, something useful and clever especially when performing analysis in an interactive way. In our case, DFT was used in a non-interactive way and visualisation was delayed until the assessment step, which we describe in Section\u00a02.4.
"},{"location":"PROJ-TREEDET/#234-exporting-results","title":"2.3.4 Exporting results","text":"Thanks to the vast Octave/MATLAB ecosystem, DFT results can be output to disk in several ways and using data formats. More specifically, we used the ESRI Shapefile file format to export the average (x, y) coordinates of the detected stems/peaks.
"},{"location":"PROJ-TREEDET/#235-trunk-diameters","title":"2.3.5 Trunk diameters","text":"This feature is missing in DFT.
"},{"location":"PROJ-TREEDET/#24-post-processing-assessment-algorithm-and-metrics-computation","title":"2.4 Post-processing: assessment algorithm and metrics computation","text":"As already said, the STDL used a couple of third-party tools, namely TerraScan and the Digital Forestry Toolbox (DFT), in order to detect trees from point clouds. Both tools can output
one (X, Y, Z) triplet per detected tree, where the X, Y and Z (optional) coordinates are
computed either as the centroid of all the points which get associated to a given tree, or - under some conditions - as the centroid of the trunk only;
As the ground truth data the STDL was provided with take the form of one (X', Y') pair per tree, with Z' implicitly equal to 1 meter above the ground, the comparison between detections and ground truth trees could only be performed on the common ground of 2D space. In other words, we could not assess the 3D point clouds segmentations obtained by either TerraScan or DFT against reference/ground truth segmentations in the 3D space.
The problem which needed to be solved amounts to finding matching and unmatching items between two sets of 2D points:
In order to fulfill the requirement of a 1 meter accuracy which was set by the beneficiaries of this project, the following matching rule was adopted:
a detection (D) matches a ground truth tree (GT) (and vice versa) if and only if the Cartesian distance between D and GT is less or equal to 1 meter
Figure 2.9 shows how such a rule would allow one to tag
in the most trivial case.
Figure 2.9: Tagging as True Positive (TP), False Positive (FP), False Negative (FN) ground truth and detected trees in the most trivial case.
Actually, far less trivial cases can arise, such as the one illustrated in Figure 2.10.
Figure 2.10: Only one detection can exist for two candidate ground truth trees, or else two detections can exist for only one candidate ground truth tree.
The STDL designed and implemented an algorithm, which would produce relevant TP, FP, FN tags and counts even in such more complex cases. For instance, in a setting like the one in the image here above, one would expect the algorithm to count 2 TPs, 1 FP, 1 FN.
Details are provided here below.
"},{"location":"PROJ-TREEDET/#241-the-tagging-and-counting-algorithm","title":"2.4.1 The tagging and counting algorithm","text":""},{"location":"PROJ-TREEDET/#1st-step-geohash-detections-and-ground-truth-trees","title":"1st step: geohash detections and ground truth trees","text":"In order to keep track of the various detections and ground truth trees all along the execution of the assessment algorithm, each item is given a unique identifier, computed as the geohash of its coordinates, using the pygeohash
Python module. Such identifier is not only unique (as far as a sufficiently high precision is used), but also stable across subsequent executions. The latter property allows analysts to \"synchronise\" the concerned objects between the output of the (Python) code and the views generated with GIS tools such as QGIS, which turns out to be quite useful especially at development and debugging time.
As a 2nd step, each detection is converted to a circle,
This operation can be accomplished by generating a 1 m buffer around each detection. For the sake of precision, this method was used, which generates a polygonal surface approximating the intended circle.
"},{"location":"PROJ-TREEDET/#3rd-step-perform-left-and-right-outer-spatial-joins","title":"3rd step: perform left and right outer spatial joins","text":"As a 3rd step, the following two spatial joins are computed:
left outer join between the circles generated at the previous step and ground truth trees;
right outer join between the same two operands.
In both cases, the \"intersects\" operation is used (cf.\u00a0this page for more technical details).
"},{"location":"PROJ-TREEDET/#4th-step-tag-trivial-false-positives-and-false-negatives","title":"4th step: tag trivial False Positives and False Negatives","text":"All those detections output by the left outer join for which no right attribute exists (in particular, we focus on the right geohash) can trivially be tagged as FPs. As a matter of fact, this means that the 1 m circular buffer surrounding the detection does not intersect any ground truth tree; in other words, that no ground truth tree can be found within 1 m from the detection. The same reasoning leads to trivially tagging as FNs all those ground truth trees output by the right outer join for which no left attribute exists. These cases correspond to the two rightmost items in Fig.\u00a06.1.
For reasons which will be clarified here below, the algorithm does not actually tag items as either FPs or FNs; instead,
Here's how:
TP charge FP charge 0 1
TP charge FN charge 0 1
"},{"location":"PROJ-TREEDET/#5th-step-tag-non-trivial-false-positives-and-false-negatives","title":"5th step: tag non-trivial False Positives and False Negatives","text":"The left outer spatial join performed at step 3 establishes relations between each detection and those ground truth trees which are located no further than 1 meter, as shown in Figure 2.11.
Figure 2.11: The spatial join between buffered detections and ground truth trees establishes relations between groups of items of these two populations. In the sample setting depicted in this picture, two unrelated groups can be found.
The example here above shows 4 relations,
which can be split (see the red dashed line) into two unrelated, independent groups:
In order to generate this kind of groups in a programmatic way, the algorithm first builds a graph out of the relations established by the left outer spatial join, then it extracts the connected components of such a graph (cf.\u00a0this page).
The tagging and counting of TPs, FPs, FNs is performed on a per-group basis, according to the following strategy:
if a group contains more ground truth than detected trees, then the group is assigned an excess \"FN charge\", equal to the difference between the number of ground truth trees and detected trees. This excess charge is then divided by the number of ground truth trees and the result assigned to each of them. For instance, the {D1 - GT1, D1 - GT2} group in the image here above would be assigned an FN charge equal to 1; then, each ground truth tree would be assigned an FN charge equal to 1/2.
Similarly, if a group contains more detected trees than ground truth trees, then the group is assigned an excess FP charge, equal to the difference between the number of detected trees and ground truth trees. This excess charge is then divided by the number of detections and the result assigned to each of them. For instance, the {D2 - GT3, D3 - GT3} group in the image here above would be assigned an excess FN charge equal to 1; then, each detection would be assigned an FP charge equal to 1/2.
In case the number of ground truth trees be the same as the number of detections, no excess FN/FP charge is assigned to the group.
Concerning the assignment of TP charges, the per-group budget is established as the minimum between the number of ground truth and detected trees, then equally split between the items of these two populations. In the example above, both groups would be assigned TP charge = 1.
Wrapping things up, here are the charges which the algorithm would assign to the various items of the example here above:
item TP charge FP charge Total charge D1 1 0 1 D2 1/2 1/2 1 D3 1/2 1/2 1 Sum 2 1 3
item TP charge FN charge Total charge GT1 1/2 1/2 1 GT2 1/2 1/2 1 GT3 1 0 1 Sum 2 1 3
Let us note that:
TP, FP, FN counts are extensive properties, out of which we can compute some standard metrics such as
which are intensive, instead. While referring the reader to this paragraph for the definition of these metrics, let us state the interpretation which holds in the present use case:
Typically, one cannot optimize both precision and recall for the same values of a set of parameters. Instead, they can exhibit opposite trends as a function of a given parameter (e.g. precision increases while recall decreases). In such cases, the F1-score would exhibit convexity and could be optimized.
"},{"location":"PROJ-TREEDET/#3-results-and-discussion","title":"3. Results and discussion","text":"Figure 3.1 shows some of the tree detection trials we performed, using Terrascan and DFT. Each trial corresponds to a different set of parameters and is represented either by gray dots or colored diamonds in a precision-recall plot (see the image caption for further details).
Figure 3.1: Precision vs. Recall of a subset of the tree detections we attempted, using different parameters in Terrascan and DFT. Colored diamonds represent the starting point (red) as well as our \"last stops\" in the parameter space, with (yellow, green) and without (orange) pre-processing. All the three test sectors are here combined.
Let us note that:
More detailed comments follow, concerning the best trials made with Terrascan and DFT.
"},{"location":"PROJ-TREEDET/#31-the-best-trial-made-with-terrascan","title":"3.1 The best trial made with Terrascan","text":"Among the trials we ran with Terrascan, the one which yielded the best F1-score was obtained using the following parameters:
Parameter Value Method / Algorithm Trunk Classes 4+5, cleaned and reclassified Group planar surfaces Off Min height 3.00 m Max diameter 0.40 m Min trunk 3.00 m Group by density On Gap Auto Require 1500 pts
This trial corresponds to the green diamond shown in Figure 3.1.
Figure 3.2: Test sectors as segmented by the best trial made with Terrascan.
Figure 3.2 provides a view of the outcome on the three test sectors. Metrics read as follows:
Sector TP FP FN Detectable (TP+FN) Precision Recall F1-Score Global 323 137 234 557 70.2% 58.0% 63.5% Adrien-Jeandin 177 69 160 337 72.0% 52.5% 60.7% Bel-Air 114 15 11 125 88.4% 91.2% 89.8% Floraire 32 53 63 89 37.6% 33.7% 35.6%
Figure 3.3 provides a graphical representation of the same findings, with the addition of the metrics we computed before cleaning and reclassifying the LiDAR point cloud.
Figure 3.3: Cleaning and reclassifying the point cloud has a positive influence on precision and recall, although modest.
Our results confirm that the tree detection task is more or less hard depending on the sector at hand. Without any surprise, we acknowledge that:
Cleaning and Reclassification have a benificial impact on Precision and Recall for all sectors as well as the global context (TOT). While for BEL mainly Recall profited from preprocessing, ADR and FLO showed a stronger increase in Precision. For the global context both, Precision and Recall, could be increased slighty.
Figure 3.4: The F1-score attained by our best Terrascan trial.
Figure 3.4 shows how our best Terrascan trial performed in terms of F1-score: globally, on a per-sector basis; with and without pre-processing.
We can notice that pre-processing slightly improves the F1-score for the global context as well as for the individual sectors. The largest impact was observed for the Bel-Air sector, especially for preprocessing including Reclassification.
"},{"location":"PROJ-TREEDET/#32-the-best-trial-made-with-dft","title":"3.2 The best trial made with DFT","text":"The DFT trial yielding the highest global F1-score was obtained using the stem detection method and the following parameters:
Parameter Value Method / Algorithm Stem detection Classes 4+5, cleaned and reclassified Search radius 4.00 Minimum length 4.00
Here's a summary of the resulting metrics:
Sector Precision Recall F1-score Adrien-Jeandin 75.4% 36.5% 49.2% Bel-Air 88.0% 82.4% 85.1% Floraire 47.9% 36.8% 41.7% Global 74.0% 46.6% 57.2%
Similar comments to those formulated here apply: the \"Avenue de Bel-Air\" sector remains the easiest to process; \"Parc Floraire\" the hardest. However, here we acknowledge a bigger gap between the global F1-score and the F1-score related to the \"Adrien-Jeandin\" test sector.
Figure 3.5 shows how our best DFT trial performed in terms of F1-score: globally, on a per-sector basis; with and without pre-processing. We can notice that the impact of point cloud reclassification can be slightly positive or negative depending on the test sector.
Figure 3.5: The F1-score attained by our best DFT trial.
"},{"location":"PROJ-TREEDET/#33-comparison-terrascan-vs-dft","title":"3.3 Comparison: Terrascan vs. DFT","text":"Figure 3.6: Comparison of Terrascan and DFT in terms of F1-score.
The comparison of the best Terrascan trial vs. the best DFT trial in terms of F1-score shows that there is no clear winner (see Figure 3.6). Still, we can notice that:
In addition to applying our method to the 2021 high-density (HD) LiDAR dataset, we also tried using two other datasets exhibiting a by far more standard point density (20-30 pt/m\u00b2):
The goal was twofold:
Concerning the 1st point, lower point densities make the \"trunk method\" unreliable (if not completely unusable). In Figure 3.7, we report results obtained with the watershed method, along with results related to the best performing trials obtained with the 2021 HD dataset. The scores we obtained with the SD dataset are far below the best we obtained with the HD dataset, confirming the interest of high-density acquisitions.
Figure 3.7: Comparison of F1-scores of the best performing trials. Parameters were optimized for each model individually.
Concerning the 2nd point, without any surprise we confirmed that parameters must be re-optimized for SD datasets. The usage of the set of parameters which were optimized on the basis of the HD dataset yielded poor results, as shown in Figure 3.8.
Figure 3.8: Using the parameters which were optimized for the high-density dataset leads to poor results (strong under-segmentation) on SD datasets. In accordance with the TS documentation we can see that the trunk method is unusable for lower and medium density datasets.
The watershed algorithm produces a more realistic segmentation pattern on the SD dataset but still cannot reach the performance levels of the trunk or the watershed method on the HD dataset. After optimizing parameters, we could obtain quite decent results though (see Figure 3.9).
Figure 3.9: After a dataset-specific parameter optimization, convincing results can be achieved on the medium-density 2019 dataset (Terrascan's watershed method was used).
"},{"location":"PROJ-TREEDET/#35-tree-detection-over-the-full-2021-high-density-lidar-dataset","title":"3.5 Tree detection over the full 2021 high-density LiDAR dataset","text":"Clearly, from a computational point of view processing large point cloud dataset is not the same as processing small datasets. Given the extremely high density of the 2021 LiDAR datasets, we wanted to check whether and how Terrascan could handle such a resource-intensive task. Thanks to Terrascan's macro actions, one can split the task into a set of smaller sub-tasks, each sub-task dealing with a \"tile\" of the full dataset. Additionally, Terrascan integrates quite a smart feature, which automatically merges groups of points (i.e. trees) spanning multiple tiles.
Figure 3.10 provides a static view of the results we obtained, using the parameters which globally performed the best on the three sectors. We refer the reader to this Potree viewer (kindly hosted by the G\u00e9oportail du SITN) for an interactive view.
Figure 3.10: Result of the application of the best performing Terrascan parameters to the full dataset.
"},{"location":"PROJ-TREEDET/#4-conclusion-and-outlook","title":"4. Conclusion and outlook","text":"Despite all the efforts documented here above, the results we obtained are not as satisfactory as expected. Indeed, the metrics we managed to attain all sectors combined indicate that tree detections are neither reliable (low precision) nor exhaustive (low recall). Still, we think that results may be improved by further developing some ideas, which we sketch in the following.
"},{"location":"PROJ-TREEDET/#41-further-the-dft-parameter-space-exploration","title":"4.1 Further the DFT parameter space exploration","text":"We devoted much more time to exploring Terrascan's parameter space than DFT's. Indeed, as already stated here, we only explored the two parameters searchRadius
and minLenght
. Other parameters such as cellSize
, bandwidth
and verticalStep
were not explored at all (we kept default values). We think it is definitely worth exploring these other parameters, too.
Moreover,
We showed that the algorithms implemented by TerraScan and DFT yield much better results in sparse contexts (ex.: the \"Avenue de Bel-Air\" test sector) than in dense ones (ex.: the \"Parc Floraire\" test sector). This means that precision may be improved (at the expense of recall, though) if one could restrain the tree detection to sparse contexts only, either as a pre- or post-processing step. We can think of at least a couple of methods which would allow one to (semi-)automatically tell sparse from dense contexts:
intrinsic method: after segmenting the point cloud into individual trees, one could analyze how close (far) each individual is to (from) the nearest neighbor and estimate the density of trees on some 2D or 3D grid;
extrinsic method: territorial data exist (see for instance the dataset \u00a0\"Carte de couverture du sol selon classification OTEMO\" distributed by the SITG), providing information about urban planning and land use (e.g.\u00a0roads, parks, sidewalks, etc.). These data may be analyzed in order to extract hints on how likely it is for a tree to be in close proximity with another, according to its position.
Detections coming from two or more independent trials (obtained with different software or else with the same software but different parameters) could be combined in order to improve either precision or recall:
recall would be improved (i.e.\u00a0the number of false negatives would be reduced) if detections coming from multiple trials were merged. In order to prevent double counting, two or more detections coming from two or more sources could be counted as just one if they were found within a given distance from each other. The algorithm would follow along similar lines as the ones which led us to the \"tagging and counting algorithm\" presented here above;
precision would be improved (i.e.\u00a0the number of false positives would be reduced) if we considered only those detections for which a consensus could be established among two or more trials, and discarded the rest. A distance-based criterion could be used to establish such consensus, along similar lines as those leading to our \"tagging and counting algorithm\".
Generic (i.e. not tailored for tree detection) clustering algorithms exist, such as DBSCAN (\"Density-Based Spatial Clustering of Applications with Noise\", see e.g. here), which could be used to segment a LiDAR point cloud into individual trees. We think it would be worth giving these algorithms a try!
"},{"location":"PROJ-TREEDET/#45-use-machine-learning","title":"4.5 Use Machine Learning","text":"The segmentation algorithms we used in this project do not rely on Machine Learning. Yet, alternative/complementary approaches might me investigated, in which a point cloud segmentation model would be first trained on reference data, then used to infer tree segmentations within a given area of interest. For instance, it would be tempting to test this Deep Learning model published by ESRI and usable with their ArcGIS Pro software. It would be also worth deep diving into this research paper and try replicating the proposed methodology. Regarding training data, we could generate a ground truth dataset by
The work documented here was the object of a Forum SITG which took place online on March 29, 2022. Videos and presentation materials can be found here.
"},{"location":"PROJ-TREEDET/#6-acknowledgements","title":"6. Acknowledgements","text":"This project was made possible thanks to a tight collaboration between the STDL team and some experts of the Canton of Neuch\u00e2tel (NE), the Canton of Geneva (GE), the Conservatoire et Jardin botaniques de la Ville de Gen\u00e8ve (CJBG) and the University of Geneva (UNIGE). The STDL team acknowledges key contributions from Marc Riedo (SITN, NE), Bertrand Favre (OCAN, GE), Nicolas Wyler (CJBG) and Gregory Giuliani (UNIGE). We also wish to warmly thank Matthew Parkan for developing, maintaining and advising us on the Digital Forestry Toolbox.
"},{"location":"TASK-4RAS/","title":"TASK-4RAS - HR, NH","text":"Schedule : September 2020 to February 2021 (initially planned from August 2021 February 2022)
This document describe the state of an ongoing task (DIFF) and is subject to daily revision and evolution
"},{"location":"TASK-4RAS/#context","title":"Context","text":"The 4D platform developed at EPFL with the collaboration of Cadastre Suisse is able to ingest both large scale point-based and vector-based models. During the previous development, the possibility to have this different type of data in a single framework lead to interesting results, showing the interest to have the possibility to put this different type of data into perspectives.
Illustrations of mixed models in the 4D platform : INTERLIS, Mesh and LIDAR - Data : SITN
Taking into account point-based and vector-based model allows to almost cover all type of data that are traditionally considered for land registering.
The only type of data that is currently missing is the two-dimensional rasters. Indeed, due to their nature, image are more complicated to put in perspective of other three-dimensional data. The goal of this task is then to address the management of the raster by the platform in order to be able to ingest, store and broadcast any type of data with the 4D platform.
"},{"location":"TASK-4RAS/#specifications","title":"Specifications","text":"In order to address this task, a step-by-step approach is defined. In the first place, a set of data has to be gathered from the STDL partners :
Gathering a dataset of geo-referenced ortho-photography of a chosen place of reasonable size
The dataset has to provide ortho-photography for at least two different times
The format of the dataset has to be analyzed in order to be able to extract the image pixels with their position (CH1903+)
As the platform indexation formalism is not straightforward, the images are treated as point-based model, each pixel being one colored point of the model. This will allow to provide a way of starting to analyze and understand the indexation formalism while having first results on image integration :
Transform images into simple point-based models (each pixel being one point)
Injection of the point-based model in an experimental instance of the platform
Understanding the indexation formalism for point-based models and, subsequently, its adaptation for the vector-based models
As the indexation formalism is understood for point-based models, the following adaptation will be performed :
At this point, a first reporting is required :
Is there an advantage to add raster to such a platform in perspective of the other types of model (points, vectors, meshes) ?
How the adaptation of the point-based indexation performs for images ?
How taking advantage of color accumulation enrich the image integration ?
What is the cost of rendering the image with the adaptation of the point-based indexation ?
Based on the formulated answer, the following strategical choice has to be discussed :
Depending on the answer, a new set of specification will be decided (if this direction is favored).
Depending on the remaining time and on the obtained results, the question of the time management in the platform will be addressed. Currently, the time is treated linearly in the platform and a multi-scale approach, as for the spatial dimensions, could be interesting. The specifications will be decided as the previous points will be fulfilled.
"},{"location":"TASK-4RAS/#resources","title":"Resources","text":"List of the resources initially linked to the task :
Other resources will be provided according to requirements.
"},{"location":"TASK-DIFF/","title":"AUTOMATIC DETECTION OF CHANGES IN THE ENVIRONMENT","text":"Nils Hamel (UNIGE)
Project scheduled in the STDL research roadmap - TASK-DIFF September 2020 to November 2020 - Published on December 7, 2020
Abstract : Developed at EPFL with the collaboration of Cadastre Suisse to handle large scale geographical models of different nature, the STDL 4D platform offers a robust and efficient indexation methodology allowing to manage storage and access to large-scale models. In addition to spatial indexation, the platform also includes time as part of the indexation, allowing any area to be described by models in both spatial and temporal dimensions. In this development project, the notion of model temporal derivative is explored and proof-of-concepts are implemented in the platform. The goal is to demonstrate that, in addition to their formal content, models coming with different temporal versions can be derived along the time dimension to compute difference models. Such proof-of-concept is developed for both point cloud and vectorial models, demonstrating that the indexation formalism of the platform is able to ease considerably the computation of difference models. This research project demonstrates that the time dimension can be fully exploited in order to access the data it holds.
"},{"location":"TASK-DIFF/#task-context-difference-detection","title":"Task Context : Difference Detection","text":"As the implemented indexation formalism is based on equivalences classes defined on space and time, a natural discretization along all the four dimensions is obtained. In the field of difference detection, it allowed implementing simple logical operators on the four-dimensional space. The OR, AND and XOR operators were then implemented allowing the platform to compute, in real time, convolutions to compare models with each others across the time.
The implementation of these operators was simple due to the natural spatio-temporal discretization obtained from the indexation formalism. Nevertheless, two major drawbacks appeared : the first one is that such operators only works for point-based models. Having the opportunity to compute and render differences and similarities between any type of data is not possible with such formal operators.
The second drawback comes from the nature of the point-based capturing devices. Indeed, taking the example of a building, even without any change to its structure, two digitization campaigns can lead to disparities only due to measures sampling. The XOR operator is the natural choice to detect and render differences, but this operator is very sensitive to sampling disparities. Computing the XOR convolution between two point-based models leads the rendering to be dominated by sampling variations rather than the desired structural differences.
This drawback was partially solved by considering the AND operator. Indeed, the AND operator allows to only shows constant structural elements from two different positions in time and is insensitive to sampling disparities. As shown on the following images, the AND operator shows differences as black spots (missing parts) :
AND convolution between two LIDAR models : Geneva 2005 and 2009 - Data : SITGAs one can see, AND convolutions allow detecting, through the black spots, large area of structural changes between the two times and also, with more care, allow guessing smaller differences. Nevertheless, reading and interpreting such representation remains complex for users.
The goal of this task is then to tackle these two drawbacks, allowing the platform to detect changes not only for point-based models but also for vector-based models and to implement a variation of the XOR operator for point-based models allowing to efficiently highlight the structural evolution. The task consists then in the implementation, testing and validation of a difference detection algorithm suitable for any type of model and to conduct a formal analysis on the best rendering techniques.
"},{"location":"TASK-DIFF/#methodology","title":"Methodology","text":"A step by step methodology is defined to address the problem of difference detection in the platform. In a first phase, the algorithm will be developed and validated on vector-based models as follows :
Obtaining a large scale vector-based model on which synthetic variation are introduced
Development of the algorithm using the synthetic variations model
Testing and validation of the algorithm (using the known synthetic variations)
First conclusion
In a second phase, true land register data will be used to formally detect real evolutions of the territory :
Obtaining true land register vector-based models (INTERLIS) at different times
Analysis of the difference detection algorithm on true land register vector-based models
Second conclusion
In a third phase, the algorithm will be validated and adapted to work on point-based models :
Obtaining true land register point-based models (LAS) at different position in time
Verifying the performances of the vector-based detection algorithm on point-based data
Adaptation of the algorithm for point-based models
Analysis of the difference detection algorithm on true land register point-based models
Comparison of the detected differences on point-based models and on their corresponding land register vector-based models (INTERLIS)
Third conclusion
In addition, the development of difference detection algorithm has to be conducted keeping in mind the possible future evolutions of the platform such as addition of layers (separation of data), implementation of a multi-scale approach of the time dimension and addition of raster data in the platform.
"},{"location":"TASK-DIFF/#first-phase-synthetic-variations","title":"First Phase : Synthetic Variations","text":"In order to implements the vector-based difference detection algorithm, sets of data are considered as base on which synthetic differences are applied to simulate the evolution of the territory. This approach allows focusing on well controlled data to formally benchmark the results of the implemented algorithm. Experiments are conducted using these data to formally evaluate the performance of the developed algorithm.
"},{"location":"TASK-DIFF/#selected-resources-and-models","title":"Selected Resources and Models","text":""},{"location":"TASK-DIFF/#vector-models-line-based","title":"Vector Models : Line-based","text":"In this first phase, line-based data are gathered from openstreetmap in order to create simple models used during the implementation and validation of the detection algorithm. A first set of vector-based models are considered made only of lines. Three sets are created each with a different scale, from city to the whole Switzerland.
The line-based sets of data are extracted from openstreetmap shapefiles and the elevations are restored using the SRTM geotiff data. The EGM96-5 geoid model is then used to convert the elevation from MSL to ellipsoid heights. The following images give an illustration of these sets of data :
Line-based data-sets : Switzerland - Data : OSMThe following table gives a summary of the models sizes and primitives count :
Model Size (UV3) Primitive Count Frauenfeld 5.0 Mio 93.3 K-Lines Neuch\u00e2tel 33.1 Mio 620.2 K-Lines Switzerland 1.3 Gio 25.0 M-LinesIn order to simulate evolution of the territory in time, synthetic variations are added to these models. A script is developed and used to insert controlled variations on selected primitives. The script works by randomly selecting a user-defined amount of primitives of a model and by adding a variation on one of its vertex position using a user-specified amplitude. The variation is applied on the three dimensions of space.
"},{"location":"TASK-DIFF/#vector-models-triangle-based","title":"Vector Models : Triangle-based","text":"A second set of triangle-based models is also considered for implementing and validating the difference detection algorithm. The selected model is a mesh model of the Swiss buildings provided by swisstopo. It comes aligned in the CH1903+ frame with elevations. It is simply converted into the WGS84 frame using again the EGM96-5 geoid model :
Triangle-based data-sets : Switzerland - Data : swisstopoThe following table gives a summary of the models sizes and primitives count :
Model Size (UV3) Primitive Count Frauenfeld 116.9 Mio 1.4 M-Triangles Neuch\u00e2tel 842.2 Mio 10.5 M-Triangles Switzerland 30.5 Gio 390.6 M-TrianglesThese models are very interesting for difference detection as the ratio between primitive size and model amplitude is very low. It means that all the primitives are small according to the model coverage, especially for the Switzerland one.
The developed script for line-based models is also used here to add synthetic variations to the models primitives in order to simulate an evolution of the territory.
"},{"location":"TASK-DIFF/#models-statistical-analysis","title":"Models : Statistical Analysis","text":"Before using the models in the following developments, a statistical analysis is performed on the two Switzerland models, line and triangle-based. Each primitive of these two models are considered and their edges size are computed to deduce their distribution :
Statistical analysis : Models primitive edge size distribution, in meters, for the Switzerland models : line-based (left) and triangle-based (right)One can see that the line-based model comes with a much more broad distribution of the primitives size. Most of the model is made from lines between zero and twenty meters. In the case of the triangle-based models, the primitives are much smaller. As most of them are less than ten meters, a significant fraction of primitives is below one meter.
"},{"location":"TASK-DIFF/#implementation-of-the-algorithm","title":"Implementation of the Algorithm","text":"In order to compare two models at two different positions in time to detect differences, the solution is of course to search for each primitive of the primary time if it has a corresponding one in the secondary time. In such case, the primitives can be concluded as static in time and only the primitives that have no correspondence will be highlighted as differences.
A first approach was initially tested : a vertex-based comparison. As every primitive (points, lines and triangles) is supported by vertexes, it can be seen as a common denominator on which comparison can take place. Unfortunately, it is not a relevant approach as it leads to an asymmetric detection algorithm. To illustrate the issue, the following image shows the situation of a group of line-based primitives at two different times with an evolution on one of the primitive vertex :
Asymmetric approach : The variation is detected only when comparing backward in timeWhen the comparison occurs between the second time and the first one, the modified vertex correspondence is not found, and the vertex can be highlighted as a difference. The asymmetry appears as the first time is compared to the second one. In this case, despite the primitive vertex changed, the vertex-based approach is able to find another vertex, part of another primitive, and interprets it as a vertex identity, leading the modified primitive to be considered as static.
In order to obtain a fully symmetric algorithm, that does not depend on the way models are compared in times, a primitive-attached approach is considered. The implemented algorithm then treats the correspondence problem from the whole primitive point of view, by checking that the whole primitive can be found in the other model to which it is compared to. This allows to highlight any primitive showing a modification, regardless of the way models are compared and the nature of the modification.
In addition to highlighting the primitives that changed through time, the implemented algorithm also renders the primitives that have not changed. The primitives are then shown by modulating their color to emphasize the modifications by keeping their original color for the modified one, while the static primitives are shown in dark gray. This allows to not only show the modifications but also to keep the context of the modifications, helping the user to fully understand the nature of the territory evolution.
In addition to color modulation, a variation of difference rendering is analyzed. In addition to color modulation, a visual and artificial marker is added to ease their search. The visual marker is a simple line emanating from the primitive and goes straight up with a size of 512 meters. Such markers are introduced to ease the detection of small primitives that can be difficult to spot according to large point of views.
Additional developments were required for triangle-based models : indeed, such models need to be subjected to a light source during rendering for the user to understand the model (face shading). The previously implemented lighting model is then modified to take into account color modulation in order to correctly render the triangle that are highlighted. Moreover, the lighting model was modified to light both face of the triangles in order to light them regardless of the point of view.
In addition, as mesh models are made of triangles, primitives can hide themselves. It can then be difficult for the user to spot the highlighted primitives as they can be hidden by others. An option was added to the rendering client allowing the user to ask the rendering of triangles as line-loops or points in order to make them transparent. Finally, an option allowing the user to enable or disable the render face culling was added for him to be able to see the primitive from backward.
"},{"location":"TASK-DIFF/#results-and-experiments","title":"Results and Experiments","text":"With the implemented algorithm, a series of experiments are conducted in order to validate its results and to analyze the efficiency of the difference detection and rendering from a user point of view. In addition, experiments are also conducted to quantify the efficiency of the difference detection for automated processes.
"},{"location":"TASK-DIFF/#difference-detection-overview","title":"Difference Detection : Overview","text":"Considering the selected data-sets, each original model is injected at a given time and synthetic variations are added to a copy of it to create a second model injected at another time. The synthetic variations are randomly added to a small amount of primitives of the original model and are of the order of one meter. On the following examples, the detection is operated considering the original model as primary and the modified one as secondary.
The following images show examples of how the detection algorithm allows to highlight the detected differences while keeping the rest of the model using a darker color in case of line-based models :
Example of difference detection on line-based Frauenfeld (left) and Neuch\u00e2tel (right) models - Data : OSMOne can see how the modified primitives are highlighted while keeping the context of the modifications. The highlighted primitive is the one belonging to the primary time. Comparing the models in the other way around would lead the secondary model primitives to be highlighted.
Considering the Frauenfeld example, the following images show the situation in the primary time (original model) and the secondary time (model with synthetic variations) :
Primary model (left) and secondary one (right) showing the formal situations - The modified primitive is circled in read - Data : OSMAs a result, the user can choose between the differences highlighting with the choice of model as primary and can also switch back and worth between the models themselves though the platform interface.
Of course, the readability of the difference detection models depends on the size of the modified primitive and the scale at which the model is looked at by the user. If the user adopts a large scale point of view, the differences, even highlighted, can become difficult to spot. This issue can be worsened as triangle-based models are considered. In addition to primitive size, triangles also bring occlusions.
The visual markers added to the highlighted primitives can considerably improve ease of differences search by the user. The following images give an example of difference detection without and with the visual markers added by the algorithm :
Example of highlighted primitives without (left) and with (right) visual markers - Data : OSMConsidering the triangle-based models, difference detection is made more complicated by at least three aspects : the first one is that 3D vector models are more complex than 2D ones in the way primitives (triangles) are more densely packed in the same regions of space in order to correctly model the buildings. The second one is that triangles are solid primitives that bring occlusions in the rendering, hiding other primitives. The last aspect is that such a model can contain very small primitives in order to model the details of the buildings. In such a case, the primitives can be difficult to see, even when highlighted.
The following images show an example of highlighted triangles on the Frauenfeld model :
Example of highlighted primitive on the Frauenfeld building model - Data : swisstopoOn the right image above, the highlighted triangle is underneath the roof of the house, forcing the user to adopt an unconventional point of view (from above the house) to see it. In addition, some primitives can be defined fully inside a volume close by triangles, making them impossible to see without going inside the volume or playing with triangle rendering mode.
In such a context, the usage of the visual markers become very important for such models coming with large amount of occlusion and small primitives :
Example of highlighted primitives without (left) and with (right) visual markers - Data : swisstopoIn case of triangle-based models, the usage of markers appears to be mandatory in order for the user to be able to locate the position of the detected differences in a reasonable amount of time.
"},{"location":"TASK-DIFF/#difference-detection-user-based-experiments","title":"Difference Detection : User-Based Experiments","text":"In any case, for both line and triangle-based models, the difference detection algorithm is only able to highlight visible primitives. Depending on the point of view of the user, part of the primitives are not provided by the platform because of their small size. Indeed, the whole point of the platform is to allow the user to browse through arbitrary large models, which implies to provided only the relevant primitives according to its point of view.
As a result, the detection algorithm will not be able to highlight the variations as the involved primitives are not considered as a query answer by the platform. The user has then to reduce is point of view in order to zoom on the small primitives to make them appear, and so, allowing the algorithm to highlight them.
In order to show this limitation, an experiment is performed. For each model, a copy is made on which eight synthetic differences are randomly introduced. The variations are of the order of one meter. The models and their modulated copy are injected in the platform. The rule is the following : the user uses the detection algorithm on each model and its modulated copy and has five minutes to detect the eight differences. Each time a difference is seen by the user, the detection time is kept. The user is allowed to use the platform in the way he wants. In each case, the experiment is repeated five times to get a mean detection rate.
As one could ask, these measures are made by the user and are difficult to understand without a reference. In order to provide such reference, the following additional experiment is conducted : each model and its modulated copy are submitted to a naive automated detection process. This process parses each primitive of the original model to search in its modulated copy if the primitive appear. If the primitive is not found, the process trigger a difference detection. This process is called naive as it simply implements two nested loops, which is the simplest searching algorithm implementation. The process is written in C with full code optimization and executed by a single thread.
Starting with the line-based models, the following figures shows the difference detection rates according to time. For each of the three models, the left plots show the rate without visual markers, the middle ones with visual markers and the right ones the naive process detection rate :
Frauenfeld : The black curve shows the mean detection rate while the blue (left, middle) and red (right) area gives the worst and best ratesLeft : without visual markers - Middle : with visual markers - Right : automated process Canton of Neuch\u00e2tel : The black curve shows the mean detection rate while the blue (left, middle) and red (right) area gives the worst and best ratesLeft : without visual markers - Middle : with visual markers - Right : automated process Switzerland : The black curve shows the mean detection rate while the blue (left, middle) and red (right) area gives the worst and best ratesLeft : without visual markers - Middle : with visual markers - Right : automated processAs expected, the larger the model is, the more difficult it is for the user to find the highlighted differences, with or without visual markers. Considering a city, the differences, even of the order of one meter, are easy to spot quickly. As the model gets larger, the more time it takes for the user to find the differences. On a model covering a whole canton (Neuch\u00e2tel), one can see that most of the differences are detected in a reasonable amount of time despite their small size according to the overall model. On the Swiss model, things get more complicated, as simply looking at each part of the country is already complicated in only five minutes, leading the detection rate to be lower, even using the visual markers.
These results are consistent with the statistical analysis made on the line-based Switzerland model. Detection on a city or even a whole canton lead the user to adopt a point of view sufficiently close to make most of the primitives appearing. For the Switzerland model, the user is forced to adopt a larger point of view, leading to a significant proportion of primitives to stay hidden.
These results also show that adding visual markers to the highlighted primitives increases the user detection rate, meaning that the markers lead to a more suitable rendering from the user experience point of view.
Considering the user results and the naive detection process, one can see that the user obtains at least similar results but most of the time outperforms the automated process. This allows to demonstrate how the implementation and data broadcasting strategy of the platform is able to provide an efficient way to access models and composite models, here in the context of difference detection.
The following figures show the experiments results for the triangle-based models, which were not performed on the whole Switzerland model due to limited rendering capabilities :
Frauenfeld : The black curve shows the mean detection rate while the blue (left, middle) and red (right) area gives the worst and best ratesLeft : without visual markers - Middle : with visual markers - Right : automated process Canton of Neuch\u00e2tel : The black curve shows the mean detection rate while the blue (left, middle) and red (right) area gives the worst and best ratesLeft : without visual markers - Middle : with visual markers - Right : automated processSimilar conclusions apply for the triangle-based models : the larger the model is, the more difficult the difference detection is. These results also confirm that adding visual markers in addition to primitives highlighting significantly helps the user, particularly in case of triangle-based models.
The obtained results on triangle-based models are lower than for line-based models. A first explanation is the greater amount of primitive that lead the user to spend more time at each successive point of view. The occlusion problem also seems to play a role, but to a lesser extent as the visual markers seems to largely solve it. The differences between detection on line and triangle-based models have to be searched in the statistical analysis of the triangle-based models. Indeed, for these models, a large proportion of the primitives are very small (less than a meter), leading them to be rendered only as the user adopts a close point of view, making the detection much more complicated in such a small amount of time.
The triangle-based models being larger than the line-based one, the results of the naive process are very poor. As for the line-based models experiments, the user outperforms this automated process, in a much more significant way.
"},{"location":"TASK-DIFF/#difference-detection-process-based-experiments","title":"Difference Detection : Process-Based Experiments","text":"In the previous experiments, the user ability to find the differences on the data-sets, using synthetic variations, was benchmark in perspective of the results provided by a naive automated process. The user performs quite well using the platform, but start to struggle as the data-sets get bigger according to the sizes of their primitives.
In this second set of experiments, the platform is used through an automated process instead of a user. The process has the same task as the user, that is, finding the eight synthetic differences introduced in the models copy. The process starts with a list of index (the discretization cells of the platform) in order to query the corresponding data to the platform before to search for differences in each cell. The process implements, then, a systematic difference detection covering the whole model.
In order for the process to work, it requires an input index list. To create it, the primitive injection condition of the platform is used to determine the maximal depth of these index. The following formula gives the poly-vertex (lines and triangles) primitives injection condition according to the platform scale. In other words, the formula gives the shallowest scale at which the primitive is considered through queries according to its size :
where s gives the shallowest scale, R being the WGS84 major semi-axis and e is the largest distance, in meters, between the primitive first vertex and its other ones. For example, choosing s = 26 allows the index to reach any primitive that is greater than ~30 cm over the whole model covered by the index.
The scale 26 is then chosen as the deepest search scale in the following experiments. This value can be adapted according to the primitives size and to the nature of the detection process. The larger it is, the more data are broadcast by the platform increasing the processing time.
In order to compare the user-based experiments, the naive automated approach and this process-based exhaustive search, the same protocol is considered. The process addresses queries to the platform, based on the index list, and save the detection time of each difference. The detection rate is plot in the same way as for the previous experiments. Again, eight synthetic differences are randomly introduced and the experiment is repeated five times for the line-based model and only two times for the triangle-based model.
As the scale 26 is chosen as the deepest search scale, the index list can be built in different ways. Indeed, as a query is made of one spatial index, that points at the desired cell, and an additional depth (span), to specify the density of data, the only constraint to maintain the deepest search scale at 26 is the following :
where the two left hand side terms are the spatial index size and span value. In these experiments, a first list of index is built using a span of 9 and a second with a span of 10. As the deepest scale is maintained constant, increasing the span reduces the index list size, but the queried cells contain more data to analyze.
The following figures show the mean detection rate for the Switzerland lined-based model with the deepest scale at 26 and span at 9 and 10. The plots are scaled in the same way as for the user-based experiments :
Switzerland : The black curve shows the mean detection rate while the blue area gives the worst and best rates - Span at 9 (left) and 10 (right)One can see that the detection rate on such a model is much better than the user-based or naive approach ones. In a manner of five minutes, for the span set to 10, the eight differences can be detected and reported. The full detection process took ~5 minutes with span set to 10 and ~8 minutes with the span set to 9. This shows how the platform can be used by automated processes as an efficient data provider. In addition, as the data are queried by the automated process, the detected primitive geometry is directly available, allowing all sorts of subsequent processes to take place.
As the deepest scale was set to 26, in one of the five measures session, one of the eight differences was not detected (at all). It means that the primitive on which a synthetic variation was introduced is smaller than 30cm and was then not reached by any index. This shows the importance of defining the spatial index and spans according to the processes needs. For example, increasing the deepest scale to 27 would allow reaching primitive down to ~15 cm over the whole Switzerland, and so on.
The following figures show the mean detection rate for the Switzerland triangle-based model. In this case, only two measure sessions were made to limit the time spent on this analysis :
Switzerland : The black curve shows the mean detection rate while the blue area gives the worst and best rates - Span at 9 (left) and 10 (right)The conclusion remain, but the rate is slower in this case as the model contains much more primitives than the line-based one. In this case, the full detection process took ~15 minutes with span set to 10 and ~20 minutes with the span set to 9. Again, in one of the two measure session, one difference was not detected due to the size of the primitive. Nevertheless, these results shows how the platform, seen as a process data provider, allows outperforming user-based and classic detection algorithms.
Such process-based strategy can be performed in many ways depending on the needs. For example, the index list can be limited to a specific area or set to focus on spread and defined locations (for example at the intersection of the Swiss hectometric grid). The following image gives a simple example of how the detected differences can be leveraged. As the geometry of the differences is known by the process, a summary of the differences can be provided through a simple map :
Example of a differences map based on the results of the detection process - Data : SRTMThe eight synthetic differences are easily presented allowing a user to analyze them more in detail in the platform interface for example. This map was created detecting the eight differences on the line-based Switzerland model in about 5 minutes with a span set to 10.
"},{"location":"TASK-DIFF/#conclusion-first-phase","title":"Conclusion : First Phase","text":"During this first phase, the difference detection algorithm was developed and validated on both line-based and triangle-based data. An efficient algorithm is then implemented in the platform allowing emphasizing differences between models at different temporal positions. The algorithm is able to perform the detection on the fly with good performances allowing the users to dynamically browse the data to detect and analyze the territory evolutions.
The performances of the detection algorithm allow the platform to be suitable for automated detection processes, as a data provider, answering large amounts of queries in an efficient and remote manner.
Two variations of the difference detection algorithm are implemented. The first version consists in highlighting the primitives that are subject to modifications over a time. This variation is suitable for automated processes that can rely on simple search methods to list the differences.
For the users, this first variation can lead to more difficult visual detection of the differences, especially in case the highlighted primitives are small or hidden by others. For this reason, visual markers were added on top of the highlighted primitives in order to be seen from far away, regardless of the primitives size. The measures sessions made during the user-based experiments showed a clear improvement of the detection rate when using the visual markers. This was especially true for triangle-based models, where the primitives bring occlusions.
The user-based experiments showed that using the platform interface, a human can significantly outperform the result of a naive automated process operating on the models themselves. The experiments showed that the user is able to efficiently search and find through space and time the evolutions of the territory appearing in the data.
Of course, as the model size and complexity increases, the user-driven interface starts to show its limits. In such a case, the process-based experiments showed that automated processes can take over these more complicated searches through methods allowing performing exhaustive detection over wide models in a matter of several minutes.
At this point, the developments and validations of the algorithm, and its variations, were conducted on synthetic modifications introduced in models using controlled procedures. The next phase focuses on formal data extracted from land registers.
"},{"location":"TASK-DIFF/#second-phase-true-variations","title":"Second Phase : True Variations","text":"In this second phase, also dedicated to vector-based models, the focus is set on applying the developed difference detection algorithm on true land register models. Two sets of data are considered in order to address short-term and long-term difference detection.
"},{"location":"TASK-DIFF/#selected-resources-and-models_1","title":"Selected Resources and Models","text":"In both cases, short-term and long-term, INTERLIS data are considered. A selection of tables in different topics is performed to extract the most interesting geometries of the land registering. For all models, the following colors are used to distinguish the extracted layers :
INTERLIS selected topics and tables colors - Official French and German designationsThe layers are chosen according to their geometric content. The color assignation is arbitrary and does not correspond to any official colorization standard.
"},{"location":"TASK-DIFF/#short-term-difference-detection-thurgau","title":"Short-Term Difference Detection : Thurgau","text":"For the short-term application of the difference detection algorithm, the case of the Thurgau canton is considered. Two set of INTERLIS data are considered that are very close in time, of the order of days. The selected layers are extracted from the source files before to be converted to the WGS84 frame using the EGM95-6 geoid model. The heights are restored using the SRTM topographic model. The following images give an illustration of the considered data :
Canton of Thurgau (left) and close view of Frauenfeld (right) - Data : Kanton ThurgauTwo INTERLIS models are considered with times 2020-10-13 and 2020-10-17, corresponding to the models gathering time. The following table gives the models size and primitives count :
Model Size (UV3) Primitive Count Thurgau 2020-10-13 203.7 Mio 3.8 M-Lines Thurgau 2020-10-17 203.8 Mio 3.8 M-LinesAs the two models are very close in time, they are very similar in size and content as the corrections count made during the considered time range is small.
"},{"location":"TASK-DIFF/#long-term-difference-detection-geneva","title":"Long-Term Difference Detection : Geneva","text":"For the long-term difference detection analysis, the Geneva case is selected as the canton of Geneva keeps a copy of each land register model for each month from at least 2009. This allows to compare INTERLIS models that are further away from each other from a temporal point of view. The selected layers are extracted and converted to the WGS84 coordinates system using the EGM96-6 geoid model. Again, the SRTM model is used to restore the heights. The following images give an illustration of the selected models :
Canton of Geneva in 2019-04 (left) and close view of Geneva in 2013-04 (right) - Data : SITGThe selected models are not chosen randomly along the time dimension. Models that corresponds to the Geneva LIDAR campaigns are selected as they are used in the next phase. In addition, as the LIDAR campaigns are well spread along the time dimension, the selected models are far away from each other in time, of the order of at least two years. The following table summarize the models size and primitives count :
Model Size (UV3) Primitive Count Geneva 2009-10 (MN03) 550.2 Mio 10.3 M-Lines Geneva 2013-04 407.0 Mio 7.6 M-Lines Geneva 2017-04 599.6 Mio 11.2 M-Lines Geneva 2019-04 532.6 Mio 9.9 M-LinesAs the temporal gaps between the models are much larger than for the Thurgau models, the size and primitive count show larger variations across the time, indicating that numerous differences should be detected on these data.
"},{"location":"TASK-DIFF/#models-statistical-analysis_1","title":"Models : Statistical Analysis","text":"As in the first phase, a statistical analysis of the Thurgau and Geneva models is conducted. The following figures show the line length distribution of the two Thurgau models :
Statistical analysis : Primitive size distribution, in meters, for the Thurgau 2020-10-13 (left) and 2020-10-17 (right)As expected, as the models are very similar, the distribution between both models is almost identical. In both cases, the distribution is centered around two meters and is mostly contained within the [0,5] range. The following figures show the same statistical analysis for the Geneva models, more spread along the time dimension :
Statistical analysis : Primitive size distribution, in meters, for the Geneva 2009-10 (top-left), 2013-04 (top-right), 2017-04 (bottom-left) and 2019-04 (bottom-right)One can see that the distribution varies more from a time to another. In addition, in comparison with the Thurgau models, the Geneva models tend to have smaller primitive, mostly distributed in the [0,1] range with a narrower distribution.
"},{"location":"TASK-DIFF/#results-and-analysis","title":"Results and Analysis","text":""},{"location":"TASK-DIFF/#short-term-thurgau","title":"Short-Term : Thurgau","text":"In the case of Thurgau data, the models are only separated in time by a few days. It follows that only a small amount of differences is expected. As an introduction, the following images show the overall situation of the difference detection between the two models. The differences are highlighted by keeping the primitive original color while identities are shown in dark gray to allow context conservation :
Overall view of difference detection : Thurgau (right) and Amriswil (left)As expected, as the two models are very close in time, only a limited amount of differences is detected. Such situation allows to have a clear view and understanding of each difference.
In order to analyze the results of the difference detection algorithm on real cases, selected differences, using the algorithm itself, are studied more in detail to emphasize the ability of the algorithm to detect and make the difference understandable for the user. As a first example, the case of the Bielackerstrasse in Amriswil is considered and illustrated by the following images :
Example of difference detection : Bielackerstrasse in Amriswil - 2020-10-17 (right) and 2020-10-13 (left) as primary timeIn this case, new buildings are added to the official land register. As the 2020-10-17 is selected as primary, the highlighted elements correspond the footprint of the added buildings. When the 2020-10-13 time is set as primary, as it does not contain the building footprints, the highlighted elements only corresponds to the re-measured elements for land register correction. This illustrates the asymmetry of the difference detection algorithm that only highlight primitives of the primary time.
In addition, by keeping the color of the highlighted primitives, the difference detection algorithm allows to immediately see that three layers of the land register have been affected by the modification (German : Einzelobjekte, Flaechenelement Geometrie; Bodenbedeckung, BoFlaeche Geometrie; Einzelobjekte, Linienelement). The following images show the respective situation of the 2020-10-13 and 2020-10_17 models :
Situation of Bielackerstrasse in Amriswil - 2020-10-17 (right) and 2020-10-13 (left)This confirms the analysis deduced from the difference detection algorithm that a group of new buildings are added to the land register. In this example, if the inner road was not re-measured, at least on some portion, the difference detection with 2020-10-13 as primary time would have shown noting.
To illustrate the asymmetry of the algorithm more clearly, the example of Mammern is considered. On the following image, the result of the difference detection is illustrated with both time chosen successively as primary :
Example of difference detection : Mammern - 2020-10-17 (right) and 2020-10-13 (left) as primary timeOn this specific example, one can see that choosing the 2020-10-17 time as primary, which is the most recent time, nothing is highlighted by the detection algorithm. But when the 2020-10-13 time is set as primary, a specific element appears as highlighted, showing an evolution of the land register. This example illustrates the deletion of a sequence of primitive of the property (German : Liegenschaften, ProjLiegenschaft Geometrie) layer of the land register, which then only appear as the oldest time is set as primary. The following images show both time situation :
Situation of Mammern - 2020-10-17 (right) and 2020-10-13 (left)This example shows the opposite situation of the previous one, where elements were deleted from the land register instead of added.
As a last example, an in-between situation is selected. The case of the Trungerstrasse in M\u00fcnchwilen is considered and illustrated by the following images showing both time as primary :
Example of difference detection : Trungerstrasse in M\u00fcnchwilen - 2020-10-17 (right) and 2020-10-13 (left) as primary timeThis situation is in-between the two previous one as nothing really appeared and nothing really disappeared from the land register. A modification was made on the situation of this specific property and so, appear no matter which of the two times is selected as primary. The following images show the formal situation of the land register for the two times :
Situation of Trungerstrasse in M\u00fcnchwilen - 2020-10-17 (right) and 2020-10-13 (left)One can see that the correction made are around the pointed house, as the access road of the rear delimitation. For this type of situation, the algorithm recover some kind of symmetry, as the selected time as primary does is not relevant to detect the difference.
To conclude this short-term difference detection analysis, the efficiency of visual markers is illustrated on the region of Romanshorn and Amriswil on the following images. Both images show the difference detection rendering without and with the visual markers :
Illustration of difference detection without (right) and with (left) visual markers - 2020-10-17 as primary time for both imagesOne can see that, for small highlighted primitive, the usage of visual markers eases the differences view for the user. Of course, as the highlighted primitive are big enough, or if the point of view is very close to the model, the efficiency of the visual markers decreases.
"},{"location":"TASK-DIFF/#long-term-geneva","title":"Long-Term : Geneva","text":"Considering the Geneva land register, the compared model are much more spread along the time dimension, leading to a much richer difference model. Starting with the 2019-04 and 2017-04 models, the following images gives an overview of the detected differences on the whole canton :
Overall view of difference detection between Geneva 2019-04 and 2017-04 models with 2019-04 as primaryOn this example, one can see that a much larger amount of differences is detected as the model are separated by two years. As the first observation, one can see that large portions of the model seems to have entirely moved between the two dates. Three of these zones are clearly visible on the images above as all their content is highlighted by the difference detection algorithm : the superior half of the Geneva commune, the Carouge commune and the left half of the Plan-les-Ouates commune, but more can be seen, looking more closely.
These zones have been subjected to correction during the time interval separating the two models. These corrections mainly comes from the FINELTRA [1] adjustment used to ensure conversion between the old Swiss coordinates system MN03 and the MN95 current standard. As these corrections operate on each coordinate, the whole area is then modified of the order of a few centimeters. In these condition, the whole area is then highlighted by the difference detection algorithm as illustrated by the following image on the Carouge commune :
Closer view of the Carouge 2019-04 and 2017-04 differences with 2019-04 as primaryOn this closer view, one can see that almost all the primitive of this specific commune have been corrected. Some exceptions remain. It is the case of the train tracks for example, that appear as static between the two models. Looking more closely, one can also observe that some primitive were not affected by the correction.
Looking at the areas that have not been corrected through the FINELTRA triangular model, one can see that a lot of modification appear. For example, the following two images gives the differences of the Geneva historical part and the Verbois dam :
Closer view of the Historical city (left) and Verbois dam (right) 2019-04 and 2017-04 differences with 2019-04 as primaryOne can see that, despite very few elements truly changed, a lot of primitives are highlighted as differences. This can be explained by a constant work of correction based on in-situ measurement. Some other factors can also explain these large amount of differences such as scripts used to correct the data to bring them in the expected Swiss standards.
In such context, detected real changes of the territory is made much more complicated, as large amounts of detected differences are due to corrections of the model itself, without underlying true modification on the territory. Nevertheless, differences that corresponds to a true territory modification can be found. The following images show an example on the Chemin du Signal in Bernex :
Differences on Chemin du Signal in Bernex with 2019-04 (left) and 2017-04 (right) as primaryThese differences can be detected by the user on the difference model as they appear more clearly due to an accumulation of highlighted primitives. Indeed, in case of simple correction, the highlighted primitive appear more isolated. The following images give the formal situation for the two times :
Situation of Chemin du Signal in Bernex in 2019-04 (left) and 2017-04 (right)On this example, one can see that, with both time as primary, the territory evolution can be seen by the user as the highlighted primitives are more consistent. Nevertheless, territory changes are more difficult to list in such a case than in the previous short-term analysis. The following images give two example of visible territory changes in the difference model :
La Gradelle (left) and Puplinge (right) 2019-04 and 2017-04 differences with 2019-04 as primaryOn the previous left image, a clear block of buildings can be seen as more highlighted than the rest of the difference model and correspond to new building. On the right of this block, a smaller one can also be seen that also corresponds to new buildings. On the right images, a clear block of new buildings is also visible, as more highlighted. In such a case, the user has more effort to perform in order to detect the differences that correspond to true changes in the territory, the differences model showing the land register modification in the first place rather than of the proper territory evolution.
Considering the 2013-04 model, similar observations apply with stronger effect due to the larger temporal gap. The difference models are dominated by correction made to the model rather than proper territory changes. Comparing the 2017-04 and 2013-04 lead to even more difficult detection of these true modification, as the correction are widely dominating the difference models.
The case of the 2009-10 model is made even worse by its coordinates system, as it is expressed in the old MN03 coordinates system. This model is made very difficult to compare with the three others, expressed in the MN95 frame, as all its primitives are highlighted in difference models due to the conversion performed between the MN03 and MN95 frames. Comparing the 2009-10 model with the 2013-04 lead to no primitive detected as identity, leaving only differences.
"},{"location":"TASK-DIFF/#conclusion-second-phase","title":"Conclusion : Second Phase","text":"Two cases have been addressed in this phase showing each specific interesting application of the difference detection applied on land register data through the INTERLIS format. Indeed, short and long term differences emphasize two different points of view according to the analysis of the land register and its evolution in time.
In the first place, the short term application clearly showed how difference detection and their representation opens a new point of view on the evolution of the land register as it allows focusing on clear and well identified modifications. As the compared models are close in time, one is able to produced differences models allowing to clearly see, modification by modification, what happened between the two compare situations, allowing focusing on each evolution to fully understand the modification.
It follows that this short-term difference detection can provide a useful approach for the user of the land register that are more interested in the evolution of the model rather than in the model itself. The difference models can provide users a clear a simple view on what to search and to analyze to understand the evolution of such complex models. In some way, the differences on land register models can be seen as an additional layer proposed to the user to allow him to reach information that are not easy to extract from the models themselves.
The case of Geneva, illustrating the long-term difference detection case, showed another interesting point of view. In the first place, one has to understand that land register models are complex and living models, not only affected by the transcription of the real-world situation across the time.
Indeed, on the Geneva models, a large amount of differences is detected even on a relative short period of time (two years). In addition to the regular updates, following the territory evolution, a large amount of corrections is made to keep the model in the correct reference frame. The Swiss federal system can also add complexity, as all Cantons have to align themselves on a common set of expectations.
In such a case, the difference detection turned out to be an interesting tool to understand and follows the corrections made to the model in addition to the regular updates. On the Geneva case, we illustrated that, by detecting it in the difference model, the correction on the coordinates frame on large pieces of the territory. This shows how the difference detection can be seen as a service that can help to keep track of the life of the model by detecting and checking these type of modifications.
As a result, difference detection can be a tool for the user of the land register but can also be a tool for the land register authorities themselves. The difference models can be used to check and audit the evolution of the models, helping the required follow-up on the applied correction and updates.
"},{"location":"TASK-DIFF/#third-phase-point-based-models","title":"Third Phase : Point-Based Models","text":"In this third and last phase, the developed algorithm for difference detection on vector models is tested on point-based ones. As mentioned in the introduction, the platform was already implementing logical operators allowing comparing point-based models across time. As illustrated in the introduction, only the AND operator allowed emphasizing differences, but rendering them as missing part of the composite models. It was then difficult for the user to determine and analyze those differences.
The goal of this last phase is to determine in which extend the developed algorithm is able to improve the initial results of point-based logical operators and how it can be adapted to provide better detection of differences.
"},{"location":"TASK-DIFF/#selected-resources-and-models_2","title":"Selected Resources and Models","text":""},{"location":"TASK-DIFF/#point-based-models-lidar","title":"Point-Based Models : LIDAR","text":"Smaller data-sets are considered as point-based models are usually much larger. The city of Geneva is chosen as an example. Four identical chunks of LIDAR data are considered covering the railway station and its surroundings. The four models correspond to the digitization campaigns of 2005, 2009, 2013 and 2017. The data are converted from LAS to UV3 and brought to WGS84 using the EGM96-5 geoid model. The following images give an overview of the selected models :
Point-based data-sets : Geneva LIDAR of 2005 (left) and 2009 (right) - Data : SITGThe following table gives a summary of the models sizes and primitive count :
Model Size (UV3) Primitive Count Geneva 2005 663.2 Mio 24.8 M-Points Geneva 2009 1.2 Gio 46.7 M-Points Geneva 2013 3.9 Gio 4.2 G-Points Geneva 2017 7.0 Gio 7.5 G-PointsThe color of the models corresponds to the point classification. In addition, the models have a density that considerably increases with time, from 1 points/m^2 (2005) to 25 points/m^2 (2017). This disparity of density is considered as part of the sampling disparity, leading to a set of data very interesting to analyze and benchmark the difference detection algorithm.
"},{"location":"TASK-DIFF/#models-statistical-analysis_2","title":"Models : Statistical Analysis","text":"As for line and triangle-based models, a statistical analysis of the point-based models is performed. The analysis consists in computing an approximation of the nearest neighbor distance distribution of points. The following figure shows the distribution of the 2005 and 2009 models :
Statistical analysis : Nearest neighbor distribution approximation of the 2005 (left) and 2009 (right) modelsand the following figure shows the results for the 2013 and 2017 models :
Statistical analysis : Nearest neighbor distribution approximation of the 2013 (left) and 2017 (right) modelsThe nearest neighbor distribution tends to go toward zeros with the year of acquisition, showing that modern models are significantly denser that the older ones, making these models interesting for the difference detection algorithm analysis.
"},{"location":"TASK-DIFF/#differences-detection-algorithm-direct-application-on-point-based-models","title":"Differences Detection Algorithm : Direct Application on Point-Based Models","text":"In order to determine the performances of the difference detection algorithm on the selected point-based models, the algorithm is simply applied without any adaptation on the data-sets and the results are analyzed. The following images give an overview of the obtained results comparing the 2005 and 2009 models :
Application of the difference detection algorithm on point-based models : Geneva model of 2005 and 2009 with 2005 as primary (left) and inversely (right) - Data SITGOne can see that the obtained results are very similar to the results obtained with the previously implemented XOR logical operator. The only differences is that the identical points are shown (in dark gray) along with the highlighted points (showing the differences). The same conclusion applies : the obtained composite model is difficult to read as it is dominated by sampling disparities. One can, by carefully looking at the model, ending up detecting large modifications by searching for highlighted points accumulation. In addition, taking one model or the other as primary for the algorithm does not really help as shown on the images above. The same conclusion applies even when the two compared models comes with a similar point density as the 2013 and 2017 models :
Application of the difference detection algorithm on point-based models : Geneva model of 2013 and 2017 with 2013 as primary (left) and inversely (right) - Data SITGOne can nevertheless observe that choosing the less dense model as primary leads to results a bit more clear for difference detection, but remaining very hard to interpret for a user, and much more for automated processes.
In addition, the performances of the algorithm are very poor as point-based models are much denser in terms of primitives than line or triangle-based models. These reasons lead to the conclusion that the algorithm can not be directly used for point-based models and need a more specific approach.
"},{"location":"TASK-DIFF/#differences-detection-algorithm-adaptation-for-point-based-models","title":"Differences Detection Algorithm : Adaptation for Point-Based Models","text":"In order to adapt the difference detection algorithm for point-based models, two aspects have to be addressed : the efficiency of the detection and the reduction of the sampling disparities over-representation, which are both server-side operations.
The problem of efficiency can be solved quite easily if the adaptation of the difference detection algorithm goes in the direction of logical operators, for which an efficient methodology is already implemented. Solving the sampling disparity over-representation is more complicated.
The adopted solution is inspired from a simple observation : the less deep (density of cells) the queries are, the clearer the obtained representation is. This can be illustrated by the following images showing the 2005 model compared with the 2009 one with depth equal to 7, 6 and 5, from left to right :
Example of decreasing query depth on the comparison of 2005 and 2009 models - Data SITGThis is expected, as the sampling disparities can only appear at scales corresponding to the nearest neighbor distribution. Nevertheless, as the depth is decreased, the models become less and less dense. The increase of difference readability is then compensated by the lack of density, making the structures more difficult to identify, and then, their subsequent modifications. The goal of the algorithm adaptation is to keep both readability and density.
To achieve this goal, the implementation of the previous XOR operator is considered as a base, mostly for its efficiency. As the XOR simply detects if a cell of the space-time discretization at a given time is in a different state as its counterpart at another time, it can be modulated to introduce a scale delay mechanism that only applies detection on low-valued scales, broadcasting their results to their daughter cells. This allows to preserve the density and to perform the detection only on sufficiently shallow scales to avoid sampling disparities to become dominant.
The question is how to operate the scale delay according to the scale itself. Indeed, with large points of view, the delay is not necessary as the model is viewed from far away. The necessity of the scale delay appears as the point of view is reduced, and, the more it is reduced, the larger the scale delay needs to be. A scale-attached delay is then defined to associate a specific value for each depth.
"},{"location":"TASK-DIFF/#results-and-experiments_1","title":"Results and Experiments","text":"The adaptation of the difference detection algorithm for point-based models is analyzed using the selected data-sets. An overview of its result is presented before a more formal analysis is made using difference detection made on line-based official land register data to be compared with the differences on point-based models.
"},{"location":"TASK-DIFF/#differences-detection-overview","title":"Differences Detection : Overview","text":"Considering the two first models, from 2005 and 2009 campaigns, the following images shows the results of the initial version of the difference detection algorithm (similar to XOR operator) and its adapted version implementing the scale delay :
Differences detection on 2005 and 2009 models with 2005 as primary - Left : without scale delay - Right : with scale delay - Data SITGOne can see how scale delay is able to drastically reduce the effect of sampling disparities while comparing two point-based models. The effect is more obvious as the 2009 model is set as primary for difference detection :
Differences detection on 2005 and 2009 models with 2009 as primary - Left : without scale delay - Right : with scale delay - Data SITGThis improvement gets more clear as the point of view is reduced. The following image shows the initial algorithm and the scale delay algorithm on a specific area of the city with 2005 as primary model :
Differences detection on 2005 and 2009 models with 2005 as primary - Left : without scale delay - Right : with scale delay - Data SITGBy inverting the model roles and making the 2009 model primary for difference detection lead to similar results :
Differences detection on 2005 and 2009 models with 2009 as primary - Left : without scale delay - Right : with scale delay - Data SITGConsidering the denser models of 2013 and 2017 campaigns, the results of the scale delay introduction also lead to a better understanding of the differences as shown on the following images :
Differences detection on 2013 and 2017 models with scale delay - Left : 2013 as primary - Right : 2017 as primary - Data SITGNevertheless, one can see that scale delay is not able to get rid entirely of sampling disparities. The right image above, comparing the 2017 model to the 2013 one, shows sampling disparities being highlighted as differences on the wall of the building in the background. This does not affect too much the user readability, but still make the model a bit more complicated to understand.
In addition, the models play an important role in the way differences can be detected through classic approach. For example, focusing on a specific building, the obtained highlighted differences :
Differences detection on 2013 and 2017 models with scale delay with 2013 (left) and 2017 (right) as primary - Data SITGcould lead the user to consider the building wall as a difference. Looking at the formal situation in both 2013 and 2017 models :
Structural situation in 2013 (left) and 2017 (right) - Data SITGOne can see that the detected difference comes from the missing wall on the 2013, and not from a formal evolution of the building. This example illustrates that sampling disparity is not the only factor that could reduce the readability of the model for the user.
"},{"location":"TASK-DIFF/#differences-detection-comparison-with-land-register-differences","title":"Differences Detection : Comparison with Land Register Differences","text":"As the algorithm is already tested for land register models, one can use its results on these data in order to put them into perspective of the detected differences on point cloud. As the methodology is not the same for vector-based and point-based models, it is interesting to see the coherence and deviations of both approaches.
One important thing to underline, is that difference detection in land register model does not detect changes in the environment directly, but detects the revision of the land register itself, as discussed in the previous phase. Of course, land register models evolve with environment, but come also with a large amount of modifications that only represent corrections of the model and not formal changes in the environment. This reason reinforces the interest to but point-based model difference detection with the land register models ones.
In the previous phase, the land register models of Geneva were selected to be the closest to the LIDAR campaigns. It follows, that these models can be directly used here, as each corresponding to the compared point-based model of this phase.
As a first example, the following case is studied : Rue de Bourgogne and Rue de Lyon. In this case, looking at the following images giving the situation in 2013-04 and 2017-04 through the LIDAR models, that an industrial building was partially demolished.
Structural situation in 2013 (left) and 2017 (right) - Data SITGThe following images show the differences computed on both point-based and line-based models :
Difference models between 2013 and 2017 of LIDAR (left) and INTERLIS (right), with 2013 as primary - Data SITGOne can clearly see that the difference detection on the LIDAR models correctly emphasized a true structural difference between the two times. The situation is much less clear on the land register model. Indeed, as the time separating the two models is quite high, four years in such a case, a large mount of corrections dominates the difference model, leading to a difficult interpretation of the building situation change. The following images give the situation of the land register model in 2013 and 2017 that lead to the difference model above :
Land register situation in 2013 (left) and 2017 (right) - Data SITGLooking at the land register models, one can also see that such large scale modification of the building situation does not appear clearly. Indeed, it takes some effort to detect minor changes on the two models, without leading to a clear indication of the modification. This shows how the LIDAR and its differences can help to detect and analyze differences in complement to the land register itself.
Considering the second example, Avenue de France and Avenue Blanc, the following images give the structural situation of the two times as capture by the LIDAR campaigns :
Structural situation in 2013 (left) and 2017 (right) - Data SITGOne can clearly see the destruction of the two 2013 buildings replaced by a parking lot in 2017. The detected differences on the LIDAR and land register models are presented on the following images :
Difference models between 2013 and 2017 of LIDAR (left) and INTERLIS (right), with 2013 as primary - Data SITGAgain, despite the differences are clearly and correctly highlighted on the LIDAR differences model, the situation remains unclear on the differences model of the land register. Again, one can observe that the land register was highly corrected between the two dates, leading to difficulties to understand the modification and its nature. Looking at the land register respective models :
Land register situation in 2013 (left) and 2017 (right) - Data SITGthe modification appears a bit more clearly. One can clearly see the disappearance of the two 2013 buildings in the land register replaced by a big empty area. Again, the difference detection on LIDAR seems clearly more relevant to detect and analyze structural differences than the land register itself.
An interesting example is provided by the situation just east of the Basilique Notre-Dame. The two situations as captured by the LIDAR campaigns are presented on the following images :
Structural situation in 2013 (left) and 2017 (right) - Data SITGOne can observe two structure mounted on top of two buildings roof in the 2013 situation. These structures are used to ease the work that has to be performed on the roofs. These structures are no more present in the 2017 situation. The following images give the difference detection models for the LIDAR and land register :
Difference models between 2013 and 2017 of LIDAR (left) and INTERLIS (right), with 2013 as primary - Data SITGIn such a case, as the structural modification between 2013 and 2017 occurs on top of the buildings, their footprints is not affected and the differences have no chance to appear in the land register models, even looking at them individually as in the following images :
Land register situation in 2013 (left) and 2017 (right) - Data SITGThis is another example where the LIDAR difference detection lead to more and clearer information on the structural modification that appear on Geneva between the two times.
"},{"location":"TASK-DIFF/#conclusion-third-phase","title":"Conclusion : Third Phase","text":"The main element of this third phase conclusion is that difference detection on point-based models is less straightforward than for other models. Indeed, applied naively, the algorithm is dominated by the sampling disparities of the compared models. This illustrate that point-based models, being a close mirror of the true territory state, have a large information density that is more difficult to reach, especially from their evolution point of view.
Nevertheless, we showed that the algorithm can be adapted, with relatively simple adjustments, to perform well on point-based models difference detection problem. The implemented algorithm is able to track and represent the differences appearing between the models in a useful and comprehensive way for users. The proposed example showed that the differences models are able to guide the user toward interesting structural changes in the territory, with a clear view of the third dimension.
Of course, the highlighted differences in point-based models are more complex and required a trained user that is able to interpret correctly the detail of the highlighted part of the model. The trees are a good example. As the tree re-grow each year, they will always appear as a differences in the compared models. A user only interested in building changes has to be aware of that and be able to separate the relevant differences from the others.
Following the comparison between LIDAR and land register (INTERLIS) differences models, a very surprising conclusion appear. In the first place, one could stand that land register is the proper way of detected changes that can be then analyzed more in detail in point-based differences models. In turns out that to opposite is true. Several reason explain this surprising situation.
In the first place, LIDAR are available only with large temporal gaps between them, at least two/three years. This allows the land register models to be filled with large amount of updates and correction, leading the differences model on this temporal gap to be filled with much more than structural modification. In addition, the LIDAR models come with the third dimension where the land register models are flat. The third dimension comes with large amount of differences that can not be seen in the land register.
To some extend, the land register, and its evolution, is the reflect of the way the territory is surveyed, not the reflect of the formal territory evolution. In the opposite, as LIDAR models are a structural snapshot of a territory situation, the analyze of their differences across the time lead to a better tracking of the formal modification of the real world.
"},{"location":"TASK-DIFF/#conclusion","title":"Conclusion","text":""},{"location":"TASK-DIFF/#first-phase","title":"First Phase","text":"In the first phase, the difference detection algorithm was implemented for vector models and tested using synthetic differences on selected models. The results showed the interest of the obtained differences models to emphasize evolution of models from both user and process points of view. It was demonstrated that the information between models exists and can be extracted and represented in a relevant way for both users and processes.
"},{"location":"TASK-DIFF/#second-phase","title":"Second Phase","text":"In the second phase, the difference detection algorithm was tested on the Swiss land register models on which the results obtained during the first phase were confirmed. The differences models are able to provide both user and process a clear and understandable view of the modification brought to the models.
In addition, through the short and long-term perspectives, it was possible to demonstrate how the difference detection algorithm is able to provide different points of view on the model evolution. From a short-term perspective, the differences models are able to provide a clear and individual view of the modification while the long-term perspective allows to see the large scale evolution and transformation of the models. It follows that the difference models can be used as a tool for various actors using or working with the land register models.
"},{"location":"TASK-DIFF/#third-phase","title":"Third Phase","text":"In the third phase, the difference detection algorithm, developed on vector models, was applied on point-based models, showing that a direct application on these models lead to the same issue as the logical operators : the differences models are dominated by sampling disparities, making them complicated to read. The solution of scale delay brought to the algorithm allowed to produce much clearer differences models for point-based data, allowing to generalize the difference detection on any models.
In addition to these results, the comparison of difference models on land register and on their corresponding LIDAR point-based models showed an interesting result : for structural changes, the point-based models lead to much more interesting results through the highlighted differences. Indeed, as land register models, considered long term perspective, are dominated by a large amount of corrections and adjustments in addition to territory evolution updates, making the structural changes not easy to detect and understand. The differences models are more clear with point-based models form this point of view.
In addition, as point-based models, such as LIDAR, come with the third dimension, a large amount of structural differences can only be seen through such data as many structural changes are made along the third dimension. It then follows that difference detection applied to point-based models offers a very interesting point of view for the survey of territory structural changes.
"},{"location":"TASK-DIFF/#synthesis","title":"Synthesis","text":"As a synthesis, it is clear that models are carrying a large amount of richness themselves, that is already a challenge to exploit, but it is also clear that a large amount of information can be found between the versions of the models. The difference detection algorithm brings a first tool that demonstrate the ability to reach and start to exploit these informations.
More than the content of the models itself, the understanding of the evolution of this content is a major topic especially in the field of geodata as they represent, transcript, the evolution of the surveyed territory. It then appears clear that being able to reach and exploit the information contained in-between the models is a major advantage as it allows understanding what are these models, that is four dimensional objects.
"},{"location":"TASK-DIFF/#perspectives","title":"Perspectives","text":"Many perspectives are opened following the implementation and analysis of the difference detection. Several perspectives, mostly technical, are presented here as a final section.
In the first place, as raster are entering the set of data that can be injected in the platform, evolution of the difference detection could be applied to the platform, taking advantage of the evolution of machine learning. The possibility of detected differences in images could lead to very interesting perspective through the data communication features of the platform.
Another perspective could be to allow the platform to separate the data into formal layers, the separation being only currently ensure by type and times. Splitting data into layers would allow applying difference detection in a much more controlled manner, leading to difference models focused on very specific elements of the model temporal evolution.
The addition of layer could also be the starting point to the notion of data convolution micro language. Currently, data communication and difference detection only apply through the specification of two different and parallel navigation time. The users, or processes, have to specify each of the two time position in order to obtain the mixed of differences models they need.
An interesting evolution would be to replace these two navigation time by a small and simple micro language allowing the user to compare more than two times in a more complex manner. This could also benefit from data separation through layer. Such micro language could allow to compare two, three or more models, or layers, and would also open the access the mixed models of differences models such as comparing the difference detection between point-based and vector-based models, which would then be a comparison of a comparison.
"},{"location":"TASK-DIFF/#reproduction-resources","title":"Reproduction Resources","text":"To reproduce the presented experiments, the STDL 4D framework has to be used and can be found here :
You can follow the instructions on the README to both compile and use the framework.
Only part of the considered datasets are publicly available. For the OpenStreetMap datasets, you can download them from the following source :
For the Swiss 3D buildings model, you can contact swisstopo :
For the land register datasets of Geneva and Thurgau, you can contact the SITG and the Thurgau Kanton :
INTERLIS land register, Thurgau Kanton
INTERLIS land register, SITG (Geneva)
The point-based models of Geneva can be downloaded from the SITG online extractor :
To extract and convert the data from planimetric shapefiles, the following code is used :
where the README gives all the information needed. In case of shapefile containing 3D models, please ask the STDL for advice and tools.
To extract and convert the data from INTERLIS and LAS, the following codes are used :
INTERLIS to UV3 (dalai-suite), STDL/EPFL
LAS to UV3 (dalai-suite), STDL/EPFL
where the README gives all the information needed.
For the 3D geographical coordinates conversion and heights restoration, we used two STDL internal tools. You can contact the STDL to obtain the tools and support in this direction :
ptolemee-suite : 3D coordinate conversion tool (EPSG:2056 to WGS84)
height-from-geotiff : Restoring geographical heights using topographic GeoTIFF (SRTM)
You can contact STDL for any question regarding the reproduction of the presented results.
"},{"location":"TASK-DIFF/#auxiliary-developments-corrections","title":"Auxiliary Developments & Corrections","text":"In addition to the main developments made, some additional scripts and other corrections have been made to solve auxiliary problems or to improve the code according to the developed features during this task. The auxiliary developments are summarized here :
Correction of socket read function to improve server-client connectivity.
Creation of scripts that allows to insert synthetic modifications (random displacements on the vertex coordinates) on UV3 models.
Creation of a script to convert CSV export from shapefile to UV3 format. The script code is available here.
Adding temporary addresses (space-time index) exportation in platform 3D interface.
Correction of the cell enumeration process in platform 3D interface (wrong depth limit implementation).
Creation of a script allowing segmenting UV3 model according to geographical bounding box.
Creation of C codes to perform statistical analysis of the point, line and triangle-based models : computation of edge size and nearest neighbor distributions.
Creation of a C code allowing enumerating non-empty cell index over the Switzerland models injected in the platform.
Creation of a C code allowing to automate the difference detection based on an index list and by searching in the data queried from the platform.
Developments of various scripts for plots and figures creations.
[1] REFRAME, SwissTopo, https://www.swisstopo.admin.ch/de/karten-daten-online/calculation-services.html
"},{"location":"TASK-IDET/","title":"Object Detection Framework","text":"Alessandro Cerioni, Etat de Geneve - Cl\u00e9mence Herny, Exolabs - Adrian F. Meyer, FHNW - Gwena\u00eblle Salamin, Exolabs
Published on November 22, 2021 Updated on December 12, 2023
Abstract: The STDL develops a framework allowing users to train and use deep learning models to detect objects from aerial images. While relying on a general purpose third-party open source library, the STDL's framework implements an opinionated workflow, targeting georeferenced aerial images and labels. After a brief introduction to object detection, this article provides detailed information about this framework. References to successful applications are provided along with concluding remarks.
"},{"location":"TASK-IDET/#introduction","title":"Introduction","text":"Object detection is a computer vision task which aims at detecting instances of objects of some target classes (e.g. buildings, swimming pools, solar panels, ...) in digital images and videos.
According to the commonly adopted terminology, a distinction is made between the following tasks:
This distinction is well illustrated by the bottom half of the following image:
Object Detection vs Instance Segmentation. Image credit: Waleed Abdulla.
Significant progress has been made over the past decades in the domain of object detection and instance segmentation (see e.g. this review paper). Applications of object detection methods are today popular also in consumer products: for instance, some cars are already capable of detecting and reading speed limit signs; social media applications integrate photo and video effects based on face and pose detection. All these applications usually rely on deep learning methods, which are the subset of machine learning methods leveraging deep neural networks. While referring the reader to other sources for further information on these methods (see e.g. these lecture notes), we wish to highlight a key point in all these learning-based approaches: no rigid, static, human-engineered rule is given to the machine to accomplish the task. Instead, the machine is provided with a collection of input-output pairs, where the output represents the outcome of a properly solved task. As far as object detection is concerned, we provide deep learning algorithms with a set of images accompanied by reference annotations (\"ground truth labels\"), which the machine is expected to reproduce. Things become particularly interesting when the machine learns how to generate acceptable detections/segmentation on previously unseen images; such a crucial ability is referred to as \"generalization\".
A generic framework is being developed within the STDL, allowing the usage of state-of-the-art machine learning methods to detect objects from aerial images. Among other possible applications, such framework allows one to leverage aerial images to provide valuable hints towards the update of cadastral information.
At its core, the STDL's object detection framework is powered by Detectron2, a Python library developed by the Facebook Artificial Intelligence Research group and released under the Apache 2.0 open-source license. Detectron2 features built-in methods to train models performing various tasks, object detection and instance segmentation to name a few. Our framework includes pre- and post-processing scripts allowing to use Detectron2 with georeferenced images and labels.
The workflow goes through the steps described here-below.
"},{"location":"TASK-IDET/#workflow","title":"Workflow","text":""},{"location":"TASK-IDET/#1-tileset-generation","title":"1. Tileset generation","text":"Typically, aerial coverages are made accessible through web services, publicly or privately. While making opaque to the user the server-side tiling and file-based structure, these web services can efficiently generate raster images on-demand depending on the parameters sent by the requesting client. These parameters include:
GIS tools such as QGIS and ArcGIS Pro as well as Web Applications powered by Web Mapping clients such as Leaflet, OpenLayers, MapLibre GL, etc. actually rely on this mechanism to let end users navigating through tons of bits in quite a seamless, fluent, reactive way. As a matter of fact, zooming in and out in such 2D scenes amounts to fetching and visualizing different images depending on the zoom level, instead of \"simply\" increasing/decreasing the size of the various image pixels as displayed on screen.
Through this 1st step, several requests are issued against a web service in order to generate a consistent set of tiled images (\"tileset\") covering the area of interest (AoI), namely the area over which the user intends to train a detection model and/or to perform the actual object detection. Connectors for the following web services have been developed so far:
Except when using the XYZ connector, our framework is agnostic with respect to the tiling scheme. The user has to just provide an input file compliant with some requirements. We refer the user to the code documentation for detailed information.
Concerning the AoI and its extension, the following scenarios are supported:
In the case of scenarios no. 1 and 3, ground truth labels are necessary. Provided by the user as polygons in some geographic coordinate system, these polygons are then mapped onto each image coordinate system - the latter ranging from (0, 0)
to (<image width in pixels> - 1, <image height in pixels> - 1)
- in order to generate ground truth segmented images. Such a mapping is achieved by applying an affine transformation and encoded using the COCO format, which is natively supported by Detectron2. Labels can optionally be provided in the case of inference-only scenarios as well, should the user be willing to check non-ground truth labels against detections and vice versa.
As mentioned above, machine learning models are valuable as far as they do not \"overfit\" to the training data; in other words, as far as they generalize well to new, unseen data. One of the techniques which are commonly used in order to prevent machine learning algorithms from overfitting is the \"train, validation, test split\". While referring the interested reader to this Wikipedia page for further details, let us note that a 70%-15%-15% split is currently hard-coded in our framework.
Various independent COCO tilesets are generated, depending on the scenario:
in training-only scenarios, three COCO tilesets are generated:
trn
;val
);tst
).For the time being, training, validation and test tiles are chosen exclusively among the tiles within the AoI which include one or more ground truth labels.
In inference-only scenarios, a single COCO tileset labeled as \"other\" is generated (oth
is the abbreviation we use).
In training + inference scenarios, the full collection of tilesets is generated: trn
, val
, tst
, oth
.
The 1st step provides a collection of tiled images, sharing the same size and resolution, plus the corresponding COCO files (trn
+ val
+ tst
and/or oth
depending on the scenario).
The 2nd step performs the actual training of a predictive model, iterating over the training dataset. As already mentioned, we delegate this crucial part of the process to the Detectron2 library; support for other libraries may be implemented in the future, if suitable. Detectron2 comes with a large collection of pre-trained models tailored for various tasks. In particular, as far as instance segmentation is concerned, pre-trained models can be selected from this list.
In our workflow, we setup Detectron2 in such a way that inference is made on the validation dataset every N training iterations, being N a user-defined parameter. By doing this, we can monitor both the training and validation losses all along the iterative learning and decide when to stop. Typically, learning is stopped when the validation loss reaches a minimum (see e.g. this article for further information on early stopping). As training and validation loss curves are somewhat noisy, these curves can be smoothed on the fly in order to reveal steady trends. Other metrics may be tracked and used to decide when to stop. For now, within our framework (early) stopping can be done manually and is left to the user; it will be made automatic in the future, following some suitable criterion.
Training and validation losses in a sample object detection task. In this case, one could stop the training after the first ~1400 iterations. Note that, in this example, the validation loss is evaluated every 200 iterations.
Let us note that the learning process is regulated by several parameters, which are usually called \"hyperparameters\" in order to distinguish them from the learned \"parameters\", the latter being - in our deep learning context - the coefficients of the many neurons populating the various layers of the deep neural network. In successful scenarios, the iterative learning process does actually lower the validation loss until a minimum value is reached. Yet, such a minimum is likely to be a \"local\" one (i.e. relative to a given set of hyperparameters); indeed, the global minimum may be found along a different trajectory, corresponding to a different set of hyperparameters. Actually, even finding the global minimum of the validation loss could be not as relevant as checking how different models compare with each other on the common ground of more meaningful \"business metrics\". Our code does not implement any automatic hyper-parameter tuning, it just outputs business metrics, as explained here-below.
"},{"location":"TASK-IDET/#3-detection","title":"3. Detection","text":"The model trained at the preceding step can be used to perform the actual object detection or instance segmentation over the various tilesets concerned by a given study:
Depending on the configuration, Detectron2 can perform either object detection and instance segmentation at once, or object detection only. In both cases, every detection is accompanied by the following information:
In the case of object detection only, a bounding box is output as a list of vertices relative to the image coordinate system. In the case of instance segmentation, detections are also output as binary masks, one per input tile/image, in which pixels belonging to target objects are encoded with ones whereas background pixels are encoded with zeros. Our code can then generate a vector layer out of these binary masks. Optionally, polygons can be simplified using the Ramer-Douglas-Peucker algorithm (RDP).
"},{"location":"TASK-IDET/#4-assessment","title":"4. Assessment","text":"Results are assessed by matching detections against ground truth labels. For a detection and a ground truth label to be matched with each other, the intersection over union (IoU) between the two polygons must be greater than a user-defined threshold (default value = 0.25). Let us remind that the intersection over union is defined as follows:
\\[\\mbox{IoU} = \\frac{\\mbox{Area}({\\mbox{label} \\cap \\mbox{detection}})}{\\mbox{Area}({\\mbox{label} \\cup \\mbox{detection}})}\\]If multiple detections and ground truth labels intersect, the detection which exhibits the largest IoU is tagged as true positive, the other detections as false positives.
Detections are then tagged according to the following criteria:
The reader may wonder why there are no true negatives (TN) in the list. Actually, all the pixels which are not associated with any target class can be considered as \"true negatives\". Yet, as far as object detection and instance segmentation are concerned, we do not need to group leftover pixels into \"dummy objects\". Should the user need to model such a scenario, one idea might consist in introducing a dummy class (e.g. \"background\" or \"other\").
Metrics are calculated on a class-by-class basis, in order to take into account possible imbalances between classes. Detections in the wrong class are classified as FN, i.e. missed object, or false positive (FP), i.e. detections not matching any object, depending on the target class we are making the computation for.
Precision and recall by class are used here:
While referring the reader to this page for further information on these metrics, let us note that:
Each metric can be aggregated to keep only one value per dataset, rather than one per class.
As already mentioned, each detection is assigned a confidence score, ranging from 0 to 1. By filtering out all the detections exhibiting a score smaller than some cut-off/threshold value, one would end up having more or less detections to compare against ground truth data; the higher the threshold, the smaller the number of detections, the better their quality in terms of the confidence score. By sampling the threshold from a minimum user-defined value to a maximum value (e.g. 0.95) and counting TPs, FPs, FNs at each sampling step, meaningful curves are obtained representing counts and metrics like precision and recall as a function of the threshold. Typically, precision (recall) is monotonically increasing (decreasing) as a function of the threshold. As such, neither the precision nor the recall can be used to determine the optimal value of the threshold, which is why precision and recall are customarily aggregated in order to form a third metric which can be convex if computed as a function of the threshold or, at least, can exhibit local minima. This metric is named \"\\(F_1\\) score\" and is defined as follows:
Different models can then be compared with each other in terms of \\(F_1\\) scores; the best model can be selected as the one exhibiting the maximum \\(F_1\\) over the validation dataset. At last, the test dataset can be used to assess the selected model and provide the end user with an objective measure of its reliability.
Other approaches exist, allowing one to summarize metrics and eventually come up with threshold-independent scores. One of these approaches consist in computing the \"Area Under the ROC curve\" (AUC, cf. this page).
"},{"location":"TASK-IDET/#5-iterate-until-results-are-satisfactory","title":"5. Iterate until results are satisfactory","text":"Several training sessions can be executed, using different values of the various hyperparameters involved in the process. As a matter of fact, reviewing and improving ground truth data is also part of the hyper-parameter tuning (cf. \"From Model-centric to Data-centric Artificial Intelligence''). Keeping track of the above-mentioned metrics across multiple realizations, eventually an optimal model should be found (at least, a local optimum).
The exploration of the hyper-parameter space is a tedious task, which consumes time as well as human and computing resources. It can be performed in a more or less systematic/heuristic way, depending on the experience of the operator as well as on the features offered by the code. Typically, a partial exploration is enough to obtain acceptable results. Within the STDL team, it is customary to first perform some iterations until \"decent scores\" are obtained, then to involve beneficiaries and domain experts in the continuous evaluation and improvement of results, until satisfactory results are obtained. These exchanges between data scientists and domain experts are also key to raise both communities' awareness of the virtues and flaws of machine learning approaches.
"},{"location":"TASK-IDET/#use-cases","title":"Use cases","text":"Here is a list of the successful applications of the framework described in this article:
The STDL's object detection framework is still under development and receives updates as new use cases emerge. The source code can be found here.
"}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Swiss Territorial Data Lab - STDL","text":"The STDL aims to promote collective innovation around the Swiss territory and its digital copy. It mainly explores the possibilities provided by data science to improve official land registering.
A multidisciplinary team composed of cantonal, federal and academic partners is reinforced by engineers specialized in geographical data science to tackle the challenges around the management of territorial data-sets.
The developed STDL platform codes and documentation are published under open licenses to allow partners and Swiss territory management actors to leverage the developed technologies.
"},{"location":"#exploratory-projects","title":"Exploratory Projects","text":"Exploratory projects in the field of the Swiss territorial data are conducted at the demand of institutions or actors of the Swiss territory. The exploratory projects are conducted with the supervision of the principal in order to closely analyze the answers to the specifications along the project. The goal of exploratory project aims to provide proof-of-concept and expertise in the application of technologies to Swiss territorial data.
Detection of occupied and free surfaces on rooftops May 2024Cl\u00e9mence Herny (Exolabs) - Gwena\u00eblle Salamin (Exolabs) - Alessandro Cerioni (\u00c9tat de Gen\u00e8ve) - Roxane Pott (swisstopo) Proposed by the Canton of Geneva- PROJ-ROOFTOPS Free roof surfaces offer great potential for the installation of new infrastructure, such as solar panels and vegetated rooftops. In this project, in collaboration with the Canton of Geneva, we have developed and tested three methods to automatically identify occupied and free surfaces on roofs: (1) classification of roof plane occupancy based on a random forest, (2) segmentation of objects in LiDAR point clouds based on a clustering and (3) segmentation of objects in aerial imagery based on a deep learning. The results are vector layers containing information about surface occupancy. The methods developed on a subset of 122 buildings achieved satisfactory performance. About 85% of the roof planes were correctly classified. The segmentation method was able to detect most of the objects with f1 scores of 0.78 and 0.75 for the LiDAR-based segmentation and the image-based segmentation respectively. The global shape of the occupied surface was more difficult to reproduce with a median intersection over the union of 0.35 and 0.37 respectively. The results of all three methods were considered satisfactory by the experts, with 70% to 95% of the results considered acceptable. Considering the quality of the results and the computational time, only the classification method was selected for application at the cantonal level.
Full article
Automatic Soil Segmentation April 2024Nicolas Beglinger (swisstopo) - Clotilde Marmy (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo) Proposed by the Canton of Fribourg - PROJ-SOILS This project focuses on developing an automated methodology to distinguish areas covered by pedological soil from areas comprised of non-soil. The goal is to generate high-resolution maps (10cm) to aid in the location and assessment of polluted soils. Towards this end, we utilize deep learning models to classify land cover types using raw, raster-based aerial imagery and digital elevation models (DEMs). Specifically, we assess models developed by the Institut National de l\u2019Information G\u00e9ographique et Foresti\u00e8re (IGN), the Haute Ecole d'Ing\u00e9nierie et de Gestion du Canton de Vaud (HEIG-VD), and the Office F\u00e9d\u00e9ral de la Statistique (OFS). The performance of the models is evaluated with the Matthew's correlation coefficient (MCC) and the Intersection over Union (IoU), as well as with qualitatifve assessments conducted by the beneficiaries of the project. In addition to testing pre-existing models, we fine-tuned the model developed by the HEIG-VD on a dataset specifically created for this project. The fine-tuning aimed to optimize the model performance on the specific use-case and to adapt it to the characteristics of the dataset: higher resolution imagery, different vegetation appearances due to seasonal differences, and a unique classification scheme. Fine-tuning with a mixed-resolution dataset improved the model performance of its application on lower-resolution imagery, which is proposed to be a solution to square artefacts that are common in inferences of attention-based models. Reaching an MCC score of 0.983, the findings demonstrate promising performance. The derived model produces satisfactory results, which have to be evaluated in a broader context before being published by the beneficiaries. Lastly, this report sheds light on potential improvements and highlights considerations for future work.
Full article
Cross-generational change detection in classified LiDAR point clouds for a semi-automated quality control April 2024Nicolas M\u00fcnger (Uzufly) - Gwena\u00eblle Salamin (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo) Proposed by the Federal Office of Topography swisstopo - PROJ-QALIDAR The acquisition of LiDAR data has become standard practice at national and cantonal levels during the recent years in Switzerland. In 2024, swisstopo will complete a comprehensive campaign of 6 years covering the whole Swiss territory. The produced point clouds are classified post-acquisition, i.e. each point is attributed to a certain category, such as \"building\" or \"vegetation\". Despite the global control performed by providers, local inconsistencies in the classification persist. To ensure the quality of a Swiss-wide product, extensive time is invested by swisstopo in the control of the classification. This project aims to highlight changes in a new point cloud compared to a previous generation acting as reference. We propose here a method where a common grid is defined for the two generations of point clouds and their information is converted in voxels, summarizing the distribution of classes and comparable one-to-one. This method highlights zones of change by clustering the concerned voxels. Experts of the swisstopo LiDAR team declared themselves satisfied with the precision of the method.
Full article
Automatic detection and observation of mineral extraction sites in Switzerland January 2024Cl\u00e9mence Herny (ExoLabs) - Shanci Li (Uzufly) - Alessandro Cerioni (Etat de Gen\u00e8ve) - Roxane Pott (Swisstopo) Proposed by the Federal Office of Topography swisstopo - TASK-DQRY The study of the evolution of mineral extraction sites (MES) is primordial for the management of mineral resources and the assessment of their environmental impact. In this context, swisstopo has solicited the STDL to automate the vectorisation of MES over the years. This tedious task was previously carried out manually and was not regularly updated. Automatic object detection using a deep learning method was applied to SWISSIMAGE RGB orthophotos with a spatial resolution of 1.6 m px-1. The trained model proved its ability to accurately detect MES, achieving a f1-score of 82%. Detection by inference was performed on images from 1999 to 2021, enabling us to track the evolution of potential MES over several years. Although the results are satisfactory, a careful examination of the detections must be carried out by experts to validate them as true MES. Despite this remaining manual work involved, the process is faster than a full manual vectorisation and can be used in the future to keep MES information up-to-date.
Full article
Dieback of beech trees: methodology for determining the health state of beech trees from airborne images and LiDAR point clouds August 2023Clotilde Marmy (ExoLabs) - Gwena\u00eblle Salamin (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo) Proposed by the Republic and Canton of Jura - PROJ-HETRES Beech trees are sensitive to drought and repeated episodes can cause dieback. This issue affects the Jura forests requiring the development of new tools for forest management. In this project, descriptors for the health state of beech trees were derived from LiDAR point clouds, airborne images and satellite images to train a random forest predicting the health state per tree in a study area (5 km\u00b2) in Ajoie. A map with three classes was produced: healthy, unhealthy, dead. Metrics computed on the test dataset revealed that the model trained with all the descriptors has an overall accuracy up to 0.79, as well as the model trained only with descriptors derived from airborne imagery. When all the descriptors are used, the yearly difference of NDVI between 2018 and 2019, the standard deviation of the blue band, the mean of the NIR band, the mean of the NDVI, the standard deviation of the canopy cover and the LiDAR reflectance appear to be important descriptors.
Full article
Using spatio-temporal neighbor data information to detect changes in land use and land cover April 2023Shanci Li (Uzufly) - Alessandro Cerioni (Canton of Geneva) - Clotilde Marmy (ExoLabs) - Roxane Pott (swisstopo) Proposed by the Swiss Federal Statistical Office - PROJ-LANDSTATS From 2020 on, the Swiss Federal Statistical Office started to update the land use/cover statistics over Switzerland for the fifth time. To help and lessen the heavy workload of the interpretation process, partially or fully automated approaches are being considered. The goal of this project was to evaluate the role of spatio-temporal neighbors in predicting class changes between two periods for each survey sample point. The methodolgy focused on change detection, by finding as many unchanged tiles as possible and miss as few changed tiles as possible. Logistic regression was used to assess the contribution of spatial and temporal neighbors to the change detection. While time deactivation and less-neighbors have a 0.2% decrease on the balanced accuracy, the space deactivation causes 1% decrease. Furthermore, XGBoost, random forest (RF), fully convolutional network (FCN) and recurrent convolutional neural network (RCNN) performance are compared by the means of a custom metric, established with the help of the interpretation team. For the spatial-temporal module, FCN outperforms all the models with a value of 0.259 for the custom metric, whereas the logistic regression indicates a custom metrics of 0.249. Then, FCN and RF are tested to combine the best performing model with the model trained by OFS on image data only. When using temporal-spatial neighors and image data as inputs, the final integration module achieves 0.438 in custom metric, against 0.374 when only the the image data is used.It was conclude that temporal-spatial neighbors showed that they could light the process of tile interpretation.
Full article
Classification of road surfaces March 2023Gwena\u00eblle Salamin (swisstopo) - Cl\u00e9mence Herny (Exolabs) - Roxane Pott (swisstopo) - Alessandro Cerioni (Canton of Geneva) Proposed by the Federal Office of Topography swisstopo - PROJ-ROADSURF The Swiss road network extends over 83\u2019274 km. Information about the type of road surface is useful not only for the Swiss Federal Roads Office and engineering companies, but also for cyclists and hikers. Currently, the data creation and update is entirely done manually at the Swiss Federal Office of Topography. This is a time-consuming and methodical task, potentially suitable to automation by data science methods. The goal of this project is classifying Swiss roads according to their surface type, natural or artificial. We first searched for statistical differences between these two classes, in order to then perform supervised classification based on machine-learning methods. As we could not find any discriminant feature, we used deep learning methods.
Full article
Tree Detection from Point Clouds for the Canton of Geneva March 2022Alessandro Cerioni (Canton of Geneva) - Flann Chambers (University of Geneva) - Gilles Gay des Combes (CJBG - City of Geneva and University of Geneva) - Adrian Meyer (FHNW) - Roxane Pott (swisstopo) Proposed by the Canton of Geneva - PROJ-TREEDET Trees are essential assets, in urban context among others. Since several years, the Canton of Geneva maintains a digital inventory of isolated (or \"urban\") trees. This project aimed at designing a methodology to automatically update Geneva's tree inventory, using high-density LiDAR data and off-the-shelf software. Eventually, only the sub-task of detecting and geolocating trees was explored. Comparisons against ground truth data show that the task can be more or less tricky depending on how sparse or dense trees are. In mixed contexts, we managed to reach an accuracy of around 60%, which unfortunately is not high enough to foresee a fully unsupervised process. Still, as discussed in the concluding section there may be room for improvement.
Full article
Detection of thermal panels on canton territory to follow renewable energy deployment February 2022Nils Hamel (UNIGE) - Huriel Reichel (FHNW) Project in collaboration with Geneva and Neuch\u00e2tel States - TASK-TPNL Deployment of renewable energy becomes a major stake in front of our societies challenges. This imposes authorities and domain expert to promote and to demonstrate the deployment of such energetic solutions. In case of thermal panels, politics ask domain expert to certify, along the year, of the amount of deployed surface. In front of such challenge, this project aims to determine to which extent data science can ease the survey of thermal panel installations deployment and how the work of domain expert can be eased.
Full article
Automatic detection of quarries and the lithology below them in Switzerland January 2022Huriel Reichel (FHNW) - Nils Hamel (UNIGE) Proposed by the Federal Office of Topography swisstopo - TASK-DQRY Mining is an important economic activity in Switzerland and therefore it is monitored by the Confederation through swisstopo. To this points, the identification of quarries has been mode manually, which even being done with very high quality, unfortunately does not follow the constant changing and updating pattern of these features. For this reason, swisstopo contacted the STDL to automatically detect quarries through the whole country. The training was done using SWISSIMAGE with 10cm spatial resolution and the Deep Learning Framework from the STDL. Moreover there were two iteration steps with the domain expert which included the manual correction of detection for new training. Interaction with the domain expert was very relevant for final results and summing to his appreciation, an f1-score of 85% was obtained in the end, which due to peculiar characteristics of quarries can be considered an optimal result.
Full article
Updating the \u00abCultivable Area\u00bb Layer of the Agricultural Office, Canton of Thurgau June 2021Adrian Meyer (FHNW) - Pascal Salath\u00e9 (FHNW) Proposed by the Canton of Thurgau - PROJ-TGLN The Cultivable agricultural area layer (\"LN, Landwirtschaftliche Nutzfl\u00e4che\") is a GIS vector product maintained by the cantonal agricultural offices and serves as the key calculation index for the receipt of direct subsidy contributions to farms. The canton of Thurgau requested a spatial vector layer indicating locations and area consumption extent of the largest silage bale deposits intersecting with the known LN area, since areas used for silage bale storage are not eligible for subsidies. Having detections of such objects readily available greatly reduces the workload of the responsible official by directing the monitoring process to the relevant hotspots. Ultimately public economical damage can be prevented which would result from the payout of unjustified subsidy contributions.
Full article
Swimming Pool Detection for the Canton of Thurgau April 2021Adrian Meyer (FHNW) - Alessandro Cerioni (Canton of Geneva) Proposed by the Canton of Thurgau - PROJ-TGPOOL The Canton of Thurgau entrusted the STDL with the task of producing swimming pool detections over the cantonal area. Specifically interesting was to leverage the ground truth annotation data from the Canton of Geneva to generate a predictive model in Thurgau while using the publicly available SWISSIMAGE aerial imagery datasets provided by swisstopo. The STDL object detection framework produced highly accurate predictions of swimming pools in Thurgau and thereby proved transferability from one canton to another without having to manually redigitize annotations. These promising detections showcase the highly useful potential of this approach by greatly reducing the need of repetitive manual labour.
Full article
Completion of the federal register of buildings and dwellings February 2021Nils Hamel (UNIGE) - Huriel Reichel (swisstopo) Proposed by the Federal Statistical Office - TASK-REGBL The Swiss Federal Statistical Office is in charge of the national Register of of Buildings and Dwellings (RBD) which keep tracks of every existing building in Switzerland. Currently, the register is being completed with building in addition to regular dwellings to offer a reliable and official source of information. The completion of the register introduced issue dues to missing information and their difficulty to be collected. The construction years of the building is one missing information for large amount of register entries. The Statistical Office mandated the STDL to investigate on the possibility to use the Swiss National Maps to extract this missing information using an automated process. A research was conducted in this direction with the development of a proof-of-concept and a reliable methodology to assess the obtained results.
Full article
Swimming Pool Detection from Aerial Images over the Canton of Geneva January 2021Alessandro Cerioni (Canton of Geneva) - Adrian Meyer (FHNW) Proposed by the Canton of Geneva - PROJ-GEPOOL Object detection is one of the computer vision tasks which can benefit from Deep Learning methods. The STDL team managed to leverage state-of-art methods and already existing open datasets to first build a swimming pool detector, then to use it to potentially detect unregistered swimming pools over the Canton of Geneva. Despite the success of our approach, we will argue that domain expertise still remains key to post-process detections in order to tell objects which are subject to registration from those which aren't. Pairing semi-automatic Deep Learning methods with domain expertise turns out to pave the way to novel workflows allowing administrations to keep cadastral information up to date.
Full article
Difference models applied to the land register November 2020Nils Hamel (UNIGE) - Huriel Reichel (swisstopo) Project scheduled in the STDL research roadmap - TASK-DTRK Being able to track modifications in the evolution of geographical datasets is one important aspect in territory management, as a large amount of information can be extracted out of differences models. Differences detection can also be a tool used to assess the evolution of a geographical model through time. In this research project, we apply differences detection on INTERLIS models of the official Swiss land registers in order to emphasize and follow its evolution and to demonstrate that change in reference frames can be detected and assessed.
Full article
"},{"location":"#research-developments","title":"Research Developments","text":"Research developments are conducted aside of the research projects to provide a framework of tools and expertise around the Swiss territorial data and related technologies. The research developments are conducted according to the research plan established by the data scientists and validated by the steering committee.
OBJECT DETECTION FRAMEWORK November 2021**Alessandro Cerioni (Canton of Geneva) - Cl\u00e9mence Herny (Exolabs) - Adrian Meyer (FHNW) - Gwena\u00eblle Salamin (Exolabs) ** Project scheduled in the STDL research roadmap - TASK-IDET This strategic component of the STDL consists of the automated analysis of geospatial images using deep learning while providing practical applications for specific use cases. The overall goal is the extraction of vectorized semantic information from remote sensing data. The involved case studies revolve around concrete object detection use cases deploying modern machine learning methods and utilizing a multitude of available datasets. The goal is to arrive at a prototypical platform for object detection which is highly useful not only for cadastre specialists and authorities but also for stakeholders at various contact points in society.
Full article
AUTOMATIC DETECTION OF CHANGES IN THE ENVIRONMENT November 2020Nils Hamel (UNIGE) Project scheduled in the STDL research roadmap - TASK-DIFF Developed at EPFL with the collaboration of Cadastre Suisse to handle large scale geographical models of different nature, the STDL 4D platform offers a robust and efficient indexation methodology allowing to manage storage and access to large-scale models. In addition to spatial indexation, the platform also includes time as part of the indexation, allowing any area to be described by models in both spatial and temporal dimensions. In this development project, the notion of model temporal derivative is explored and proof-of-concepts are implemented in the platform. The goal is to demonstrate that, in addition to their formal content, models coming with different temporal versions can be derived along the time dimension to compute difference models. Such proof-of-concept is developed for both point cloud and vectorial models, demonstrating that the indexation formalism of the platform is able to ease considerably the computation of difference models. This research project demonstrates that the time dimension can be fully exploited in order to access the data it holds.
Full article
"},{"location":"#steering-committee","title":"Steering Committee","text":"The steering committee of the Swiss Territorial Data Lab is composed of Swiss public administrations bringing their expertise and competences to guide the conducted projects and developments.
Members of the STDL steering committee"},{"location":"#submitting-a-project","title":"Submitting a project","text":"To submit a project to the STDL, simply fill this form. To contact the STDL, please write an email to info@stdl.ch. We will reply as soon as possible!
"},{"location":"PROJ-DQRY/","title":"Automatic Detection of Quarries and the Lithology below them in Switzerland","text":"Huriel Reichel (FHNW) - Nils Hamel (UNIGE) Supervision : Nils Hamel (UNIGE) - Raphael Rollier (swisstopo)
Proposed by swisstopo - PROJ-DQRY June 2021 to January 2022 - Published on January 30th, 2022
Abstract: Mining is an important economic activity in Switzerland and therefore it is monitored by the Confederation through swisstopo. To this points, the identification of quarries has been made manually, which even being done with very high quality, unfortunately does not follow the constant changing and updating pattern of these features. For this reason, swisstopo contacted the STDL to automatically detect quarries through the whole country. The training was done using SWISSIMAGE with 10cm spatial resolution and the Deep Learning Framework from the STDL. Moreover there were two iteration steps with the domain expert which included the manual correction of detection for new training. Interaction with the domain expert was very relevant for final results and summing to his appreciation, an F1 Score of 85% was obtained in the end, which due to peculiar characteristics of quarries can be considered an optimal result.
"},{"location":"PROJ-DQRY/#1-introduction","title":"1 - Introduction","text":"Mining is an important economic activity worldwide and this is also the case in Switzerland. The Confederation topographic office (swisstopo) is responsible for monitoring the presence of quarries and also the materials being explored. This is extremely relevant for planning the demand and shortage of explored materials and also their transportation through the country. As this of federal importance the mapping of these features is already done. Although this work is very detailed and accurate, quarries have a very characteristical updating pattern. Quarries can appear and disappear in a matter of a few months, in especial when they are relatively small, as in Switzerland. Therefore it is of interest of swisstopo to make an automatic detection of quarries in a way that it is also reproducible in time.
A strategy offen offered by the Swiss Territorial Data Lab is the automatic detection of several objects in aerial imagery through deep learning, following our Object Detection Framework. In this case it is fully applicable as quarries in Switzerland are relatively small, so high resolution imagery is required, which is something our Neural Network has proven to tackle with mastery in past projects. This high resolution imagery is also reachable through SWISSIMAGE, aerial images from swisstopo that cover almost the whole country with a 10cm pixel size (GSD).
Nevertheless, in order to train our neural network, and as it's usually the case in deep learning, several labelled images are required. These data work as ground truth so that the neural network \"learns\" what's the object to be detected and which should not. For this purpose, the work from the topographic landscape model (TLM) team of swisstopo has been of extreme importance. Among different surface features, quarries have been mapped all over Switzerland with a highly detailed scale.
Although the high quality and precision of the labels from TLM, quarries are constantly changing, appearing and disappearing, and therefore the labels are not always synchronized with the images from SWISSIMAGE. This lack of of synchronization between these sets of data can be seen in Figure 1, where in the left one has the year of mapping of TLM and on the right the year of the SWISSIMAGE flights.
Figure 1 : Comparison of TLM (left) and SWISSIMAGE (right) temporality.For this purpose, a two-times interaction was necessary with the domain expert. In order to have a ground truth that was fully synchronized with SWISSIMAGE we required two stages of training : one making use of the TLM data and a second one with a manual correction of the predicted labels from the first interaction. It is of crucial importance to state that this correction needed to be made by the domain expert so that he could carefully check each detection in pre-defined tiles. With that in hands, we could go further with a more trustworthy training.
As stated, it is of interest of swisstopo to also identify the material being explored by every quarry. For that purpose, it was recommended the usage of the GeoCover dataset from swisstopo as well. This dataset a vector layer of the the geological cover of the whole Switzerland, which challenged us to cross the detector predictions with such vector information.
In summary, the challenge of the STDL was to investigate to which extent is it possible to automatically detect quarries using Deep Learning considering their high update ratio using aerial imagery.
"},{"location":"PROJ-DQRY/#2-methodology","title":"2 - Methodology","text":"First of all the \"area of interest\" must be identified. This is where the detection and training took place. In this case, a polygon of the whole Switzerland was used. After that, the area of interest is divided in several tiles of fixed size. This is then defining the slicing of SWISSIMAGE (given as WMS). For this study, tiles of different sizes were tested, being 500x500m tiles defined for final usage. Following the resolution of the images must be defined, which, again, after several tests, was defined as 512x512 pixels.
For validation purposes the data is then split into Training, Validation and Testing. The training data-set is used inside the network for its learning; the validation is completely apart from training and used only to check results and testing is used for cross-validation. 70% of the data was used for training, 15% for validation and 15% for testing.
To what concerns the labels, the ones from TLM were manually checked so that a group of approximately 250 labels with full synchronization with the SWISSIMAGE were found and recorded. Following, the first row training passes through the same framework from former STDL projects. We make use of a predictive Recursive-Convolutional Neural Network with ResNet-50 backbone provided by Detectron2. A deeper explanation of the network functionality can be found here and here.
Even with different parameters set, it was observed that predictions were including too much false positives, which were mainly made of snow. Most probably the reflectance of snow is similar to the one of quarries and this needed a special treatment. For this purpose, a filtering of the results was used. First of all the features were filtered based on the score values (0.9) and then by elevation, using the SRTM digital elevation model. As snow usually does not precipitate below around 1155 m, this was used as threshold. Finally an area threshold is also passed (using smallest predictions area) and predictions are merged. A more detailed description of how to operate this first filter can be seen here.
Once several tests were performed, the new predictions were sent back to the domain expert for detailed revision with a rigid protocol. This included the removal of false positives and the inclusion of false negatives, mainly. This was performed by 4 different experts from swisstopo in 4 regions with the same amount of tiles to be analyzed. It is important to the state again the importance of domain expertise in this step, as a very careful and manual evaluation of what is and what is not a quarry must be made.
Once the predictions were corrected, a new session of training was performed using different parameters. Once again, the same resolution and tile size were used as in the first iteration (512x512m tiles with 512x512 pixels of resolution), although this time a new filtering was developed. Very similar to the first one, but in a different order, allowing more aesthetical predictions in the end, something the domain expert was also carrying about.
This procedure is summarized in figure 2.
Figure 2 : Methodology applied for the detection of quarries and new training sessions.In the end, in order to also include the geological information of the detected quarries, a third layer resulting of the intersection of both the predictions and the GeoCover labels is created. This was done in a way that the final user can click to obtain both information on the quarry (when not a pure prediction) and the information of the geology/lithology on this part of the quarry. As a result, each resulting intersection poylgon contains both information from quarry and GeoCover.
In order to evaluate the obtained results, the F1 Score was computed and also the final predictions were compared to the corrected labels from the domain expert side. This was done visually by acquiring the centroid of each quarry detected and by a heat-map, allowing one to detect the spatial pattern of detections. The heat-map was computed using 10'000 m radius and a 100 m pixel size.
"},{"location":"PROJ-DQRY/#3-results-discussion","title":"3 - Results & Discussion","text":"In the first iteration, when the neural was trained with some labels of the TLM vector data, an optimal F1 score of approximately 0.78 was obtained. The figure 3 shows the behavior of the precision, recall and F1 score for the final model selected.
Figure 3 : Precision, Recall and F1 score of the first iteration (using TLM data).Given the predictions resulting from the correction by the domain experts, there was an outstanding improve in the F1 score obtained, which was of approximately 0.85 in its optimal, as seen in figure 4. A total of 1265 were found in Switzerland after filtering.
Figure 4 : Precision, Recall and F1 score of the second iteration (using data corrected by the domain expert).Figure 5 demonstrates some examples of detected quarries and this one can have some notion of the quality of the shape of the detections and how they mark the real-world quarries. Examples of false positives and false negatives, unfortunately still present in the detections are also shown. This is also an interesting demonstration of how some objects that are very similar to quarries, in the point of view of non-experts and how they may influence the results. These examples of errors are also an interesting indication of the importance of domain expertise in evaluating machine made results.
Figure 5 : Examples of detected quarries, with true positive, false negative and false positive.To check on the validity of the new predictions generated, the centroid of them was plot along the centroid of the corrected labels, so one could check the spatial pattern of them and this way evaluate if they were respecting the same behavior. Figure 6 shows this plot.
Figure 6 : Disposition of the centroids of assessed predictions and final predictions.One can see that despite some slight differences, the overall pattern is very similar among the disposition of the predictions. A very similar result can be seen with the computed heat-map of these points, seen in figure 7.
Figure 7 : Heatmap of assessed predictions and final predictions.There is a small area to the west of the country where there were less detections than desired and in general there were more predictions than before. The objective of the heat-map is more to give a general view of the results than giving an exact comparison, as a point is created for every feature and the new filter used tended to smooth the results and join many features into a single one too.
At the end the results were also intersected with GeoCover, which provide the Swiss soil detailed lithology, and an example of the results can be seen below using the QGIS Software.
Figure 8 : Intersection of predictions with GeoCover seen in QGIS.Finally and most important, the domain expert was highly satisfied with this work, due to the support it can give to swisstopo and the TLM team in mapping the future quarries. The domain expert also demonstrated interest in pursuing the work by investigating the temporal pattern of quarries and detecting the volume of material in each quarry.
"},{"location":"PROJ-DQRY/#4-conclusion","title":"4 - Conclusion","text":"Through this collaboration with swisstopo, we managed to demonstrate that data science is able to provide relevant and efficient tool to ease complex and time-consuming task. With the produced inventory of the quarries on the whole Swiss territory, we were able to provide a quasi-exhaustive view of the situation to the domain expert, leading him to have a better view of the exploitation sites.
This is of importance and a major step forward compared to the previous situation. Indeed, before this project, the only solution available to the domain expert was to gather all the federal and cantonal data, through non-standardized and time-consuming process, to hope having a beginning of an inventory, with temporality issues. With the developed prototype, within hours, the entire SWISSIMAGE data-set can be processed and turn into a full scale inventory, guiding the domain expert directly toward its interests.
The resulting geographical layer can then be seen as the result of this demonstrator, able to turn the aerial images into a simple polygonal layer representing the quarries, with little false positive and false negative, providing the required view for the domain expert understanding of the Swiss situation. With such a result, it is possible to convolve it with all the other existing data, with the GeoCover in the first place. This lithology model of the Swiss soil can be intersected with the produced quarries layer in order to create a secondary geographical layer merging both quarries location and quarries soil type, leading to a powerful analyzing tool for the domain expert.
The produced demonstrator shows that it is possible, in hours, to deduce a simple and reliable geographical layer based on a simple set of orthomosaic. The STDL then was able to prove the possibility to repeat the process along the time dimension, for future and past images, opening the way to build and rebuild the history and evolution of the quarries. With such a process, it will be possible to compute statistical quantities on the long term to catch the evolution and the resources, leading to more reliable strategical understanding of the Swiss resources and sovereignty.
"},{"location":"PROJ-DQRY-TM/","title":"Automatic detection and observation of mineral extraction sites in Switzerland","text":"Cl\u00e9mence Herny (Exolabs), Shanci Li (Uzufly), Alessandro Cerioni (\u00c9tat de Gen\u00e8ve), Roxane Pott (swisstopo)
Proposed by swisstopo - PROJ-DQRY-TM October 2022 to February 2023 - Published on January 2024
Abstract: Studying the evolution of mineral extraction sites (MES) is of primary importance for assessing the availability of mineral resources, managing MES and evaluating the impact of mining activity on the environment. In Switzerland, MES are inventoried at local level by the cantons and at federal level by swisstopo. The latter performs manual vectorisation of MES boundaries. Unfortunately, although the data is of high quality, it is not regularly updated. To automate this tedious task and to better observe the evolution of MES, swisstopo has solicited the STDL to carry out an automatic detection of MES in Switzerland over the years. We performed instance segmentation using a deep learning method to automatically detect MES in RGB aerial images with a spatial resolution of 1.6 m px-1. The detection model was trained with 266 labels and orthophotos from the SWISSIMAGE RGB mosaic published in 2020. The selected trained model achieved a f1-score of 82% on the validation dataset. The model was used to do detection by inference of potential MES in SWISSIMAGE RGB orthophotos from 1999 to 2021. The model shows good ability to detect potential MES with about 82% of labels detected for the 2020 SWISSIMAGE mosaic. The detections obtained with SWISSIMAGE orthophotos acquired over different years can be tracked to observe their temporal evolution. The framework developed can perform detection in an area of interest (about a third of Switzerland at the most) in just a few hours, which is a major advantage over manual mapping. We acknowledge that there are some missed and false detections in the final product, and the results need to be reviewed and validated by domain experts before being analysed and interpreted. The results can be used to perform statistics over time and update MES evolution in future image acquisitions.
"},{"location":"PROJ-DQRY-TM/#1-introduction","title":"1. Introduction","text":""},{"location":"PROJ-DQRY-TM/#11-context","title":"1.1 Context","text":"Mineral extraction constitutes a strategic activity worldwide, including in Switzerland. Demand for mineral resources has been growing significantly in recent decades1, mainly due to the rapid increase in the production of batteries and electronic chips, or buildings construction, for example. As a result, the exploitation of some resources, such as rare earth elements, lithium, or sand, is putting pressure on their availability. Being able to observe the development of mineral extraction sites (MES) is of primary importance to adapting mining strategy and anticipating demand and shortage. Mining has also strong environmental and societal impact23. It implies the extraction of rocks and minerals from water ponds, cliffs, and quarries. The surface affected, initially natural areas, can reach up to thousands of square kilometres1. The extraction of some minerals could lead to soil and water pollution and involves polluting truck transport. Economic and political interests of some resources might overwhelm land protection, and conflicts are gradually intensifying2.
MES are dynamic features that can evolve according to singular patterns, especially if they are small, as is the case in Switzerland. A site can expand horizontally and vertically or be filled to recover the site4235. Changes can happen quickly, in a couple of months. As a results, updating the MES inventory can be challenging. There is a significant demand for effective MES observation of development worldwide. Majority of MES mapping is performed manually by visual inspection of images1. Alternatively, recent improvements in the availability of high spatial and temporal resolution space/airborne imagery and computational methods have encouraged the development of automated image processing. Supervised classification of spectral images is an effective method but requires complex workflow 642. More recently, few studies have implemented deep learning algorithms to train models to detect extraction sites in images and have shown high levels of accuracy3.
In Switzerland, MES management is historically regulated on a canton-based level using GIS data, including information about the MES location, extent, and extracted materials among others. At the federal level, swisstopo and the Federal Office of Statistics (FSO) observe the development of MES. swisstopo has carried out a detailed manual delineation of MES based on SWISSIMAGE dataset over Switzerland.
In the scope to fasten and improving the process of MES mapping in Switzerland, we developed a method for automating MES detection over the years. Ultimately, the goal is to keep the database up to date when new images are acquired. The results can be statistically process to better assess the MES evolution over time in Switzerland.
"},{"location":"PROJ-DQRY-TM/#12-approach","title":"1.2. Approach","text":"The STDL has developed a framework named object-detector to automatically detect objects in a georeferenced imagery dataset based on deep learning method. The framework can be adapted to detect MES (also referred as quarry in the project) in Switzerland.
A project to automatically detect MES in Switzerland7 has been carried out by the STDL in 2021 (detector-interface framework). Detection of potential MES obtained by automatic detection on the 2020 SWISSIMAGE mosaic has already been delivered to swisstopo (layer 2021_10_STDL_QC1). The method has proven its efficiency detecting MES. The numerical model trained with the object detector achieved a f1-score of 82% and detected about 1200 potential MES over Switzerland.
In this project, we aim to continue this work and extend it to a second objective, that of observing MES evolution over time. The main challenge is to prove the algorithm reliability for detecting objects in a multi-year dataset images acquired with different sensors.
The project workflow is synthesised in Figure 1. First, a deep learning algorithm is trained using a manually mapped MES dataset that serves as ground truth (GT). After evaluating the performance of the trained model, the selected one was used to perform inference detection for a given year dataset and area of interest (AoI). The results were filtered to discard irrelevant detection. The operation was repeated over several years. Finally, each potential MES detected was tracked over the years to observe its evolution.
Figure 1: Workflow diagram for automatic MES detection.In this report, we first describe the data used, including the image description and the definition of AoI. Then we explain the model training, evaluation and object detection procedure. Next, we present the results of potential MES detection and the MES tracking strategy. Finally, we provide conclusion and perspectives.
"},{"location":"PROJ-DQRY-TM/#2-data","title":"2. Data","text":""},{"location":"PROJ-DQRY-TM/#21-images-and-area-of-interest","title":"2.1 Images and area of interest","text":"Automatic detection of potential MES over the years in Switzerland was performed with aerial orthophotos from the swisstopo product SWISSIMAGE Journey. Images are georeferenced RGB TIF tiles with a size of 256 x 256 pixels (1 km2).
Product Year Coordinate system Spatial resolution SWISSIMAGE 10 cm 2017 - current CH1903+/MN95 (EPSG:2056) 0.10 m (\\(\\sigma\\) \\(\\pm\\) 0.15 m) - 0.25 m SWISSIMAGE 25 cm 2005 - 2016 MN03 (2005 - 2007) and MN95 (since 2008) 0.25 m (\\(\\sigma\\) \\(\\pm\\) 0.25 m) - 0.50 m (\\(\\sigma\\) \\(\\pm\\) 3.00 - 5.00 m) SWISSIMAGE 50 cm 1998 - 2004 MN03 0.50 m (\\(\\sigma\\) \\(\\pm\\) 0.50 m) Table 1: SWISSIMAGE products characteristics.
Several SWISSIMAGE products exist, produced from different instrumentation (Table 1). SWISSIMAGE mosaics are built and published yearly. The year of the mosaic corresponds to the last year of the dataset publication, and the most recent orthophotos datasets available are then used to complete the mosaic. For example the 2020 SWISSIMAGE mosaic is a combination of 2020, 2019 and 2018 images acquisition. The 1998 mosaic release corresponds to a year of transition from black and white images (SWISSIMAGE HIST) to RGB images. For this study, only RGB data from 1999 to 2021 were considered.
Figure 2: Acquisition footprint of SWISSIMAGE aerial orthophotos for the years 2016 to 2021. The SWISSIMAGE Journey mosaic in the background is the 2020 release.Acquisition footprints of yearly acquired orthophotos were used as AoI to perform MES detection through time. Over the years, the footprints may spatially overlap (Fig. 2). Since 2017, the geometry of the acquisition footprints has been quasi-constant, dividing Switzerland into three more or less equal areas, ensuring that the orthophotos are updated every three years. For the years before 2017, the acquisition footprints were not systematic and do not guarantee a periodically update of the orthophotos. The acquisition footprint may also not be spatially contiguous.
Figure 3: Illustration of the combination of SWISSIMAGE images and FSO images for the 2007 SWISSIMAGE mosaic. (a) Overview of the 2007 SWISSIMAGE mosaic. The red polygon corresponds to the provided SWISSIMAGE acquisition footprint for 2007. The orange polygon corresponds to the surface covered by the new SWISSIMAGE for 2007. The remaining area of the red polygon corresponds to the FSO image dataset acquired in 2007. The black box indicates the panel (b) location, and the white box indicates the panel (c) location. (b) Side-by-side comparison of image composition in 2006 and 2007 SWISSIMAGE mosaics. (c) Examples of detection polygons (white polygons) obtained by inference on the 2007 SWISSIMAGE dataset (red box) and FSO images 2007 (outlined by black box).SWISSIMAGE Journey mosaics of 2005, 2006, and 2007 present a particularity as it is composed not only of 25 cm resolution SWISSIMAGE but also of orthophotos acquired for the FSO. These are tiff RGB orthophotos with a spatial resolution of 50 cm px-1 (coordinate system: CH1903/LV03 (EPSG:21781)) and have been integrated into the SWISSIMAGE Journey products. However, these images were discarded (modification of the footprint shape) from our dataset because they were causing issues in the MES automatic detection producing odd segmented detection shapes (Fig. 3). This is probably due to the different stretching of pixel colour between datasets.
It also has to be noted that there are currently missing images (about 88 tiles at zoom level 16) in the 2020 SWISSIMAGE dataset.
"},{"location":"PROJ-DQRY-TM/#22-image-fetching","title":"2.2 Image fetching","text":"Pre-rendered SWISSIMAGE tiles (256 x 256 px, 1 km2) are downloaded using the Web Map Tile Service (WMTS) wmts.geo.admin.ch via an XYZ connector. Tiles are served on a cartesian coordinates grid using a Web Mercator Quad projection and a coordinate reference system EPGS 3857. Position of a tile on the grid is defined by x and y coordinates and the pixel resolution of the image is defined by z, its zoom level. Changing the zoom level affects the resolution by a factor of 2 (Fig. 4). For instance a zoom level of 17 corresponds to a resolution of 0.8 m px-1 and a zoom level of 16 to a resolution of 1.6 m px-1.
Figure 4: Examples of tiles geometry at zoom level 16 (z16, black polygons) and at zoom level 17 (z17, blue polygons). The number of tiles for each zoom level is indicated in square brackets. The tiles are selected for model training, i.e. only tiles intersecting swissTLM3D labels (tlm-hr-trn-topo, yellow polygons).Note that in the subsequent project carried out by Reichel and Hamel (2021)7, the tiling method adopted was slightly different from the one adopted for this project. Custom size and resolution tiles were built. A sensitivity analysis of these two parameters was conducted and led to the choice of tiles with a size of about 500 m and a pixel resolution of about 1 m (above, the performance was not significantly improved).
"},{"location":"PROJ-DQRY-TM/#23-ground-truth","title":"2.3 Ground truth","text":"The MES labels originate from the swiss Topographic Landscape Model 3D (swissTLM3D) produced by swisstopo. swissTLM3D is a large-scale topographic landscape model of Switzerland, including manually drawn and georeferenced vectors of objects of interest at a high resolution, including MES features. Domain experts from swisstopo have carried out extensive work to review the labeled MES and to synchronise them with the 2020 SWISSIMAGE mosaic to improve the quality of the labeled dataset. A total of 266 labels are available. The mapped MES reveal the diversity of MES characteristics, such as the presence or absence of buildings/infrastructures, trucks, water pounds, and vegetation (Fig. 5).
Figure 5: Examples of MES mapped in swissTLM3D and synchronised to 2020 SWISSIMAGE mosaic.These labels are used as the ground truth (GT) i.e. the reference dataset indicating the presence of a MES in an image. The GT is used both as input to train the model to detect MES and to evaluate the model performance.
"},{"location":"PROJ-DQRY-TM/#3-automatic-detection-methodology","title":"3. Automatic detection methodology","text":""},{"location":"PROJ-DQRY-TM/#31-deep-learning-algorithm-for-object-detection","title":"3.1 Deep learning algorithm for object detection","text":"Training and inference detection of potential MES in SWISSIMAGE were performed with the object detector framework. This project is based on the open source detectron2 framework8 implemented with PyTorch by the Facebook Artificial Intelligence Research group (FAIR). Instance segmentation (delineation of object) was performed with a Mask R-CNN deep learning algorithm9. It is based on a Recursive-Convolutional Neural Network (CNN) with a backbone pre-trained model ResNet-50 (50 layers deep residual network).
Images were annotated with custom COCO object based on the labels (class 'Quarry'). The model is trained with this dataset to later perform inference detection on images. If the object is detected by the algorithm, a pixel mask is produced with a confidence score (0 to 1) attributed to the detection (Fig. 6).
Figure 6: Example of detection mask. The pink rectangle corresponds to the bounding box of the object, the object is segmented by the pink polygons associated with the detection class ('Quarry') and a confidence score.The object detector framework permits to convert detection mask to georeferenced polygon that can be used in GIS softwares. The implementation of the Ramer-Douglas-Peucker (RDP) algorithm, allows the simplification of the derived polygons by discarding non-essential points based on a smoothing parameter. This allow to considerably reduces the amount of data to be stored and prevent potential memory saturation while deriving detection polygons on large areas as it is the case for this study.
"},{"location":"PROJ-DQRY-TM/#32-model-training","title":"3.2 Model training","text":"Orthophotos from the 2020 SWISSIMAGE mosaic, for which the GT has been defined, were chosen to proceed the model training. Tiles intersecting labels were selected and split randomly into three datasets: the training dataset (70%), the validation dataset (15%), and the test dataset (15%). Addition of empty tiles (no annotation) to confront the model to landscapes not containing the target object has been tested (Appendix A.1) but did not provide significant improvement in the model performance to be adopted.
Figure 7: Training curves obtained at zoom level 16 on the 2020 SWISSIMAGE mosaic. The curves were obtained for the trained model 'replicate 3'. (a) Learning rate in function of iteration. The step was defined every 500 iterations. The initial learning rate was 5.0 x 10-3 with a weight and bias decay of 1.0 x 10-4. (b) The total loss is a function of iteration. Raw measurement (light red) and smoothed curve (0.6 factor, solid red) are superposed. (c) The validation loss curve is a function of iteration. Raw measurement (light red) and smoothed curve (0.6 factor, solid red) are superposed. The vertical dashed black lines indicate the iteration minimising the validation loss curve, i.e. 3000.Models were trained with two images per batch (Appendix A.2), a learning rate of 5 x 10-3, and a learning rate decay of 1 x 10-4 every 500 steps (Fig. 7 (a)). For the given model, parameters and a zoom level of 16 (Section 3.3.3), the training is performed over 7000 iterations and lasts about 1 hour on a 16 GiB GPU (NVIDIA Tesla T4) machine compatible with CUDA. The total (train and validation loss) loss curve decreases until reaching a quasi-steady state around 6000 iterations (Fig. 7 (b)). The optimal detection model corresponds to the one minimising the validation loss curve. This minimum is reached between 2000 and 3000 iterations (Fig. 7 (c)).
"},{"location":"PROJ-DQRY-TM/#33-metrics","title":"3.3 Metrics","text":"The model performance and detection reliability were assessed by comparing the results to the GT. The detection performed by the model can be either (1) a True Positive (TP), i.e. the detection is real (spatially intersecting the GT) ; (2) a False Positive i.e. the detection is not real (not spatially intersecting the GT) or (3) a False Negative (FN) i.e. the labeled object is not detected by the algorithm (Fig. 8). Tagging the detection (Fig. 9(a)) allows to calculate several metrics (Fig. 9(b)) such as:
Figure 8: Examples of different detection cases. Label is represented with a yellow polygon and detection with a red polygon. (a) True Positive (TP) detection intersecting the GT, (b) a potential True Positive (TP?) detection with no GT, (c) False Negative (FN) case with no detection while GT exists, (d) False Positive (FP) detection of object that is not a MES.the recall, translating the amount of TP detections predicted by the model:
\\[recall = \\frac{TP}{(TP + FN)}\\]the precision, translating the number of well-predicted TP among all the detections:
\\[precision = \\frac{TP}{(TP + FP)}\\]the f1-score, the harmonic average of the precision and the recall:
\\[f1 = 2 \\times \\frac{recall \\times precision}{recall + precision}\\]Trained models reached f1-scores of about 80% with a standard deviation of 2% (Table 2). The performances are similar to the model trained by Reichel and Hamel (2021)7.
model precision recall f1 replicate 1 0.84 0.79 0.82 replicate 2 0.77 0.76 0.76 replicate 3 0.83 0.81 0.82 replicate 4 0.89 0.77 0.82 replicate 5 0.78 0.82 0.80
Table 2: Metrics value computed for the validation dataset for trained models replicates with the 2020 SWISSIMAGE mosaic at zoom level 16.
A variability is expected as the deep learning algorithm displays some random behavior, but it is supposed to be negligible. However, the observed model variability is enough to affect final results that might slightly change by using different trained models with same input parameters (Fig. 10).
Figure 10: Detection polygons obtained for the different trained model replicates (Table 2) highlighting results variability. The labels correspond to orange polygons. The number in the square bracket corresponds to the number of polygons. The inference detections have been performed on a subset of 2000 tiles for the 2020 SWISSIMAGE at zoom level 16. Detections have been filtered according to the parameters defined in Section 5.1.To reduce the variability of the trained models, the random seeds of both detectron2 and python have been fixed. Neither of these attempts have been successful, and the variability remains. The nondeterministic behavior of detectron2 has been recognised (issue 1, issue 2), but no suitable solution has been provided yet. Further investigation on the model performance and consistency should be performed in the future.
To mitigate the results variability of model replicates, we could consider in the future to combine the results of several model replicates to remove FP while preserving the TP and potential TP detection. The choice and number of models used should be evaluated. This method is tedious as it requires inference detection from several models, which can be time-consuming and computationally intensive.
"},{"location":"PROJ-DQRY-TM/#42-sensitivity-to-the-zoom-level","title":"4.2 Sensitivity to the zoom level","text":"Image resolution is dependent on the zoom level (Section 2.2). To select the most suitable zoom level for MES detection, we performed a sensitivity analysis on trained model performance. Increasing the zoom level increases the value of the metrics following a global linear trend (Fig. 11).
Figure 11: Metrics values (precision, recall and f1) as function of zoom level for the validation dataset. The results of the replicates performed at each zoom level are included (Table A1).Models trained at a higher zoom level performed better. However, a higher zoom level implies smaller tile and thus, a larger number of tiles to fill the AoI. For a typical AoI, i.e up to a third of Switzerland, this can lead to a large number of tiles to be stored and processed, leading to potential RAM and/or disk space saturation. For 2019 AoI, 89'290 tiles are required at zoom level 16 while 354'867 tiles are required at zoom level 17, taking respectively 3 hours and 11 hours to process on a 30 GiB RAM machine with a 16 GiB GP.
Visual comparison of inference detection reveals that there was no significant improvement in the object detection quality from zoom level 16 to zoom level 17. Both zoom level present a similar proportion of detections intersecting labels (82% and 79% for zoom level 16 and zoom level 17 respectively). On the other hand, the quality of object detection at zoom level 15 was depreciated. Indeed, detection scores were lower, with only tens of detection scores above 0.95 while it was about 400 at zoom level 16 and about 64% of detection intersecting labels.
"},{"location":"PROJ-DQRY-TM/#43-model-choice","title":"4.3 Model choice","text":"Based on tests performed, we selected the 'replicate 3' model, obtained (Tables 2 and A1) at zoom level 16, to perform inference detection.
Models trained at zoom level 16 (1.6 m px-1 pixel resolution) have shown satisfying results in accurately detecting MES contour and limiting the number of FP with high detection score (Fig. 11). It represents a good trade-off between results reliability (f1-score between 76% and 82% on the validation dataset) and computational resources. Then, among all the replicates performed at zoom level 16, we selected the trained model 'replicate 3' (Table 2) because it combines both the highest metrics values (for the validation dataset but also the train and test datasets), close precision and recall values and a rather low amount of low score detections.
"},{"location":"PROJ-DQRY-TM/#5-automatic-detection-of-mes","title":"5. Automatic detection of MES","text":""},{"location":"PROJ-DQRY-TM/#51-detection-post-processing","title":"5.1 Detection post-processing","text":"Detection by inference was performed over AoIs with a threshold detection score of 0.3 (Fig. 12). The low score filtering results in a large amount of detections. Several detections may overlap, potentially segmenting a single object. In addition a detection might be split into multiple tiles. To improve the pertinence and the aesthetics of the raw detection polygons, a post-processing procedure was applied.
First, a large proportion of FP occurred in mountainous areas (rock outcrops and snow, Fig. 12(a)). We assumed MES are not present (or at least sparse) above a given altitude. An elevation filtering was applied using a Switzerland Digital Elevation Model (about 25 m px-1) derived from the SRTM instrument (USGS - SRTM). The maximum elevation of the labeled MES is about 1100 m.
Second, detection aggregation was applied: - polygons were clustered (K-means) according to their centroid position. The method involves setting a predefined number k of clusters. Manual tests performed by Reichel and Hamel (2021)7 concluded to set k equal to the number of detection divided by three. The highest detection score was assigned to the clustered detection. This method preserves the final integrity of detection polygons by retaining detection that has potentially a low confidence score but belongs to a cluster with a higher confidence score improving the final segmentation of the detected object. The value of the threshold score must be kept relatively low (i.e. 0.3) when performing the detection to prevent removing too many polygons that could potentially be part of the detected object. We acknowledge that determining the optimal number of clusters by clustering validation indices rather than manual adjustment would be more robust. In addition, exploring other clustering methods, such as DBSCAN, based on local density, can be considered in the future. - score filtering was applied. - spatially close polygons were assumed to belong to the same MES and are merged according to a distance threshold. The averaged score of the merged detection polygons was ultimately computed.
Finally, we assumed that a MES covers a minimal area. Detection with an area smaller than a given threshold were filtered out. The minimum MES area in the GT is 2270 m2.
Figure 12: MES detection filtering. (a) Overview of the automatic detection of MES obtained with 2020 SWISSIMAGE at zoom level 16. Transparent red polygons (with associated confidence score in white) correspond to the raw object detection output and the red line polygons (with associated confidence score in red) correspond to the final filtered detection. The black box outlines the location of the (b) and (c) panel zoom. Note the large number of detection in the mountains (right area of the image). (b) Zoom on several raw detections polygons of a single object with their respective confidence score. (c) Zoom on a filtered detection polygon of a single object with the resulting score.Sensitivity of detections to these filters was investigated (Table 3). The quantitative evaluation of filter combination relevance is tricky as potential MES presence is performed by inference, and the GT provided by swissTLM3D constitutes an incomplete portion of the MES in Switzerland (2020). As indication, we computed the number of spatial intersection between ground truth and detection obtained with the 2020 SWISSIMAGE mosaic. Filter combination number 3 was adopted, allowing to detect about 82% of the GT with a relatively limited amount of FP detection compared to filter combinations 1 and 2 (from visual inspection).
filters combination score threshold elevation threshold (m) area threshold (m2) distance threshold (m) number of detection label detection (%) 1 0.95 2000 1100 10 1745 85.1 2 0.95 2000 1200 10 1862 86.6 3 0.95 5000 1200 10 1347 82.1 4 0.96 2000 1100 10 1331 81.3 5 0.96 2000 1200 8 1445 78.7 6 0.96 5000 1200 10 1004 74.3
Table 3: Threshold values of filtering parameters and their respective number of detections and intersection proportion with swissTLM3D labels. The detections have been obtained for the 2020 SWISSIMAGE mosaic.
We acknowledged that for the selected filter combination, the area threshold value is higher than the smallest area value of the GT polygons. However, reducing the area value increases significantly the presence of FP. Thirteen labels display an area below 5000 m2.
"},{"location":"PROJ-DQRY-TM/#52-inference-detections","title":"5.2 Inference detections","text":"The trained model was used to perform inference detection on SWISSIMAGE orthophotos from 1999 to 2021. The automatic detection model shows good capabilities to detect MES in different years orthophotos (Fig. 13), despite being trained on the 2020 SWISSIMAGE mosaic. The model also demonstrates capabilities to detect potential MES that have not been mapped yet but are strong candidates. However, the model misses some labeled MES or potential MES (FN, Fig. 8). However, when the model process FSO images, with different colour stretching, it failed to correctly detect potential MES (Fig. 3). It reveals that images must have characteristics close to the training dataset for optimal results with a deep learning model.
Figure 13: Examples of object detection segmented by polygons in different year orthophotos. The yellow polygon for the year 2020 panel of object ID 3761 corresponds to the label. Other coloured polygons correspond to the algorithm detection.Then, we acknowledge that a significant amount of FP detection can still be observed in our filtered detection dataset (Figs. 8 and 14). The main sources of FP are the presence of large rock outcrops, mountainous areas without vegetation, snow, river sand beds, brownish-coloured fields, or construction areas. MES present a large variety of features (buildings, water pounds, trucks, vegetation) (Fig. 5) which can be a source of confusion for the algorithm but even sometimes for human eye. Therefore, the robustness of the GT is crucial for reliable detection. The algorithm's results should be taken carefully.
Figure 14: Examples of FP detection. (a) Snow patches (2019) ; (b) River sand beds and gullies (2019); (c) Brownish field (2020); (d) vineyards (2005); (e) Airport tarmac (2020); (f) Construction site (2008).The detections produced by the algorithm are potential MES, but the final results must be reviewed by experts in the field to discard remaining FP detection and correct FN before any processing or interpretation.
"},{"location":"PROJ-DQRY-TM/#6-observation-of-mes-evolution","title":"6. Observation of MES evolution","text":""},{"location":"PROJ-DQRY-TM/#61-object-tracking-strategy","title":"6.1 Object tracking strategy","text":"Switzerland is covered by RGB SWISSIMAGE product over more than 20 years (1999 to actual), allowing changes to be detected (Fig. 13).
Figure 15: Strategy for MES tracking over time. ID assignment to detection. Spatially intersecting polygons share the same ID allowing the MES to be tracked in a multi-year dataset.We assumed that detection polygons that overlap from one year to another describe a single object (Fig. 15). Overlapping detections and unique detections (which do not overlap with polygons from other years) in the multi-year dataset were assigned a unique object identifier (ID). A new object ID in the timeline indicates: - the first occurrence of the object detected in the dataset of the first year available for the area. It does not mean that the object was not present before, - the creation of a potential new MES.
The disappearance of an object ID indicates its potential refill. Therefore, the chronology of MES, creation, evolution and filling, can be constrained.
"},{"location":"PROJ-DQRY-TM/#62-evolution-of-mes-over-years","title":"6.2 Evolution of MES over years","text":"Figures 13 and 16 illustrate the ability of the trained model to detect and track a single object in a multi-year dataset. The detection over the years appears reliable and consistent, although object detection may be absent from a year dataset (e.g. due to shadows or colour changes in the surroundings). Remember that the image coverage of a given area is not renewed every year. Characteristics of the potential MES, such as surface evolution (extension or retreat), can be quantified. For example, the surfaces of object IDs 239 and 3861 have more than doubled in about 20 years. Tracking object ID along with image visualisation allows observation of the opening and the closing of potential MES, as object IDs 31, 44, and 229.
Figure 16: Detection area (m2) as a function of years for several object ID. Figure 13 provides the visualisation of the object IDs selected. Each point corresponds to an object ID occurrence in the corresponding year dataset.The presence of an object in several years dataset strengthens the likeliness of the detected object to be an actual MES. On the other hand, object detection of only one occurrence is more likely a FP detection.
"},{"location":"PROJ-DQRY-TM/#7-conclusion-and-perspectives","title":"7. Conclusion and perspectives","text":"The project demonstrated the ability to automatically, quickly (a matter of hours for one AoI), and reliably detect potential MES in orthophotos of Switzerland with an automatic detection algorithm (deep learning). The selected trained model achieved a f1-score of 82% on the validation dataset. The final detection polygons accurately delineate the potential MES. We can track single MES through multiple years, emphasising the robustness of the method to detect objects in multi-year datasets despite the detection model being trained on a single dataset (2020 SWISSIMAGE mosaic). However, image colour stretching different from that used to train the model can significantly affect the model's ability to provide reliable detection, as was the case with the FSO images.
Although the performance of the trained model is satisfactory, FP and FN are present in the datasets. They are mainly due to confusion of the algorithm between MES and rock outcrops, river sandbeds or construction sites. A manual verification of the relevance of the detection by experts in the field is necessary before processing and interpreting the data. Revision of all the detections from 1999 to 2021 is a time-consuming effort but is necessary to guarantee detection reliability. Despite the required manual checks, the provided framework and detection results constitute a valuable contribution that can greatly assist the inventory and the observation of MES evolution in Switzerland. It provides state-wide detection in a matter of hours, which is a considerable time-saving compared with manual mapping. It also enables MES detection with a standardised method, independent of the information or method adopted by the cantons.
Further model improvements could be consider, such as increasing the metrics by improving GT quality, improving model learning strategy, mitigating the model learning variability, or test supervised clustering methods to find relevant detection.
This work can be used to compute statistics to study long-term MES in Switzerland and better management of resources and land use in the future. MES detection can be combined with other data, such as the geologic layer, to identify the mineral/rocks exploited and high-resolution DEM (swissALTI3D) to infer elevation changes and observe excavation or filling of MES5. So far only RGB SWISSIMAGE orthophotos from 1999 to 2021 were processed. Prior to 1999, black and white orthophotos exist but the model trained on RGB images could not be applied trustfully to black and white images. Image colourisation tests (with the help of deep learning algorithm[@farella_colour_2022]) were performed and provided encouraging detection results. This avenue needs to be explored.
Finally, automatic detection of MES is rare13, and most studies perform manual mapping. Therefore, the framework could be the extended to other datasets and/or other countries to provide a valuable asset to the community. A global mapping of MES has been completed with over 21'000 polygons1 and can be used as a GT database to train an automatic detection model.
"},{"location":"PROJ-DQRY-TM/#code-availability","title":"Code availability","text":"The codes are stored and available on the STDL's github repository:
This project was made possible thanks to a tight collaboration between the STDL team and swisstopo. In particular, the STDL team acknowledges key contribution from Thomas Galfetti (swisstopo). This project has been funded by \"Strategie Suisse pour la G\u00e9oinformation\".
"},{"location":"PROJ-DQRY-TM/#appendix","title":"Appendix","text":""},{"location":"PROJ-DQRY-TM/#a1-influence-of-empty-tiles-addition-to-model-performance","title":"A.1 Influence of empty tiles addition to model performance","text":"By selecting tiles intersecting only labels, the detection model is mainly confronted with the presence of the targeted object to be detected. Addition of non-label-intersecting tiles, i.e. empty tiles, provides landscape diversity that might help to improve the object detection performance.
In order to evaluate the influence of adding empty tiles to the dataset used for the model performance, empty tiles were chosen randomly (not intersecting labels) within Switzerland boundaries and added to the tile dataset used for the model training (Fig. A1). Empty tiles were added to (1) the whole dataset split as for the initial dataset (training: 70%, test: 15%, and validation: 15%) and (2) only to the training dataset. A visual inspection must be performed to prevent a potential unlabeled MES to be present in the image and disturbing the algorithm learning.
Figure A1: View of tiles intersecting (black) labels (yellow) and randomly selected empty tiles (red) in Switzerland. This case correspond to the addition of 35% empty tiles.Figure A1 reveals that adding empty tiles to the dataset does not significantly influence the metrics values. The number of TP, FP, and FN do not show significant variation. However, when performing an inference detection test on a subset of tiles (2000) for an AOI, it appears that the number of raw detections (unfiltered) is reduced as the number of empty tiles increases. However, visual inspection of the final detection after applying filters does not show significant improvement compared to a model trained without adding empty tiles.
Figure A1: Influence of the addition of empty tiles (relative to the number of tiles intersecting labels) on trained performance for zoom levels 16 and 17 with (a) the F1-score as a function of the percentage of added empty tiles and (b) the normalised (by the number of tiles sampled = 2000) number of detection as a function of added empty tiles. Empty tiles have been added to only the train dataset for the 5% and 30% cases and to all datasets for 9%, 35%, 70%, and 140% cases.A considered solution to improve the results could be to specifically select tiles for which FP occurred and include them in the training dataset as empty tiles. This way, the model could be trained with relevant confounding features such as snow patches, river sandbeds, or gullies not labeled as GT.
"},{"location":"PROJ-DQRY-TM/#a2-sensitivity-of-the-model-to-the-number-of-images-per-batch","title":"A.2 Sensitivity of the model to the number of images per batch","text":"During the model learning phase, the trained model is updated after each batch of samples was processed. Adding more samples, i.e. in our case images, to the batch can influence the model learning capacity. We investigated the role of adding more images per batch for a dataset with and without adding a portion of empty tiles to the learning dataset. Adding more images per batch speeds up the model learning (Table A1), and the minimum of the loss curve is reached for a smaller number of iterations.
Figure A2: Metrics (precision, recall and f1-score) evolution with the number of images per batch during the model training. Results have been obtained on a dataset without empty tiles addition (red) and with the addition of 23% of empty tiles to the training dataset.Figure A2 reveals that the metrics values remain in a range of constant values while adding extra images to the batch in all cases (with or without empty tiles). A potential effect of adding more images to the batch is the reduction of the metrics variability between replicates of trained models as the range of metrics values is smaller for 8 images per batch than 2 images per batch. However, this observation has to be taken carefully as fewer replicates have been performed with 8 images per batch than for 2 or 4 images per batch. Further investigation would provide stronger insights on this effect.
"},{"location":"PROJ-DQRY-TM/#a3-evaluation-of-trained-models","title":"A.3 Evaluation of trained models","text":"Table A1 sumup metrics value obtained for all the configuration tested for the project.
zoom level model empty tiles (%) image per batch optimum iteration precision recall f1 15 replicate 1 0 2 1000 0.727 0.810 0.766 16 replicate 1 0 2 2000 0.842 0.793 0.817 16 replicate 2 0 2 2000 0.767 0.760 0.763 16 replicate 3 0 2 3000 0.831 0.810 0.820 16 replicate 4 0 2 2000 0.886 0.769 0.826 16 replicate 5 0 2 2000 0.780 0.818 0.798 16 replicate 6 0 2 3000 0.781 0.826 0.803 16 replicate 7 0 4 1000 0.748 0.860 0.800 16 replicate 8 0 4 1000 0.779 0.785 0.782 16 replicate 9 0 8 1500 0.800 0.793 0.797 16 replicate 10 0 4 1000 0.796 0.744 0.769 16 replicate 11 0 8 1000 0.802 0.769 0.785 16 ET-250_allDS_1 34.2 2 2000 0.723 0.770 0.746 16 ET-250_allDS_2 34.2 2 3000 0.748 0.803 0.775 16 ET-1000_allDS_1 73.8 2 6000 0.782 0.815 0.798 16 ET-1000_allDS_2 69.8 2 6000 0.786 0.767 0.776 16 ET-1000_allDS_3 70.9 2 6000 0.777 0.810 0.793 16 ET-1000_allDS_4 73.8 2 6000 0.768 0.807 0.787 16 ET-2000_allDS_1 143.2 2 6000 0.761 0.748 0.754 16 ET-80_trnDS_1 5.4 2 2000 0.814 0.793 0.803 16 ET-80_trnDS_2 5.4 2 2000 0.835 0.752 0.791 16 ET-80_trnDS_3 5.4 2 2000 0.764 0.802 0.782 16 ET-400_trnDS_1 29.5 2 6000 0.817 0.777 0.797 16 ET-400_trnDS_2 29.5 2 5000 0.848 0.785 0.815 16 ET-400_trnDS_3 29.5 2 4000 0.758 0.802 0.779 16 ET-400_trnDS_4 29.5 4 2000 0.798 0.818 0.808 16 ET-400_trnDS_5 29.5 4 1000 0.825 0.777 0.800 16 ET-1000_trnDS_1 0 2 4000 0.758 0.802 0.779 17 replicate 1 0 2 5000 0.819 0.853 0.835 17 replicate 1 0 2 5000 0.803 0.891 0.845 17 replicate 1 0 2 5000 0.872 0.813 0.841 17 ET-250_allDS_1 16.8 2 3000 0.801 0.794 0.797 17 ET-1000_allDS_1 72.2 2 7000 0.743 0.765 0.754 18 replicate 1 0 2 10000 0.864 0.855 0.859
Table A1: Metrics value computed for the validation dataset for all the trained models with the 2020 SWISSIMAGE Journey mosaic at zoom level 16.
Victor Maus, Stefan Giljum, Jakob Gutschlhofer, Dieison M. Da Silva, Michael Probst, Sidnei L. B. Gass, Sebastian Luckeneder, Mirko Lieber, and Ian McCallum. A global-scale data set of mining areas. Scientific Data, 7(1):289, September 2020. URL: https://www.nature.com/articles/s41597-020-00624-w, doi:10.1038/s41597-020-00624-w.\u00a0\u21a9\u21a9\u21a9\u21a9\u21a9
Vicen\u00e7 Carabassa, Pau Montero, Marc Crespo, Joan-Cristian Padr\u00f3, Xavier Pons, Jaume Balagu\u00e9, Llu\u00eds Brotons, and Josep Maria Alca\u00f1iz. Unmanned aerial system protocol for quarry restoration and mineral extraction monitoring. Journal of Environmental Management, 270:110717, September 2020. URL: https://linkinghub.elsevier.com/retrieve/pii/S0301479720306496, doi:10.1016/j.jenvman.2020.110717.\u00a0\u21a9\u21a9\u21a9\u21a9
Chunsheng Wang, Lili Chang, Lingran Zhao, and Ruiqing Niu. Automatic Identification and Dynamic Monitoring of Open-Pit Mines Based on Improved Mask R-CNN and Transfer Learning. Remote Sensing, 12(21):3474, January 2020. URL: https://www.mdpi.com/2072-4292/12/21/3474, doi:10.3390/rs12213474.\u00a0\u21a9\u21a9\u21a9\u21a9
Haoteng Zhao, Yong Ma, Fu Chen, Jianbo Liu, Liyuan Jiang, Wutao Yao, and Jin Yang. Monitoring Quarry Area with Landsat Long Time-Series for Socioeconomic Study. Remote Sensing, 10(4):517, April 2018. URL: https://www.mdpi.com/2072-4292/10/4/517, doi:10.3390/rs10040517.\u00a0\u21a9\u21a9
Valentin Tertius Bickel and Andrea Manconi. Decadal Surface Changes and Displacements in Switzerland. Journal of Geovisualization and Spatial Analysis, 6(2):24, December 2022. URL: https://link.springer.com/10.1007/s41651-022-00119-9, doi:10.1007/s41651-022-00119-9.\u00a0\u21a9\u21a9
George P. Petropoulos, Panagiotis Partsinevelos, and Zinovia Mitraka. Change detection of surface mining activity and reclamation based on a machine learning approach of multi-temporal Landsat TM imagery. Geocarto International, 28(4):323\u2013342, July 2013. URL: http://www.tandfonline.com/doi/abs/10.1080/10106049.2012.706648, doi:10.1080/10106049.2012.706648.\u00a0\u21a9
Huriel Reichel and Nils Hamel. Automatic Detection of Quarries and the Lithology below them in Switzerland. 2022. URL: file:///C:/Users/Clemence/Documents/STDL/Projects/proj-quarries/01_Documentation/Bibliography/Automatic%20Detection%20of%20Quarries%20and%20the%20Lithology%20below%20them%20in%20Switzerland%20-%20Swiss%20Territorial%20Data%20Lab.htm.\u00a0\u21a9\u21a9\u21a9\u21a9
Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. Detectron2. 2019. URL: https://github.com/facebookresearch/detectron2.\u00a0\u21a9
Kaiming He, Georgia Gkioxari, Piotr Doll\u00e1r, and Ross Girshick. Mask R-CNN. January 2018. arXiv:1703.06870 [cs]. URL: http://arxiv.org/abs/1703.06870, doi:10.48550/arXiv.1703.06870.\u00a0\u21a9
Nils Hamel (UNIGE) - Huriel Reichel (swisstopo)
Project scheduled in the STDL research roadmap - PROJ-DTRK September 2020 to November 2020 - Published on April 23, 2021
Abstract : Being able to track modifications in the evolution of geographical datasets is one important aspect in territory management, as a large amount of information can be extracted out of differences models. Differences detection can also be a tool used to assess the evolution of a geographical model through time. In this research project, we apply differences detection on INTERLIS models of the official Swiss land registers in order to emphasize and follow its evolution and to demonstrate that change in reference frames can be detected and assessed.
"},{"location":"PROJ-DTRK/#introduction","title":"Introduction","text":"Land register models are probably to most living of the geographical models as they are constantly updated to offer a rigorous and up-to-date view of the territory.
The applied corrections are always the result of a complex process, involving different territory actors, until the decision is made to integrate them into the land register. In addition, land register models comes with an additional constraint linked to political decisions. Indeed, the land register models are the result of a political mission conducted under federal laws making these models of high importance and requiring constant care. We show in this research project how differences detection tool [1] of the STDL 4D framework can be used to emphasize and analyze these corrections along the time dimension.
In addition to the constant updates of the models, changes in the reference frame can also lead to large-scale corrections of the land register models. These global corrections are then made even more complex by the federal laws that impose a high degree of correctness and accuracy.
In the context of the introduction of the new reference frame DM.flex [2] for the Swiss land register, being able to assess the applied changes on the geographical model appear as an important aspect. Indeed, changing the reference frame for the land register models is a long and complex technical process that can be error prompt. We also show in this research project how the difference detection algorithm can be helpful to assess and verify the performed corrections.
"},{"location":"PROJ-DTRK/#research-project-specifications","title":"Research Project Specifications","text":"In this research project, the difference detection algorithm implemented in the STDL 4D framework is applied on INTERLIS data containing the official land register models of different Swiss Canton. As introduced, two main directions are considered for the difference detection algorithm :
Demonstrating the ability to extract information in between land register models
Demonstrating the ability of difference models to be used as an assessment tool
Through the first direction, the difference detection algorithm is presented. Considering the difference models it allows computing, it is shown how such model are able to extract information in between the models in order to emphasize the ability to represent, and then, to verify the evolution of the land register models.
The second direction focuses on demonstrating that difference models are a helpful representation of the large-scale corrections that can be applied to land register during reference frame modification and how they can be used as a tool to assess the modifications and to help to fulfil the complex task of the verification of the corrected models.
"},{"location":"PROJ-DTRK/#research-project-data","title":"Research Project Data","text":"For the first research direction, the land register models of the Thurgau Kanton are considered. They are selected in order to have a small temporal distance allowing to focus on a small amount of well-defined differences :
Thurgau Kanton, 2020-10-13, INTERLIS
Thurgau Kanton, 2020-10-17, INTERLIS
For the second direction, which focus on more complex differences, the models of the Canton of Geneva land register are considered with a much larger temporal gap between them :
Canton of Geneva, 2009-10, INTERLIS
Canton of Geneva, 2013-04, INTERLIS
Canton of Geneva, 2017-04, INTERLIS
Canton of Geneva, 2019-04, INTERLIS
This first section focuses on short-term differences to show how difference models work and how they are able to represent the modifications extracted out of the two compared models. The following images give an illustration of the considered dataset, which are the land register models of Thurgau Kanton :
Illustration of Thurgau Kanton INTERLIS models - Data : Kanton ThurgauThe models are made of vector lines, well geo-referenced in the Swiss coordinates frame EPSG:2056. The models are also made of different layers that are colored differently with the following correspondences :
INTERLIS selected topics and tables colors - Official French and German designationsThese legends are used all along this research project.
Considering two temporal versions of this geographical model, separated of a few days, one is able to extract difference models using the 4D framework algorithm. As an example, one can consider this very specific view of the land register, focusing on a few houses :
Close view of the Thurgau INTERLIS model in 2020-10-13 (left) and 2020-10-17 (right) - Data : Kanton ThurgauIt is clear that most of the close view is identical for the two models, except for a couple of houses that were added to the land register model between these two temporal versions. By applying the difference detection algorithm, one is able to obtain a difference model comparing the two previous models. The following image gives an illustration of the obtained difference models considering the most recent temporal version as reference :
Difference model obtained comparing the two temporal versions - Data : Kanton ThrugauOne can see how the difference algorithm is able to emphasize the differences and to represent them in a human-readable third model. The algorithm also displays the identical parts in dark gray to offer the context of the differences to the operator.
Of course, in such close view, differences detection can appear as irrelevant, as one is clearly able to see that something changed on the selected example without any help. But difference models can be computed at any scale. For example, taking the example of the Amriswil city :
View of Amriswil model in 2020-10-13 (left) and 2020-10-17 (right) - Data : Kanton ThurgauIt becomes more complicated to track down the differences that can appear between the two temporal versions. By computing their difference model, one is able to access a third model that ease the analysis of the evolution at the scale of the city itself as illustrated on the following image :
Difference model computed for the city of Amriswil - Data : Kanton ThrugauOne can see how difference models can be used to track down modifications brought to the land register in a simple manner, while keeping the information of the unchanged elements between the two compared models. This demonstrates that information that exists between models can be extracted and represented for further users or automated processes. In addition, such difference models can be computed at any scale, considering small area up to the whole countries.
"},{"location":"PROJ-DTRK/#difference-models-an-assessment-tool","title":"Difference Models : An Assessment Tool","text":"On the previous section, the difference models are computed using two models only separated of a few days, containing only a small amount of clear and simple modifications. This section focuses on detecting differences on larger models, separated by several years. In this case, the land register of the Canton of Geneva is considered :
Illustration of the Geneva land register in 2017-04 (left) and 2019-04 (right) - Data : Canton of GenevaOne can see that at such a scale, taking into account that the Canton of Geneva is one of the smallest in Switzerland, having a vision and a clear understanding of the modifications made between these two models is difficult by considering the two models separately.
It's precisely where differences models can be useful to understand and analyze the evolution of the land register, along both the space and time dimensions.
"},{"location":"PROJ-DTRK/#large-scale-analysis","title":"Large-Scale Analysis","text":"A first large-scale evaluation can be made on the overall models. A difference model can be computed considering the land register of Geneva in 2019 and 2017 as illustrated on the following image :
Difference model on Geneva land register between 2019-04 and 2017-04 - Data : Canton of GenevaTwo observations can be already made by looking at the difference model. In the first place, one can see that the amount of modifications brought to the land register is large in only two years. A large portion of the land register were subject to modifications or corrections, the unchanged parts being mostly limited outside the populated area.
In the second place, one can observe large portions where differences seem to be accumulating over this period of time. Looking at them more closely leads to the conclusion that these zones were actually completely modified, as all elements are highlighted by the difference detection algorithm. The following image gives a closer view of such an area of differences accumulation :
Focus on Carouge area of the 2019-04 and 2017-04 difference model - Data : Canton of GenevaDespite the amount of modifications outside this specific zone is also high, it is clear that the pointed zone contains more of them. Looking at it more closely leads to the conclusion that everything changed.
In order to understand these areas of differences accumulation, the the land register experts of the Canton of Geneva (SITG) were questioned. They provided an explanation for these specific areas. Between 2017 and 2019, these areas were subjected to a global correction in order to release the tension between the old reference frame LV03 [3] and the current one LV95 [4]. These corrections were made using the FINELTRA algorithm to modify the elements of the land register of the order of a few centimeters.
The land register of Geneva provided the following illustration summarizing these reference frame corrections made between 2017 and 2019 on the Geneva territory :
Reference frame corrections performed between 2017 and 2019 - Data : SITGComparing this map from the land register with the computed model allows seeing how differences detection can emphasize this type of corrections efficiently, as the corrected zones on this previous image corresponds to the difference accumulation areas on the computed difference model.
"},{"location":"PROJ-DTRK/#small-scale-analysis","title":"Small-Scale Analysis","text":"One can also dive deep into the details of the difference models. As we saw on the large scale analysis, two types of areas can be seen on the 2019-04-2017-04 difference model of Geneva : regular evolution with an accumulation of corrections and areas on which global corrections were applied. The following images propose a close view of these two types of situation :
Illustration of the two observed type of evolutions of the land register - Data : Canton of GenevaOn the left image above, one can observe the regular evolution of the land register where modifications are brought to the model in order to follow the evolution of the territory. On the right image above, one can see a close view of an area subjected to a global correction (reference frame), leading to a difference model highlighting all the elements.
Analyzing more closely the right image above lead the observer to conclude that not all the elements are actually highlighted by the difference detection algorithm. Indeed, some elements are rendered in gray on the difference model, indicating their lack of modification between the two compared times. The following image emphasizes the unchanged elements that can be observed :
Unchanged elements in the land register after reference frame correction - Data : SITGThese unchanged elements can be surprising as they're found in an area that was subject to a global reference frame correction. This shows how difference models can be helpful to track down these type of events in order to check whether these unchanged elements are expected or are the results of a discrepancy in the land register evolution.
Other example can be found in this very same area of the Geneva city. The following images give an illustration of two other close view where the unchanged element can be seen despite the reference frame correction :
Unchanged elements in the land register after reference frame correction - Data : SITGOn the left image above, one can observe that the unchanged elements are the railway tracks within the commune of Carouge. This is an interesting observation, as railway tracks can be considered as specific elements that can be subjected to different legislations regarding the land register. But it is clear that railway tracks were not considered in the reference frame correction.
On the right image above, one can see another example of unchanged elements that are more complicated to explain, as they're in the middle of modified other elements. This clearly demonstrate how difference models can be helpful for analyzing and assessing the evolution of the land register models. Such models are able to drive users or automated processes and lead them to focus on relevant aspects and to define the good question in the context of analyzing the evolution of the land register.
"},{"location":"PROJ-DTRK/#conclusion","title":"Conclusion","text":"The presented difference models computed based on two temporal versions of the land register and using the 4D framework algorithm showed how differences can be emphasized for users and automated processes [1]. Difference models can be helpful to determine the amount and nature of changes that appear in the land register. Applying such an algorithm on land register is especially relevant as it is a highly living model, that evolves jointly with the territory it describes.
Two main applications can be considered using difference models applied on the land register. In the first place, the difference models can be used to assess and analyze the regular evolution of the territory. Indeed, updating the land register is not a simple task. Such modifications involve a whole chain of decisions and verifications, from surveyors to the highest land register authority before to be integrated in the model. Being able to assess and analyze the modifications in the land register through difference models could be one interesting strengthening of the overall process.
The second application of difference models could be as an assessment tool of global corrections applied to the land register or parts of it. These modifications are often linked to the reference frame and its evolution. Being able to assess the corrections through the difference models could add a helpful tool in order to verify that the elements of the land register where correctly processed. In this direction, difference models could be used during the introduction of the DM.flex reference frame for both analyzing its introduction and demonstrating that difference models can be an interesting point of view.
"},{"location":"PROJ-DTRK/#reproduction-resources","title":"Reproduction Resources","text":"To reproduce the presented experiments, the STDL 4D framework has to be used and can be found here :
You can follow the instructions on the README to both compile and use the framework.
Unfortunately, the used data are not currently public. In both cases, the land register INTERLIS datasets were provided to the STDL directly. You can contact both Thurgau Kanton and SITG :
INTERLIS land register, Thurgau Kanton
INTERLIS land register, SITG (Geneva)
to query the data.
In order to extract and convert the data from the INTERLIS models, the following code is used :
where the README gives all the information needed.
For the 3D geographical coordinates conversion and heights restoration, we used two STDL internal tools. You can contact the STDL to obtain the tools and support in this direction :
ptolemee-suite : 3D coordinate conversion tool (EPSG:2056 to WGS84)
height-from-geotiff : Restoring geographical heights using topographic GeoTIFF (SRTM)
You can contact STDL for any question regarding the reproduction of the presented results.
"},{"location":"PROJ-DTRK/#references","title":"References","text":"[1] Automatic Detection of Changes in the Environment, N. Hamel, STDL 2020
[2] DM.flex reference frame
[3] LV03 Reference frame
[4] LV95 Reference frame
"},{"location":"PROJ-GEPOOL/","title":"Swimming Pool Detection from Aerial Images over the Canton of Geneva","text":"Alessandro Cerioni (Canton of Geneva) - Adrian Meyer (FHNW)
Proposed by the Canton of Geneva - PROJ-GEPOOL September 2020 to January 2021 - Published on May 18, 2021
Abstract: Object detection is one of the computer vision tasks which can benefit from Deep Learning methods. The STDL team managed to leverage state-of-art methods and already existing open datasets to first build a swimming pool detector, then to use it to potentially detect unregistered swimming pools over the Canton of Geneva. Despite the success of our approach, we will argue that domain expertise still remains key to post-process detections in order to tell objects which are subject to registration from those which aren't. Pairing semi-automatic Deep Learning methods with domain expertise turns out to pave the way to novel workflows allowing administrations to keep cadastral information up to date.
"},{"location":"PROJ-GEPOOL/#introduction","title":"Introduction","text":"The Canton of Geneva manages a register of swimming pools, counting - in principle - all and only those swimming pools that are in-ground or, at least, permanently fixed to the ground. The swimming pool register is part of a far more general cadastre, including several other classes of objects (cf. this page).
Typically the swimming pool register is updated either by taking building/demolition permits into account, or by manually checking its multiple records (4000+ to date) against aerial images, which is quite a long and tedious task. Exploring the opportunity of leveraging Machine Learning to help domain experts in such an otherwise tedious tasks was one of the main motivations behind this study. As such, no prior requirements/expectations were set by the recipients.
The study was autonomously conducted by the STDL team, using Open Source software and Open Data published by the Canton of Geneva. Domain experts were asked for feedback only at a later stage. In the following, details are provided regarding the various steps we followed. We refer the reader to this page for a thorough description of the generic STDL Object Detection Framework.
"},{"location":"PROJ-GEPOOL/#method","title":"Method","text":"Several steps are required to set the stage for object detection and eventually reach the goal of obtaining - ideally - even more than decent results. Despite the linear presentation that the reader will find here-below, multiple back-and-forths are actually required, especially through steps 2-4.
"},{"location":"PROJ-GEPOOL/#1-data-preparation","title":"1. Data preparation","text":"As a very first step, one has to define the geographical region over which the study has to be conducted, the so-called \"Area of Interest\" (AoI). In the case of this specific application, the AoI was chosen and obtained as the geometric subtraction between the following two polygons:
The so-defined AoI covers both the known \"ground-truth\" labels and regions over which hypothetical unknown objects are expected to be detected.
The second step consists in downloading aerial images from a remote server, following an established tiling strategy. We adopted the so-called \"Slippy Map\" tiling scheme. Aerial images were fetched from a raster web service hosted by the SITG and powered by ESRI ArcGIS Server. More precisely, the following dataset was used: ORTHOPHOTOS AGGLO 2018. According to our configuration, this second step produces a folder including one GeoTIFF image per tile, each image having a size of 256x256 pixels. In terms of resolution - or better, in terms of \"Ground Sampling Distance\" (GSD) - the combination of
yields approximately a GSD of ~ 60 cm/pixel. The tests we performed at twice the resolution showed little gain in terms of predictive power, surely not enough to support the interest in engaging 4x more resources (storage, CPU/GPU, ...).
The third step amounts to splitting the tiles covering the AoI (let's label them \"AoI tiles\") twice:
first, tiles are partitioned into two subsets, according to whether they include (GT
tiles) or not (oth
tiles) ground-truth labels:
\\(\\mbox{AoI tiles} = (\\mbox{GT tiles}) \\cup (\\mbox{oth tiles}),\\; \\mbox{with}\\; (\\mbox{GT tiles}) \\cap (\\mbox{oth tiles}) = \\emptyset\\)
Then, ground-truth tiles are partitioned into three other subsets, namely the training (trn
), validation (val
) and test (tst
) datasets:
\\(\\mbox{GT tiles} = (\\mbox{trn tiles}) \\cup (\\mbox{val tiles}) \\cup (\\mbox{tst tiles})\\)
with \\(A \\neq B \\Rightarrow A \\cap B = \\emptyset, \\quad \\forall A, B \\in \\{\\mbox{trn tiles}, \\mbox{val tiles}, \\mbox{tst tiles}, \\mbox{oth tiles}\\}\\)
We opted for the 70%-15%-15% dataset splitting strategy.
Slippy Map Tiles at zoom level 18 covering the Area of Interest, partitioned into several subsets: ground-truth (GT = trn + val + tst), other (oth).
Zoom over a portion of the previous image.
Concerning ground-truth labels, the final results of this study rely on a curated subset of the public dataset including polygons corresponding to the Canton of Geneva's registered swimming pools, cf. PISCINES. Indeed, some \"warming-up\" iterations of this whole process allowed us to semi-automatically identify tiles where the swimming pool register was inconsistent with aerial images, and viceversa. By manually inspecting the tiles displaying inconsistency, we discarded those tiles for which the swimming pool register seemed to be wrong (at least through the eyes of a Data Scientist; in a further iteration, this data curation step should be performed together with domain experts). While not having the ambition to return a \"100% ground-truth\" training dataset, this data curation step yielded a substantial gain in terms of \\(F_1\\) score (from ~82% to ~90%, to be more precise).
"},{"location":"PROJ-GEPOOL/#2-model-training","title":"2. Model training","text":"A predictive model was trained, stemming from one of the pre-trained models provided by Detectron2. In particular, the \"R50-FPN\" baseline was used (cf. this page), which implements a Mask R-CNN architecture leveraging a ResNet-50 backbone along with a Feature Pyramid Network (FPN). We refer the reader e.g. to this blog article for further information about this kind of Deep Learning methods.
Training a (Deep) Neural Network model means running an algorithm which iteratively adjusts the various parameters of a Neural Network (40+ million parameters in our case), in order to minimize the value of some \"loss function\". In addition to the model parameters (otherwise called \"weights\", too), multiple \"hyper-parameters\" exist, affecting the model and the way how the optimization is performed. In theory, one should automatize the hyper-parameters tuning, in order to eventually single out the best setting among all the possible ones. In practice, the hyper-parameters space is never fully explored; a minima, a systematic search should be performed, in order to find a \"sweet spot\" among a finite, discrete collection of settings. In our case, no systematic hyper-parameters tuning was actually performed. Instead, a few man hours were spent in order to manually tune the hyper-parameters, until a setting was found which the STDL team judged to be reasonably good (~90% \\(F_1\\) score on the test dataset, see details here-below). The optimal number of iterations was chosen so as to approximately minimize the loss on the validation dataset.
"},{"location":"PROJ-GEPOOL/#3-prediction","title":"3. Prediction","text":"Each image resulting from the tiling of the AoI constitutes - let's say - the \"basic unit of computation\" of this analysis. Thus, the model optimized at the previous step was used to make predictions over:
oth
images, meaning images covering no already known swimming pools; trn
, val
and tst
images, meaning images covering already known swimming pools.The combination of predictions 1 and 2 covers the entire AoI and allows us to discover potential new objects as well as to check whether some of the known objects are outdated, respectively.
Image by image, the model produces one segmentation mask per detected object, accompanied by a score ranging from a custom minimum value (5% in our setting) to 100%. The higher the score, the most the model is confident about a given prediction.
Sample detections of swimming pools, accompanied by scores. Note that multiple detections can concern the same object, if the latter extends over multiple tiles.
Let us note that not only swimming pools exhibiting only \"obvious\" features (bluish color, rectangular shape, ...) were detected, but also:
As a matter of fact, the training dataset was rich enough to also include samples of such somewhat tricky cases.
"},{"location":"PROJ-GEPOOL/#4-prediction-assessment","title":"4. Prediction assessment","text":"As described here in more detail, in order to assess the reliability of the predictive model predictions have to be post-processed so as to switch from the image coordinates - ranging from (0, 0) to (255, 255) in our case, where 256x256 pixel images were used - to geographical coordinates. This amounts to applying an affine transformation to the various predictions, yielding a vector layer which we can compare with ground-truth (GT
) data by means of spatial joins:
GT
data are referred to as \"true positives\" (TPs);GT
data are referred to as \"false positives\" (FPs);GT
objects which are not detected are referred to as \"false negatives\" (FNs).Example of a true positive (TP), a false positive (FP) and a false negative (FN). Note that both the TP and the FP object are detected twice, as they extend over multiple tiles.
The counting of TPs, FPs, FNs allow us to compute some standard metrics such as precision, recall and \\(F_1\\) score (cf. this Wikipedia page for further information). Actually, one count (hence one set of metrics) can be produced per choice of the minimum score that one is willing to accept. Choosing a threshold value (= thr
) means keeping all the predictions having a score >= thr
and discarding the rest. Intuitively,
Such intuitions can be confirmed by the following diagram, which we obtained by sampling the values of thr
by steps of 0.05 (= 5%), from 0.05 to 0.95.
True positives (TPs), false negatives (FNs), and false positives (FPs) counted over the test dataset, as a function of the threshold on the score: for a given threshold, all and only the predictions exhibiting a bigger score are kept.
Performance metrics computed over the test dataset as a function of the threshold on the score: for a given threshold, all and only the predictions exhibiting a bigger score are kept.
The latter figure was obtained by evaluating the predictions of our best model over the test dataset. Inferior models exhibited a similar behavior, with a downward offset in terms of \\(F_1\\) score. In practice, upon iterating over multiple realizations (with different hyper-parameters, training data and so on) we aimed at maximizing the value of the \\(F_1\\) score on the validation dataset, and stopped when the \\(F_1\\) score went over the value of 90%.
As the ground-truth data we used turned out not to be 100% accurate, the responsibility for mismatching predictions has to be shared between ground-truth data and the predictive model, at least in some cases. In a more ideal setting, ground-truth data would be 100% accurate and differences between a given metric (precision, recall, \\(F_1\\) score) and 100% should be imputed to the model.
"},{"location":"PROJ-GEPOOL/#domain-experts-feedback","title":"Domain experts feedback","text":"All the predictions having a score \\(\\geq\\) 5% obtained by our best model were exported to Shapefile and shared with the experts in charge of the cadastre of the Canton of Geneva, who carried out a thorough evaluation. By checking predictions against the swimming pool register as well as aerial images, it was empirically found that the threshold on the minimum score (= thr
) should be set as high as 97%, in order not to have too many false positives to deal with. In spite of such a high threshold, 562 potentially new objects were detected (over 4652 objects which were known when this study started), of which:
This figures show that:
Examples of \"actual false positives\": a fountain (left) and a tunnel (right).
Examples of detected swimming pools which are not subject to registration: placed on top of a building (left), inflatable hence temporary (right).
"},{"location":"PROJ-GEPOOL/#conclusion","title":"Conclusion","text":"The analysis reported in this document confirms the opportunity of using state-of-the-art Deep Learning approaches to assist experts in some of their tasks, in this case that of keeping the cadastre up to date. Not only the opportunity was explored and actually confirmed, but valuable results were also produced, leading to the detection of previously unknown objects. At the same time, our study also shows how essential domain expertise still remains, despite the usage of such advanced methods.
As a concluding remark, let us note that our predictive model may be further improved. In particular, it may be rendered less prone to false positives, for instance by:
Clotilde Marmy (ExoLabs) - Gwena\u00eblle Salamin (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo)
Proposed by the Canton of Jura - PROJ-HETRES October 2022 to August 2023 - Published on November 13, 2023
All scripts are available on GitHub.
Abstract: Beech trees are sensitive to drought and repeated episodes can cause dieback. This issue affects the Jura forests requiring the development of new tools for forest management. In this project, descriptors for the health state of beech trees were derived from LiDAR point clouds, airborne images and satellite images to train a random forest predicting the health state per tree in a study area (5 km\u00b2) in Ajoie. A map with three classes was produced: healthy, unhealthy, dead. Metrics computed on the test dataset revealed that the model trained with all the descriptors has an overall accuracy up to 0.79, as well as the model trained only with descriptors derived from airborne imagery. When all the descriptors are used, the yearly difference of NDVI between 2018 and 2019, the standard deviation of the blue band, the mean of the NIR band, the mean of the NDVI, the standard deviation of the canopy cover and the LiDAR reflectance appear to be important descriptors.
"},{"location":"PROJ-HETRES/#1-introduction","title":"1. Introduction","text":"Since the drought episode of 2018, the canton of Jura and other cantons have noticed dieback of the beech trees in their forests 1. In the canton of Jura, this problem mainly concerns the Ajoie region, where 1000 hectares of deciduous trees are affected 2. This is of concern for the productivity and management of the forest, as well as for the security of walkers. In this context, the R\u00e9publique et Canton du Jura has contacted the Swiss Territorial Data Lab to develop a new monitoring solution based on data science, airborne images and LiDAR point clouds. The dieback symptoms are observable in the mortality of branches, the transparency of the tree crown and the leaf mass partition 3.
The vegetation health state influences the reflectance in images (airborne and satellite), which is often used as a monitoring tool, in particular under the form of vegetation indices:
For instance, Brun et al. studied early-wilting in Central European forests with time series of the Normalized Difference Vegetation Index (NDVI) and estimate the surface concerned by early leaf-shedding 4.
Another technology used to monitor forests is light detection and ranging (LiDAR) as it penetrates the canopy and gives 3D information on trees and forest structures. Several forest and tree descriptors such as the canopy cover 5 or the standard deviation of crown return intensity 6 can be derived from the LiDAR point cloud to monitor vegetation health state.
In 5, the study was conducted at tree level, whereas in 6 stand level was studied. To work at tree level, it is necessary to segment individual trees in the LiDAR point cloud. On complex forests, like with a dense understory near tree stems, it is challenging to get correct segments without manual corrections.
The aim of this project is to provide foresters with a map to help plan the felling of beech trees in the Ajoie's forests. To do so, we developed a combined method using LiDAR point clouds and airborne and satellite multispectral images to determine the health state of beech trees.
"},{"location":"PROJ-HETRES/#2-study-area","title":"2. Study area","text":"The study was conducted in two areas of interest in the Ajoie region (Fig. 1.A); one near Mi\u00e9court (Fig. 1.B), the other one near Beurnev\u00e9sin (Fig. 1.C). Altogether they cover 5 km2, 1.4 % of the Canton of Jura's forests 7.
Mi\u00e9court sub-area is west-south and south oriented, whereas Beurnev\u00e9sin sub-area is rather east-south and south oriented. They are in the same altitude range (600-700 m) and are 2 km away from each other, thus near the same weather station.
Figure 1: The study area is composed of two areas of interest."},{"location":"PROJ-HETRES/#3-data","title":"3. Data","text":"The project makes use of different data types: LiDAR point cloud, airborne and satellite imagery, and ground truth data. Table 1 gives an overview of the data and their characteristics. Data have been acquired in late summer 2022 to have an actual and temporally correlated information on the health state of beech trees.
Table 1: Overview of the data used in the project.
Resolution Acquisition time Proprietary LiDAR 50-100 pts/m2 08.2022 R\u00e9publique et Canton du Jura Airborne images 0.03 m 08.2022 R\u00e9publique et Canton du Jura Yearly variation of NDVI 10 m 06.2015-08.2022 Bern University of Applied Science (HAFL) and the Federal Office for Environment (BAFU) Weekly vegetation health index 10 m 06.2015-08.2022 ExoLabs Ground truth - (point data) 08.-10.2022 R\u00e9publique et Canton du Jura "},{"location":"PROJ-HETRES/#31-lidar-point-cloud","title":"3.1 LiDAR point cloud","text":"The LiDAR dataset was acquired on the 16th of August 2023 and its point density is 50-100 pts/m\u00b2. It is classified in the following classes: ground, low vegetation (2-10m), middle vegetation (10-20m) and high vegetation (20 m and above). It was delivered in the LAS format and had reflectance values 8 in the intensity storage field.
"},{"location":"PROJ-HETRES/#32-airborne-images","title":"3.2 Airborne images","text":"The airborne images have a ground resolution of 3 cm and were acquired simultaneously to the LiDAR dataset. The camera captured the RGB bands, as well as the near infrared (NIR) one. The acquisition of images with a lot of overlap and oblique views allowed the production of a true orthoimage for a perfect match with the LiDAR point cloud and the data of the ground truth.
"},{"location":"PROJ-HETRES/#33-satellite-images","title":"3.3 Satellite images","text":"The Sentinel-2 mission from the European Space Agency is passing every 6 days over Switzerland and allows free temporal monitoring at a 10 m resolution. The archives are available back to the beginning of beech tree dieback in 2018.
"},{"location":"PROJ-HETRES/#331-yearly-variation-of-ndvi","title":"3.3.1 Yearly variation of NDVI","text":"The Bern University of Applied Science (HAFL) and the Federal Office for Environment (BAFU) have developed Web Services for vegetation monitoring derived from Sentinel-2 images. For this project, the yearly variation of NDVI 9 between two successive years is used. It measures the decrease in vegetation activity between August of one year (e.g. 2018) and June of the following year (e.g. 2019). The decrease is derived from rasters made of maximum values of the NDVI in June, July or August. The data are downloaded from the WCS service which delivers \"row\" indices: the NDVI values are not cut for a minimal threshold.
"},{"location":"PROJ-HETRES/#332-vhi","title":"3.3.2 VHI","text":"The Vegetation Health Index (VHI) was generated by ETHZ, WSL and ExoLab within the SILVA project 10 which proposes several indices for forest monitoring. VHI from 2016 to 2022 is used. It is computed mainly out of Sentinel-2 images, but also out of images from other satellite missions, in order to have data to obtain a weekly index with no time gap.
"},{"location":"PROJ-HETRES/#34-ground-truth","title":"3.4 Ground truth","text":"The ground truth was collected between August and October 2022 by foresters. They assessed the health of the beech trees based on four criteria 3:
In addition, each tree was associated with its coordinates and pictures as illustrated in Figure 1 and Figure 2 respectively. The forester surveyed: 75 healthy, 77 unhealthy and 56 dead trees.
Tree locations were first identified in the field with a GPS-enabled tablet on which the 2022 SWISSIMAGE mosaic was displayed. Afterwards, the tree locations were precisely adjusted on the trunk locations by visually locating the corresponding stems in the LiDAR point cloud with the help of the pictures taken in the field. The location and health status of a further 18 beech trees were added in July 2023. These 226 beeches - under which are 76 healthy, 77 affected and 73 dead trees - surveyed at the two dates are defined as the ground truth for this project.
Figure 2: Examples of the three health states: left, a healthy tree with a dense green tree crown; center, an unhealthy tree with dead twigs and a scarce foliage; right, a dead tree completely dry."},{"location":"PROJ-HETRES/#4-method","title":"4. Method","text":"The method developed is based on the processing of LiDAR point clouds and of airborne images. Ready-made vegetation indices derived from satellite imagery were also used. First, a segmentation of the trees in the LiDAR point cloud was carried out using the Digital-Forestry-Toolbox (DFT) 11. Then, descriptors for the health state of the beech trees were derived from each dataset. Boxplots and corresponding t-test are computed to evaluate the ability of descriptors to differentiate the three health states. A t-test value below 0.01 indicates that there is a significant difference between the means of two classes. Finally, the descriptors were used jointly with the ground truth to train a random forest (RF) algorithm, before inferring for the study area.
Figure 3: Overview of the methodology, which processes the data into health descriptors for beech trees, before training and evaluating a random forest."},{"location":"PROJ-HETRES/#41-lidar-processing","title":"4.1 LiDAR processing","text":"At the beginning of LiDAR processing, exploration of the data motivated the segmentation and descriptors computation.
"},{"location":"PROJ-HETRES/#412-data-exploration","title":"4.1.2 Data exploration","text":"In order to get an understanding of the available information at the tree level, we manually segmented three healthy, five unhealthy and three dead trees. More unhealthy trees have been segmented to better represent dieback symptoms. Vertical slices of each tree were rotary extracted, providing visual information on the health state.
"},{"location":"PROJ-HETRES/#413-segmentation","title":"4.1.3 Segmentation","text":"To be able to describe the health state of each tree, segmentation of the forest was performed using the DFT. Parameters have been tuned to find an appropriate segmentation. Two strategies for peak isolation were tested on the canopy height model (CHM):
Each peak isolation method was tested on a range of parameters and on different cell resolutions for the CHM computation. The detailed plan of the simulation is given in Appendix 1. The minimum tree height was set to 10 m. For computation time reasons, only 3 LiDAR tiles with 55 ground truth (GT) trees located on them were processed.
To find the best segmentation, the locations of the GT trees were compared to the location of the segment peaks. GT trees with a segmented peak less than 4 m away were considered as True Positive (TP). The best segmentation was the one with the most TP.
"},{"location":"PROJ-HETRES/#414-structural-descriptors","title":"4.1.4 Structural descriptors","text":"An alternative to the segmentation is to change of paradigm and perform the analyses at the stand level. Meng et al. 6 derived structural descriptors for acacia dieback at the stand level based on LiDAR point cloud. By adapting their method to the present case, the following descriptors were derived from the LiDAR point cloud using the LidR library from R 12:
Descriptors 1 to 6 are directly overtaken from Meng et al. All the descriptors were first computed for three grid resolutions: 10 m, 5 m and 2.5 m. In a second time, the DFT segments were considered as an adaptive grid around the trees, with the assumption that it is still more natural than a regular grid. Then, structural descriptors for vertical points distribution (descriptors 1 to 4) were computed on each segment, whereas descriptors for horizontal points distribution (descriptors 5 to 7) have been processed for 2.5 m grid. A weight was applied to the value of the latter descriptors according to the area of grid cells included in the footprint of the segments.
Furthermore, LiDAR reflectance mean and standard deviation (sd) were computed for the segment crowns to differentiate them by their reflectance.
"},{"location":"PROJ-HETRES/#42-image-processing","title":"4.2 Image processing","text":"For the image processing, an initial step was to compute the normalized difference vegetation index (NDVI) for each raster image. The normalized difference vegetation index (NDVI) is an index commonly used for the estimation of the health state of vegetation 51314.
\\[\\begin{align} \\ NDVI = {NIR-R \\over NIR+R} \\ \\end{align}\\]where NIR and R are the value of the pixel in the near-infrared and red band respectively.
To uncover potential distinctive features between the classes, boxplots and principal component analysis were used on the images four bands (RGB-NIR) and the NDVI.
Firstly, we tested if the brute pixel values allowed the distinction between classes at a pixel level. This method avoids the pit of the forest segmentation into trees. Secondly, we tested the same method, but with some low-pass filter to reduce the noise in the data. Thirdly, we tried to find distinct statistical features at the tree level. This approach allows decreasing the noise that can be present in high-resolution information. However, it necessitates having a reasonably good segmentation of the trees. Finally, color filtering and edge detection were tested in order to highlight and extract the linear structure of the branches.
For each treatment, it is possible to do it with or without a mask on the tree height. As only trees between 20 m and 40 m tall are affected by dieback, a mask based on the Canopy Height Model (CHM) raster derived from the LiDAR point cloud was tested.
Figure 4: Overview of different possible data treatments for the the statistical analysis."},{"location":"PROJ-HETRES/#421-statistical-tests-on-the-original-and-filtered-pixels","title":"4.2.1 Statistical tests on the original and filtered pixels","text":"The statistical tests were performed on the original and filtered pixels.
Two low pass filters were tested:
In the original and the filtered cases, the pixels for each GT tree were extracted from the images and sorted by class. Then, the corresponding NDVI is computed. Each pixel has 5 attributes corresponding to its value on the four bands (R, G, B, NIR) and its NDVI. First, the per-class boxplots of the attributes were executed to see if the distinction between classes was possible on one or several bands or on the NDVI. Then, the principal component analysis (PCA) was computed on the same values to see if their linear combination allowed the distinction of the classes.
"},{"location":"PROJ-HETRES/#422-statistical-tests-at-the-tree-level","title":"4.2.2. Statistical tests at the tree level","text":"For the tests at the tree level, the GT trees were segmented by hand. For each tree, the statistics of the pixels were calculated over its polygon, on each band and for the NDVI. Then, the results were sorted by class. Each tree has five attributes per band or index corresponding to the statistics of its pixels: minimum (min), maximum (max), mean, median and standard deviation (std).
Like with the pixels, the per-class boxplots of the attributes were executed to see if the distinction between classes was possible. Then, the PCA was computed.
"},{"location":"PROJ-HETRES/#423-extraction-of-branches","title":"4.2.3 Extraction of branches","text":"One of the beneficiaries noted that the branches are clearly visible on the RGB images. Therefore, it may be possible to isolate them with color filtering based on the RGB bands. We calibrated an RGB filter through trial and error to produce a binary mask indicating the location of the branches. A sieve filter was used to reduce the noise due to the lighter parts of the foliage. Then, a binary dilation was performed on the mask to highlight the results. Otherwise, they would be too thin to be visible at a 1:5'000 scale. A mask based on the CHM is integrated to the results to limit the influence of the ground.
The branches have a characteristic linear structure. In addition, the branches of dead trees tend to be very light line on the dark forest ground and understory. Therefore, we thought that we may detect the dead branches thanks to edge detection. We used the canny edge detector and tested the python functions of the libraries openCV and skimage.
"},{"location":"PROJ-HETRES/#43-satellite-based-indices","title":"4.3 Satellite-based indices","text":"The yearly variation of NDVI and the VHI were used to take account of historical variations of NDVI from 2015 to 2022. For the VHI, the mean for each year is computed over the months considered for the yearly variation of NDVI.
The pertinence of using these indices was explored: the values for each tree in the ground truth were extracted and observed in boxplots per health class in 2022 per year pair over the time span from 2015 to 2022.
"},{"location":"PROJ-HETRES/#44-random-forest","title":"4.4 Random Forest","text":"In R 12, the caret and randomForest packages were used to train the random forest and make predictions. First, the ground truth was split into the training and the test datasets, with each class being split 70 % into the training set and 30 % into the test set. Health classes with not enough samples were completed with copies. Optimization of the RF was performed on the number of trees to develop and on the number of randomly sampled descriptors to test at each split. In addition, 5-fold cross-validation was used to ensure the use of different parts of the dataset. The search parameter space was from 100 to 1000 decision trees and from 4 to 10 descriptors as the default value is the square root of all descriptors, i.e. 7. RF was assessed using a custom metric, which is an adaptation of the false positive rate for the healthy class. It minimizes the amount of false healthy detections and of dead trees predicted as unhealthy (false unhealthy). It is called custom false positive rate (cFPR) in the text. It was preferred to have a model with more unhealthy predictions to control on the field, than missing unhealthy or dead trees. The cFPR goes from 0 (best) to 1 (worse).
Table 2: Confusion matrix for the three health classes.
Ground truth Healthy Unhealthy Dead Prediction Healthy A B C Unhealthy D E F Dead G H IAccording to the confusion matrix in Table 2, the cFPR is computed as follows:
\\[\\begin{align} \\ cFPR = {(\ud835\udc35+\ud835\udc36+\ud835\udc39)\\over(\ud835\udc35+\ud835\udc36+\ud835\udc38+\ud835\udc39+\ud835\udc3b+\ud835\udc3c)}. \\ \\end{align}\\]In addition, the overall accuracy (OA), i.e. the ratio of correct predictions over all the predictions, and the sensitivity, which is, per class, the number of correct predictions divided by the number of samples from that class, are used.
An ablation study was performed on descriptors to assess the contribution of the different data sources to the final performance. An \u201cimportant\u201d descriptor is having a strong influence on the increase in prediction errors in the case of random reallocation of the descriptor values in the training set.
After the optimization, predictions for each DFT segments were computed using the best model according to the cFPR. The inferences were delivered as a thematic map with colors indicating the health state and hue indicating the fraction of decision trees in the RF having voted for the class (vote fraction). The purpose is to give a confidence information, with high vote fraction indicating robust predictions.
Furthermore, the ground truth was evaluated for quantity and quality by two means:
Finally, after having developed the descriptors and the routine on high-quality data, we downgraded them to have resolutions similar to the ones of the swisstopo products (LiDAR: 20 pt/m2, orthoimage: 10 cm) and performed again the optimization and prediction steps. Indeed, the data acquisition was especially commissioned for this project and only covers the study area. If in the future the method should be extended, one would like to test if a lower resolution as the one of the standard national-wide product SWISSIMAGE could be sufficient.
"},{"location":"PROJ-HETRES/#5-results-and-discussion","title":"5 Results and discussion","text":"In this section, the results obtained during the processing of each data source into descriptors are presented and discussed, followed by a section on the random forest results.
"},{"location":"PROJ-HETRES/#51-lidar-processing","title":"5.1 LiDAR processing","text":"For the LiDAR data, the reader will first discover the aspect of beech trees in the LiDAR point cloud according to their health state as studied in the data exploration. Then, the segmentation results and the obtained LiDAR-based descriptors will be presented.
"},{"location":"PROJ-HETRES/#512-data-exploration-for-11-beech-trees","title":"5.1.2 Data exploration for 11 beech trees","text":"The vertical slices of 11 beech trees provided visual information on health state: branch shape, clearer horizontal and vertical point distribution. In Figure 5, one can appreciate the information shown by these vertical slices. The linear structure of the dead branches, the denser foliage of the healthy tree and the already smaller tree crown of the dead tree are well recognizable.
Figure 5: Slices for three trees with different health state. Vertical slices of each tree were rotary extracted, providing visual information on the health state. Dead twigs and density of foliage are particularly distinctive.Some deep learning image classifier could treat LiDAR point cloud slices as artificial images and learn from them before classifying any arbitrary slice from the LiDAR point cloud. However, the subject is not adapted to transfer learning because 200 samples are not enough to train a model to classify three new classes, especially via images without resemblance to datasets used to pre-train deep learning models.
"},{"location":"PROJ-HETRES/#513-segmentation","title":"5.1.3 Segmentation","text":"Since the tree health classes were visually recognizable for the 11 trees, it was very interesting to individuate each tree in the LiDAR point cloud.
After having searched for optimal parameters in the DFT, the best realization of each peak isolation method either slightly oversegmented or slightly undersegmented the forest. The forest has a complex structure with dominant and co-dominant trees, and with understory. A simple yet frequent example is the situation of a small pine growing in the shadow of a beech tree. It is difficult for an algorithm to differentiate between the points belonging to the pine and those belonging to the beech. Complex tree crowns (not spheric, with two maxima) especially lead to oversegmentation.
As best segmentation, the smoothing of maxima on a 0.5 m resolution CHM was identified. Out of 55 GT trees, 52 were within a 4 m distance from the centroid of a segment. The total number of segments is 7347. This corresponds to 272 trees/ha. Report of a forest inventory in the Jura forest between 2003 and 2005 indicated a density of 286 trees/ha in high forest 7. Since the ground truth is only made of point coordinates, it is difficult to assess quantitatively the correctness of the segments, i.e. the attribution of each point to the right segment. Therefore, the work at the tree level is only approximate.
"},{"location":"PROJ-HETRES/#514-structural-descriptors","title":"5.1.4 Structural descriptors","text":"Nevertheless, the structural descriptors for each tree were computed from the segmented LiDAR point cloud. The t-test between health classes for each descriptor at each resolution (10 m, 5 m, 2.5 m and per-tree grid) are given in Appendices 2, 3, 4 and 5. The number of significant descriptors per resolution is indicated to understand better the effect on the RF:
The simulations at 5 m and at 2.5 m seemed a priori the most promising. In both constellations, t-tests indicated a significant different distribution for:
The maximal height and the sdCHM appear to be the most suited descriptors to separate the three health states. The other descriptors are differentiating healthy trees from the others or dead trees from the others. From the 11 LiDAR-based descriptors, 8 are at least significant for the comparison between two classes.
"},{"location":"PROJ-HETRES/#52-image-processing","title":"5.2 Image processing","text":"Boxplots and PCA are given to illustrate the results of the image processing exploration. As the masking of pixels below and above the affected height made no difference in the interpretation of the results, they are presented here with the height mask.
"},{"location":"PROJ-HETRES/#521-boxplots-and-pca-over-the-pixel-values-of-the-original-images","title":"5.2.1 Boxplots and PCA over the pixel values of the original images","text":"When the pixel values of the original images per health class are compared in boxplots (ex. Fig. 6), the sole brute value of the pixel is not enough to clearly distinguish between classes.
Figure 6: Boxplots of the unfiltered pixel values on the different bands and the NDVI index by health class.The PCA in Figure 7 shows that it is not possible to distinguish the groups based on a linear combination of the brute pixel values of the band and NDVI.
Figure 7: Distribution of the pixels in the space of the principal components based on the pixel values on the different branches and the NDVI. "},{"location":"PROJ-HETRES/#522-boxplots-and-pca-over-the-pixel-values-of-the-filtered-images","title":"5.2.2 Boxplots and PCA over the pixel values of the filtered images","text":"A better separation of the different classes is noticeable after the application of a Gaussian filter. The most promising band is the NIR one for a separation of the healthy and dead classes. On the NDVI, the distinction between those two classes should also be possible as illustrated in Figure 8. In all cases, there is no possible distinction between the healthy and unhealthy classes. The separation between the healthy and dead trees on the NIR band would be around 130 and the slight overlap on the NDVI band is between approx. 0.04 and approx. 0.07.
Figure 8: Boxplots of the pixel values on the different bands and the NDVI by health class after a Gaussian filter with sigma=5.As for the brute pixels, the overlap between the different classes is still very present in the PCA (Fig. 9).
Figure 9: Distribution of the pixels in the space of the principal components based on the pixel values on the different branches and the NDVI after a Gaussian filter with sigma=5.The boxplots produced on the resampled images (Figure 10) give similar results to the ones with the Gaussian filter. The healthy and dead classes are separated on the NIR band around 130. The unhealthy class stays similar to the healthy one.
Figure 10: Boxplots of the pixel values on the different bands and the NDVI by health class after a downsampling filter with a factor 1/3.According to the PCA in Figure 11, it seems indeed not possible to distinguish between the classes only with the information presented in this section.
Figure 11: Distribution of the pixels in the space of the principal components based on the pixel values on the different branches and the NDVI after a downsampling filter with a factor 1/3.When the factor for the resampling is decreased, i.e. when the resulting resolution increases, the separation on the NIR band becomes stronger. With a factor of 1/17, the healthy and dead classes on the NDVI are almost entirely separated around the value of 0.04.
"},{"location":"PROJ-HETRES/#523-boxplots-and-pca-over-the-tree-statistics","title":"5.2.3 Boxplots and PCA over the tree statistics","text":"As an example for the per-tree statistics, the boxplots and PCA for the blue band are presented in Figures 12 to 14. On the mean and on the standard deviation, healthy and dead classes are well differentiated on the blue band as visible on Figure 12. The same is observed on the mean, median, and minimum of the NDVI, as well as on the maximum, mean, and median of the NIR band. However, there is no possible differentiation on the red and green bands.
Figure 12: Boxplots of the statistics values for each tree on the blue band by health class.In the PCA in Figure 13, the groups of the healthy and dead trees are quite well separated, mostly along the first component.
Figure 13: Distribution of the trees in the space of the principal components based on their statistical values on the blue band.On Figure 14, the first principal component is influenced principally by the standard deviation of the blue band. The mean, the median and the max have an influence too. This is in accordance with the boxplots where the standard deviation values presented the largest gap between classes.
Figure 14: Influence of the statistics for the blue band on the first and second principal components.The point clouds of the dead and healthy classes are also well separated on the PCA of the NIR band and of the NDVI. No separation is visible on the PCA of the green and red bands.
"},{"location":"PROJ-HETRES/#524-extraction-of-branches","title":"5.2.4 Extraction of branches","text":"Finally, the extraction of dead branches was performed.
"},{"location":"PROJ-HETRES/#use-of-an-rgb-filter","title":"Use of an RGB filter","text":"The result of the RGB filter is displayed in Figure 15. It is important to include the binary CHM in the visualization. Otherwise, the ground can have a significant influence on certain zones and distract from the dead trees. Some interferences can still be seen among the coniferous trees that have a similar light color as dead trees.
Figure 15: Results produced by the RGB filter for the detection and highlight of dead branches over a zone with coniferous, healthy deciduous and dead deciduous trees. The parts in grey are the zones masked by the filter on the height."},{"location":"PROJ-HETRES/#use-of-the-canny-edge-detector","title":"Use of the canny edge detector","text":"Figure 16 presents the result for the blue band which was the most promising one. The dead branches are well captured. However, there is a lot of noise around them due to the high contrasts in some parts of the foliage. The result is not usable as is. Using a stricter filter decreased the noise, but it also decreased the captured pixels of the branches. In addition, using a sieve filter or trying to combine the results with the ones of the RGB filter did not improve the situation.
Figure 16: Test of the canny edge detector from sklearn over a dead tree by using only the blue band. The parts in grey are the zones masked by the CHM filter on the height.The results for the other bands, RGB images or the NDVI were not usable either.
"},{"location":"PROJ-HETRES/#525-discussion","title":"5.2.5 Discussion","text":"The results at the tree level are the most promising ones. They are integrated into the random forest. Choosing to work at the tree-level means that all the trees must be segmented with the DFT. This adds uncertainties to the results. As explained in the dedicated section, the DFT has a tendency of over/under-segmenting the results. The procedures at the pixel level, whether on filtered or unfiltered images, are abandoned.
For the branch detection, the results were compared with some observations on the terrain by a forest expert. He assessed the result as incorrect in several parts of the forest. Therefore, the use of dead branch detection was not integrated in the random forest. In addition, the edge detection was maybe not the right choice for dead branches and maybe we should have taken an approach more focused on detection of straight lines or graphs. The chance of success of such methods are difficult to predict as there can be a lot of variations in the form of the dead branches.
"},{"location":"PROJ-HETRES/#53-vegetation-indices-from-satellite-imagery","title":"5.3 Vegetation indices from satellite imagery","text":"The t-test used to evaluate the ability of satellite indices to differentiate between health states are given in Appendices 6 and 7. In the following two subsections, solely the significant tested groups are mentioned for understanding the RF performance.
"},{"location":"PROJ-HETRES/#531-yearly-variation-of-ndvi","title":"5.3.1 Yearly variation of NDVI","text":"t-test on the yearly variation of NDVI indicated significance between:
t-test on the VHI indicated significance between:
Explanations similar to those for NDVI may partly explain the significance obtained. In any case,it is encouraging that the VHI helps to differentiate health classes thanks to different evolution through the years.
"},{"location":"PROJ-HETRES/#54-random-forest","title":"5.4 Random Forest","text":"The results of the RF that are presented and discussed are: (1) the optimization and ablation study, (2) the ground truth analysis, (3) the predictions for the AOI and (4) the performance with downgraded data.
"},{"location":"PROJ-HETRES/#541-optimization-and-ablation-study","title":"5.4.1 Optimization and ablation study","text":"In Table 3, performance for VHI and yearly variation of NDVI (yvNDVI) descriptors using their value at the location of the GT trees are compared. VHI (cFPR = 0.24, OA = 0.63) performed better than the yearly variation of NDVI (cFPR = 0.39, OA = 0.5). Both groups of descriptors are mostly derived from satellite data with the same resolution (10 m). A conceptual difference is that the VHI is a deviation to a long-term reference value; whereas the yearly variation of NDVI reflects the change between two years. For the latter, values can be high or low independently of the actual health state. Example, a succession of two bad years will indicate few to no differences in NDVI.
Table 3: RF performance with satellite-based descriptors.
Descriptors cFPR OA VHI 0.24 0.63 yvNDVI 0.39 0.5Nonetheless, only the yearly variation of NDVI is used hereafter as it is available free of charge.
Regarding the LiDAR descriptors, the tested resolutions indicated that the 5 m resolution (cFPR = 0.2 and OA = 0.65) was performing the best for the cFPR, but that the per-tree descriptors had the higher OA (cFPR = 0.33, OA = 0.67). At 5 m resolution, fewer affected trees are missed, but there are more errors in the classification, so more control on the field would have to be done. The question of which grid resolution to use on the forest is a complex one, as the forest consists of trees of different sizes. Further, even if dieback affects some areas more severely than others, it's not a continuous phenomenon, and it is important to be able to clearly delimit each tree. However, a grid, as the 2.5 m one, can also hinder to capture the entirety of some trees and the performance may decrease (LiDAR, 2.5 m, OA=0.63).
Table 4: RF performance with LiDAR-based descriptors at different resolutions.
Descriptors cFPR OA LiDAR, 10 m 0.3 0.6 LiDAR, 5 m 0.2 0.65 LiDAR, 2.5 m 0.28 0.63 LiDAR, per tree 0.33 0.67Then, the 5 m resolution descriptors are kept for the rest of the analysis according to the decision of reducing missed dying trees.
The ablation study performed on the descriptor sources is summarized in Table 5.A and Table 5.B. The two tables reflect performance for two different partitions of the samples in training and test sets. Since the performance is varying form several percents, the performance is impacted by the repartition of the samples. Following those values, the best setups for each partition respectively are the full model (cFPR = 0.13, OA = 0.76) and the airborne-based model (cFPR = 0.11, OA = 0.79).
One notices that all the health classes are not predicted with the same accuracy. The airborne-based model, as described in Section 5.2.3, is less sensitive to the healthy class; whereas the satellite-based model and the LiDAR-based model is more polarized to healthy and dead classes, with low sensitivity performance in the unhealthy class.
Table 5.A: Ablation study results, partition A of the dataset.
Descriptor sources cFPR OA Sensitivity healthy Sensitivity unhealthy Sensitivity dead LiDAR 0.2 0.65 0.65 0.61 0.71 Airborne images 0.18 0.63 0.43 0.61 0.94 yvNDVI 0.4 0.49 0.78 0.26 0.41 LiDAR and yvNDVI 0.23 0.7 0.74 0.61 0.76 Airborne images and yvNDVI 0.15 0.73 0.65 0.7 0.88 LiDAR, airborne images and yvNDVI 0.13 0.76 0.65 0.74 0.94Table 5.B: Ablation study results, partition B of the dataset.
Descriptor sources cFPR OA Sensitivity healthy Sensitivity unhealthy Sensitivity dead LiDAR 0.19 0.71 0.76 0.5 0.88 Airborne images 0.11 0.79 0.62 0.8 1 yvdNDVI 0.38 0.62 0.81 0.4 0.65 LiDAR and yvNDVI 0.27 0.74 0.86 0.5 0.88 Airborne images and yvNDVI 0.14 0.78 0.62 0.8 0.94 LiDAR, airborne images and yvNDVI 0.14 0.79 0.71 0.7 1Even if the performance varies according to the dataset partition, the important descriptors remain quite similar between the two partitions as displayed in Figure 17.A and Figure 17.B. The yearly difference of NDVI between 2018 and 2019 (NDVI_diff_1918) is the most important descriptor; standard deviation on the blue band (b_std) and the mean on the NIR band and NDVI (nir_mean and ndvi_mean) are standing out in both cases; from the LiDAR, the standard deviation of canopy cover (sdcc) and of the LiDAR reflectance (i_sd_seg) are the most important descriptors. The order of magnitude explains the better performance on partition B with the airborne-based model: for instance, the b_std has the magnitude of 7.6 instead of 4.6 with Partition B.
Figure 17.A: Important descriptors for the full model, dataset partition A. Figure 17.B: Important descriptors for the full model, dataset partition B.The most important descriptor of the full model resulted to be the yearly variation of NDVI between 2018 and 2019. The former was a year with a dry and hot summer which has stressed beech trees and probably participated to cause forest damages 1. This corroborates the ability of our RF method to monitor the response of trees to extreme drought events. However, the 10 m resolution of the index and the different adaptability of individual beech trees to drought may make the relationship between current health status and the index weak. This can explain that the presence of this descriptor in the full model doesn't offer better performance than the airborne-based model to predict the health state.
Both the mean on the NIR band and the standard deviation on the blue band play an important role. Statistical study in Section 5.2.3 indicated that the models might confuse healthy and unhealthy classes. On one hand, airborne imagery only sees the top of the crown and may miss useful information on hidden part. On the other hand, airborne imagery has a good ability to detect dead trees thanks to different reflectance values in NIR and blue bands.
One argument that could explain the lower performance of the model based on LiDAR-based descriptors is the difficulty to find the right scale to perform the analysis as beech trees can show a wide range of crown diameters.
"},{"location":"PROJ-HETRES/#542-ground-truth-analysis","title":"5.4.2 Ground truth analysis","text":"With progressive removal of sample individuals from the training set, impact of individual beech trees on the performance is further analyzed. The performance variation is shown in Figure 18. The performance is rather stable in the sense that the sensitivities stay in a range of values similar to the initial one up to 40 samples removed, but with each removal, a slight instability in the metrics is visible. The size of the peaks indicates variations of 1 prediction for the dead class, but up to 6 predictions for the unhealthy class and up to 7 for the healthy class. During the sample removal, some samples were always predicted correctly, whereas others were often misclassified leading to the peaks in Figure 18. With the large number of descriptors in the full model, there is no straightforward profile of outliers to identify.
Figure 18: Evolution of the per-class sensitivity with removal of samples.In addition, the subsampling of the training set in Table 6 shows that the OA varies only by max. 3% according to the subset used. It indicated again that the amount of ground truth allows to reach a stable OA range, but the characteristics of the samples does not allow a stable OA value. The sensitivity for the dead classes is stable, whereas sensitivity for healthy and unhealthy class are varying.
Table 6: Performance according to different random seed for the creation of the training subset.
Training set subpartition cFPR OA Sensitivity healthy Sensitivity unhealthy Sensitivity dead Random seed = 2 0.13 0.76 0.61 0.83 0.88 Random seed = 22 0.15 0.78 0.70 0.78 0.88 Random seed = 222 0.18 0.75 0.65 0.74 0.88 Random seed = 2222 0.13 0.76 0.65 0.78 0.88 Random seed = 22222 0.10 0.78 0.65 0.83 0.88 "},{"location":"PROJ-HETRES/#543-predictions","title":"5.4.3 Predictions","text":"The full model and the airborne-based-model were used to infer the health state of trees in the study area (Fig. 19). As indicated in Table 7, with the full model, 35.1 % of the segments were predicted as healthy, 53 % as unhealthy and 11.9 % as dead. With the airborne-based model, 42.6 % of the segments were predicted as healthy, 46.2 % as unhealthy and 11.2 % as dead. The two models agree on 74.3 % of the predictions. In the 25.6 % of disagreement, it is about 77.1% of disagreement between healthy and unhealthy predictions. Finally, 1.5% are critical disagreement (between healthy and dead classes).
Table 7: Percentage of health in the AOI.
Model Healthy [%] Unhealth [%] Dead [%] Full 35.1 53 11.9 Airborne-based 42.6 46.2 11.2Control by forestry experts reported that the predictions mostly correspond to the field situation and that a weak vote fraction often corresponds to false predictions. They confirmed that the map is delivering useful information to help plan beech tree felling. The final model retained after excursion on the field is the full model.
Figure 19: Extract of the predicted thematic health map. Green is for healthy, yellow for unhealthy, and red for dead trees. Hues indicate the RF fraction of votes. The predictions can be compared with the true orthophoto in the background. The polygons approximating the tree crowns correspond to the delimitation of segmented trees."},{"location":"PROJ-HETRES/#544-downgraded-data","title":"5.4.4 Downgraded data","text":"Finally, random forest models are trained and tested on downgraded data with the partition A of the ground truth for all descriptors and by descriptor sources. With this partition, RF have a better cFPR for the full model (0.08 instead of 0.13), the airborne-based model (0.08 instead of 0.21) and the LiDAR-based model (0.28 instead of 0.31). The OA is also better (full model: 0.84 instead of 0.76, airborne-based model: 0.77 instead of 0.63), except in the case of the LiDAR-based model (0.63 instead of 0.66). It indicated that the resolution of 10 cm in the aerial imagery does not weaken the model and can even improve it. For the LiDAR point cloud, a reduction by a factor 5 of the density has not changed much the performance.
Table 7.A: Performance for RF trained and tested with the partition A of the dataset of downgraded data.
Simulation cFPR OA Full 0.08 0.84 Airborne-based 0.08 0.77 LiDAR-based 0.28 0.63Table 7.A: Performance for RF trained and tested with the partition A of the dataset for original data.
Simulation cFPR OA Full 0.13 0.76 Airborne-based 0.21 0.63 LiDAR-based 0.31 0.66When the important descriptors are compared between the original and downgraded model, one notices that the airborne descriptors gained in importance in the full model when data are downgraded. The downgraded model showed sufficient accuracy for the objective of the project.
"},{"location":"PROJ-HETRES/#6-conclusion-and-outlook","title":"6 Conclusion and outlook","text":"The study has demonstrated the ability of a random forest algorithm to learn from structural descriptors derived from LiDAR point clouds and from vegetation reflectance in airborne and satellite images to predict the health state of beech trees. Depending on the used datasets for training and test, the optimized full model including all descriptors reached an OA of 0.76 or of 0.79, with corresponding cFPR values of 0.13 and 0.14 respectively. These metrics are sufficient for the purpose of prioritizing beech tree felling. The produced map, with the predicted health state and the corresponding votes for the segments, delivers useful information for forest management. The cantonal foresters validated the outcomes of this proof-of-concept and explained how the location of affected beech trees as individuals or as groups are used to target high-priority areas. The full model highlighted the importance of the yearly variation of NDVI between a drought year (2018) and a normal year (2019). The airborne imagery showed good ability to predict dead trees, whereas confusion remained between healthy and unhealthy trees. The quality of the LiDAR point cloud segmentation may explain the limited performance of the LiDAR-based model. Finally, the model trained and tested on downgraded data gave an OA of 0.84 and a cFPR of 0.08. In this model, the airborne-based descriptors gained in importance. It was concluded that a 10 cm resolution may help the model by reducing the noise in the image.
Outlooks for improving results include improving\u00a0the ground truth representativeness of symptoms in the field\u00a0and continuing research into descriptors for differentiating between healthy and unhealthy trees:
The possibility of further developments put aside, the challenge is now the extension of the methodology to a larger area. The simultaneity of the data is necessary to an accurate analysis. It has been shown that the representativeness of the ground truth has to be improved to obtain better and more stable results. Thus, for an extension to further areas, we recommend collecting additional ground truth measurements. The health state of the trees showed some autocorrelation that could have boosted our results and make them less representative of the whole forest. They should be more scattered in the forest.
Furthermore, required data are a true orthophoto and a LiDAR point cloud for per-tree analysis. It should be possible to use an old LiDAR acquisition to produce a CHM and renounce to use LiDAR-based descriptors without degrading the performance of the model too much.
"},{"location":"PROJ-HETRES/#7-appendixes","title":"7 Appendixes","text":""},{"location":"PROJ-HETRES/#71-simulation-plan-for-dft-parameter-tuning","title":"7.1 Simulation plan for DFT parameter tuning","text":"
Table 8: parameter tuning for DFT.
CHM cell size [m] Maxima smoothing Local maxima within search radius 0.50 0.1 (3.09632 + 0.00895 * h^2)/2 0.3 (1.7425 * h^0.5566)/2 0.5 (1.2 + 0.16 * h)/2 1.00 0.1 (3.09632 + 0.00895 * h^2)/2 0.3 (1.7425 * h^0.5566)/2 0.5 (1.2 + 0.16 * h)/2 1.50 0.1 (3.09632 + 0.00895 * h^2)/2 0.3 (1.7425 * h^0.5566)/2 0.5 (1.2 + 0.16 * h)/2 2.00 0.1 (3.09632 + 0.00895 * h^2)/2 0.3 (1.7425 * h^0.5566)/2 0.5 (1.2 + 0.16 * h)/2 "},{"location":"PROJ-HETRES/#72-t-tests","title":"7.2 t-tests","text":"t-test were computed to evaluate the ability of descriptors to differentiate the three health states. A t-test value below 0.01 indicates that there is a significant difference between the means of two classes.
"},{"location":"PROJ-HETRES/#721-t-tests-on-lidar-based-descriptors-at-10-m","title":"7.2.1 t-tests on LiDAR-based descriptors at 10 m","text":"
Table 9: t-test on LiDAR-based descriptors at 10 m.
Descriptors healthy vs. unhealthy healthy vs. dead unhealthy vs. dead maximal height 0.002 1.12E-11 3.23E-04 scale parameter 0.005 0.014 0.964 shape parameter 0.037 0.002 0.269 cvLAD 0.001 2.22E-04 0.353 VCI 0.426 0.094 0.358 mean reflectance 4.13E-05 0.002 0.164 sd of reflectance 0.612 3.33E-06 9.21E-05 canopy cover 0.009 0.069 0.340 sdCC 0.002 0.056 0.324 sdCHM 0.316 0.262 0.892 AGH 0.569 0.055 0.120 "},{"location":"PROJ-HETRES/#722-t-test-on-lidar-based-descriptors-at-5-m","title":"7.2.2 t-test on LiDAR-based descriptors at 5 m","text":"
Table 10: t-test on LiDAR-based descriptors at 5 m.
Descriptors healthy vs. unhealthy healthy vs. dead unhealthy vs. dead maximal height 0.001 4.67E-12 1.73E-04 scale parameter 0.072 0.831 0.204 shape parameter 0.142 0.654 0.361 cvLAD 9.14E-06 3.22E-05 0.667 VCI 0.006 0.104 0.485 mean reflectance 6.60E-05 2.10E-06 0.249 sd of reflectance 0.862 2.26E-08 9.24E-08 canopy cover 0.288 0.001 0.003 sdCC 1.42E-05 1.94E-11 0.001 sdCHM 0.004 1.94E-08 0.002 AGH 0.783 0.071 0.095 "},{"location":"PROJ-HETRES/#723-t-test-on-lidar-based-descriptors-at-25-m","title":"7.2.3 t-test on LiDAR-based descriptors at 2.5 m","text":"
Table 11: t-test on LiDAR-based descriptors at 2.5 m.
Descriptors healthy vs. unhealthy healthy vs. dead unhealthy vs. dead maximal height 3.76E-04 7.28E-11 4.80E-04 scale parameter 0.449 0.283 5.60E-01 shape parameter 0.229 0.087 0.462 cvLAD 3.59E-04 1.06E-07 0.012 VCI 0.004 1.99E-05 0.072 mean reflectance 3.15E-04 5.27E-07 0.068 sd of reflectance 0.498 1.10E-10 4.66E-11 canopy cover 0.431 0.004 0.019 sdCC 0.014 1.94E-13 6.94E-09 sdCHM 0.003 5.56E-07 0.006 AGH 0.910 0.132 0.132 "},{"location":"PROJ-HETRES/#724-t-test-on-lidar-based-descriptors-per-tree","title":"7.2.4 t-test on LiDAR-based descriptors per tree","text":"
Table 12: t-test on LiDAR-based descriptors per tree.
Descriptors healthy vs. unhealthy healthy vs. dead unhealthy vs. dead maximal height 0.001 1.98E-11 2.61E-04 scale parameter 0.726 0.618 0.413 shape parameter 0.739 0.795 0.564 cvLAD 0.001 4.23E-04 0.526 VCI 0.145 0.312 0.763 mean reflectance 1.19E-04 0.001 0.949 sd of reflectance 0.674 3.70E-07 4.79E-07 canopy cover 0.431 0.005 0.023 sdCC 0.014 4.43E-13 1.10E-08 sdCHM 0.003 2.71E-07 0.004 AGH 0.910 0.090 0.087 "},{"location":"PROJ-HETRES/#725-t-tests-on-yearly-variation-of-ndvi","title":"7.2.5 t-tests on yearly variation of NDVI","text":"
Table 13: t-test on yearly variation of NDVI.
Descriptors healthy vs. unhealthy healthy vs. dead unhealthy vs. dead 2016 0.177 0.441 0.037 2017 0.079 2.20E-06 0.004 2018 0.093 1.57E-04 0.132 2019 0.003 0.001 0.816 2020 0.536 0.041 0.005 2021 0.002 0.894 0.003 2022 0.131 0.103 0.002 "},{"location":"PROJ-HETRES/#726-t-test-on-vhi","title":"7.2.6 t-test on VHI","text":"
Table 14: t-test on VHI.
Descriptors healthy vs. unhealthy healthy vs. dead unhealthy vs. dead 2015-2016 0.402 0.572 0.767 2016-2017 0.005 0.002 0.885 2017-2018 0.769 0.329 0.505 2018-2019 2.64E-05 3.98E-14 0.001 2019-2020 7.86E-06 9.55E-05 0.427 2020-2021 0.028 0.790 0.018 2021-2022 0.218 0.001 0.080 "},{"location":"PROJ-HETRES/#8-sources-and-references","title":"8 Sources and references","text":"Indications on software and hardware requirements, as well as the code used to perform the project, are available on GitHub: https://github.com/swiss-territorial-data-lab/proj-hetres/tree/main.
Other sources of information mentioned in this documentation are listed here:
OFEV et al. (\u00e9d.). La canicule et la s\u00e9cheresse de l\u2019\u00e9t\u00e9 2018. Impacts sur l\u2019homme et l\u2019environnement. Technical Report 1909, Office f\u00e9d\u00e9ral de l\u2019environnement, Berne, 2019.\u00a0\u21a9\u21a9
Beno\u00eet Grandclement and Daniel Bachmann. 19h30 - En Suisse, la s\u00e9cheresse qui s\u00e9vit depuis plusieurs semaines frappe durement les arbres - Play RTS. February 2023. URL: https://www.rts.ch/play/tv/19h30/video/en-suisse-la-secheresse-qui-sevit-depuis-plusieurs-semaines-frappe-durement-les-arbres?urn=urn:rts:video:13829524 (visited on 2023-03-28).\u00a0\u21a9
Xavier Gauquelin, editor. Guide de gestion des for\u00eats en crise sanitaire. Office National des For\u00eats, Institut pour le D\u00e9veloppement Forestier, Paris, 2010. ISBN 978-2-84207-344-2.\u00a0\u21a9\u21a9
Philipp Brun, Achilleas Psomas, Christian Ginzler, Wilfried Thuiller, Massimiliano Zappa, and Niklaus E. Zimmermann. Large-scale early-wilting response of Central European forests to the 2018 extreme drought. Global Change Biology, 26(12):7021\u20137035, 2020. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/gcb.15360. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/gcb.15360 (visited on 2022-10-13), doi:10.1111/gcb.15360.\u00a0\u21a9
Run Yu, Youqing Luo, Quan Zhou, Xudong Zhang, Dewei Wu, and Lili Ren. A machine learning algorithm to detect pine wilt disease using UAV-based hyperspectral imagery and LiDAR data at the tree level. International Journal of Applied Earth Observation and Geoinformation, 101:102363, September 2021. URL: https://www.sciencedirect.com/science/article/pii/S0303243421000702 (visited on 2022-10-13), doi:10.1016/j.jag.2021.102363.\u00a0\u21a9\u21a9\u21a9
Pengyu Meng, Hong Wang, Shuhong Qin, Xiuneng Li, Zhenglin Song, Yicong Wang, Yi Yang, and Jay Gao. Health assessment of plantations based on LiDAR canopy spatial structure parameters. International Journal of Digital Earth, 15(1):712\u2013729, December 2022. URL: https://www.tandfonline.com/doi/full/10.1080/17538947.2022.2059114 (visited on 2022-12-07), doi:10.1080/17538947.2022.2059114.\u00a0\u21a9\u21a9\u21a9
Patrice Eschmann, Pascal Kohler, Vincent Brahier, and Jo\u00ebl Theubet. La for\u00eat jurassienne en chiffres, R\u00e9sultats et interpr\u00e9tation de l'inventaire forestier cantonal 2003 - 2005. Technical Report, R\u00e9publique et Canton du Jura, St-Ursanne, 2006. URL: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjHuZyfhoSBAxU3hP0HHeBtC4sQFnoECDcQAQ&url=https%3A%2F%2Fwww.jura.ch%2FHtdocs%2FFiles%2FDepartements%2FDEE%2FENV%2FFOR%2FDocuments%2Fpdf%2Frapportinventfor0305.pdf%3Fdownload%3D1&usg=AOvVaw0yr9WOtxMyY-87avVMS9YM&opi=89978449However.\u00a0\u21a9\u21a9
Agnieska Ptak. (5) Amplitude vs Reflectance \\textbar LinkedIn. June 2020. URL: https://www.linkedin.com/pulse/amplitude-vs-reflectance-agnieszka-ptak/ (visited on 2023-08-11).\u00a0\u21a9
BFH-HAFL and BAFU. Waldmonitoring.ch : wcs_ndvi_diff_2016_2015, wcs_ndvi_diff_2017_2016, wcs_ndvi_diff_2018_2017, wcs_ndvi_diff_2019_2018, wcs_ndvi_diff_2020_2019, wcs_ndvi_diff_2021_2020, wcs_ndvi_diff_2022_2021. URL: https://geoserver.karten-werk.ch/wfs?request=GetCapabilities.\u00a0\u21a9
Reik Leiterer, Gillian Milani, Jan Dirk Wegner, and Christian Ginzler. ExoSilva - ein Multi\u00ad-Sensor\u00ad-Ansatz f\u00fcr ein r\u00e4umlich und zeitlich hochaufgel\u00f6stes Monitoring des Waldzustandes. In Neue Fernerkundungs\u00adtechnologien f\u00fcr die Umweltforschung und Praxis, 17\u201322. Swiss Federal Institute for Forest, Snow and Landscape Research, WSL, April 2023. URL: https://www.dora.lib4ri.ch/wsl/islandora/object/wsl%3A33057 (visited on 2023-11-13), doi:10.55419/wsl:33057.\u00a0\u21a9
Matthew Parkan. Mparkan/Digital-Forestry-Toolbox: Initial release. April 2018. URL: https://zenodo.org/record/1213013 (visited on 2023-08-11), doi:10.5281/ZENODO.1213013.\u00a0\u21a9
R Core Team. R: A Language and Environment for Statistical Computing. 2023. URL: https://www.R-project.org/.\u00a0\u21a9\u21a9
Olga Brovkina, Emil Cienciala, Peter Surov\u00fd, and P\u0159emysl Janata. Unmanned aerial vehicles (UAV) for assessment of qualitative classification of Norway spruce in temperate forest stands. Geo-spatial Information Science, 21(1):12\u201320, January 2018. URL: https://www.tandfonline.com/doi/full/10.1080/10095020.2017.1416994 (visited on 2022-07-15), doi:10.1080/10095020.2017.1416994.\u00a0\u21a9
N.K. Gogoi, Bipul Deka, and L.C. Bora. Remote sensing and its use in detection and monitoring plant diseases: A review. Agricultural Reviews, December 2018. doi:10.18805/ag.R-1835.\u00a0\u21a9
Samuli Junttila, Roope N\u00e4si, Niko Koivum\u00e4ki, Mohammad Imangholiloo, Ninni Saarinen, Juha Raisio, Markus Holopainen, Hannu Hyypp\u00e4, Juha Hyypp\u00e4, P\u00e4ivi Lyytik\u00e4inen-Saarenmaa, Mikko Vastaranta, and Eija Honkavaara. Multispectral Imagery Provides Benefits for Mapping Spruce Tree Decline Due to Bark Beetle Infestation When Acquired Late in the Season. Remote Sensing, 14(4):909, February 2022. URL: https://www.mdpi.com/2072-4292/14/4/909 (visited on 2023-10-27), doi:10.3390/rs14040909.\u00a0\u21a9
Shanci Li (Uzufly) - Alessandro Cerioni (Canton of Geneva) - Clotilde Marmy (ExoLabs) - Roxane Pott (swisstopo)
Proposed by the Swiss Federal Statistical Office - PROJ-LANDSTATS September 2022 to March 2023 - Published on April 2023
All scripts are available on GitHub.
Abstract: From 2020 on, the Swiss Federal Statistical Office started to update the land use/cover statistics over Switzerland for the fifth time. To help and lessen the heavy workload of the interpretation process, partially or fully automated approaches are being considered. The goal of this project was to evaluate the role of spatio-temporal neighbors in predicting class changes between two periods for each survey sample point.
The methodology focused on change detection, by finding as many unchanged tiles as possible and miss as few changed tiles as possible. Logistic regression was used to assess the contribution of spatial and temporal neighbors to the change detection. While time deactivation and less-neighbors have a 0.2% decrease on the balanced accuracy, the space deactivation causes 1% decrease. Furthermore, XGBoost, random forest (RF), fully convolutional network (FCN) and recurrent convolutional neural network (RCNN) performance are compared by the means of a custom metric, established with the help of the interpretation team. For the spatial-temporal module, FCN outperforms all the models with a value of 0.259 for the custom metric, whereas the logistic regression indicates a custom metrics of 0.249.
Then, FCN and RF are tested to combine the best performing model with the model trained by OFS on image data only. When using temporal-spatial neighors and image data as inputs, the final integration module achieves 0.438 in custom metric, against 0.374 when only the the image data is used.
It was concluded that temporal-spatial neighbors showed that they could lighten the process of tile interpretation.
"},{"location":"PROJ-LANDSTATS/#1-introduction","title":"1. Introduction","text":"The introduction presents the background and the objectives of the projects, but also introduces the input data and its specific features.
"},{"location":"PROJ-LANDSTATS/#11-background","title":"1.1 Background","text":"Since 1979, the Swiss Federal Statistical Office (FSO) provides detailed and accurate information on the state and evolution of the land use and the land cover in Switzerland. It is a crucial tool for long-term spatial observation. With these statistics, it is possible to determine whether and to what extent changes in land cover and land use are consistent with the goals of Swiss spatial development policies (FSO).
Figure 1: Visualization of the land cover and land use classification.
Every few years, the FSO carries out a survey on aerial or satellite images all over Switzerland. A grid with sample points spaced 100 meters apart overlays the images, providing 4.1 million sample points on which the statistics are based. The classification of the hectare tile is assigned on the center dot, as shown in Figure 1. Currently, a time series of four surveys is accessible, based on aerial images captured in the following years:
The first two surveys of the land statistics in 1979 and 1992 were made by visual interpretation of aerial analogue photos using stereoscopes. Since the 2004 survey, the methodology was deeply renewed, in particular through the use of digital aerial photographs, which are observed stereoscopically on workstations using specific photogrammetry software.
A new nomenclature (2004 NOAS04) has also been introduced in 2004 which systematically distinguishes 46 land use categories and 27 land cover categories. A numerical label from this catalogue is assigned to each point by a team of trained interpreters. The 1979 and 1992 surveys have been revised according to the nomenclature NOAS04, so that all readings (1979, 1992, 2004, 2013) are comparable. On this page you will find the geodata of the Land Use Statistics at the hectare level since 1979, as well as documentation on the data and the methodology used to produce these data. Detailed information on basic categories and principal domains can be found in Appendix 1.
"},{"location":"PROJ-LANDSTATS/#12-objectives","title":"1.2 Objectives","text":"It is known that manual interpretation work is time-consuming and expensive. However, in a feasibility study, the machine learning technique showed great potential capacity to help speed up the interpretation, especially with deep learning algorithms. According to the study, 50% of the estimated interpretation workload could be saved.
Therefore, FSO is currently carrying out a project to assess the relevance of learning and mastering the use of artificial intelligence (AI) technologies to automate (even partially) the interpretation of aerial images for change detection and classification. The project is called Area Statistics Deep Learning (ADELE).
FSO had already developed tools for change detection and multi-class classification using the image data. However, the current workflow does not exploit the spatial and temporal dependencies between different points in the surveys.
The aim of this project is therefore to evaluate the potential of spatial-temporal neighbors in predicting whether or not points in the land statistics will change class. The methodolgy will be focused on change detection, by finding as many unchanged tiles as possible (automatized capacity) and miss as few changed tiles as possible. The detailed objectives of this project are to:
The raw data delivered by the domain experts is a table with 4'163'496 records containing the interpretation results of both land cover and land use from survey 1 to survey 4. An example record is shown in Table 1 and gives following information:
Table 1: Example record of raw data delivered by the domain experts.
RELI EAST NORTH LU4* LC4 LU3 LC3 LU2 LC2 LU1 LC1 training 74222228 2742200 1222800 242 21 242 21 242 21 242 21 0 75392541 2753900 1254100 301 41 301 41 301 41 301 41 0 73712628 2737100 1262800 223 46 223 46 223 46 223 46 0*The shortened LC1/LU1 to LC4/LU4 will be used to simplify the notation of Land Cover/Use of survey 1 to survey 4 in the following documentation.
For machine learning, training data quality has strong influence on model performance. With the training label, domain experts from FSO selected data points that are more reliable and representative. These 348'474 tiles and their neighbors composed the training and testing dataset for machine learning methodology.
"},{"location":"PROJ-LANDSTATS/#2-exploratory-data-analysis","title":"2. Exploratory data analysis","text":"As suggested by domain experts, exploratory data analysis (EDA) is of significance to understand the data statistics and find the potential internal patterns of class transformation. The EDA is implemented from three different perspectives: distribution, quantity and probability. With the combination of the three, we can find that there do exist certain trends in the transformation of both land cover and land use classes.
For the land cover, main findings are:
quantity: there are some clear patterns in quantitative changes
For the land use, main findings are:
Readers particularly interested by the change detection methods can directly go to Section 3; otherwise, readers are welcomed to read the illustrated and detailed EDA given hereafter.
"},{"location":"PROJ-LANDSTATS/#21-distribution-statistics","title":"2.1 Distribution statistics","text":"Figure 2: Land cover distribution plot.
Figure 3: Land use distribution plot.
First, a glance at the overall distribution of land cover and land use is shown in Figure 2 and 3. The X-axis is the label of each class while the Y-axis is the number of tiles in the Log scale. The records of the four surveys are plotted in different colors chronologically. By observation, some trends can be found across the four surveys.
Artificial areas only take up a small portion of the land cover (labels between 10 to 20), while most surface of Switzerland is covered by vegetation or forest (20 - 50). Bare land (50 - 60) and water areas (60 - 70) take up a considerable portion as well. For land use, it is obvious that the agricultural (200 - 250) and forest (300 - 310) areas are the main components while the unused area (421) also stands out from others.
Most classes kept the same tendency during the past 40 years. There are 11 out of 27 land cover classes and 32 out of 46 land use classes which are continuously increasing or decreasing all the time. Especially for land use, compared with 10 classes rising with time, 22 classes dropping, which indicates that there is some transformation patterns that caused the leakage from some classes to those 10 classes. We will dive into these patterns in the following sections.
"},{"location":"PROJ-LANDSTATS/#22-quantity-statistics","title":"2.2 Quantity statistics","text":"The data are explored in a quantitative way by three means:
Figure 4: Land cover transformation from 1985 to 2018.
The analysis of the transformation patterns in quantitative perspective has been implemented in the interactive visualization in Figure 4. The nodes of the same color belong to a common superclass (principle domain). The size of the node represents the number of tiles for the class and the width of the link reflects the number of transformations in log scale. When hanging over your mouse on these elements, detailed information such as the class label code and the number of transformations will be shown. Clicking the legend will enable you to select the superclasses in which the transformation should be analyzed.
Pre-processing had been done for the transformation data. To simplify the graph and stand out the major transformations, links with the number of transformations less than 0.1% of the total were removed from the graph. The filter avoids too many trivial links (580) connecting nearly all the nodes, leaving significant links (112) only. The process filtered 6.5% of the transformations in land cover and 11.5% in land use, which is acceptable considering it is a quantitative analysis focusing on the major transformation.
"},{"location":"PROJ-LANDSTATS/#222-sequential-transformation-visualization","title":"2.2.2 Sequential transformation visualization","text":"Figure 5: Land cover sequential transformation.
In addition to the transformation between the 2 surveys, the sequential transformation over time had also been visualized. Here, a similar filter is implemented as well to simplify the result and only tiles that had changed during the 4 surveys are visualized. In Figure 5, the box of a class in column 1985 (survey 1) is composed of different colors while the box of a class in column 2018 (survey 4) only has one color. This is caused by the color of the link showing a kind of sequential transformation. The different colors of a class in the first column show the end status (classification) of the tiles in survey 4.
There are some clear patterns we can find in the graph. For example, the red lines point out four diamond patterns in the graph. The diamond pattern with the edges in the same color illustrates the continuous trend that one class of tiles is transferred to the other class. In this figure, it is obvious that the Tree Clusters are degraded to the Grass and Herb, while Grass and Herb are transferred to the Consolidated Surfaces, showing the expansion of urban areas and the destruction to the natural environment.
"},{"location":"PROJ-LANDSTATS/#223-quantity-statistics-analysis","title":"2.2.3 Quantity statistics analysis","text":"Comparing the visualization of different periods, a constant pattern has been spotted in both land cover and land use. For example in land cover, the most transformation happened between the superclass of Tree Vegetation and Brush Vegetation. Also, a visible bi-direction transformation between Grass and Herb Vegetation and Clusters of Trees is witnessed. Greenhouses, wetlands and reedy marshes hardly have edges linked to them all over time, which illustrates that either they have a limited area or they hardly change.
A similar property can also be captured in land use classes. The most transformation happened inside the superclass of Arable and Grassland and Forest not Agricultural. Also, a visible transformation from Unused to Forest is highlighted by others.
Combining the findings above, it is clear that the transformation related to the Forest and Vegetation is the main part of the story. The forest shrinks or expands over time, changing to shrubs and getting back later. The Arable and Grassland keeps changing based on the need for agriculture or animal husbandry during the survey year. Different kinds of forests interconvert with each other which is a rational natural phenomenon.
"},{"location":"PROJ-LANDSTATS/#23-probability-matrix","title":"2.3 Probability matrix","text":"The above analysis demonstrates the occurrence of transformation with quantitative statistics. However, the number of tiles for different classes is not a uniform distribution as shown in the distribution analysis. The largest class is thousands of times more than the smallest one. Sometimes, the quantity of a transformation is trivial compared with the majority, but it is caused by the small amount of tiles for the class. Even if the negligible class would not have a significant impact on the performance of change detection, it is of great importance to reveal the internal transformation pattern of the land statistics and support the multi-class classification task. Therefore, the probability analysis is designed as below:
The probability analysis for land cover/use contains 3 parts:
The probability is calculated by the status change between the beginning survey and the end survey stated in the figure title. For example Figure 6 is calculated by the transformation between survey 1 and survey 4, without taking into account possible intermediate changes in survey 2 and 3.
"},{"location":"PROJ-LANDSTATS/#231-land-cover-analysis","title":"2.3.1 Land cover analysis","text":"Figure 6: Land cover probability matrix from LC1 to LC4.
The first information that the matrix provides is the blank blocks with zero probability of conversion. This discloses that transformation between some classes had never happened during the past four decades. Besides, all the diagonal blocks are with distinct color depth, illustrating that all classes of land cover are most likely to keep their status rather than to change.
Another evident features of this matrix are the columns with destination classes Grass and Herb Vegetation (21) and Closed Forest (41). There are a few classes such as Shrubs (31), Fruit Tree (33), Garden Plants (35) and Open Forest (44) which have a noticeable trend to convert to these two classes, which is partially consistent with the quantity analysis while revealing some new findings.
Figure 7: Land cover transformation probability without change.
When it comes to the refined visualization of the diagonal blocks, it is clear that half of the classes have more than an 80% probability of not transforming, while the minimum one only has about 35%. This is caused by the accumulation of the 4 surveys together which lasts 40 years. For a single decade, as the first 3 sub-graphs of Figure 23 in the Appendix A2.1, the majority are over 90% probability and the minimum rises to 55%.
Figure 8: Maximum transformation probability to a certain class when land cover changes.
For those transformed tiles, the maximum probability of converting into another class is shown in Figure 8. This graph together with the matrix in Figure 6 can point out the internal transformation pattern. The top 5 possible transformations between the first survey and the forth survey are:
1. 38% Open Forest (44) --> Closed Forest (41)\n 2. 36% Brush Meadows (32) --> Shrubs (31)\n 3. 34% Garden Plants (35) --> Grass and Herb Vegetation (21)\n 4. 29% Shrubs (31) --> Closed Forest (41)\n 5. 26% Cluster of Tree (47) --> Grass and Herb Vegetation (21)\n
In this case, the accumulation takes effect as well. For a single decade, the maximum probability decreases to 25%, but the general distribution of the probability is consistent between the four surveys according to Figure 24 in the Appendix A2.1.
"},{"location":"PROJ-LANDSTATS/#232-land-use-analysis","title":"2.3.2 Land use analysis","text":"Figure 9: Land use probability matrix from LU1 to LU4.
The land use probability matrix has different features compared with the land cover probability matrix. Although most diagonal blocks are with the deepest color depth, there are two areas highlighted by the red line presenting different statistics. The upper area is related to Construction sites (146) and Unexploited Urban areas (147). These two classes tend to change to other classes rather than keep unchanged, which is reasonable since the construction time of buildings or infrastructures hardly exceeds 10 years. This is confirmed by the left side of the red-edged rectangular block, which has a deeper color depth. This illustrates that construction and unexploited areas ended in the Settlement and Urban Areas (superclass of 100 - 170).
The lower red area account for the pattern concerning the Forest Areas (301 -304). The Afforestation (302), Lumbering areas (303) and Damaged Forest (304) would thrive and recover between the surveys, and finally become Forest (301) again.
Figure 10: Land use transformation probability without change.
Figure 10 further validates the assumptions. With most classes with a high probability of not changing, there are two deep valleys for classes 144 to 147 and 302 to 304, which are exactly the results of the stories mentioned above.
Figure 11: Maximum transformation probability to a certain class when land use changes.
Figure 11 tells the difference in the diversity of transformation destination. The construction and unexploited areas would turn into all kinds of urban areas, with more than 95% changed and the maximum probability to a fixed class is less than 35%. While the Afforestation, Lumbering areas and Damaged Forest returned to Forest with a probability of more than 90%, the transformation pattern within these four classes is fairly fixed.
The distribution statistics, the quantity statistics and the probability matrices have shown to validate and complement each other during the exploratory analysis of the data.
"},{"location":"PROJ-LANDSTATS/#3-methods","title":"3. Methods","text":"The developed method should be integrated in the OFS framework for change detection and classification of land use and land cover illustrated in Figure 12. The interesting parts for this project are highlighted in orange and will be presented in the following.
Figure 12: Planned structure in FSO framework for final prediction.
Figure 12 shows on the left the input data type in the OFS framework. The current project work on the LC/LU neighbors introduced in Section 1.3. The main objective of the project - to detect change by means of these neighbors - is the temporal-spatial module in Figure 12.
As proposed by the feasibility study, FSO had implement studies on change detection and multi-class classification on swisstopo aerial images time series to accelerate the efficiency of the interpretation work. The predicted LC and LU probabilities and information obtained by deep learning are defined as the image-level module.
In a second stage of the project, the best model for combining the temporal-spatial and the image-level module outputs is explored to evaluate the gain in performance after integration of the spatial-temporal module in the OFS framework. This is the so-called integration module. The rest of the input data will not be part of the performance evaluation.
"},{"location":"PROJ-LANDSTATS/#31-temporal-spatial-module","title":"3.1 Temporal-spatial module","text":"Figure 13: Time and space structure of a tile and its neighbors.
The input data to the temporal-spatial module will be the historical interpretation results of the tile to predict and its 8 neighbors. The first three surveys are used as inputs to train the models while the forth survey serves as the ground truth of the prediction. This utilizes both the time and space information in the dataset like depicted in Figure 13.
During the preprocessing, the tiles with missing neighbors were discarded from the dataset to keep the data format consistent, which is insignificant (about 400 out of 348'868). The determination of change is influenced by both land cover and land use. When there is a disparity between the classifications in the fourth survey and the third one for a specific tile, it is identified as altered (positive) in change detection. The joint prediction of land cover and land use is based on the assumption that a correlation may exist between them. If the land cover of a tile undergoes a change, it is probable that its land use will also change.
Moreover, the tile is assigned numerical labels. Nevertheless, the model does not desire a numerical association between classes, even when they belong to the same superclass and are closely related. To address this, we employ one-hot encoding, which transforms a single land cover column into 26 columns, with all values set to '0' except for one column marked as '1' to indicate the class. Despite increasing the model's complexity with almost two thousand input columns, this is a necessary trade-off to eliminate the risk of numerical misinterpretation.
"},{"location":"PROJ-LANDSTATS/#32-change-detection","title":"3.2 Change detection","text":"Usually, spatial change detection is a remote sensing application performed on aerial or satellite images for multiclass change detection. However, in this project, a table of point records is used for binary classification into changed and not changed classes. Different traditional and new deep learning approach have been explored to perform this task. The motivations to use them are given hereinafter. An extended version of this section with detailed introduction to the machine learning models is available in Appendix A3.
Three traditional classification models, logistic regression (LR), XGBoost and random forest (RF), are tested. The three models represent the most popular approaches in the field - the linear, boosting, and bagging models. In this project, logistic regression is well adapted because it can explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables. Concerning XGBoost, it has the advantage that weaker classifiers are introduced sequentially to focus on the areas where the current model is struggling, while misclassified observations would receive extra weight during training. Finally, in random forest, higher accuracy may be obtained and overfitting still avoided through the larger number of trees and the sampling process.
Beyond these traditional popular approaches, another two deep learning algorithms are explored as well: fully connected network and convolutional recurrent neural network. Different from the traditional machine learning algorithms, deep learning does not require manual feature extraction or engineering. Deep neural networks capture the desired feature with back-propagation optimization process. Besides, these deep neural networks have some special design for temporal or spatial inputs, because it is assumed that the internal pattern of the dataset would match with the network structure and the model will have better performance.
"},{"location":"PROJ-LANDSTATS/#321-focal-loss","title":"3.2.1 Focal loss","text":"Deep neural networks need differentiable loss function for optimization training. For this project with imbalanced classification task, the local loss was chosen rather than the traditional (binary) cross entropy loss.
\\[\\begin{align} \\\\ FL(p_t) = -{\\alpha}(1-p_t)^{\\gamma} \\ log(p_t) \\\\ \\end{align}\\]where \\(p_t\\) is the probability of predicting the correct class, \\(\\alpha\\) is a balance factor between positive and negative classes, and \\(\\gamma\\) is a modulation factor that controls how much weight is given to examples hard to classify.
Focal loss is a type of loss function that aims to solve the problem of class imbalance in tasks like classification. Focal loss modifies the cross entropy loss by adding a factor that reduces the loss for easy examples and increases the loss for examples hard to classify. This way, focal loss focuses more on learning from misclassified examples. Compared with other loss functions such as cross entropy, binary cross entropy and dice loss, some advantages of focal loss are:
\\(\\alpha\\) should be chosen based on the class frequency. A common choice is to set \\(\\alpha_t\\) is 1 minus the frequency of class t. This way, rare classes get more weight than frequent classes. \\(\\gamma\\) should be chosen based on how much you want to focus on hard samples. A larger gamma means more focus on hard samples, while a smaller gamma means less focus. The original paper suggested that a gamma equal to 2 is an effective value for most cases.
"},{"location":"PROJ-LANDSTATS/#322-fully-connected-network-fcn","title":"3.2.2 Fully connected network (FCN)","text":"Fully connected network (FCN) in deep learning is a type of neural network that consists of a series of fully connected layers. The major advantage of fully connected networks for this project is that they are structure agnostic. That is, no special assumptions need to be made about the input (for example, that the input consists of images or videos).
A disadvantage of FCN is that it can be very computationally expensive and prone to overfitting due to the large number of parameters involved. Another disadvantage is that it does not exploit any spatial or temporal structure in the input data, which can lead to poor performance for some tasks.
For implementation, the FCN employ 4 hidden layers (2048, 2048, 1024, 512 neurons respectively) besides the input and output layer. Relu activation function are chosen before the output layer while sigmoid function is applied at the end to scale the result to probability representation.
"},{"location":"PROJ-LANDSTATS/#323-convolutional-recurrent-neural-network-convrnn","title":"3.2.3 Convolutional recurrent neural network (ConvRNN)","text":"Convolutional recurrent neural network (ConvRNN) is a type of neural network that combines convolutional neural networks (CNNs) and recurrent neural networks (RNNs). CNNs are good at extracting spatial features from images, while RNNs are good at capturing temporal features from sequences. ConvRNNs can be used for tasks that require both spatial and temporal features as it is meant to be achieved in this project. Furthermore, the historical data of the land cover and land use can be translated to some synthetic images. The synthetic images use channels to represent sequence of surveys and the pixel value represents ground truth label. Thus, the spatial relationship of the neighbour tiles could be extracted from the data structure with the CNN.
Figure 14: Convolutional Recurrent Neural Network Pipeline.
In this project, we explored ConvRNN with structure shown in Figure 14. The sequence of surveys are treated as sequence of input \\(x^t\\). With the recurrent structure and hidden states \\(h^t\\) to transmit information, the temporal information could be extracted. Different from the traditional RNN, the function \\(f\\) in hidden layers of the recurrent structure use convolutional operation instead of matrix computation and an additional CNN module is applied to the sequence output to detect the spatial information.
"},{"location":"PROJ-LANDSTATS/#33-performance-metric","title":"3.3 Performance metric","text":"Once the different machine learning models are trained for the respective module, comparison has to be made on the test set to evaluate their performance. This will be performed with the help of metrics.
"},{"location":"PROJ-LANDSTATS/#331-traditional-metrics","title":"3.3.1 Traditional metrics","text":"As discovered in the distribution analysis, the dataset is strongly unbalanced. Some class is thousands of others. This is of importance to change detection. Moreover, among 348'474 tiles in the dataset, only 58'737 (16.86%) tiles have changed. If the overall accuracy is chosen as the performance metric, the biased distribution would make the model tend to predict everything unchanged. In that case, the accuracy of the model can achieve 83.1%, which is a quite high value achieved without any effort. Therefore, avoiding the problem during the model training and selecting the suitable metric that can represent the desired performance are the initial steps.
The constant model is defined as a model which predicts the third survey interpretation values as the prediction of the forth survey. In simple words, the constant model predicts that everything does not change. By this definition, we can calculate all kinds of metrics for other change detection models and compare them to the constant model metrics to indentify models with better performance.
For change detection with the constant model, the performance is as below:
Figure 15: Confusion matrix of constant distribution as prediction: TP=True Positive, TN=True Negative, FP=False Positive, FN=False Negative.
Table 2: Metrics evaluation for constant model.
Models Accuracy Balanced Accuracy Precision (PPV/NPV) Recall (TPR/TNR) F1-score Constant 0.831 0.500 Positive Negative 0.000 0.831 0.000 1.000 0.000 0.907
Definition of abbreviations:
For positive case:\n\nPrecision = TP / (TP + FP) Recall = TP / (TP + FN)\n(Positive predictive value, PPV) (True positive rate, TPR)\n\nFor negative case:\n\nPrecision = TN / (TN + FN) Recall = TN / (TN + FP) \n(Negative predictive value, NPV) (True negative rate, TNR)\n
The aim of the change detection is to predict the tiles of high confidence that do not change, so that the interpretation from the last survey can be used directly. However, the negative-case-related metrics above and the accuracy are not suitable for the present task because of the imbalance nature of the problem. Indeed, they indicate a high performance for the constant model, which we know is not depicting the reality, because of the large amount of unchanged tiles. After the test, the balanced accuracy which is the mean of the true positive rate and the true negative rate is considered a suitable metric for change detection.
"},{"location":"PROJ-LANDSTATS/#332-specific-weighted-metric-for-change-detection","title":"3.3.2 Specific weighted metric for change detection","text":"In theory, true negative rate is equivalent to 1 minus false positive rate. Optimizing balanced accuracy typically results in minimizing the false positive rate. However, our primary objective is to reduce false negative instances (i.e., changed cases labeled as unchanged), while maximizing the true positive rate and true negative rate. False positives are of lesser concern, as they will be manually identified in subsequent steps. Consequently, balanced accuracy does not adequately reflect the project's primary objective. With the help of FSO interpretation team, an additionnal, specific metric targeting on the objective has been designed to measure the model performance. Reminding the Exploratory Data Analysis, some transformation patterns have been found and applied in this metric as well.
Figure 16: Workflow with multiple input to define a weighted metric.
As depicted in Figure 16, the FSO interpretation team designed two filters to derive a custom metric. The first filter combines inputs from all the possible modules (in this case, the image-level and temporal-spatial modules). The input modules give the probability of change detection or multi-class classification prediction with confidence. As prediction from modules might be different, the first filter will set the final prediction of a tile as positive if any input module gives a positive prediction. Here the threshold to define positive is a significant hyperparameter to finetune.
The Weights Matrix defined by the human experts is the core of the entire metric. Based on professional experience and observation of EDA, the experts assigned different weights to all possible transformations. These weights demonstrate the importance of the transformation to the overall statistics. Besides, part of the labels is defined as Small Classes, which means that these classes are negligible or we do not consider them in this study. The second filter removes all the transformations related to the small classes and apply the weights matrix to all the remained tiles. Finally, the weighted metric is calculated as below:
\\[\\begin{align} Automatized \\ Tiles &= {\\#Predicted \\ Negatives} \\\\ \\\\ Automatized \\ Capacity &= {{\\#Automatized \\ Tiles} \\over {\\#Negatives \\ (ground \\ truth)}} \\\\ \\\\ Missed \\ Weighted \\ Changed \\ Ratio &= {{\\sum \\{Missed \\ Change \\times Weight\\}} \\over {\\sum \\{All \\ Change \\times Weight\\}}} \\\\ \\\\ Weighted \\ Metric &= Automatized \\ Capacity \\times (0.1 - Missed \\ Weighted \\ Changed \\ Ratio) \\ / \\ 0.1 \\end{align}\\]From now on, we will still calculate metrics like balanced accuracy and recall for reference and analysis; however, the Weighted Metric is the decisive metric for model selection.
"},{"location":"PROJ-LANDSTATS/#34-training-and-testing-plan","title":"3.4 Training and testing plan","text":"Introduced in Section 1.3, the 348'474 tiles with temporal-spatial information are selected for training. The 80%-20% split is applied to the selected tiles to create the train set and the test set respectively. Adam optimizer and multi-step learning rate scheduler are deployed for better convergence.
For the temporal-spatial module, metrics for ablation study on the descriptors and descriptor importance are first computed. The descriptor importance is taken from XGBoost simulations. The ablation study is performed with the logistic regression and consists of training the model with:
Then, the baseline configuration is used to trained the traditional algorithms and the deep learning ones. Metrics are compared and the best performing models are kept for the integration module.
Finally, the performance of several configurations are compared for the integration module.
The extra information gain from the temporal-spatial module will be studied by comparison with image-level performance only. The image-level data contain multi-class classification prediction and its confidence. We can calculate the change probability according to the probability of each class. Therefore, the weighted metric can also be applied at the image-level only. Then, the RF and FCN are tested for the integration module which combines various types of information sources.
"},{"location":"PROJ-LANDSTATS/#4-experiments","title":"4. Experiments","text":"The Experiments section covers the results obtained when performing the planned simulations for the temporal-spatial module and the integration module.
"},{"location":"PROJ-LANDSTATS/#41-temporal-spatial-module","title":"4.1 Temporal-spatial module","text":""},{"location":"PROJ-LANDSTATS/#411-feature-engineering-time-and-space-deactivation","title":"4.1.1 Feature engineering (time and space deactivation)","text":"In the temporal-spatial module, the studied models take advantages of both the space (the neighbors) and the time (different surveys) information as introduced in Section 3.1. Ablation study is performed here to acknowledge the feature importance and which information really matters in the model.
Table 3: Model metrics for ablation plan.
Logistic Regression Best threshold Accuracy Balanced Accuracy Precision (PPV/NPV) Recall (TPR/TNR) F1-score Time deactivate 0.515 0.704 0.718 Positive Negative 0.330 0.930 0.740 0.696 0.457 0.796 Space deactivate 0.505 0.684 0.711 Positive Negative 0.316 0.930 0.752 0.670 0.445 0.779 4 neighbors 0.525 0.707 0.718 Positive Negative 0.332 0.929 0.734 0.701 0.458 0.799 Baseline* 0.525 0.711 0.720 Positive Negative 0.337 0.928 0.734 0.706 0.462 0.802*Baseline: 8 neighbors with time and space activated
Table 3 reveals the performance change when time or space information is totally or partially (4-neighbors instead of 8-neighbors) deactivated. While time deactivation and less-neighbors hardly have an influence on the balanced accuracy (only 0.2% decrease), the one for space deactivation decreased by about 1%. The result demonstrates that space information is more vital to the algorithm than time information, even though both have a minor impact.
Figure 17: Feature importance analysis comparasion of 4 (left) and 8 (right) neighbors.
Figure 18: Feature importance analysis comparasion of time (left) and space (right) deactivation.
Figure 17 and 18 give the feature importance analysis from the XGBoost model. The sum of feature importance from variables all related to the tile itself and its neighbors are plotted in the charts. The 4-neighbor and 8-neighbor have similar capacities but the importance of neighbors for the latter is much more than for the former. This is caused by the number of variables. With more neighbors, the number of variables related to the neighbor increases and the sum of the feature importance grows as well.
The feature importance illustrates the weight assigned to the input variables. From Figure 17, it is obvious that the variable related to the tile itself from past surveys is the most critical. Furthermore, the more recent, the more important. The neighbor on the east and west (neighbors 3 and 4) are more significant than others and even more than the land use of the tile in the first survey.
In conclusion, the feature importance is not evenly distributed. However, the ablation study shows that the model with all the features as input achieved the best performance.
"},{"location":"PROJ-LANDSTATS/#412-baseline-models-with-probability-or-tree-models","title":"4.1.2 Baseline models with probability or tree models","text":"Utilizing the time and space information from the neighbors, three baseline methods with probability or tree model are fine-tuned. The logistic regression outperforms the other two, achieving 72.0% balanced accuracy. As result, more than 41'000 tiles are correctly predicted as unchanged while only about 3'000 changed tiles are missed as they are the false negatives. Detailed metrics of each method are listed in Table 4.
Table 4: Performance metrics for traditional machine learning simulation of spatial-temporal model.
Models Accuracy Balanced Accuracy Precision (PPV/NPV) Recall (TPR/TNR) F1-score Logistic Regression 0.711 0.720 Positive Negative 0.337 0.928 0.734 0.706 0.462 0.802 Random Forest 0.847 0.715 Positive Negative 0.775 0.849 0.134 0.992 0.229 0.915 XGBoost 0.837 0.715 Positive Negative 0.533 0.869 0.297 0.947 0.381 0.906 Constant 0.830 0.500 Positive Negative 0.000 0.830 0.000 1.000 0.000 0.90721: Metric changes with different threshold for logistic regression.
Besides the optimal performance with balanced accuracy, logistic regression can manually adjust its ability by changing the decision threshold as its output is the probability to change instead of prediction only. For example, we can trade off between the true positive rate and the negative predictive value. As shown in Figure 19, if we decrease the threshold probability, the precision of the negative case (NPV) will increase while the true negative rate goes down. This means more tiles need manual checks; however, fewer changed tiles are missed. Considering both the performance and the characteristics, Logistic Regression is selected as the baseline model.
"},{"location":"PROJ-LANDSTATS/#413-neural-networks-fcn-and-convrnn","title":"4.1.3 Neural networks: FCN and ConvRNN","text":"FCN and ConvRNN work differently: FCN does not have special structure designed for temporal-spatial data while ConvRNN has specific designation for time and space information respectively. To study these two extreme situations, we explored their performance and compared with the logistic regression which is the best of the baseline models.
Table 5: Performance metrics for deep machine learning simulation of spatial-temporal model
Models Weighted Metric RawMetric Balanced Accuracy Recall Missed Changes MissedChangesRatio Missed Weighted Changes Missed Weighted ChangesRatio Automatized Points Automatized Capacity LR (Macro)* 0.237 0.197 0.655 0.954 349 0.046 18995 0.035 14516 0.364 LR (BA)* 0.249 0.207 0.656 0.957 326 0.043 17028 0.031 14478 0.363 FCN 0.259 0.21 0.656 0.958 322 0.042 15563 0.029 14490 0.363 ConvRNN 0.176 0.133 0.606 0.949 388 0.051 19026 0.035 10838 0.272 Constant -10.717 -10.72 0.500 0.000 7607 1.000 542455 1.00 47491 1.191*Macro: the model is trained with Macro F1-score; BA: the model is trained with Balanced Accuracy.
As a result of its implementation (see Section 3.2.2), FCN outperforms all the models with a value of 0.259 for the weighted metric, slightly above the logistic regression with 0.249. ConvRNN does not perform well even if we have increased the size of hidden states to 1024. Following deliberation, we posit that the absence of one-hot encoding during the generation of synthetic images may be the cause, given that an increased number of channels could substantially explodes computational expenses. Since the ground truth label is directly applied to pixel values, the model may attempt to discern numerical relationships among distinct pixel values that, in reality, do not exist. This warrants further investigation in subsequent phases of our research.
"},{"location":"PROJ-LANDSTATS/#42-integration-module","title":"4.2 Integration module","text":"Table 5 compares the performance of FCN or image-level only to several configurations for the integration module.
Table 5: Performance metrics for the integration model in combination with a spatial-temporal model.
Model Weighted Metric RawMetric Balanced Accuracy Recall Missed Changes Missed ChangesRatio Missed Weighted Changes Missed Weighted ChangesRatio Automatized Points Automatized Capacity FCN 0.259 0.210 0.656 0.958 322 0.042 15563 0.029 14490 0.363 image-level 0.374 0.305 0.737 0.958 323 0.042 15735 0.029 20895 0.524 LR + RF 0.434 0.372 0.752 0.969 241 0.031 10810 0.020 21567 0.541 FCN + RF 0.438 0.373 0.757 0.968 250 0.032 11277 0.021 22010 0.552 FCN + FCN 0.438 0.376 0.750 0.970 229 0.030 9902 0.018 21312 0.534 LR + FCN 0.423 0.354 0.745 0.967 255 0.033 10993 0.020 21074 0.528The study demonstrates that the image-level contains more information related to change detection compared with temporal-spatial neighbors (FCN row in the Table 5). However, performance improvement from the temporal-spatial module when combined with image-level data, achieving 0.438 in weighted metric in the end (FCN+RF and FCN+FCN).
Regarding the composition of different models for the two modules, FCN is proved to be the best one for the temporal-spatial module, while RF and FCN have similar performance in the integration module. The choice of integration module could be influenced by the data format of other potential modules. This will be further studied by the FSO team.
"},{"location":"PROJ-LANDSTATS/#5-conclusion-and-outlook","title":"5. Conclusion and outlook","text":"This project studied the potential of historical and spatial neighbor data in change detection task for the fifth interpretation process of the areal statistic of FSO. For the evaluation of this specific project, a weighted metric was defined by the FSO team. The temporal-spatial information was proved not to be as powerful as image-level information which directly detects change within visual data. However, an efficient prototype was built with 6% performance improvement in weighted metric combining the temporal-spatial module and the image-level module. It is validated that integration of modules with different source information can help to enhance the final capacity of the entire workflow.
The next research step of the project would be to modify the current implementation of ConvRNN. If the numerical relationship is removed from the synthetic image data, ConvRNN should have similar performance as FCN theoretically. Also, CNN is worth trying to validate whether the temporal pattern matters in this dataset. Besides, by changing the size of the synthetic images, we can figure out how does the number of neighbour tiles impact the model performance.
"},{"location":"PROJ-LANDSTATS/#appendix","title":"Appendix","text":""},{"location":"PROJ-LANDSTATS/#a1-classes-of-land-cover-and-land-use","title":"A1. Classes of land cover and land use","text":"Figure 20: Land Cover classification labels. Figure 21: Land Use classification labels."},{"location":"PROJ-LANDSTATS/#a2-probability-analysis-of-different-periods","title":"A2. Probability analysis of different periods","text":""},{"location":"PROJ-LANDSTATS/#a21-land-cover","title":"A2.1 Land cover","text":"Figure 22: Land cover probability matrix. Figure 23: Land cover transformation probability without change. Figure 24: Maximum transformation probability to a certain class when land cover changes."},{"location":"PROJ-LANDSTATS/#a22-land-use","title":"A2.2 Land use","text":"Figure 25: Land use probability matrix. Figure 26: Land use transformation probability without change. Figure 27: Maximum transformation probability to a certain class when land use changes."},{"location":"PROJ-LANDSTATS/#a3-alternative-version-of-section-32","title":"A3 Alternative version of Section 3.2","text":""},{"location":"PROJ-LANDSTATS/#a31-logistic-regression","title":"A3.1 Logistic regression","text":"Logistic regression is a kind of Generalized Linear Model. It is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, logistic regression is a predictive analysis in this project. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.
"},{"location":"PROJ-LANDSTATS/#a32-xgboost-random-forest","title":"A3.2 XGBoost & random forest","text":"Figure 28: Comparison of boosting and bagging models.
XGBoost and Random Forest both originate from the tree model, while one is the sequential variant and the other is the parallel variant.
Extreme Gradient Boosting (XGBoost) is a distributed, scalable gradient-boosted decision tree (GBDT) machine learning algorithm. Gradient boosting is a flexible method used for regression, multi-class classification, and other tasks since it is compatible with all kinds of loss functions. It recasts boosting as a numerical optimization problem with the goal of reducing the loss function of the model by adding weak classifiers while employing gradient descent. Later, a first-order iterative approach, gradient descent, is used to find the local optimal of its differentiable function. Weaker classifiers are introduced sequentially to focus on the areas where the current model is struggling while misclassified observations would receive extra weight during training.
Random Forest is a bagging technique that contains a number of decision trees generated from the dataset. Instead of relying solely on one decision tree, it takes the average of a number of trees to improve the predictive accuracy. For each tree, the input feature is a different sampled subset from all the features, making the model more robust and avoiding overfitting. Then, these trees are trained with a bootstrapping-sampled subset of the dataset respectively. Finally, the random forest takes the prediction from each tree and based on the majority votes makes the final decision. Higher accuracy is obtained and overfitting is avoided through the larger number of trees and the sampling process.
"},{"location":"PROJ-LANDSTATS/#a33-focal-loss","title":"A3.3 Focal loss","text":"The next two methods are Deep Neural Networks which need differentiable loss function for optimization training. Here we first tell the difference between the loss function and evaluation metric.
The choice of loss function and evaluation metric depends on the task and data. The loss function should be chosen based on whether it is suitable for the model architecture and output type, while the evaluation metric should be relevant for the problem domain and application objectives.
The loss function and the evaluation metric are two different concepts in deep learning. The loss function is used to optimize the model parameters during training, while the evaluation metric is used to measure the performance of the model on a test set. The loss function and the evaluation metric may not be the same. For example, Here we use focal loss to train a classification model, but use balanced accuracy or specific defined metric to evaluate its performance. The reason for this is that some evaluation metrics may not be differentiable or easy to optimize, or they may not match with the objective of the model.
For this project with imbalanced classification task, we think the Focal Loss is a better choice than the traditional (binary) Cross Entropy Loss.
\\[\\begin{align} \\\\ FL(p_t) = -{\\alpha}(1-p_t)^{\\gamma} \\ log(p_t) \\\\ \\end{align}\\]where p_t is the probability of predicting the correct class, \\(\\alpha\\) is a balance factor between positive and negative classes, and \\(\\gamma\\) is a modulation factor that controls how much weight is given to examples hard to classify.
Focal loss is a type of loss function that aims to solve the problem of class imbalance in tasks like classification. Focal loss modifies the cross entropy loss by adding a factor that reduces the loss for easy examples and increases the loss for examples hard to classify. This way, focal loss focuses more on learning from misclassified examples. Compared with other loss functions such as cross entropy, binary cross entropy and dice loss, some advantages of focal loss are:
\\(\\alpha\\) should be chosen based on the class frequency. A common choice is to set \\(\\alpha_t\\) = 1 - frequency of class t. This way, rare classes get more weight than frequent classes. \\(\\gamma\\) should be chosen based on how much you want to focus on hard samples. A larger gamma means more focus on hard samples, while a smaller gamma means less focus. The original paper suggested gamma = 2 as an effective value for most cases.
"},{"location":"PROJ-LANDSTATS/#a34-fully-connected-network-fcn","title":"A3.4 Fully connected network (FCN)","text":"Figure 29: Network structure of FCN.
The fully connected network (FCN) in deep learning is a type of neural network that consists of a series of fully connected layers. A fully connected layer is a function from \\(\\mathbb{R}_m\\) to \\(\\mathbb{R}_n\\) that maps each input dimension to each output dimension. The FCN can learn complex patterns and features from data using backpropagation algorithm.
The major advantage of fully connected networks is that they are \u201cstructure agnostic.\u201d That is, no special assumptions need to be made about the input (for example, that the input consists of images or videos). Fully connected networks are used for thousands of applications, such as image recognition, natural language processing, and recommender systems.
A disadvantage of FCN is that it can be very computationally expensive and prone to overfitting due to the large number of parameters involved. Another disadvantage is that it does not exploit any spatial or temporal structure in the input data, which can lead to poor performance for some tasks. A possible alternative to fully connected network is convolutional neural network (CNN), which uses convolutional layers that apply filters to local regions of the input data, reducing the number of parameters and capturing spatial features.
"},{"location":"PROJ-LANDSTATS/#a35-convolutional-neural-network-cnn","title":"A3.5 Convolutional neural network (CNN)","text":"CNN stands for convolutional neural network, which is a type of deep learning neural network designed for processing structured arrays of data such as images. CNNs are very good at detecting patterns in the input data, such as lines, shapes, colors, or even faces and objects. CNNs use a special technique called convolution, which is a mathematical operation that applies a filter (also called a kernel) to each part of the input data and produces an output called a feature map. Convolution helps to extract features from the input data and reduce its dimensionality.
CNNs usually have multiple layers of convolution, followed by other types of layers such as pooling (which reduces the size of the feature maps), activation (which adds non-linearity to the network), dropout (which prevents overfitting), and fully connected (which performs classification or regression tasks). CNNs can be trained using backpropagation and gradient descent algorithms.
CNNs are widely used in computer vision and have become the state of the art for many visual applications such as image classification, object detection, face recognition, semantic segmentation, etc. They have also been applied to other domains such as natural language processing for text analysis.
Figure 30: Workflow of Convolutional Neural Network.
In this project, the historical data of the land cover and land use can be translated to some synthetic images. The synthetic images use channels to represent sequence of surveys and the pixel value represents ground truth label. Thus, the spatial relationship of the neighbour tiles could be extracted from the data structure with the CNN.
"},{"location":"PROJ-LANDSTATS/#a36-convolutional-recurrent-neural-network-convrnn","title":"A3.6 Convolutional recurrent neural network (ConvRNN)","text":"A convolutional recurrent neural network (ConvRNN) is a type of neural network that combines convolutional neural networks (CNNs) and recurrent neural networks (RNNs). CNNs are good at extracting spatial features from images, while RNNs are good at capturing temporal features from sequences. ConvRNNs can be used for tasks that require both spatial and temporal features, such as image captioning and speech recognition.
A ConvRNN consists of two main parts: a CNN part and an RNN part. The CNN part takes an input image or signal and applies convolutional filters to extract features. The RNN part takes these features as a sequence and processes them with recurrent units that have memory. The output of the RNN part can be a single vector or a sequence of vectors, depending on the task. A ConvRNN can learn both spatial and temporal patterns from data that have both dimensions, such as audio signals or video frames. For example, a ConvRNN can detect multiple sound events from an audio signal by extracting frequency features with CNNs and capturing temporal dependencies with RNNs.
Figure 31: Convolutional Recurrent Neural Network Pipeline.
In this project, we explored ConvRNN with structure shown in Figure 31. The sequence of surveys are treated as sequence of input \\(x^t\\). With the recurrent structure and hidden states \\(h^t\\) to transmit information, the temporal information could be extracted. Different from the traditional Recurrent Neural Network, the function \\(f\\) in hidden layers of the recurrent structure use Convolutional operation instead of matrix computation and an additional CNN module is applied to the sequence output to detect the spatial information.
"},{"location":"PROJ-QALIDAR/","title":"Cross-generational change detection in classified LiDAR point clouds for a semi-automated quality control","text":"Nicolas M\u00fcnger (Uzufly) - Gwena\u00eblle Salamin (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo)
Proposed by the Federal Office of Topography swisstopo - PROJ-QALIDAR September 2023 to February 2024 - Published in March 2024
All scripts are available on GitHub.
Abstract: The acquisition of LiDAR data has become standard practice at national and cantonal levels during the recent years in Switzerland. In 2024, the Federal Office of Topography (swisstopo) will complete a comprehensive campaign of 6 years covering the whole Swiss territory. The point clouds produced are classified post-acquisition, i.e. each point is attributed to a certain category, such as \"building\" or \"vegetation\". Despite the global control performed by providers, local inconsistencies in the classification persist. To ensure the quality of a Swiss-wide product, extensive time is invested by swisstopo in the control of the classification. This project aims to highlight changes in a new point cloud compared to a previous generation acting as reference. We propose here a method where a common grid is defined for the two generations of point clouds and their information is converted in voxels, summarizing the distribution of classes and comparable one-to-one. This method highlights zones of change by clustering the concerned voxels. Experts of the swisstopo LiDAR team declared themselves satisfied with the precision of the method.
"},{"location":"PROJ-QALIDAR/#1-introduction","title":"1. Introduction","text":"The usage of light detection and ranging (LiDAR) technology has seen a large increase in the field of geo-surveying over the recent years 1. Data obtained from airborne acquisition provides rich 3D information about land cover in the form of a point cloud. These point clouds are typically processed after acquisition in order to assign a class to each point, as displayed in Figure 1.
Figure 1: View of the Rhine Falls in the classified point cloud of the product swissSURFACE3D.To conduct their LiDAR surveys, the Federal Office of Topography (swisstopo) mandates external companies, in charge of the airborne acquisition and classification in post-processing. The process of verifying the quality of the data supplied is tedious, with an estimated duration of 42 working hours for the verification of an area of 216 km2. A significant portion of this verification process is dedicated to ensuring the precision of the point classification. With the first generation of the LiDAR product 2 nearing completion, swisstopo is keen to leverage the considerable time and effort invested to facilitate the quality assessment of the next generation. In this context, the swisstopo's LiDAR development team contacted the STDL to develop a change detection method.
As reviewed by Stilla & Xu (2023), change detection in point clouds has already been explored in numerous ways3. The majority of the research focus, however, on changes of geometry. Deep learning solutions are being extensively researched to apply the advancements in this field to change detection in point clouds4. However, to the best of our knowledge, no solution currently address the problem of change detection in the classification of two point clouds. Most challenges of change detection in point clouds come from the unstructured nature of LiDAR data, making it impossible to reproduce the same result across acquisition. Therefore, the production of ground truth and application of deep learning to point clouds of different generations can be challenging. To overcome this, data discretization by voxelization has already been studied in several works on change detection in point clouds, with promising results56.
The goal of this project is to create a mapping of the changes observed between two generations of point clouds for a common scene, with an emphasis on classification changes. The proposed method creates a voxel map for the reference point cloud and the new point cloud for which classification was not controlled as thoroughly. By using the same voxel grid for both generations, direct comparisons can be performed on the occupancy of voxels by the previous and the new classes. Based on the domain expert's criteria, an urgency level is assigned to all voxels: non-problematic, grey zone or problematic. Problematic voxels are then clustered into high priority areas. The summarized process is displayed in Figure 2.
Figure 2: Overview of the workflow for change detection and assignment of a criticality level to the detected changes."},{"location":"PROJ-QALIDAR/#2-data","title":"2. Data","text":""},{"location":"PROJ-QALIDAR/#21-lidar-point-clouds","title":"2.1 LiDAR point clouds","text":"The algorithm required two temporally distinct acquisitions for a same area. Throughout the document, we refer to the first point cloud as v.1. It served as reference data and is assumed to have a properly controlled classification. The subsequent point cloud, representing a new generation, is referred as v.2.
"},{"location":"PROJ-QALIDAR/#211-choice-of-the-lidar-products","title":"2.1.1 Choice of the LiDAR products","text":"The swissSURFACE3D product was extensively controlled by swisstopo's LiDAR team before its publication. Therefore, its classification has the quality expected by the domain expert. It acted as the v.1, i.e as the generation of reference.
We thus needed to find some newer acquisition which fulfilled the following conditions:
For our v.2, we used the point cloud produced by the State of Neuch\u00e2tel, which covers the area within its cantonal borders. The characteristics of each point cloud are summarized in Table 1.
Table 1: Characteristics of swissSURFACE3D, used as v1, and the LiDAR product of the State of Neuch\u00e2tel, used as v2. swissSURFACE3D Neuch\u00e2tel Acquisition period 2018-19 2022 Planimetric precision 20 cm 10 cm Altimetric precision 10 cm 5 cm Spatial density ~15-20 pts/m2 ~100 pts/m 2 Number of class 7 21 Dimension of one tile 1000 x 1000 m 500 x 500 m Provided file format LAZ LAZ
"},{"location":"PROJ-QALIDAR/#212-area-of-interest","title":"2.1.2 Area of interest","text":"The delimitation of the LiDAR tiles used in this project is shown in Figure 3. We chose to work with tiles of the dimensions of the Neuch\u00e2tel data, i.e. 500 x 500 m. The tiles are designated by a letter that we refer to in the continuation of this document.
The tiles are located in the region of Le Locle. The zone covers an urban area, where quality control is the most time-consuming. It also possesses a variety of land covers, such as a large band of dense forest or agricultural fields.
Figure 3: Tiles used for the development of our method: A for a result control for the hyperparameter tuning, B for the choice of the voxel size and C for a control of the results by the domain expert."},{"location":"PROJ-QALIDAR/#22-annotations-by-the-domain-expert","title":"2.2 Annotations by the domain expert","text":"To understand the expected result, the domain expert controlled the v.2 point cloud in the region of Le Locle as if it was a new acquisition. A perimeter of around 1.2 km2 was controlled.
The problematic zones were each defined by a polygon with a textual description, as well as the current and the correct class as numbers. A sample of annotations are shown in Figure 4.
Figure 4: Controlled area (left) and examples of control annotations within the detail zone, with the reported error as color and with the original and the corrected class as labels (right).This provided us with annotations of areas where the point cloud data were incorrect. The annotations were used to calibrate the change detection.
It must be noted that this control was achieved without referring the v.1 point cloud. In this case we assume that the v.1 contains no classification error, and that the annotated areas therefore represent classification changes between the two generations.
"},{"location":"PROJ-QALIDAR/#3-method","title":"3. Method","text":""},{"location":"PROJ-QALIDAR/#31-correspondence-between-classes","title":"3.1 Correspondence between classes","text":"To compare the classes between generations, we needed to establish their correspondence. We selected the classes from the swisstopo point cloud, i.e the reference generation, as the common ground. Any added classes in the new generation must come from a subdivision of an existing class, as explained in the requirements for the v.2 point cloud. This is the case with Neuch\u00e2tel data. Each class from Neuch\u00e2tel data was mapped to an overarching class from the reference generation, in accordance with the inputs from the domain expert. The details of this mapping are given in table 2. Notice that the class Ground level noise received the label -1. It means that this class was not treated in our algorithm and every such point is removed from the point cloud. This was agreed with the domain expert as this class is very different from the class Low Point (Noise) and doesn't provide any useful information.
Table 2: Mapping between the v.2 and v.1 point cloud. The field \"original ID\" provides the class number for v.2, the class name corresponds to the class description from the metadata, and the corresponding ID shows the class number from v.1 to which it is assigned. Original ID Class name Corresponding ID 1 Unclassified 1 2 Ground 2 3 Low vegetation 3 4 Medium vegetation 3 5 High vegetation 3 6 Building roofs 6 7 Low Point (Noise) 7 9 Water 9 11 Piles, heaps (natural materials) 1 14 Cables 1 15 Masts, antennas 1 17 Bridges 17 18 Ground level noise -1 19 Street lights 1 21 Cars 1 22 Building facades 6 25 Cranes, trains, temporary objects 1 26 Roof structures 6 29 Walls 1 31 Additional ground points 2 41 Water (synthetic points) 9
Figure 5: Reallocation of points from the v.2 classes (left) to the v.1 classes (right) for tile B with the class numbers from the second generation indicated between parenthesis.
As visible on Figure 5, seven classes were reassigned to class 1 Undefined. However, they represented a small part of the point cloud. The most important classes were ground, with in equal parts of ground and additional ground points, vegetation, with mainly points in high vegetation, and building, with mainly points on building roofs.
"},{"location":"PROJ-QALIDAR/#32-voxelization-of-the-point-clouds","title":"3.2 Voxelization of the point clouds","text":"The method relies on the voxelization of both point clouds. As defined in Xu et al. (2021)7, voxels are a geometry in 3D space, defined on a regular 3D grid. They can be seen as the 3D equivalent to pixels in 2D. Figure 68 shows how a voxel grid is defined over a point cloud.
Figure 6: Representation of a point cloud (a) and its voxel grid (b), courtesy of Shi et al. (2018)."},{"location":"PROJ-QALIDAR/#321-preprocessing-of-lidar-tiles","title":"3.2.1 Preprocessing of LiDAR tiles","text":"It must be noted that the approach operated under the assumption that both point clouds were already projected in the same reference frame, and that the 3D positions of the points were accurate. We did not perform any point-set registration as part of the workflow, as the method focuses on finding errors of classification in the point cloud.
Before creating the voxels, the tiles were cropped to the size of the generation with the smallest tiling grid. Here, the v.1 tiles were cropped from 1000 x 1000 m to the dimensions of v.2, i.e 500 x 500 m. A v.2 tile corresponds exactly to one quarter of a v.1 tile, so no additional operations were needed.
"},{"location":"PROJ-QALIDAR/#323-voxelization-process","title":"3.2.3 Voxelization process","text":"In the interest of keeping our solution free of charge for users, and to have greater flexibility in the voxelization process, we chose to develop our own solution, rather than use pre-existing tools.
We used the python libraries laspy and pandas. Given a point cloud provided as a LAS or LAZ file, it returned a table with one row per voxel. The voxels were identified by their center coordinates. In addition, the columns provided the number of points for each class contained within the voxel for each generation. Figure 7 shows a visual representation of the voxelization process for one voxel element.
Figure 7: Summarized process for the creation of one voxel in the v.1 (left) and the v.2 (right) generation from the point cloud to the class distribution as a vector. The class distribution is saved for both generations in a table."},{"location":"PROJ-QALIDAR/#33-determination-of-the-voxel-size","title":"3.3 Determination of the voxel size","text":"The voxels must be sized to efficiently locate area of changes without being sensitive to negligible local variations in the point location and density.
We assumed that although a point cloud changes between two generations, the vast majority of its features would remain consistent on a tile of 500 x 500 m. Following this hypothesis, we evaluated how the voxel size influenced the proportion of voxels not filled with the same classes in two separate generations. We called this situation a \"categorical change\". A visual example is given in Figure 8.
Figure 8: Example of a situation with no categorical change (left) and a second situation with a categorical change (right)When the proportion of voxels presenting a categorical change was calculated for different voxel sizes, it rose drastically around a size of 1.5 m, as visible on Figure 9. We postulated that this is the minimum voxel size which allows observing changes without interference from the noisy nature of point clouds.
Figure 9: Proportion of categorical changes for different voxel size in tile B. The horizontal axis is the voxel size. The vertical axis represents the percentage of voxels experiencing a categorical change between the two generations.\u00a0For the rest of the development process, square voxels of 1.5 m are used. However, the voxel width and height can be modified in the scripts if desired.
"},{"location":"PROJ-QALIDAR/#34-criticality-tree","title":"3.4 Criticality tree","text":"The algorithm must not only detect changes, but also assign them a criticality level. We translated the domain expert's criteria into a decision tree, which sorts the voxels into different criticality levels for control. The decision tree went through several iterations, in a dialogue with the domain expert. Figure 10 provides the final architecture of the tree.
Figure 10: Decision tree used to classify the voxels based on the different types of changes and their criticality.The decision tree classifies the voxels into three buckets of criticality level: \"non-problematic\", \"grey zone\" and \"problematic\".
Let us note that although only three final buckets were output, we preserved an individual number for each outgoing branch of the criticality tree, as they provided a more detailed information. Those numbers are referred as \"criticality numbers\".
The decisions of the criticality tree are divided into two major categories. Some are based on qualitative criteria which is by definition true or false. Others, however, depend on some threshold which had to be defined.
"},{"location":"PROJ-QALIDAR/#341-qualitative-decisions","title":"3.4.1 Qualitative decisions","text":"Decision A: Is there only one class in both generations and is it the same? Every voxel that contains a single, common class in both generations is automatically identified as non-problematic.
Decision B: Is noise absent from the new generation? Any noise presence is possibly an object wrongly classified and necessitates a control. Any voxel containing noise in the new generation is directed to the \"problematic\" bucket.
Decision G: Is the change a case of complete appearance or disappearance of a voxel? If the voxel is only present in one generation, it means that the voxel is part of a new or disappearing geometry that might or not be problematic, depending on decisions H and J. If the voxel is present in both generations, we are facing a change in the class distribution due to new classes in it. The decision I will compare the voxel with its neighbors to determine if it is problematic.
Decision J: Is it the specific case of building facade or vegetation? Due to the higher point density in the v.2 point cloud, point proportions may change in voxels compared to the v.1 point cloud, even though the geometry already existed. We particularly noticed this on building facades and under dense trees, as shown in the example given in Figure 11. To avoid classifying these detections as problematic, a voxel with an appearance of points in the class building or vegetation is not problematic if it is located under a non-problematic voxel containing points of the same class.
Figure 11: Example of non-problematic appearance of points in the v.2 point cloud due to the difference of density between the two generations."},{"location":"PROJ-QALIDAR/#342-threshold-based-decisions","title":"3.4.2 Threshold based decisions","text":"The various thresholds were set iteratively by visualization of the results on tile A and visual comparison with the expert's annotations described in section 2.2. Once the global result seemed satisfying, we assessed the criticality label for a subset of voxels. Eight voxels were selected randomly for each criticality number. Given that there are 13 possible outcomes, 104 voxels were evaluated. A first evaluation was performed on tile A without the input of the domain expert. It allowed for the hyperparameter tuning. A second evaluation was conducted by the domain expert on tile C and he declared that no further adjustment of the threshold was necessary.
Cosine similarity
The decision C, D and E require to evaluate the similarity between the distribution of the previous and the new classes occupying a voxel. We thus sought a metric adapted to compare the two distributions. Many ways exist to measure the similarity between two distributions9. We settled for the well-known cosine similarity. Given two vectors X and Y, it is defined as: \\(\\text{Cosine Similarity}(\\mathbf{X}, \\mathbf{Y}) = \\frac{\\mathbf{X} \\cdot \\mathbf{Y}}{\\|\\mathbf{X}\\| \\|\\mathbf{Y}\\|}\\)
This metric measures the angle between two vectors. The magnitude of the vectors holds no influence on the results. Therefore, this measure is unaffected by the density of the point clouds. The more the two vectors point in the same direction, the closer the metric is to one. Vectors having null cosine similarity correspond to voxels where none of the classes present in the previous generation match those from the new one.
One limitation of the cosine similarity is its requirement for both vectors to be non-zero. For cases where a voxel is only occupied in a single generation, an arbitrary cosine similarity of -1 is set.
Decision C: Does the proportion of class stay similar and the classes don't change? We assessed whether the proportion of class stays similar between generations. A threshold of 0.8 is set on the cosine similarity.
flowchart LR\n A[Prev. gen.<br> 0 | 0 | 4 | 2 | 0 | 0 | 7] --> E(Cosine similarity)\n B[New gen.<br> 25 | 0 | 20 | 0 | 0 | 5 | 40] --> E\n\n E-->F[0.84]
Graph 1: Example of vectors and their resulting cosine similarity when considering all the classes. Decision D: Do the previous classes keep the same proportions? We computed the cosine similarity based only on the vector elements which are non-empty in the previous generation. A threshold of 0.8 is set as the limit.
Let us note that voxels present only in one of the two generations are here artificially considered to retain the same class proportion. They are treated further down the decision tree by the decision G.
flowchart LR\n A[Prev. gen.<br> 0 | 0 | 4 | 2 | 0 | 0 | 7] --> C[4 | 2 | 7]\n B[New gen.<br> 25 | 0 | 20 | 0 | 0 | 5 | 40] --> D[20 | 0 | 40]\n C-->E(Cosine similarity)\n D-->E\n E-->F[0.97]
Graph 2: Example of vectors and their resulting cosine similarity when considering only the classes present in the reference generation v.1. Decision E: Is the change due to the class 1? We assessed whether the change is due to the influence of the unclassified points (class 1). To do so we computed the cosine similarity with all vector elements except the first one, corresponding to unclassified points. If the cosine similarity was low when considering all vector elements (decision C), but high when discarding the quantity of unclassified points, this indicates that the change is due to class 1. A threshold of 0.8 is set as the limit.
flowchart LR\n A[Prev. gen.<br> 0 | 0 | 4 | 2 | 0 | 0 | 7] --> C[0 | 4 | 2 | 0 | 0 | 7]\n B[New gen.<br> 25 | 0 | 20 | 0 | 0 | 5 | 40] --> D[0 | 20 | 0 | 0 | 5 | 40]\n C-->E(Cosine similarity)\n D-->E\n E-->F[0.96]
Graph 3: Example of vectors and their resulting cosine similarity when excluding the first class. Decision F: Is class 1 presence low in the new generation? In the case where the change is due to the unclassified points, we wished to evaluate whether such points are in large quantity in the new voxel occupancy. Because the number of points is dependent on the density of the new point cloud, we cannot simply set a threshold on the number of points. To solve this issue, we normalize the number of unclassified points in the voxel, \\(n_{unclassified}\\). Let \\(N_{reference}\\) and \\(N_{new}\\) be the total number of points in the v.1 and v.2 point cloud respectively. The normalized number of unclassified points \\(\\tilde{n}_{unclassified}\\) is defined as:
\\[\\tilde{n}_{unclassified}= n_{unclassified}\\cdot \\frac{N_{reference}}{N_{new}} \\]An arbitrary threshold of 1 is set as the limit on \\(\\tilde{n}_{unclassified}\\). Under this threshold, the presence of the class 1 was considered as low in the new generation.
Decision H & I: Do the neighbor voxels share the same characteristics? For both decisions we searched the neighbors of a given voxel to evaluate if they share the same characteristics. To make the search of these neighbors efficient, we built a KD-Tree from the location of the voxels. For each voxel, it then assessed whether the neighbors shared the same classes or not. Each class of the evaluated voxel must be present in at least one neighbor. The radius of search influences the number of voxels used for comparison. Let \\(x\\) be the voxel edge length, using search radii of \\(x\\), \\(\\sqrt{2}x\\) or \\(\\sqrt{3}x\\) leads to considering 6, 18 or 26 neighbors respectively, as displayed in Figure\u00a01210. Note that the radius is not limited to these options and search among further adjacent voxels is possible.
Figure 12: Possible connectivity types to define the neighborhood of a voxel from the website brainvisa.info.In the case where the voxel is only present in one generation, i.e for the decision H, the neighbors considered are the following:
In the case where the class distribution changes due to new classes present in the voxel compared to v.1, i.e for the decision I, the class distribution of the voxel in v.2 will be compared to its neighbors in v.2. Therefore, if the entire area share the same classes, the voxel is classified the grey zone, but if the change is isolated, it goes into the \"problematic\" bucket.
"},{"location":"PROJ-QALIDAR/#343-description-of-the-grey-zone-and-problematic-buckets","title":"3.4.3 Description of the \"grey zone\" and \"problematic\" buckets","text":"We provide a brief description of the output for each branch of the decision tree ending in the grey zone and problematic buckets. They are identified by their criticality number. Let us note that the criticality numbers are not a ranking of the voxel priority level for a control, but identifiers for the different types of change.
Grey zone:
Problematic:
Voxels ending in the \"grey zone\" and \"problematic\" buckets were often isolated. This creates a noisy map, making its usage for quality control challenging. To provide a less granular change map, we chose to cluster the change detections, highlighting only areas with numerous problematic detections in close proximity. In practice, we leveraged the DBSCAN algorithm. Then, the smallest clusters were filtered out and their cluster number is set to one. They are not treated as clusters in the rest of the processing. The hyperparameters for the clustering process are shown in Table 3. They were determined by the expert through the visualization of the results. The epsilon parameter was chosen to correspond to a neighborhood of 18 voxels, as illustrated on Figure 12.
Table 3: Hyperparameters used for the DBSCAN clustering and the filtering of the clusters.
Hyperparameter Description Value Epsilon radius of the neighborhood for a given voxel in meters 2.13 Minimum number of samples minimum number of problematic voxels in the epsilon neighborhood for a voxel to be a core point for the cluster 5 Minimum cluster size minimum number of voxels needed inside a cluster for it to be preserved 10The clusters should be controlled in priority; they form the primary control. Voxels outside a cluster go into the secondary control as illustrated in the schema of the workflow on Figure 13. Their cluster number of those voxels is set to zero.
Figure 13: Schema of the additional step of clustering for the problematic voxels and assignment of the voxels falling inside a cluster to the primary control.All problematic voxels went through this DBSCAN algorithm at once, without distinction based on the criticality number. That way, detections related to the same geometry were grouped together even if its voxels are not all labeled with the same criticality number. In the end, the label which was the most present inside a cluster is attributed to it.
"},{"location":"PROJ-QALIDAR/#36-visualization-of-detections","title":"3.6 Visualization of detections","text":"Several possibilities were considered for the visualization of the results, as shown in Figure 14.
Figure 14: Comparison of a voxel mesh in green (a), a LAS point cloud (b), and a shapefile with the most represented criticality number of the cluster (c) for the visualization of the detections. The cluster in the point cloud and shapefile are colored in orange and blue depending on their criticality number. The v.2 point cloud is visible as background.Table 4 shows the advantages and drawbacks of the different methods. In the end, the domain expert required that we provide the results as a shapefile.
Table 4: Comparison of the visualization methods Voxel mesh LAS point cloud shapefile 2D representation of the space occupied by voxels Yes No Yes 3D representation of the space occupied by voxels Yes No No Visualization of the voxel height Yes Yes No Numerical attributes No Yes Yes Textual attributes No No Yes
"},{"location":"PROJ-QALIDAR/#4-results","title":"4. Results","text":""},{"location":"PROJ-QALIDAR/#41-granularity-of-results","title":"4.1 Granularity of results","text":"Figure 15 shows the voxels produced by the algorithm for the different priority levels. From the base with all the created voxels, each level reduces the number of considered voxels. At the last level, the clustering effectively reduces the dispersion of voxels, keeping only clearly defined groups.
Figure 15: Voxels by their center coordinates in a point cloud for the different levels of priority for tile C. Going from the clustered detections at the top to all the voxels at the bottom.Table 5 gives the number and percentage of voxels retained at each level. For tile C, the voxels falling into the \"grey zone\" and the \"problematic\" buckets represent 14.86% of all voxels. If only the problematic ones are retained, this percentage is reduced to 4.77%. Finally, after removing the voxels which do not belong to a cluster, only 2.30% remains.
Meanwhile, the covered part of the tile decreases from 35.80% with all the problematic voxels and the ones of the grey zone, to 11.89% with only the problematic voxels, and to 4.53% with only the clustered voxels. In the end, an expert controlling the classification would have to check in priority 5% of the total tile area.
Table 5: Number of voxels preserved in each urgency level on tile C. Urgency level Number of voxels Percentage of all voxels Covered tile area Clustered detections 8'756 2.30 % 4.53 % Problematic detections 18'146 4.77 % 11.89 % Problematic + grey zone detections 56'363 14.83 % 35.80 % All voxels 380'165 100 % 100 %
The percentage of voxels and covered tile area decrease consequently between each granularity level. The higher the granularity is, the larger the difference in the voxel number and the covered area between two levels. The covered tile area decreases more slowly than the percentage of voxels retained.
"},{"location":"PROJ-QALIDAR/#42-distribution-of-the-decision-tree-outcomes","title":"4.2 Distribution of the decision tree outcomes","text":""},{"location":"PROJ-QALIDAR/#421-distribution-of-the-points-in-the-criticality-numbers-and-buckets","title":"4.2.1 Distribution of the points in the criticality numbers and buckets","text":"Figure 16 shows the percentage of points from the new point cloud coming out of each branch of the decision tree. The distribution between criticality buckets is given at the top of the figure. The vast majority of points belongs to non-problematic voxels, with around 80% of them being from the first tree branch. This corresponds to the case where only one class is present in both generations. We notice that 10% of the points end up in voxels assigned to the grey zone. It is mostly due to the output of the 8th tree branch. For this specific tile, 1.81% of points from the new point cloud end up in problematic voxels. Let us note that no point ends up in voxels with the 4th and 9th criticality number. This is because those correspond to case of geometry disappearances.
Figure 16: Relative distribution of the points from the new point cloud depending on the criticality number and bucket of their voxel. Results for tile A.Figure 17 shows the same plot, but with the v.1 point cloud for tile C. In that case the percentage of non-problematic points is smaller than for tile A, with more points falling in the \"grey zone\" and \"problematic\" buckets, but the overall trend stays similar. The only changes over 1% are for criticality numbers 1 (-6.99 points), 8 (+5.08 points), and 12 (+1.44 points). Fewer voxels present one same class across generations, marked with the criticality number 1. More voxels present a change in the distribution. This change can be non-problematic if due to the presence of extra classes in the voxel and reflected by the neighboring voxels (criticality number 8). It is problematic if there is a drastic change in the distribution of all classes in the voxel (criticality number 12).
Figure 17: Relative distribution of the points from the new point cloud depending on the criticality number and bucket of their voxel. Results for tile C."},{"location":"PROJ-QALIDAR/#422-distribution-of-the-criticality-numbers-in-the-clusters","title":"4.2.2 Distribution of the criticality numbers in the clusters","text":"Figure\u00a018 displays a sample of clusters as an example. These are shown as a shapefile, which is the visualization format required by the domain expert. One cluster (#1) indicates the disappearance of a tree. Another cluster (#2) designates an appearance. Upon closer examination, the voxels contributing to the cluster comprise different types: \"appearance\" and \"class change\". The most present label is assigned to the cluster. Finally, two zones with differences in classification are highlighted: one (#3) for a building structure going from class unclassified to building, and the other (#4) for a shed going from unclassified to vegetation.
Figure 18: Example of resulting clusters with the corresponding point cloud for the reference generation (v.1) and the uncontrolled generations (v.2).The 8'756 problematic voxels for the primary control are grouped in 263 individual clusters. The repartition of clusters and voxels among the criticality numbers is given in Table 6. Among the clusters, 67% of them contain mostly voxels with the criticality number 12, meaning that there is a major change in the class distribution for the delineated area. Then, 13% and 12% of the voxels are dominated by a geometry appearance and disappearance respectively. Only 7% of the clusters contain in majority an occurrence of the noise class. It is normal that no cluster is tagged with the criticality number 11, because it is assigned by definition to isolated class changes.
The criticality number 12 is the most present criticality number among the clustered voxels. However, its percentage decrease of 18 points at the voxel scale compared to the cluster scale. On the other hand, the presence of the criticality number 8 increase by 15 points at the voxel scale compared to the cluster scale. The other percentages remain stable.
Table 6: Number of clusters and number of voxels in the primary control for each criticality number on tile C. Criticality number and its description Distribution in the clusters Distribution in the voxels in the primary control 9. Appearance of a geometry 13.31 % 27.91 % 10. Disappearance of a geometry 12.17 % 16.47 % 11. Isolated minor class change 0 % 0.13 % 12. Major change in the class distribution 67.30 % 49.63 % 13. Noise 7.22 % 5.86 %
"},{"location":"PROJ-QALIDAR/#423-distribution-of-the-lidar-classes-in-the-criticality-buckets","title":"4.2.3 Distribution of the LiDAR classes in the criticality buckets","text":"Figure 19 shows the distribution of the LiDAR classes in the criticality buckets. We see that for the three main classes of this tile, ground, vegetation and building, the vast majority of points fall in non-problematic voxels, with the ground class having a higher proportion of points falling in \"grey zone\" voxels than the others. Unclassified points fall predominantly in the grey zone voxels. The grey zone gets a lot of voxels due to the decision C of the criticality tree, which requires that the voxels share the same classes in both generations. It is difficult for voxels to end up in the \"non-problematic\" bucket, if they did not pass the decision C. All points classified as noise end up in the problematic bucket, as required by the domain expert. Finally, points from the bridge class fall in \"problematic\" and \"grey zone\" voxels. This class is, however, in very low quantity in the new point cloud (only 0.014% of all points) and is thus not statistically significant.
Figure 19: Distribution of the points among criticality bucket relative to their LiDAR class, as well as the percentage represented by the class in the point cloud. Let us note the results are for the v.2 point cloud on tile C and that no point was classified as water for this tile.
"},{"location":"PROJ-QALIDAR/#43-assessment-of-a-subset-of-detections","title":"4.3 Assessment of a subset of detections","text":"As mentioned in section 3.4.2, 104 voxels were evaluated on tile C, i.e 13 per criticality number. Per the expert review, all the non-problematic and \"grey zone\" voxels were deemed rightfully attributed. However out of the 40 selected problematic voxels, nine detections did not justify their status. Three of those were for cases of appearance and disappearance of geometry. Out of those, two were due to an isolated change of density in the area of the voxel, a situation which can occur in vegetated areas. The other six came from the tree branch 11, which detects small changes that are not present in the neighboring voxels. After discussion with the domain expert, it was agreed that such changes still needed to be classified as problematic, but due to their isolated nature, would not be checked as a priority. After the implementation of the clustering via the DBSCAN algorithm, these voxels of criticality number 11 and isolated changes in vegetation are filtered out.
"},{"location":"PROJ-QALIDAR/#5-discussion","title":"5. Discussion","text":""},{"location":"PROJ-QALIDAR/#51-interpretation-of-the-results","title":"5.1 Interpretation of the results","text":"In Section 4.1, the voxel count for the different granularity levels highlights the number of detections that would have to be controlled at each level. For the clustered detections, which would be the principal mapping to use, only 2.30% of all evaluated voxels are to be controlled. It represents 4.53% of the tile area. The domain expert confirmed that the final amount of voxel to control is reasonable and would allow saving resources compared to the actual situation.
For each granularity level, the percentage of the tile area covered is 2 to 3 times higher than the percentage of voxels considered. It means that between each granularity level, a part of the eliminated voxel does not impact the covered tile area. The reason must be that the area is a 2D measurement while the voxels are positioned in the 3D space and can cover the same area by belonging to the same grid column. The voxels of a same column must be frequently assigned to different criticality buckets. Therefore, the covered tiles area decreases more slowly than the percentage of voxels considered.
From the results obtained in Section 4.2.1, we see that the vast majority of points from the new point cloud end up in non-problematic voxels. The number of points falling in problematic voxels is limited, which is desired as a high quantity of problematic detections would not help in making the quality assessment faster. We notice, however, a relatively large number of points falling in voxels classified as \"grey zone\", due to the 8th tree branch. These voxels typically exhibit high similarity in their distribution between the v.1 and the v.2, but do not retain precisely the same classes. The decision C will therefore exclude them from a quick assignment to the non-problematic voxels. Such a situation occurs, for example, if a few points of vegetation appear in a zone previously filled only with ground points. This situation generally isn't a classification error and reflects the reality of the terrain. However if it were to be a widespread problem of classification, it needs to be raised to the controller. This is why we preserve those rules which lead to a lot of \"grey zone\" detections instead of redirecting them to \"non-problematic\".
In Section 4.2.1, results are presented for tile A and C on Figures 16 and 17 respectively. Tile A has fewer voxels in the \"problematic\" and \"grey zone\" buckets than tile C. It is in accordance with our expectation than urban zones would have more detected changes, as they evolve faster than other areas and have some complex landscape to classify.
The numbers of Section 4.2.2 show that the majority of clusters and the majority of the voxels in clusters have the criticality number 12, indicating a major change in the class distribution. It is a satisfying point as the variations of the classification across generations were the main focus of this work. Let us not, however, that they dominate 67% of the clusters, but only 50% of the voxels in clusters are assigned to this criticality number. On the other hand, the criticality number 9, standing for the appearance of a geometry, represents 28% of the voxels present in clusters while it was only 13% of the clusters. Two possibilities can explain that: the clusters with a geometry appearance are larger than the ones with a major change in the class distribution, or this type of voxel is more present in clusters that were assigned to another criticality number.
Results of Section 4.2.3 show that the points of the three main LiDAR classes are assigned predominantly to the \"non-problematic\" bucket, which makes the map usable. For the unclassified points, the majority are deemed in \"grey zone\". Because these points regroup, among other things, mobile and temporary objects, it is not desirable that every such appearance or disappearance ends up in the primary control. However, geometries which transform from a given class to unclassified, or the opposite, are problematic. That situation happens quite often, as indicated by the 17.43% of points ending in this level. For the bridge class, none of the points fall in \"non-problematic\" because, in this specific tile, a small zone was classified as bridge in v.2 but no point of that class is present in v.1.
Finally, from the evaluation by the domain expert described in Section 4.3, we understand that the voxels are correctly classified into their criticality level, except for some minor cases. Some of the problematic voxels were not rightfully attributed. Even so, six out of nine of those voxels had the criticality number 11, whose detections are removed when applying the clustering. This sample evaluation instills confidence that the level of urgency attributed to the voxels corresponds well to the situation contained within, making it relevant for usage in a control of the classification.
"},{"location":"PROJ-QALIDAR/#52-discussion-of-the-results","title":"5.2 Discussion of the results","text":"As seen in the previous section, the proposed method generates a somewhat reasonable amount of problematic detections, accompanied by a considerable volume of instances falling within the \"grey zone\". The map for this intermediate level may not be suitable for initial quality control but can offer a more detailed delineation for precise assessment. The map of non-problematic voxels could also be used to highlight the areas requiring no quality assessment given the absence of changes in the distribution.
The proportion of points identified as problematic is very low (1-4%). However, their visual representation can be overwhelming for the controller, given the high number of scattered detections. To address this challenge, we introduced the clustering and filtering of detections. Though this allows for visually more understandable areas, it naturally sacrifices the exhaustiveness of the detections. For example, low walls and hedges were frequently classified differently between the v.1 and the v.2. Due to the clustering favoring areas with grouped elements, such elongated and thin objects can be cut out of the mapping. Possible future works could study other filtering methods to attenuate this issue.
Currently, no full assessment of the detections on a tile was performed. Therefore, it is hard to estimate the quantity of detections that would be missing in clusters or in the \"problematic\" bucket.
For the precision of the results, the evaluation of the small subset of detections by the domain experts indicates that they are relevant and possibly useful as a tool for quality assessment.
While the developed method allows for finding changes between two point clouds, it has some limitations. First, it only works if the classes from v.2 can be mapped to overarching classes in v.1. This is not always the case, due to the lack of consensus between LiDAR providers. Another limitation comes from the voxel size. Indeed, by employing fixed volumes of 3.375 m3 to detect the changes, points not contributing to the actual change will also be included in the highlighted areas. A possible improvement would be to refine the detection area after the clustering. Another thing to consider is that the method work on a single tile at a time, without consideration of the tiles around it. This can potentially affect the clustering step as voxels on the border have fewer neighbors. To ensure that this does not affect the results a buffer could be taken around the tile. This buffer could also ensure that the total tile size is a multiple of the voxel size. The method is currently limited to the use of a single reference generation. However, with the frequent renewal of LiDAR acquisitions, it should soon be possible to compare several generations with a new acquisition. The decision tree could then be adapted to take into account the stability of the classification in the voxels and prioritize change in stable areas over areas with high variation, such as forests.
"},{"location":"PROJ-QALIDAR/#6-conclusion","title":"6. Conclusion","text":"Quality assessment of LiDAR classification is a demanding task, requiring a considerable amount of work by an operator. With the proposed method, controllers can leverage a previous acquisition to highlight changes in the new one. The detections are divided in different levels of urgency, allowing for control of various granularity levels.
The limited number of voxels preserved in the map of primary changes encourages the prospect of its usefulness in a quality assessment process. The positive review of a sample of voxels by the domain expert further confirms the method quality.
A possible step to make the detections more suited to experts' specific needs could be to review a broader sample of voxel by criticality buckets to optimize the thresholds of the decision tree.
In the planned future, the cluster produced by the algorithm will be tested on tiles in another region and with other LiDAR data. If the results are deemed satisfying, the method will be tested in swisstopo's workflow when the production for the next generation of swissSURFACE3D begins. The test in the workflow should enable a control of the detection precision and exhaustively, as well as an estimation of the time spared for an operator by working with the developed algorithm.
By the domain expert's evaluation, out of all the operations applied during a quality assessment, the developed method touches on operations which make up 52.7% of the control time. These operations could be made faster by having zones of interest already precomputed.
"},{"location":"PROJ-QALIDAR/#7-acknowledgements","title":"7. Acknowledgements","text":"This project was made possible thanks to the swisstopo's LiDAR team that submitted this task to the STDL and provided regular feedback. Special thanks are extended to Florian Gandor for his expertise and his meticulous review of the method and results. In addition, we are very appreciative of the active participation of Matthew Parkan and Mayeul Gaillet to our meetings.
"},{"location":"PROJ-QALIDAR/#8-bibliography","title":"8. Bibliography","text":"Xin Wang, HuaZhi Pan, Kai Guo, Xinli Yang, and Sheng Luo. The evolution of LiDAR and its application in high precision measurement. IOP Conference Series: Earth and Environmental Science, 502(1):012008, May 2020. URL: https://iopscience.iop.org/article/10.1088/1755-1315/502/1/012008 (visited on 2024-02-20), doi:10.1088/1755-1315/502/1/012008.\u00a0\u21a9
swissSURFACE3D. URL: https://www.swisstopo.admin.ch/fr/modele-altimetrique-swisssurface3d#technische_details (visited on 2024-01-16).\u00a0\u21a9
Uwe Stilla and Yusheng Xu. Change detection of urban objects using 3D point clouds: A review. ISPRS Journal of Photogrammetry and Remote Sensing, 197:228\u2013255, March 2023. URL: https://linkinghub.elsevier.com/retrieve/pii/S0924271623000163 (visited on 2023-10-05), doi:10.1016/j.isprsjprs.2023.01.010.\u00a0\u21a9
Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, and Mohammed Bennamoun. Deep Learning for 3D Point Clouds: A Survey. June 2020. arXiv:1912.12033 [cs, eess]. URL: http://arxiv.org/abs/1912.12033 (visited on 2024-01-18).\u00a0\u21a9
Harith Aljumaily, Debra F. Laefer, Dolores Cuadra, and Manuel Velasco. Voxel Change: Big Data\u2013Based Change Detection for Aerial Urban LiDAR of Unequal Densities. Journal of Surveying Engineering, 147(4):04021023, November 2021. Publisher: American Society of Civil Engineers. URL: https://ascelibrary.org/doi/10.1061/%28ASCE%29SU.1943-5428.0000356 (visited on 2023-11-20), doi:10.1061/(ASCE)SU.1943-5428.0000356.\u00a0\u21a9
J. Gehrung, M. Hebel, M. Arens, and U. Stilla. A VOXEL-BASED METADATA STRUCTURE FOR CHANGE DETECTION IN POINT CLOUDS OF LARGE-SCALE URBAN AREAS. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, IV-2:97\u2013104, May 2018. Conference Name: ISPRS TC II Mid-term Symposium \\textless q\\textgreater Towards Photogrammetry 2020\\textless /q\\textgreater (Volume IV-2) - 4–7 June 2018, Riva del Garda, Italy Publisher: Copernicus GmbH. URL: https://isprs-annals.copernicus.org/articles/IV-2/97/2018/isprs-annals-IV-2-97-2018.html (visited on 2024-01-18), doi:10.5194/isprs-annals-IV-2-97-2018.\u00a0\u21a9
Yusheng Xu, Xiaohua Tong, and Uwe Stilla. Voxel-based representation of 3D point clouds: Methods, applications, and its potential use in the construction industry. Automation in Construction, 126:103675, June 2021. URL: https://www.sciencedirect.com/science/article/pii/S0926580521001266 (visited on 2024-01-18), doi:10.1016/j.autcon.2021.103675.\u00a0\u21a9
Zhenwei Shi, Zhizhong Kang, Yi Lin, Yu Liu, and Wei Chen. Automatic Recognition of Pole-Like Objects from Mobile Laser Scanning Point Clouds. Remote Sensing, 10(12):1891, 2018. Number: 12, Publisher: Multidisciplinary Digital Publishing Institute. URL: https://www.mdpi.com/2072-4292/10/12/1891 (visited on 2024-01-19), doi:10.3390/rs10121891.\u00a0\u21a9
Sung-Hyuk Cha. Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. INTERNATIONAL JOURNAL OF MATHEMATICAL MODELS AND METHODS IN APPLIED SCIENCES, 4(1):300\u2013307, 2007. URL: https://pdodds.w3.uvm.edu/research/papers/others/everything/cha2007a.pdf (visited on 2024-03-14).\u00a0\u21a9
[Volume of Labels] Compute Clique Statistics. URL: https://brainvisa.info/axon/fr/processes/AtlasComputeCliqueFromLabels.html (visited on 2024-02-23).\u00a0\u21a9
Nils Hamel (UNIGE) - Huriel Reichel (swisstopo)
Proposed by the Federal Statistical Office - TASK-REGBL December 2020 to February 2021 - Published on March 2, 2021
Abstract: The Swiss Federal Statistical Office is in charge of the national Register of Buildings and Dwellings (RBD) which keeps track of every existing building in Switzerland. Currently, the register is being completed with buildings in addition to regular dwellings to offer a reliable and official source of information. The completion of the register introduced issues due to missing information and their difficulty to be collected. The construction year of the buildings is one missing information for a large amount of register entries. The Statistical Office mandated the STDL to investigate on the possibility to use the Swiss National Maps to extract this missing information using an automated process. A research was conducted in this direction with the development of a proof-of-concept and a reliable methodology to assess the obtained results.
"},{"location":"PROJ-REGBL/#introduction","title":"Introduction","text":"The Swiss Federal Statistical Office [1] is responsible of maintaining the Federal Register of Buildings and Dwellings (RBD) in which a collection of information about buildings and homes are stored. Currently, a completion operation of the register is being conducted to include to it any type of construction on the Swiss territory.
Such completion operation comes with many challenges including the gathering of the information related to the construction being currently integrated to the register. In this set of information are the construction years of the buildings. Such information is important to efficiently characterise each Swiss building and to allow the Statistical Office to provide a reliable register to all actors relying on it.
The construction year of buildings turns out to be complicated to gather, as adding new buildings to the register already impose an important workload even for the simple information. In addition, in many cases, the construction year of the building is missing or can not be easily collected to update the register.
The Statistical Office mandated the STDL to perform researches on the possibility to automatically gather the construction year by analysing the swisstopo [3] National Maps [4]. Indeed, the Swiss national maps are known for their excellency, their availability on any geographical area, and for their temporal cover. The national maps are made with a rigorous and well controlled methodology from the 1950s and therefore they can be used as a reliable source of information to determine the buildings' construction year.
The STDL was then responsible for performing the researches and developing a proof-of-concept to provide all the information needed to the Statistical Office for them to take the right decision on considering national maps as a reliable way of assigning a construction year for the buildings lacking information.
"},{"location":"PROJ-REGBL/#research-project-specifications","title":"Research Project Specifications","text":"Extracting the construction date out of the national maps is a real challenge, as the national maps are a heavy dataset, they are not easy to be considered as a whole. In addition, the Statistical Office needs the demonstration that it can be done in a reliable way and within a reasonable amount of time to limit the cost of such process. They are also subjected to strict tolerances on the efficiency of the construction years extraction through an automated process. The goal of at least 80% of overall success was then provided as a constraint to the STDL.
As a result, the research specifications for the STDL were:
Gathering and understanding the data related to the problem
Developing a proof-of-concept demonstrating the possibility to extract the construction years from the national maps
Assessing the results with a reliable metric to allow demonstrating the quality and reliability of the obtained construction years
In this research project, two datasets were considered: the building register itself and the national maps. As both datasets are heavy and complex, considering them entirely for such a research project would have been too complicated and unnecessary. It was then decided to focus on four areas selected for their representativeness of Swiss landscape:
Basel (BS): Urban area
Bern (BE): Urban and peri-urban area
Biasca (TI): Rural and mountainous
Caslano (TI): Peri-urban and rural
The following images give a geographical illustration of the selected areas through their most recent map:
Illustration of the selected areas: Basel (2015), Bern (2010), Biasca (2012) and Caslano (2009) Data: swisstopoBasel was selected as it was one example of an area on which the building register was already well filled in terms of construction years. The four regions are 6km by 6km squared areas which allows up to twenty thousand buildings to be considered on a single one.
"},{"location":"PROJ-REGBL/#federal-register-of-buildings-and-dwellings","title":"Federal Register of Buildings and Dwellings","text":"The register of buildings is a formal database composed with entries, each of them representing a specific building. Each entry comes with a set of information related to the building they describe. In this project, a sub-set of these informations was considered:
Federal identifier of the building (EGID)
The position of the building, expressed in the EPGS:2056 (GKODE, GKODN)
The building construction year, when available (GBAUJ)
The surface of the building, when available, expressed in square metres (GAREA)
In addition, tests were conducted by considering the position of the entries of each building. In turned out rapidly that they were not useful in this research project as they were missing on a large fraction on the register and only providing a redundant information according to the position of the buildings.
The following table gives a summary of the availability of the construction year in the register according to the selected areas:
Area Buildings Available years Missing fraction Basel 17\u2019088 16\u2019584 3% Bern 21\u2019251 4\u2019499 79% Biasca 3\u2019774 1\u2019346 64% Caslano 5\u2019252 2\u2019452 53%One can see that the amount of missing construction year can be large depending on the considered area.
"},{"location":"PROJ-REGBL/#national-maps","title":"National Maps","text":"On the side of the national maps, the dataset is more complex. In addition to the large number of available maps, variations of them can also be considered. Indeed, maps are made for different purposes and come with variations in their symbology to emphasise elements on which they focus. Moreover, for modern years, sets of vector data can also be considered in parallel to maps. Vector data are interesting as they allow to directly access the desired information, that is the footprint of the building without any processing required. The drawback of the vector data is their temporal coverage which is limited to the last ten to twenty years.
The following images give an illustration of the aspect of the available maps and vector datasets considering the example of the Bern area. Starting with the traditional maps:
Available map variations: KOMB, KGRS and KREL - Data: swisstopoand the more specific and vector ones:
Available map variations: SITU, GEB and DKM25-GEB (vector) - Data: swisstopoIn addition to the number of available variations and data types, they all come with their specific temporal coverage. In the case of this research project, we tried to go back in time as much as possible, simplifying the choice for the older maps. The question still remains for more modern times.
As we are mostly interesting in buildings, the availability of already extracted building layers, that can be either raster or vector data, is highly interesting. But the problem of data selection is complex in our case. Indeed, no matter the choice, on the older times, the only available maps have to be considered. In addition to building footprint access, the question of the continuity of the data as to be considered with care. More than building footprints, we are interested in the continuity of these footprints, in order to be able to safely assume the cycle of life of the tracker buildings.
This consideration led us to discover variation in methodologies depending on the considered set of data. Indeed, buildings are not shaped in the same way on traditional maps than they are in layer focusing on them. It follows that variation of the symbology, so do the shape of the building, appears between traditional maps and building layers (raster and vector). These variations can lead to shocks going from a map to the one preceding it in time. This can break the continuity of the building footprints along time, making them much more difficult to track safely.
This is the reason we choose to focus on the KOMB variation of the maps. These maps are very stable and covers the largest temporal ranges. The methodology was kept very similar along the years, making this dataset much more reliable to work with when the time dimension is considered. Only considering the KOMB variation of the maps also allows to ensure that all source data is treated the same in the processing pipeline, easing the assessment of the results.
In addition, the KOMB maps are dense in information and come with colorimetry of their symbology. This opens the possibility to more easily extract the information we need in this project, that are the building footprints. One exception was made concerning the KOMB maps: in their very latest version, the methodology changed, causing the symbology to be different with the older KOMB maps. In their latest version, texts are much more numerous and tend to cover a large amount of the buildings, making them invisible. For this reason, their latest version was dropped, slightly reducing the temporal coverage on the 2015-2020 period.
Selecting the KOMB variation allowed us to obtain the following temporal coverage for the four selected areas:
Area Oldest map Latest map Mean separation Basel 1955 2015 5.5 Years Bern 1954 2010 5.6 Years Biasca 1970 2012 6.0 Years Caslano 1953 2009 6.2 YearsOne can see that a large portion of the 20th century can be covered using the maps with a very good resolution of around five to six years between the maps.
"},{"location":"PROJ-REGBL/#research-approaches","title":"Research Approaches","text":"In this research project, the main focus was put on the national maps to extract the construction year of buildings as the maps are sources on which we can rely and assess the results. The only drawback of the maps is their limited temporal coverage, as they only start to be available in the 1950s.
This is the reason why another experimental approach was also added to address the cases of building being built before the 1950s. This secondary approach focused on a statistical methodology to verify to which extent it could be possible to assign a construction date even in the case no maps are available.
National Maps: This main approach focuses on the national maps from which the construction year of a building is deduced from a temporal analysis of the maps. Each building is tracked until it disappears or change its shape on a given map allowing to deduce that the construction of the building was made in the gap separating the map and its successor one.
Statistical Analysis: This method is based on the principle of spatial dependence and furthermore on concentric zones of urban development. This is technically an interpolator which deduces construction years based first on different searching radii for difference variances, second by splitting the data in quantiles and, finally, by a gaussian mixture model unsupervised learning technique to gather the final predictions.
The statistical analysis allows then to consider buildings that were detected on all maps, meaning their construction is older than the oldest available map, to assign them an estimation of their construction year, knowing they had to be older than the oldest map.
"},{"location":"PROJ-REGBL/#research-approach-national-maps","title":"Research Approach: National Maps","text":"In order to detect construction year of buildings, we need to be able to track them down on the maps across the temporal coverage. The RBD is providing the reference list of the building, each coming with a federal identifier (EGID) and a position. This position can then be used to track down the building on maps for its appearance or morphological change.
As the maps are already selected, as the research areas, this research approach can be summarised in the following way:
Translating maps into binary images containing only building
Extracting the RBD buildings related to the analysed area
Detection procedure of the buildings on the maps
Detection of the morphological variation of the buildings
Assessment of the obtained results
The four first points are related to the development of the proof-of-concept. The last one concern a very sensitive and complicated question relative to the considered problem: how to analyse and assess the obtained results. This question was to most difficult question in this research, and finding a clear and reliable answer is mandatory before to develop anything. For this reason, it is considered in the first place.
"},{"location":"PROJ-REGBL/#reliability-of-the-data","title":"Reliability of the Data","text":"Assessing the results is essentially having a strong reference allowing to compare both in order to obtain a reliable characterisation of the success rate in the deduction of the construction years. This question leads to the discovery that this problem is much more complex that and can appear in the first place.
Indeed, we were warned by the Statistical Office that the RBD, considering the construction years it already gives, can be unreliable on some of its portions. This can be explained by the fact that collecting such information is a long and complicated administrative process. As an example, the following image gives an illustration of a building tracked on each of the available selected maps:
Temporal track of a selected buildingOn this illustration, one can see two things: the RBD announce a construction year in 1985; the maps are clearly indicating something different, locating its construction year between 1963 and 1969. So both datasets are contradicting each other. In order to solve the contradiction, we manually searched for historical aerial images. The following images illustrate what was found:
Aerial view of the building situation: 1963, 1967 and 1987 - Data: swisstopoOne can clearly see that the maps seem to give the correct answer concerning the construction date of this specific building, the RBD being contradicted by two other sources. This illustrates the fact that the RBD can not be directly considered as a reliable reference to assess the results.
The same question applies for the maps. Even if it is believed that they are highly reliable, one has to be careful with such consideration. Indeed, looking at the following example:
Temporal track of a selected buildingIn this case, the RBD gives 1986 as the construction date of the pointed building. The maps are giving a construction year between 1994 and 2000. Again, the two datasets are contradicting each other. The same procedure was conducted to solve the contradiction:
Aerial view of the building situation: 1970, 1986 and 1988 - Data: swisstopoLooking at the aerial images, it seems that the tracked building was there in 1988. One can see that the map in 1994 continue to represent the four old buildings instead on the new one. It's only in 2000 that the maps are correctly representing the new building. This shows that despite maps are a reliable source of geo-information, they can also be subjected to delay in their symbology.
The maps also come with the problem of the consistency of the building footprint symbology. Looking at the following example:
Temporal track of a selected buildingone can see that the maps seem to indicate a strange evolution of the situation: a first building appears in 1987 and it is destroyed and replaced by a larger one in 1993. Then, this new large building seems to have been destroyed right after its construction to be replaced by a new one in 1998. Considering aerial images of the building situation:
Aerial image view of the building situation: 1981, 1987 and 1993 - Data: swisstopoone can clearly see that a first building was constructed and completed by an extension between 1987 and 1993. This shows an illustration where the symbology of the building footprints can be subjected to variation than can be de-synchronised regarding the true situation.
"},{"location":"PROJ-REGBL/#metric","title":"Metric","text":"In such context, neither the RBD or the national maps can be formally considered as a reference. It follows that we are left without a solution to assess our results, and more problematically, without any metric able to guide the developments of the proof-of-concept in the right direction.
To solve the situation, one hypothesis is made in this research project. Taking into account both the RBD and the national maps, one can observe that both are built using methodologies that are very different. On one hand, the RBD is built out of a complex administrative process, gathering the required information in a step by step process, going from communes to cantons, and finally to the Statistical Office. On the other hand, the national maps are built using regular aerial image campaigns conducted over the whole Switzerland. The process of establishing maps is quite old and can then be considered as well controlled and stable.
Both datasets are then made with methodologies that can be considered as fully independent from each other. This led us to the formulation of our hypothesis:
One should remain careful with this hypothesis, despite it sounds reasonable. It would be very difficult to assess it as requiring to gather complex confirmation data that would have to be independent of the RBD, the national maps and the aerial images (as maps are based on them). This assumption is the only one made in this research project.
Accepting this assumption leads us to the possibility to establish a formal reference that can be used as a metric to assess the results and to guide the development of the proof-of-concept. But such reference has to be made with care, as the problem remains complex. To illustrate this complexity, the following figure gives a set representation of our problem:
Set representation of the RBD completion problemThe two rectangles represent the set of buildings for a considered area. On the left, one can see the building set from the RBD point of view. The grey area shows the building without the information of their construction year. Its complementary set is split in two sub-sets that are the buildings having a construction year that is absolutely correct and absolutely incorrect (the limit between both is subject to a bit of interpretation, as the construction year is not a strong concept). If a reference can be extracted, it should be in the green sub-set. The problem is that we have no way of knowing which building are in which sub-set. So the national maps were considered to define another sub-set: the synchronous sub-set where both RBD and national maps agree.
To build the metric, the RBD sub-set of buildings coming with the information of the construction year is randomly sub-sampled to extract a representative sub-set: the potentials. This sub-set of potentials is then manually analysed to separate the building on which both datasets agree and to reject the other. At the end of the process, the metric sub-set is obtained and should remain representative.
On the right of the set representation is the view of the buildings set through the national maps. One can see that the same sub-set appears but it replaces the construction years by the representation of the building on the maps. The grey part is then representing the building that are not represented on the maps because of their size or because they can be hidden by the symbology for example. The difference is that the maps do not give access to the construction years directly, but they are read from the maps through our developed detector. The detector having a success rate, it cuts the whole set of sub-sets in half, which is exactly what we need for out metric. If the metric sub-set remains representative, the success rate of the detector evaluated on it should generalise to the whole represented buildings.
This set representation demonstrates that the problem is very complex and has to be handled with care. Considering only the six most important sub-set and considering construction year are extracted by the detector from the maps, it means that up to 72 specific case can apply on each building randomly selected.
To perform the manual selection, a random selection of potential buildings was made on the RBD set of buildings coming with a construction year. The following table summarises the selection and manual validation:
Area Potentials Metric Basel 450 EGIDs 209 EGIDs Bern 450 EGIDs 180 EGIDs Biasca 336 EGIDs 209 EGIDs Caslano 450 EGIDs 272 EGIDsThe previous table gives the result of the second manual validation. Indeed, two manual validation sessions were made, with several weeks in-between, to check the validation process and how it evolved with the increase of the view of the problem.
Three main critics can then be addressed to the metric: the first one is that establishing validation criterion is not simple as the number of cases in which buildings can fall is very high. Understanding the problem takes time and requires to see a lot of these cases. It then follows that the second validation session was more stable and rigorous than the first one.
The second critic that can be made on our metric is the selection bias. As the process is made by a human, it is affected by its way of applying the criterion and more specifically on by its severity on their application. Considering the whole potentials sub-set, one can conclude that a few buildings could be rejected and validated depending on the person doing the selection.
The last critic concerns specific cases for which the asynchronous criterion to reject them is weak. Indeed, for some buildings, the situation is very unclear in the way the RBD and the maps give information that can not be understood. This is the case for example when the building is not represented on the map. This can be the position in the RBD or the lack of information on the maps that lead to such an unclear situation. These cases are then rejected, but without being fully sure of the asynchronous aspect regarding the maps and the RBD.
"},{"location":"PROJ-REGBL/#methodology","title":"Methodology","text":"With a reliable metric, results can be assessed and the development of the proof-of-concept can be properly guided. As mentioned above, the proof-of-concept can be split in four major steps that are the processing of the maps, the extraction of the RBD buildings, detection of the building on the maps and detection in morphological changes.
"},{"location":"PROJ-REGBL/#national-maps-processing","title":"National Maps Processing","text":"In order to perform the detection of building on the maps, a reliable methodology is required. Indeed, one could perform the detection directly on the source maps but this would lead to a complicated process. Indeed, maps are mostly the result of the digitisation of paper maps creating a large number of artefacts on the digital images. This would lead to an unreliable way of detecting building as a complicated decision process would have to be implemented each time a RBD position is checked on each map.
A map processing step was then introduced in the first place allowing to translate the color digitised images into reliable binary images on which building detection can be made safely and easily. The goal of this process is then to create a binary version of each map with black pixels indicating the building presence. A method of extracting buildings on maps was then designed.
Considering the following example of a map cropped according to a defined geographical area (Basel):
Example of a considered map: Basel in 2005 and closer view - Data: swisstopo
The first step of the map processing methodology is to correct and standardise the exposure of the digitised maps. Indeed, as maps mostly result of a digitisation process, they are subjected to exposure variation due to the digitisation process. A simple standardisation is then applied.
The next step consists in black pixel extraction. Each pixel of the input map is tested to determine whether or not it can be considered as black using specific thresholds. As the building are drawn in black, extracting black pixels is a first way of separating the buildings from the rest of the symbology. The following result is obtained:
Result of the black extraction process
As one can see on the result of the black extraction process, the buildings are still highly connected to other symbological elements and to each others in some cases. Having the building footprints well separated and well defined is an important point for subsequent processes responsible of construction years deduction. To achieve it, two steps are added. The first one uses a variation of the Conway game of life [5] to implement a morphological operator able to disconnect pixel groups. The following image gives the results of this separation process along with the previous black extraction result on which it is based:
Result of the morphological operator (right) compare to the previous black extraction (left)
As the morphological operator provides the desired result, it also shrinks the footprint of the elements. It allows to eliminate a lot of structures that are not buildings, but it also reduces the footprint of the buildings themselves, which can increase the amount of work to perform by the subsequent processes to properly detect a building. To solve this issue and to obtain building footprints that are as close as possible to the original map, a controlled re-growing step is added. It uses a region threshold and the black extraction result to re-grow the buildings without going any further of their original definition. The following images give a view of the final result along with the original map:
Final result of the building footprints extraction (right) compared to the original map
As the Conway morphological operator is not able to get rid of all the non-building elements, such as large and bold texts, the re-growing final step also thickening them along with the building footprints. Nevertheless, the obtained binary image is able to keep most of the building footprint intact while eliminating most of the other element of the map as illustrated on the following image:
Extracted building footprints, in pink, superimposed on the Bern mapThe obtained binary images are then used for both detection of building and detection of morphological changes as the building are easy to access and to analyse on such representation.
"},{"location":"PROJ-REGBL/#building-extraction-from-rbd","title":"Building Extraction from RBD","text":"In the case of limited geographical areas as in this research project, extracting the relevant buildings from the RBD was straightforward. Indeed, the RBD is a simple DSV database that is very easy to understand and to process. The four areas were packed into a single DSV file and the relevant building were selected through a very simple geographical filtering. Each area being defined by a simple geographical square, selecting the buildings was only a question of checking if their position was in the square or not.
"},{"location":"PROJ-REGBL/#building-detection-process","title":"Building Detection Process","text":"Based on the computed binary images, each area can be temporally covered with maps on which building can be detected. Thanks to the processed maps, this detection is made easily, as it was reduced to detect black pixels in a small area around the position of the building provided in the RBD. For each building in the RBD, its detection on each temporal version of the map is made to create a presence table of the building. Such table is simply a Boolean value indicating whether a building was there or not according to the position provided in the RBD.
The following images give an illustration of the building detection process on a given temporal version of a selected map:
Detection overlay superimposed on its original map (left) and on its binary counterpart (right)
One can see that for each building and for each temporal version of the map, the decision of a building presence can be made. At the end of this process, each building is associated to a list of presence at each year corresponding to an available map.
"},{"location":"PROJ-REGBL/#morphological-change-detection","title":"Morphological Change Detection","text":"Detecting the presence of a building on each temporal version of the map is a first step but is not enough to determine whether or not it is the desired building. Indeed, a building can be replaced by another along the time dimension without creating a discontinuity in the presence timeline. This would lead to misinterpret the presence of building with another one, leading the construction year to be deduced too far in time. This can be illustrated by the following example:
Example of building being replaced by another one without introducing a gap in the presence tableIn case the detection of the presence of the building is not enough to correctly deduce a construction year, a morphological criterion is added. Many different methodologies have been tried in this project, going from signature to various quantities deduce out of the footprint of the building. The most simple and most reliable way was to focus on the pixel count of the building footprint, which corresponds to its surface in geographical terms.
A morphological change is considered as the surface of the building footprint changes up to a given threshold along the building presence timeline. In such a case, the presence timeline is broken at the position of the morphological change, interpreting it in the same way as a formal appearance of a building.
Introducing such criteria allowed to significantly improve our results, especially in the case of urban centers. Indeed, in modern cities, large number of new buildings were built just after a previous building was being destroyed due to the lack of spaces left for new constructions.
"},{"location":"PROJ-REGBL/#results","title":"Results","text":"The developed proof-of-concept is applied on the four selected areas to deduce construction year for each building appearing in the RBD. With the defined metric, it is possible to assess the result in a reliable manner. Nevertheless, assessing the results with clear representations is not straightforward. In this research project, two representations were chosen:
Histogram of the success rate: For this representation, the building of the metric are assigned to temporal bins of ten years in size and the success rate of the construction year is computed for each bins.
Distance and pseudo-distance distribution: As the previous representation only gives access to a binary view of the results, a distance representation is added to understand to which extent mistakes are made on the deduction of a construction year. For buildings detected between two maps, the temporal middle is assumed as the guessed construction year, allowing to compute a formal distance with its reference. In case a building is detected before or beyond the map range, a pseudo-distance of zero is assigned in case the result is correct according to the reference. Otherwise, the deduced year (that is necessarily between two maps) is compared to its reference extremal map date to obtain an error pseudo-distance.
In addition to the manually defined metric, the full RBD metric is also considered. As the construction years provided in the RBD have to be considered with care, as part of them are incorrect, comparing the results obtained the full RBD metric and the metric we manually defined opens the important question of the synchronisation between the maps and the RBD, viewed from the construction perspective.
"},{"location":"PROJ-REGBL/#results-basel-area","title":"Results: Basel Area","text":"The following figures give the Basel area result using the histogram representation. The left plot uses the full RBD metric while the right one uses the manually validated one:
Histogram of the success rate - Ten years binsOne can see one obvious element that is the result provided by the full RBD metric (left) and the manually validated metric (right) are different. This is a clear sign that the RBD and the maps are de-synchronised on a large fraction of the building set of Basel. The other element that can be seen on the right plot is that the deduction of the construction year are more challenging where maps are available. Indeed on the temporal range covered by the maps (vertical white lines), the results drops from the overall results to 50-60% on some of the histogram bins.
The following figures show the distance and pseudo-distance distribution of the error made on the deduced construction year according to the chosen metric:
Distance (red and blue) and pseudo-distance (red) of the error on the construction yearsThe same differences as previously observed between the two metrics can also be seen here. Another important observation is that the distribution seems mostly symmetrical. This indicates that no clear deduction bias can be observed in the results provided by the proof-of-concept.
"},{"location":"PROJ-REGBL/#results-bern-area","title":"Results: Bern Area","text":"The following figures give the histogram view of the results obtained on the Bern area:
Histogram of the success rate - Ten years binsOne can observe that the results are similar to the result of Basel whilst being a bit better. In addition, one can clearly see that the difference between the full RBD metric and the manually validated metric huge here. This is probably the sign that the RBD is mostly incorrect in the case of Bern.
The following figures show the distance distributions for the case of Bern:
Distance (red and blue) and pseudo-distance (red) of the error on the construction yearsAgain, the distribution of the error on the deduced construction year is symmetrical in the case of Bern.
"},{"location":"PROJ-REGBL/#results-biasca-area","title":"Results: Biasca Area","text":"The following figures give the histogram view of the success rate for the case of Biasca:
Histogram of the success rate - Ten years binsIn this case, the results are much better according to the manually validated metric. This can be explained by the fact that Biasca is a rural/mountainous area in which growing of the urban areas are much simpler as buildings once built tend to remain unchanged, limiting the difficulty to deduce a reliable construction year.
The following figures show the distance distribution for Biasca:
Distance (red and blue) and pseudo-distance (red) of the error on the construction yearsThis confirms the results seen on the histogram figure and shows that the results are very good on such areas.
"},{"location":"PROJ-REGBL/#results-caslano-area","title":"Results: Caslano Area","text":"Finally, the following figures show the histogram view of the success rate of the proof-of-concept on the case of Caslano:
Histogram of the success rate - Ten years binsThe same consideration applies as for the Biasca case. The results are very good as part of the Caslano area can be considered as rural or at least peri-urban. The results are a bit less good than in the Biasca case, drawing the picture that urban centres are more difficult to infer than rural areas.
The following figures show the error distribution for Caslano:
Distance (red and blue) and pseudo-distance (red) of the error on the construction years"},{"location":"PROJ-REGBL/#results-synthesis","title":"Results: Synthesis","text":"In order to synthesise the previous results, that were a bit dense due to the consideration of two representations and two metrics, the following summary is given:
Basel: 78.0% of sucess rate and 80.4% of building correctly placed within \u00b15.5 years
Bern: 84.4% of sucess rate and 85.0% of building correctly placed within \u00b15.6 years
Biasca: 93.5% of sucess rate and 93.9% of building correctly placed within \u00b16.0 years
Caslano: 90.8% of sucess rate and 91.2% of building correctly placed within \u00b16.2 years
These results only consider the manually validated metric for all of the four areas. By weighting each area with their amount of buildings, one can deduce the following numbers:
These last numbers can be considered as a reasonable extrapolation of the proof-of-concept performance on the overall Switzerland.
"},{"location":"PROJ-REGBL/#conclusion","title":"Conclusion","text":"As a main conclusion to the national maps approach, one can consider the results as good. It was possible to develop a proof-of-concept and to apply it on selected and representative areas of Switzerland.
In this approach, it turns out that developing the proof-of-concept was the easy part. Indeed, finding a metric and demonstrating its representativeness and reliability was much more complicated. Indeed, as the two datasets can not be considered as fully reliable in the first place, a strategy had to be defined in order to be able to demonstrate that the chosen metric was able to assess our result in the way expected by the Statistical Office.
In addition, the metric only required one additional hypothesis on top of the two datasets. This hypothesis, consisting in assuming that the synchronous sub-set was a quasi-sub-set of the absolutely correct construction years, can be assumed to be reasonable. Nevertheless it is important to emphasise that it was necessary to make it, leading us to remains critic and careful whilst reading the results given by our metric.
The developed proof-of-concept was developed in C++, leading to an efficient code able to be used for the whole processing of Switzerland without the necessity to deeply modify it.
"},{"location":"PROJ-REGBL/#research-approach-statistical","title":"Research Approach: Statistical","text":"As the availability of the topographic/national maps does not reach the integrity of all building's year of construction in the registry, an add-on was developed to infer this information, whenever there was this need for extrapolation. Usually, the maps availability reaches the 1950s, whilst in some cities the minimum year of construction can be in the order of the 12th century, e.g. The core of this statistical model is based on the Concentric Zones Model (Park and Burgess, 1925)[6] extended to the idea of the growth of the city from the a centre (Central Business District - CBD) to all inner areas. The concept behind this statistical approach can be seen below using the example of a crop of Basel city:
Illustration of the Burgess concentric zone modelAlthough it is well known the limits of this model, which are strongly described in other famous urban models such as from Hoyt (1939)[7] and Harris and Ullman (1945)[8]. In general those critics refer to the simplicity of the model, which is considered and compensated for this application, especially by the fact that the main prediction target are older buildings that are assumed to follow the concentric zones pattern, differently than newer ones (Duncan et al., 1962)[9]. Commonly this is the pattern seen in many cities, hence older buildings were built in these circular patterns to some point in time when reconstructions and reforms are almost randomly placed in spatial and temporal terms. Moreover processes like gentrification are shown to be dispersed and quite recent (R\u00e9rat et al, 2010)[10].
In summary, a first predictor is built on the basis that data present a spatial dependence, as in many geostatistical models (Kanevski and Maignan, 2004[11]; Diggle and Ribeiro, 2007[12]; Montero and Mateu, 2015[13]). This way we are assuming that closer buildings are more related to distant buildings (Tobler, 1970[14]) in terms of year of construction and ergo the time dimension is being interpolated based on the principles of spatial models. We are here also demonstrating how those two dimensions interact. After that concentric zones are embedded through the use of quantiles, which values will be using in a probabilistic unsupervised learning technique. Finally, the predicted years are computed from the clusters generated.
"},{"location":"PROJ-REGBL/#metric_1","title":"Metric","text":"Similar to the detection situation, generating a validation dataset was an especially challenging task. First of all, the dates in the RBD database could not be trusted in their integrity and the topographic maps used did not reach this time frame. In order to ascertain the construction year in the database, aerial images from swisstopo (Swiss Federal Office of Topography) were consulted and this way buildings were manually selected to compound a validation dataset.
References extraction from aerial images manual analysisOne of the problems related to this approach was the fact that a gap between the surveys necessary for the images exists. This way it is not able to state with precision the construction date. These gaps between surveys were approximately in the range of 5 years, although in Basel, for some areas, it reached 20 years. An example of this methodology to create a trustworthy validation set can be seen below. In the left-hand side one can see the year of the first image survey (up) and the year registered in the RBD (down) and in the right-hand side, one can see the year of the next image survey in the same temporal resolution.
"},{"location":"PROJ-REGBL/#methodology_1","title":"Methodology","text":"First of all, a prior searching radius is defined as half of the largest distance (between random variables). For every prediction location, the variance between all points in the prior searching radius will be used to create a posterior searching radius. This way, the higher the variance, the smaller the searching radius, as we tend to trust data less. This is mainly based on the principle of spatial dependence used in many geostatistical interpolators. The exception to this rule is for variances that are higher than 2 x the mean distance between points. In this case, the searching radius increases again in order to avoid clusters of very old houses that during tests caused underestimation. The figure below demonstrates the logic being the creation of searching radii.
Searching radii computation processbeing d the distance between points, \u03bc the mean and s\u00b2 the variance of random variable values within the prior searching radius.
It is important to mention that in case of very large number of missing data, if the searching radius does not find enough information, the posterior mean will be the same as the prior mean, possibly causing over/underestimation in those areas.
This first procedure is used to fill the gaps in the entry database so clustering can be computed. The next step is then splitting the data in 10 quantiles, what could give the idea of concentric growth zones, inspired, in Burgess Model (1925)[7]. Every point in the database will then assume the value of its quantile. It is also possible to ignore this step and pass to clustering directly, what can be useful in two situations, if a more general purpose is intended or if the concentric zones pattern is not observed in the study area. As default, this step is used, which will be followed by an unsupervised learning technique. A gaussian mixture model, which does not only segments data into clusters, but indicates the probability of each point belonging to every cluster is then performed. The number of components computed is a linear function to the total number of points being used, including the ones that previously had gaps. The function to find the number of components is the following:
being np the number of components/clusters, and nc the total number of points used. The number of clusters shall usually be very large compared to a standard clustering exercise. To avoid this, this value is being divided by ten, but the number of clusters will never be smaller than five. An example of clustering performed by the embedded gaussian mixture model can be seen below:
Example of clustering process on the Basel areaHence the matrix of probabilities of every point belonging to each cluster (\u03bb - what can be considered a matrix of weights) is multiplied by the mean of each cluster ( 1 x nc matrix mc), forming the A matrix:
or in matrices:
Finally, the predictions can then be made using the sum of each row in the A matrix.
It is important to state that the same crops (study areas) were used for this test. Although Caslano was not used in this case, as it possesses too few houses with a construction date below the oldest map available. Using the metric above explained a hold out cross-validation was performed, this way a group of points was only used for validation and not for training. After that, the RMSE (Root Mean Squared Error) was calculated using the difference between the date in the RBD database and the predicted one. This RMSE was also extrapolated to the whole Switzerland, so one could have a notion of what the overall error could be, using the following equation (for the expected error):
where E is the error and n the number of buildings in each region.
In addition to the RMSE, the 95th percentile was computed for every study area and using all combined as well. Hence, one could discuss the spread and predictability of errors.
"},{"location":"PROJ-REGBL/#results_1","title":"Results","text":"The first case analysed was Basel, where the final RMSE was 9.78 years. The density plot below demonstrates the distribution of errors in Basel, considering the difference between the year of construction in the RBD database and the predicted one.
Distribution of error on construction year extrapolationAmong the evaluated cases, Basel presented a strong visible spatial dependence, and it was also the case which the largest estimated proportion of houses with construction years older than (1955) the oldest map (11336 or approximately 66% of buildings). Based on the validation dataset only, there was an overall trend of underestimation and the 95th percentile reached was 20 years, showing a not so spread and flat distribution of errors.
Bern was the second case evaluated, and it demonstrated to be an atypical case. This starts from the fact that a big portion of the dates seemed incongruent with reality, based on the aerial images observed and as seen in the previous detection approach. Not only that, but almost 80% of the buildings in Bern had missing data to what refers to the year of construction. This is especially complicated as the statistical method here presented is in essence an interpolator (intYEARpolator). Basically, as in any inference problem, data that is known is used to fill unknown data, therefore a reasonable split among known and unknown inputs is expected, as well as a considerable confidence on data. In the other hand, an estimated number of 1079 (approximately 27% of the buildings) buildings was probably older than the oldest map available (1954) in Bern crop. Therefore, in one way liability was lower in this case, but the number of prediction points was smaller too. The following figure displays the density of errors in Bern, where an RMSE of 20.64 years was computed.
Distribution of error on construction year extrapolationThere was an overall trend for overestimation, though there was still enough lack of spread in errors, especially if one considers the 95th percentile of 42.
Finally, the crop on Biasca was evaluated. The computed RMSE was of 13.13 years, which is closer to the Basel case and the 95th percentile was 17 years, this way presenting the least spread error distribution. In Biasca an estimated 1007 (32%) buildings were found, which is not much more than the proportion in Bern, but Biasca older topographic map used was from 1970, making of it an especially interesting case. The density plot below demonstrates the concentrated error case of Biasca:
Distribution of error on construction year extrapolationOnce the RMSE was computed for the three regions, it was extrapolated to the whole Switzerland by making consideration the size of each dataset:
Extrapolation of the error distribution on the whole SwitzerlandThe expected extrapolated error calculated was 15.6 years and the 95th percentile was then 31 years.
"},{"location":"PROJ-REGBL/#conclusion_1","title":"Conclusion","text":"This add-on allows extrapolating the predictions to beyond the range of the topographical maps. Its predictions are limited, but the accuracy reached can be considered reasonable, once there is a considerable lack of information in this prediction range. Nor the dates in the RBD, nor the topographic maps can be fully trusted, ergo 15.6 years of error for the older buildings is acceptable, especially by considering the relative lack of spread in errors distribution. If a suggestion for improvement were to be given, a method for smoothing the intYEARpolator predictions could be interesting. This would possibly shift the distribution of the error into closer to a gaussian with mean zero. The dangerous found when searching for such an approach is that the year of construction of buildings does not seem to present a smooth surface, despite the spatial dependence. Hence, if this were to be considered, a balance between smoothing and variability would need to found.
We also demonstrated a completely different perspective on how the spatial and temporal dimensions can be joined as the random variable predicted through spatial methodology was actually time. Therefore a strong demonstration of the importance of time in spatially related models and approaches was also given. The code for the intYEARpolator was developed in Python and it runs smoothly even with this quite big proportion of data. The singular case it can be quite time-demanding is in the case of high proportion of prediction points (missing values). It should also be reproducible to the whole Switzerland with no need for modification. A conditional argument is the use of concentric zones, that can be excluded in case of a total different pattern of processing time.
"},{"location":"PROJ-REGBL/#reproduction-resources","title":"Reproduction Resources","text":"The source code of the proof-of-concept for national maps can be found here :
The README provides all the information needed to compile and use the proof-of-concept. The presented results and plots can be computed using the following tools suite :
with again the README giving the instructions.
The proof-of-concept source code for the statistical approach can be found here :
with its README giving the procedure to follow.
The data needed to reproduce the national maps approach are not publicly available. For the national maps, a temporal series of the 1:25'000 maps of the same location are needed. They can be asked to swisstopp :
With the maps, you can follow the instruction for cutting and preparing them on the proof-of-concept README.
The RBD data, used for both approaches, are not publicly available either. You can query them using the request form on the website of the Federal Statistical Office :
Both proof-of-concepts READMEs provide the required information to use these data.
"},{"location":"PROJ-REGBL/#references","title":"References","text":"[1] Federal Statistical Office
[2] Federal Register of Buildings and Dwellings
[3] Federal Office of Topography
[4] National Maps (1:25'000)
[5] Conway, J. (1970), The game of life. Scientific American, vol. 223, no 4, p. 4.
[6] Park, R. E.; Burgess, E. W. (1925). \"The Growth of the City: An Introduction to a Research Project\". The City (PDF). University of Chicago Press. pp. 47\u201362. ISBN 9780226148199.
[7] Hoyt, H. (1939), The structure and growth of residential neighborhoods in American cities (Washington, DC).
[8] Harris, C. D., and Ullman, E. L. (1945), \u2018The Nature of Cities\u2019, Annals of the American Academy of Political and Social Science, 242/Nov.: 7\u201317.
[9] Duncan, B., Sabagh, G., & Van Arsdol,, M. D. (1962). Patterns of City Growth. American Journal of Sociology, 67(4), 418\u2013429. doi:10.1086/223165
[10] R\u00e9rat, P., S\u00f6derstr\u00f6m, O., Piguet, E., & Besson, R. (2010). From urban wastelands to new\u2010build gentrification: The case of Swiss cities. Population, Space and Place, 16(5), 429-442.
[11] Kanevski, M., & Maignan, M. (2004).\u00a0Analysis and modelling of spatial environmental data\u00a0(Vol. 6501). EPFL press.
[12] Diggle, P. J. Ribeiro Jr., P. J. (2007). Model-based Geostatistics. Springer Series in Statistics.
[13] Montero, J. M., & Mateu, J.\u00a0(2015). Spatial and spatio-temporal geostatistical modeling and kriging\u00a0(Vol. 998). John Wiley & Sons.
[14] Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic geography, 46(sup1), 234-240.
"},{"location":"PROJ-ROADSURF/","title":"Classification of road surfaces","text":"Gwena\u00eblle Salamin (swisstopo), Cl\u00e9mence Herny (Exolabs), Roxane Pott (swisstopo), Alessandro Cerioni (\u00c9tat de Gen\u00e8ve)
Proposed by the Federal Office of Topography swisstopo - PROJ-ROADSURF August 2022 to March 2023 - Published on August 28, 2023 All scripts are available on GitHub: https://github.com/swiss-territorial-data-lab/proj-roadsurf
Abstract: The Swiss road network extends over 83\u2019274 km. Information about the type of road surface is useful not only for the Swiss Federal Roads Office and engineering companies, but also for cyclists and hikers. Currently, the data creation and update is entirely done manually at the Swiss Federal Office of Topography. This is a time-consuming and methodical task, potentially suitable to automation by data science methods. The goal of this project is classifying Swiss roads according to their surface type, natural or artificial. We first searched for statistical differences between these two classes, in order to then perform supervised classification based on machine-learning methods. As we could not find any discriminant feature, we used deep learning methods. In terms of balanced F1 score, we obtained a global score of 0.74 over the training, validation and test area, 0.56 over the inference-only area.
"},{"location":"PROJ-ROADSURF/#1-introduction","title":"1. Introduction","text":"The Swiss road network extends over 83'274 km 1. Not only cyclists and hikers can be interested in knowing whether a given road section is covered by a natural or an artificial surface, but also the Swiss Federal Roads Office, which is in charge of road maintenance, and engineering companies. This information is found within the swissTLM3D 2 dataset, the large-scale topographic model of Switzerland produced by the Federal Office of Topography (swisstopo). Keeping the swissTLM3D dataset up to date is a time-consuming work that has to be done methodically. Operators draw and georeference new elements and fill in their attributes based on the visual interpretation of stereoscopic aerial images. The update of existing elements also follows this manual approach. Data science can help by autmatizing this time-consuming and systematic tasks.
So far, the majority of data science studies on the identification of the road surface type, in particular those based on artificial intelligence, have been conducted in the context of improving the driving and security of autonomous vehicles 3456. These works rely on images shot by cameras mounted at the front of the moving vehicle itself. To our knowledge, only one study, carried out by Mansourmoghaddam et al. (2022) 7, proposed a method based on object-based classification from aerial imagery, which could successfully tell artificial roads from natural ones. Another possible approach is to use spectral indices, as done by Zhao & Zhu (2022) 8 working on the distinction between artificial surfaces and bare land. However, their method is not specifically designed for road surfaces.
The goal of this project was to determine whether the road cover is artificial or natural with the development of data science tools. For this first test, only the roads of the class \"3m Strasse\" are considered.
Figure 1: Overview of the workflow for this project.As the location of roads was known, we faced a problem of supervised classification. Two approaches were tested to address it: machine learning (ML) and deep learning (DL). Both approaches used the same input data, aerial images and vector road location.
"},{"location":"PROJ-ROADSURF/#2-data","title":"2. Data","text":"As input data, this project used two datasets produced by the Federal Office of Topography: swissTLM3D and SWISSIMAGE RS. We worked with data for the year 2018, for which the images and ground truth, i.e. the manually vectorized and classified roads, are available for the area of interest (AOI). Coordinates are expressed in the EPSG:2056 reference system.
"},{"location":"PROJ-ROADSURF/#21-area-of-interest","title":"2.1. Area of interest","text":"Figure 2: Delimitation of the area of interest with the tile numbers of the 1:25'000 Swiss national map.The area of interest (AOI) defined for this study was represented by the tiles 1168, 1188, 1208 and 1228 of the Swiss national map at a scale of 1:25'000. This zone covers an area of 840 km2 and was chosen because of its representativeness of the Swiss territory.
"},{"location":"PROJ-ROADSURF/#22-swisstlm3d","title":"2.2. swissTLM3D","text":"The swissTLM3D 2 dataset is a large-scale topographic model of Switzerland. It contains geospatial data necessary o the national map, such as roads, buildings and land cover. Periodical updates rely on the manual work of specialized operators. They interpret stereoscopic images and fill in attributes with the help of some additional information, like cadastral surveys and terrestrial images. The specification of aerial imagery is similar to the SWISSIMAGE RS product. The road layer contains lines with the identifier, the structure (none, bridge, tunnel, etc.), the object type (highways, 8m roads, 1 m paths, etc.) and the surface type as attributes. The two possible classes of the surface type are defined in the metadata: artificial (German: Hart) and natural (Natur). The artificial class contains surfaces of hard artificial materials like asphalt, concrete or slabs. The natural class contains roads with a surface of natural materials like gravel or dirt, and untreated surfaces.
In this project, it was decided to test the classification for the type \"3m Strasse\" (3 m roads). This class encompasses roads that are between 2.81 m and 4.20 m wide. Within this subset, 6486 roads have an artificial surface and 289 a natural one. The dataset is heavily unbalanced toward the artificial roads.
In addition, the swissTLM3D dataset was used to identify the forests. Indeed, they prevent us from observing roads from aerial images; hence those roads cannot be used in our study. As no layer in the swissTLM3D is specifically devoted to forested areas, they were deduced from the land cover classes. A filter was applied to only keep forests (\"Wald\") and open forests (\"Wald offen\").
Over the AOI, all the roads in quarries have a natural surface. We used our own layer from the project on the detection of mineral extraction sites to know their location. However, it is possible to use the information on the area of use from the swissTLM3D dataset which has a class on gravel quarries and one on stone quarries.
"},{"location":"PROJ-ROADSURF/#23-swissimage-rs","title":"2.3. SWISSIMAGE RS","text":"The product SWISSIMAGE RS 9 contains aerial images of Switzerland composed by four bands: near-infrared (NIR), red (R), green (G) and blue (B). The ground resolution equals 0.10 m over the area of interest, except in some high altitude regions or regions with complex topography, where a resolution of 0.25 m is deemed sufficient. The standard deviation is +/- 0.15 m (1 sigma) for a ground resolution of 0.10 m and +/- 0.25 m (1 sigma) for a ground resolution of 0.25 m, +/- 3-5 m (1 sigma). The dataset is composed of a collection of 16-bit encoded GeoTIFF orthorectified images. The overlap between images varies, but stays always present.
"},{"location":"PROJ-ROADSURF/#3-preprocessing","title":"3. Preprocessing","text":"Both the swissTLM3D and SWISSIMAGE RS dataset were processed to be suitable for the algorithms we wanted to develop. This was achieved with two procedures: the generation of the road domain and the creation of a raster mosaic.
"},{"location":"PROJ-ROADSURF/#31-generation-of-the-road-domain","title":"3.1. Generation of the Road Domain","text":"The swissTLM3D contains a vector layer representing every road section as a 3D line with some attached attributes. As a first test, the beneficiaries requested us to perform the analysis only on roads of the type \"3m Strasse\", i.e the roads wider than 2.81 m and thinner than 4.20 m. The engineered structures were excluded based on the attribute \"KUNSTBAUTE\". Only bridges and road sections without structures were kept. Data preparation differs slightly between the two performed analyses, machine and deep learning. Results for both approaches are shown here below.
Figure 3: Resulting labels (left) from the initial TLM lines (right) in the case of the machine learning. Figure 4: Resulting labels (left) from the initial TLM lines (right) in the case of deep learning.For the machine learning analysis, only the 3m roads were kept (figure 3). For the deep learning analysis, we judged safer to keep all the visible roads (figure 4). Therefore, the neighboring roads were also considered. We made the hypothesis that we would obtain better results by training the model on all the visible roads, rather than on the 3m ones only. Still, the focus on \"3m Strasse\" class was enforced through the selection of raster tiles: only those tiles containing the specific class were used as input data. Road geometries, originally linear, were transformed into polygons by adding a buffer with a flat cap style. This procedure generated unwanted overlapping areas in the neighborhood of the intersection points between contiguous road sections. Such artifacts were handled differently depending on the road types:
Once that the polygons were generated, sections hidden by a forest canopy were excluded. A buffer of 2 m was also added around forests as the canopy was often seen to be extending beyond the forest delimitation as recorded in the swissTLM3D dataset.
We considered adding some information about the altitude of the length of the roads to the labels. Natural and artificial roads share pretty much the same distribution in terms of altitude. For the length, the longest roads all had an artificial surface. However, the experts could not tell us if it was the case for all Switzerland or a coincidence on our AOI. For the deep learning analysis, we tried to improve the overlap between labels and images by taking cadastral data into account. A larger buffer was used on the lines for the TLM. Then, only the parts of the buffer intersecting the road surfaces from cadastral surveying were kept. As described in the deep learning analysis section, we tested the labels straight out of the TLM and the ones augmented by the cadastral surveying. We also tried to merge the labels by width type or by surface type.
After the pre-processing step described here above,
Let us remind that there were many more roads labeled in the second case as we considered all the visible roads. Especially for natural roads, the vast majority did not belong to the class of interest, but rather to the \"1m Weg\" and \"2m Weg\" classes.
"},{"location":"PROJ-ROADSURF/#32-raster-mosaic-generation","title":"3.2. Raster Mosaic Generation","text":"As said in the description of SWISSIMAGE RS, a large overlap between images is present in the dataset. To remove this overlap, a mosaic was created. Instead of merging all the images into one, we decided to set up a XYZ raster tile service, allowing us to work at different resolutions. The first step consists in reprojecting images in the EPSG:3857 projection, compliant with standard tile map services. Then, to save memory and disk space, images were converted from 16 to 8 bits. Besides, normalization was performed to optimize the usage of the available dynamic range. Finally, images were exported to the Cloud-Optimized GeoTIFF (COG) format. COG files can then be loaded by the TiTiler application, an Open Source dynamic tile server application 10. The MosaicJSON specification was used to store image metadata 11. Zoom levels were bound between 17 and 20, corresponding to resolutions between 1.20 m and 0.15 m.
"},{"location":"PROJ-ROADSURF/#4-machine-learning-analysis","title":"4. Machine Learning Analysis","text":""},{"location":"PROJ-ROADSURF/#41-methodology","title":"4.1. Methodology","text":"Before delving into machine learning, we performed some exploratory data analysis, aiming at checking whether already existing features were discriminant enough to tell natural roads from artificial ones. Additional predictive features were also generated, based on
The machine learning analysis was performed only on the two middle tiles of the AOI.
The most promising spectral index we found in the literature is the Artificial Surface Index (ASI) defined by Zhao & Zhu (2022) 12. Unfortunately, the computation of the ASI requires the shortwave infrared (SWIR) band which is not available in the SWISSIMAGE RS data. The SWIR band can be available in satellite imagery (e.g.: Landsat 8, Sentinel 2), yet spatial resolution (20-30 m/px) is not enough for the problem at hand.
Instead, the VgNIR-BI index 13 could be computed in our case, since it combines the green and NIR bands:
\\[\\begin{align} \\ \\mbox{VgNIR-BI} = {\\rho_{green} - \\rho_{NIR} \\over \\rho_{green} + \\rho_{NIR}} \\ \\end{align}\\]where \u03c1 stands for the atmospherically corrected surface reflectance values of the band. In our case, no atmospheric correction was applied, because we dealt with aerial imagery instead of satellite imagery.
Boxplots were generated to visualize the distribution of the aforementioned predictive features. Principal component analysis (PCA) were performed, too. The group of values passed to the PCA were the following: - pixel values: Each pixel displays 11 attributes corresponding to (1) its values on the different bands (R, G, B, NIR), (2) the ratio between bands (G/R, B/R, NIR/R, G/B, G/NIR, B/NIR), and (3) the VgNIR-BI spectral index 13. - summary statistics: Each road has 5 attributes for each band: the mean, the median, the minimum (min), the maximum (max), and the standard deviation (std).
Let us note that:
In order not to make the presentation too cumbersome, here we only show results produced at zoom level 18, on the entire dataset, and considering road sections corresponding to the following criteria:
We can see on the figure 5 that both the median and the upper quartile are systematically higher for natural than for artificial roads across all the bands, meaning the natural roads have brighter parts. Unfortunately, we have that pixel value statistics do not allow a sharp distinction between the two classes, as the lower quartile are very close.
Figure 6: Boxplots of the pixel distribution on the VgNIR-BI index and the ratios between bands. Each graph represents a ratio or the index and each boxplot a surface type. Figure 6bis: Boxplots of the pixel distribution on the ratios between bands. Each graph represents a ratio and each boxplot a surface type.The ratios between bands and the VgNIR-BI present similar values for the artificial and natural roads, allowing no distinction between the classes.
Figure 7: Boxplots of the distribution for the road summary statistics on the blue band. Each graph represents a statistic and each boxplot a type of road surface.Boxplots produced with the summary statistics computed per band and per road section lead to similar conclusions. Natural roads tend to be lighter than artificial ones. However, the difference is not strong enough to affect the lower quartiles and allow a sharp distinction between classes.
Figure 8: PCA of the pixels based on their value on each band. Figure 9: PCA of the roads based on their statistics on the blue band.The figures 8 and 9 present respectively the results of the PCA on the pixel values and on the statistics over road sections. Once more, we have to acknowledge that, unfortunately, artificial and natural roads cannot be separated.
"},{"location":"PROJ-ROADSURF/#43-discussion","title":"4.3. Discussion","text":"Although boxplots reveal that some natural roads can be brighter than artificial roads, statistical indicators overlap in such a way that no sharp distinction between the two classes can be drawn. The PCA confirms such an unfortunate finding.
Those results are not surprising. As a matter of fact, natural roads which are found in the \"3m Strasse\" type are mainly made by gravel or similar materials which, color-wise, make them very similar to artificial roads.
"},{"location":"PROJ-ROADSURF/#5-deep-learning-analysis","title":"5. Deep Learning Analysis","text":""},{"location":"PROJ-ROADSURF/#51-methodology","title":"5.1. Methodology","text":"To perform the detection and classification of roads, the object detector (OD) framework developed by the STDL 14 was used. It is described in details in the dedicated page.
The two central parts of the AOI constitute the training zone, i.e. the zone for the training, validation and test datasets. The two exterior parts constitute the inference-only zone, i.e. for the \"other\" dataset, to test the trained model on an entirely new zone.
To assess the predictions, a script was written, final_metrics.py
instead of using the one directly from the STDL's OD. We decided to take advantage that: 1. Predictions are not exclusive between classes. Every road section was detected several times with predictions of different class overlapping. 2. The delimitation of the roads are already known.
Therefore, rather than choosing one correct prediction, we aggregated the predictions in a natural and an artificial index over each label. Those indices were defined as follows:
\\[\\begin{align} \\ \\mbox{index}_{class} = \\frac{\\sum_{i=1}^{n} (A_{\\%,i} \\cdot \\mbox{score}_{class,i})}{\\sum_{i=1}^{n} A_{\\%,i}} \\ \\end{align}\\]where n is the number of predictions belonging to the class, \\(A_{\\%, i}\\) is the percentage of overlapping area between the label and the prediction, \\(\\mbox{score}_{class,i}\\) is its confidence score.
\\[\\begin{align} \\ \\text{final class} = \\begin{cases} \\mbox{artificial} \\quad \\mbox{ if } \\quad \\mbox{index}_{artificial} \\gt \\mbox{index}_{natural}\\\\ \\mbox{natural} \\quad \\mbox{ if } \\quad \\mbox{index}_{artificial} \\lt \\mbox{index}_{natural} \\\\ \\mbox{undetected} \\quad \\text{ if } \\quad \\mbox{index}_{artificial} = 0 \\; \\text{ and } \\; \\mbox{index}_{natural} = 0 \\\\ \\mbox{undetermined} \\quad \\text{ if } \\quad \\mbox{index}_{artificial} = \\mbox{index}_{natural} \\; \\text{ and }\\; \\mbox{index}_{artificial} \\neq 0\\\\ \\end{cases} \\ \\end{align}\\]The largest index indicates the right class as better predictions are supporting it. Once every road has an attributed class, the result was evaluated in terms of recall, precision and balanced F1 score.
\\[\\begin{align} \\ P_{class} = \\frac{TP_{class}}{TP_{class}+FP_{class}} \\text{ and } P = \\frac{P_{natural} + P_{artificial}}{2} \\ \\end{align}\\] \\[\\begin{align} \\ R_{class} = \\frac{TP_{class}}{TP_{class}+FN_{class}} \\text{ and } R = \\frac{R_{natural} + R_{artificial}}{2} \\ \\end{align}\\] \\[\\begin{align} \\ F1\\text{ }score = \\frac{2PR}{P + R} \\ \\end{align}\\]where
The predictions are not necessarily all taken into account. They are filtered based on their confidence score. Thresholds were tested over the balanced F1 score of the validation dataset.
The current dataset exhibits a very strong class imbalance. Therefore, we decided to use balanced metrics, giving the same weight to both classes. The balanced F1 score was chosen as the determining criterion between the different tested models. As it gives equal weight to both classes, the quality of the classification for the natural road was well taken into consideration. However, we have to keep in mind that we gave great importance to this class compared to its number of individuals.
A great risk exists that the model would be biased toward artificial roads, because of the imbalance between classes. Therefore, we decided on a baseline model (BLM) where all the roads in the training zone are classified as artificial. Its metrics are the following:
Artificial Natural Global Precision 0.97 0 0.49 Recall 1 0 0.5 F1 score 0.98 0 0.49Table 1: Metrics for the BLM with all the roads classified as artificial
The trained models should improve the global F1 score of 0.49 to be considered as an improvement.
Finally, we wanted to know if the artificial and natural index could constitute a confidence score for their respective classes. The reliability diagram has been plotted to visualize the accuracy of the classification at different levels of those indices.
Figure 10: Listing of the various tests carried out.To achieve the best possible results, several input parameters and files for the model training were tested. 1. We tried to improve the quality of the labels by integrating data from cadastral surveying and by merging the roads based on their cover, on their type, or not at all. 2. We trained the model with different zoom level images, from 17 to 20. 3. The influence of different band combinations on the model performance was investigated: true colors (RGB) and false colors (NirRG).
For each test, the best configuration was chosen based on the global balanced F1 score. This method supposes that the best choice for one parameter did not depend on the others.
"},{"location":"PROJ-ROADSURF/#52-results","title":"5.2. Results","text":"When testing different procedures to create the labels, using only the TLM and excluding the data from the cadastral survey gave the best metrics. Besides, cutting the label corresponding to the road sections and not merging them by road type or surface gave better metrics. Increasing the zoom level improved the balanced F1 score. Using the bands RGB and RG with NIR gave very similar results and an equal F1 score. Therefore, the best model is based on labels deduced from the TLM and using the RGB bands at a zoom level 20.
Artificial Natural Global Precision 0.99 0.74 0.87 Recall 0.97 0.74 0.86 F1 score (1) 0.98 0.74 0.86 F1 score for the BLM (2) 0.98 0 0.49 Improvement: (1)-(2) 0 0.74 0.32Table 2: metrics for the best model over the training, test and validation area.
The F1 score for the natural roads and the global one outperformed the BLM. The per-class F1 scores has been judged as satisfying by the beneficiaries.
Artificial Natural Global Precision 0.98 0.22 0.60 Recall 0.95 0.26 0.61 F1 score (1) 0.96 0.24 0.60 F1 score for the BLM (2) 0.98 0 0.49 Improvement: (1)-(2) -0.02 0.24 0.11Table 3: metrics for the best model over the inference-only area.
Those metrics are worse than the ones obtained over the training area. The global F1 score is still higher than for the BLM. However, the natural F1 score is not high enough.
Figure 11: Absolute and relative repartition of the roads in the inference-only zone.93.2% of the roads are correctly classified, 4.2% are in the wrong class and 2.6% are undetected or undetermined. Nearly half of the natural roads are either undetected or in the wrong class, but as they represent a tiny proportion of the dataset, they impact little the accuracy. In the training zone, only 2% of the roads are in the wrong class and 1.7% are undetected or undetermined.
Figure 12: Reliability curves for the training and the inference-only zone.The artificial index can be used as confidence score for the artificial roads. The natural index can be used as confidence score for the natural ones. Indeed, the accuracy of the results for each class increases with their value.
"},{"location":"PROJ-ROADSURF/#53-discussion","title":"5.3. Discussion","text":"The F1 score obtained is 0.86 over the area to train and validate the model and 0.60 over the rest of the AOI. The difference is essentially due to the decrease in the F1 score of the natural roads, passing from 0.74 to 0.24. The first intuition is that we were facing a case of overfitting. However, the validation loss was controlled in order to stop the training on time and avoid this problem. Another possibility would be that the two zones differ significantly and that a model trained on one cannot apply on the other. Hence, we also split the tiles randomly between the training and the inference-only zone. The gap between the balanced F1 score of the training and inference-only zone passed from 0.25 to 0.19 with the same hyper-parameters.
The high recall for artificial roads indicates that the model properly detects them. However, once the artificial recall is high, the high artificial precision is in this case necessarily due. As the roads have a known location, the false positives not due to a class confusion are eliminated from our assessment. Then, only the roads classified in the wrong class can affect precision. As there are not a lot of natural roads, even if they were all wrongly classified as artificial like in the BLM, the precision would still remain well at 0.97. In the current case, the precision of the trained model is 0.01 higher than the one of the BLM. The drop in the natural F1 score is due to all the roads predicted in the wrong class. As they are only a few natural roads, errors of the model affect them more heavily. The part of the misclassified road increased by 44% between the training and the inference-only zone. Meanwhile, the part of undetermined roads only increased by 1%
The F1 score could maybe be further improved by focusing more strictly on the 3m roads. Indeed, we considered it would be safer to teach the algorithm to differentiate only between surfaces and not between road types, which are defined by width. Therefore, the tiles were selected because they intersected 3m roads, but then all the roads on the tiles were transformed into labels. Because of the rarity of 3m natural roads, most of the natural roads seen by the algorithm are 2m roads and those often have a surface with grass, where the 3m natural roads have a surface made only of gravel or dirt. Over the training zone, 110 natural roads are 3m ones and 1183 ones are 2 m and 1 m paths. Maybe, labelling only the 3m roads would give better results than labelling all the visible roads. We did not tune the hyperparameter used by the deep learning model once we found a satisfying enough combination. In addition, as the algorithm is based on detectron2, not everything can easily be tuned. Using an entirely new framework and tuning the loss weights would allow better handling the class imbalance. A new framework could also allow integrating an attention mask and take advantage of the known road location like recommended by Epel (2018)15. Using a new framework could also allow to use images with 4 bands and integrating the NIR. However, we decided here to first try the tools we already had in our team.
We can say that there is a bias in the model encouraging it to predict artificial roads. However, it is still better than the BLM. Therefore, this model is adapted for its purpose.
"},{"location":"PROJ-ROADSURF/#531-elements-specific-to-the-application-on-the-swisstlm3d-product","title":"5.3.1. Elements specific to the application on the SwissTLM3D product","text":"All these findings seem negative, which is why it is appropriate to recall the significant imbalance between the classes. If we look at the percentages, 93.2% of the dataset is correctly classified over the inference-only zone. This could represent a significant gain of time compared to an operator who would do the classification manually. Indeed, once the model trained, the procedure documented here only needs 20 minutes to classify the roads of the AOI. Besides, the artificial and natural indices allow us to find most of the misclassified roads and limit the time needed for a visual verification. In addition, the information of the road surface type is already available for the whole Switzerland. When using the algorithm to update the swissTLM3D dataset, it would be possible to perform change detection between the previous and new surface type. Then, those changes could be visually verified.
"},{"location":"PROJ-ROADSURF/#6-conclusion","title":"6. Conclusion","text":"Keeping the swissTLM3D dataset up to date is a time consuming and methodical task. This project aimed at finding a method to automatize the determination of the road surface type (artificial vs. natural). We focused on roads belonging to the \"3m Strasse\" class and discovered that statistics stemming from pixel values are not enough discriminating to tell artificial roads from natural ones. Therefore, we decided not to attempt any supervised classification based on machine learning. Instead, deep learning methods are performed. With 93% of the roads classified correctly, this method gave better results in regard to the global F1 score than a baseline model classifying all the roads as artificial. However, the model classifies 4.2% of the roads in the wrong class and has difficulties performing new zones. To ensure the quality of the swissTLM3D product, we advise to first perform a classification with the algorithm, then to check roads with a low class index or a change in surface type compared to the previous version years. It could represent a huge time saver for the operators who currently classify and check a second time all the roads.
Despite our investigations, we could not find the cause of the gap between the metrics for the training and the inference-only zone. Further investigation is needed. The next step for this project would be to extend the algorithm to paths of 1 to 2 m wide. The natural roads of 3 m are mostly made of gravel, which strongly resembles asphalt, while natural paths are mostly made of dirt and can grow grass. Therefore, when mixing the two road width classes in one model, the natural roads of 3 m could be too difficult to distinguish from artificial roads and end up neglected.
"},{"location":"PROJ-ROADSURF/#7-references","title":"7. References","text":"Office f\u00e9d\u00e9ral de la statistique. Longueur des routes en 2020 | Office f\u00e9d\u00e9ral de la statistique. https://www.bfs.admin.ch/news/fr/2020-0273, November 2020.\u00a0\u21a9
swisstopo. swissTLM3D. https://www.swisstopo.admin.ch/de/geodata/landscape/tlm3d.html.\u00a0\u21a9\u21a9
Lushan Cheng, Xu Zhang, and Jie Shen. Road surface condition classification using deep learning. Journal of Visual Communication and Image Representation, 64:102638, October 2019. doi:10.1016/j.jvcir.2019.102638.\u00a0\u21a9
Susi Marianingsih, Fitri Utaminingrum, and Fitra Abdurrachman Bachtiar. Road Surface Types Classification Using Combination of K-Nearest Neighbor and Na\u00efve Bayes Based on GLCM. International Journal of Advanced Software Computer Application, 11(2):15\u201327, 2019.\u00a0\u21a9
Marcus Nolte, Nikita Kister, and Markus Maurer. Assessment of Deep Convolutional Neural Networks for Road Surface Classification. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC), 381\u2013386. Maui, HI, November 2018. IEEE. doi:10.1109/ITSC.2018.8569396.\u00a0\u21a9
Viktor Slavkovikj, Steven Verstockt, Wesley De Neve, Sofie Van Hoecke, and Rik Van De Walle. Image-Based Road Type Classification. In 2014 22nd International Conference on Pattern Recognition, 2359\u20132364. Stockholm, August 2014. IEEE. doi:10.1109/ICPR.2014.409.\u00a0\u21a9
Mohammad Mansourmoghaddam, Hamid Reza Ghafarian Malamiri, Fahime Arabi Aliabad, Mehdi Fallah Tafti, Mohamadreza Haghani, and Saeed Shojaei. The Separation of the Unpaved Roads and Prioritization of Paving These Roads Using UAV Images. Air, Soil and Water Research, 15:117862212210862, January 2022. doi:10.1177/11786221221086285.\u00a0\u21a9
Hailing Zhou, Hui Kong, Lei Wei, Douglas Creighton, and Saeid Nahavandi. On Detecting Road Regions in a Single UAV Image. IEEE Transactions on Intelligent Transportation Systems, 18(7):1713\u20131722, July 2017. doi:10.1109/TITS.2016.2622280.\u00a0\u21a9
swisstopo. SWISSIMAGE RS. https://www.swisstopo.admin.ch/fr/geodata/images/ortho/swissimage-rs.html.\u00a0\u21a9
TiTiler. https://developmentseed.org/titiler/.\u00a0\u21a9
Vincent Sarago, Sean Harkins, and Drew Bollinger. Developmentseed / mosaicjson-spec. https://github.com/developmentseed/mosaicjson-spec, 2021.\u00a0\u21a9
Yongquan Zhao and Zhe Zhu. ASI: An artificial surface Index for Landsat 8 imagery. International Journal of Applied Earth Observation and Geoinformation, 107:102703, March 2022. doi:10.1016/j.jag.2022.102703.\u00a0\u21a9
Ronald C. Estoque and Yuji Murayama. Classification and change detection of built-up lands from Landsat-7 ETM+ and Landsat-8 OLI/TIRS imageries: A comparative assessment of various spectral indices. Ecological Indicators, 56:205\u2013217, September 2015. doi:10.1016/j.ecolind.2015.03.037.\u00a0\u21a9\u21a9
Swiss Territorial Data Lab. Object detector. February 2023. URL: https://github.com/swiss-territorial-data-lab/object-detector.\u00a0\u21a9
Sagi Eppel. Classifying a specific image region using convolutional nets with an ROI mask as input. December 2018. arXiv:1812.00291.\u00a0\u21a9
Cl\u00e9mence Herny (Exolabs) - Gwena\u00eblle Salamin (Exolabs) - Alessandro Cerioni (\u00c9tat de Gen\u00e8ve) - Roxane Pott (swisstopo)
Proposed by the Canton of Geneva - PROJ-ROOFTOPS Mars 2023 to January 2024 - Published in May 2024
Abstract: Free roof surfaces offer great potential for the installation of new infrastructure such as solar panels and vegetated rooftops, which are essential for adapting cities to climate change. The arrangement of objects on rooftops can be complex and dynamic. Inventories of existing roof objects are often scarce, incomplete and difficult to update, making it difficult to assess their potential. In this project, in collaboration with the Canton of Geneva, we have developed and tested three methods to automatically identify occupied and free surfaces on roofs: (1) classification of roof plane occupancy based on a random forest, (2) segmentation of objects in LiDAR point clouds based on a clustering and (3) segmentation of objects in aerial imagery based on a deep learning. The results are vector layers containing information about surface occupancy. True orthophotos and LiDAR data acquired over the canton of Geneva in 2019 were used. The methods were developed using a subset of 122 buildings selected to be representative of a diversity of objects and roofs, and on which the ground truth objects were manually vectorized. The developed methods achieved satisfactory performance. About 85% of the roof planes were correctly classified. The segmentation method was able to detect most of the objects with f1 scores of 0.78 and 0.75 for the LiDAR-based segmentation and the image-based segmentation respectively. The global shape of the occupied surface was more difficult to reproduce with a median intersection over the union of 0.35 and 0.37 respectively. The results of all three methods were considered satisfactory by the experts, with 70% to 95% of the results considered acceptable. Considering the quality of the results and the computational time, only the classification method was selected for an application at the cantonal level.
"},{"location":"PROJ-ROOFTOPS/#1-introduction","title":"1. Introduction","text":"To address the challenges of the climate crisis and the ecological transition, local authorities need to adapt their land use policies. One possible measure is to use the surface available on rooftops to install new infrastructure while minimizing the impact on land use. For instance, solar panels can be installed on rooftops to produce local energy with a minimal impact on the landscape1. Rooftops can also accommodate vegetated areas, promoting biodiversity in cities and mitigating the heat island effect2. Accurate knowledge of available rooftop surface and an inventory of the existing infrastructure, such as solar panels and vegetated rooftops, are required to plan and prioritize future investments. Ignoring rooftop objects could firstly lead to overestimating the potential for new infrastructure, such as the solar potential1, and secondly, slowing down the process of new installations. Unfortunately, information on this topic is often scarce and difficult to keep up to date, especially in big cities, limiting our understanding of the current situation. This can be explained by the number and diversity of roofs and roof objects. In addition, the rooftop landscape is dynamic and requires regular monitoring.
With increasing urbanization and the need for sustainable cities, there is a growing interest in knowing the potential of rooftops. The availability of high-resolution satellite and aerial imagery, as well as LiDAR data, along with the development of advanced numerical methods, has yielded to the multiplication of studies. The crowdsourcing approach3 makes it possible to vectorize objects on a large scale, but requires a large workforce and can suffer from a lack of homogeneity. Computer vision-based solutions show promising results for segmenting objects of interest. A deterministic approach based on pixel analysis and a 3D building model, developed by Narjabadifam et al. (2022)4, was able to detect suitable areas for installing solar panels, taking into account large roof objects (e.g. ventilation). The watershed method is commonly used for image segmentation. It can detect small objects (e.g. roof windows) in high-resolution images but involves a complex workflow to achieve satisfactory results5. Deep learning (DL) methods are used to train detection models for objects of interest such as solar panels6, vegetated roofs5, superstructures on roofs7 or available roof area89 with variable performances depending on studies and studied objects. The main difficulty in training DL models is the availability of a qualitative dataset of labels7 as the production of such dataset is a time-consuming task. LiDAR data is often used to assess the solar potential of rooftops by segmenting their main planes1011. Continuous improvements in point density make it possible today to retrieve the detailed morphology of the roof, including superstructures (e.g. dormers) but also smaller objects, such as chimneys. Therefore, segmentation of objects protruding from flat roof planes provides valuable information about the area available on rooftops.
In this context, the State of Geneva, through the Cantonal Office for Energy (OCEN) and the Cantonal Office for Agriculture and Nature (OCAN), contacted the STDL to explore possibilities of improving knowledge of rooftops. Both offices have developed methods for producing vector layers for solar panels and vegetated rooftops, respectively, but neither provided a satisfactory level of automation, accuracy, or completeness. Besides, information on other objects present on the rooftops, like air conditioners, pipes or windows, is incomplete. Therefore, both offices expressed the need to further automate the detection of available roof surfaces to assess the potential, define realistic objectives and strategies to achieve them, and prioritize investments. The objective for the STDL was to produce a binary vector layer of the available and occupied surfaces on roofs in the canton of Geneva. In this report, we first describe the data used, including high-resolution aerial imagery, 3D LiDAR point clouds and available vector layers of rooftops. We then present the methods and results of three approaches developed to evaluate available rooftop surface, namely, (1) LiDAR-based classification of roof occupancy, (2) LiDAR-based object segmentation, and (3) image-based object segmentation. Next, we discuss the possibility of combining the results of the different methods to improve the results. Finally, we provide conclusions on the ability of the developed methods to address the problem and on the most appropriate solution.
"},{"location":"PROJ-ROOFTOPS/#2-input-data","title":"2. Input data","text":""},{"location":"PROJ-ROOFTOPS/#21-lidar-point-cloud","title":"2.1 LiDAR point cloud","text":"The LiDAR point cloud was acquired in March 2019 by the State of Geneva. It has a density of 25 pts/m2, an altimetric accuracy of +/- 10 cm and a planimetric accuracy of 20 cm. It is distributed in georeferenced tiles of 500 m each. The point cloud is classified into 11 classes, including a \"building\" class. This class includes the whole building without distinction for the facades, rooftop or roof superstructures. Within the framework of the classification of the roof plane occupancy, the presence of the class \"building\" was evaluated, as explained in Section 4.1.2. To avoid the influence of classification errors, points from all classes were considered in the LiDAR segmentation.
"},{"location":"PROJ-ROOFTOPS/#22-true-orthophotos","title":"2.2 True orthophotos","text":"The RGB aerial imagery was acquired in May 2019 by the State of Geneva with a ground sampling distance of 5 cm. A true orthophoto was derived based on a photomesh. It has a ground sampling distance of 6.8 cm. The product, available on request, is served as RGB GeoTIFF images with a size of 500 m. True orthophotos are more complicated to obtain than orthophotos, and thus rarer. Their use was motivated by the fact that orthorectification aligns the roofs and bases of buildings. As a result, the objects detected on true orthophotos have the true position, allowing us to compare our results with those obtained with LiDAR data.
"},{"location":"PROJ-ROOFTOPS/#23-delimitation-of-the-roofs","title":"2.3 Delimitation of the roofs","text":"Information on building roofs is provided by the roof vector layer produced by the State of Geneva. It includes the main roof planes and some superstructure elements, defined by their area between 1 m2 and 9 m2. Each roof has been assigned the following attributes:
The vector layer is regularly updated to reflect of the destruction and construction of buildings. The version used for this project was downloaded in March 2023.
"},{"location":"PROJ-ROOFTOPS/#24-ground-truth","title":"2.4 Ground truth","text":"In the Canton of Geneva, several vector layers exist for roof objects (see the SITG catalog) but are incomplete for the purposes of our project. Consequently, it was decided to produce a precise ground truth (GT) dedicated to the project instead of using existing layers. It consists of a vector layer segmenting all the visible objects on the roofs, the objects partially covering the roofs, such as trees, as well as the delimitation of free surfaces. This work was performed manually on the 2019 true orthophotos. A single GT was produced for both the LiDAR and the true orthophoto datasets as they are aligned and synchronized in time. All vectorized objects were assigned to a class listed in Figure 1.
Figure 1: Number of objects per class of the ground truth for the training and test datasets.The GT is a list of 122 buildings chosen to be representative of the diversity (villas, industrial buildings, old town...). Of these, 105 were used to develop and optimize the workflows, i.e. as a training dataset, and 17 were used to check the stability of metrics, i.e. as a test dataset. The labeled objects in the GT, occupying surfaces on the selected roofs, represent about 50% of the total surface in both training and test datasets (Table 1).
Dataset Number of buildings Occupied area (m2) Free area (m2) Training subset 25 3,087 14,147 Training 105 57,303 60,526 Test 17 6,214 7,415 Table 1: Occupied and free surface areas for the different ground truth datasets. The training subset is specific to the image segmentation workflow (see Section 6.1.4).
Buildings were classified by occupation, hereinafter referred to as the building type, and roof typology, hereinafter referred to as the roof type, to evaluate the impact of these parameters on the results. The following building types were selected:
The following roof types were selected:
Note that all administrative and industrial roofs have a flat roof and all the pitched roofs are residential. The GT was used to optimize and assess the different workflows. No custom training was done for this project.
"},{"location":"PROJ-ROOFTOPS/#3-evaluation-of-the-results","title":"3. Evaluation of the results","text":""},{"location":"PROJ-ROOFTOPS/#31-metrics","title":"3.1 Metrics","text":"The performances of the developed methods were evaluated by computing the number of GT labels detected, namely, the precision P and the ability of the algorithm to be exhaustive with its detections, namely the recall R. The two were combined to obtain the f1 score. The respective formulas are presented below:
with:
The main challenge in calculating these metrics is related to the count of TP. Indeed, roof objects can have complex shapes, such as pipes, aeration outlets and solar panels, which can be vectorized in many ways, all of them possibly correct (Fig. 2). It can be difficult to reproduce the labels with detections, especially from one algorithm to another. Several detections may well cover one label, just as one detection may well cover several labels and be equally correct for all.
Figure 2: Illustration of different approaches to object vectorization and possible segmentation results. Solar panels can be vectorized as a group (left), as lines (middle) or as individual panels (right).To account for this aspect, a connected-component method was adopted. Graphs of overlapping detections and labels were generated as illustrated in Figure 3. A detection was considered to overlap a label when more than 10% of the detection surface was covered. All the elements in a connected graph were tagged as TP. Detections within the group were merged and the assigned TP value is equal to the number of labels within the connected graph. The labels and detections that were not part of any connected graph were assigned FN and FP labels respectively.
Figure 3: Labels (a) and detected obstacles (b) for the EGID 1005001, the corresponding graphs for the numbered elements on the balconies (c) and the resulting merged tagged detections (d).In addition to object detection, the ability to reproduce the shape of the occupied surface was evaluated. The main objective of the project is to recover the delimitation of free and occupied surfaces. Because of the difficulty of pairing detections and labels and the fact that it is not necessary to know the delimitation of objects inside an occupied surface, we calculated the intersection over union (IoU) of the detections and the labels at the roof scale:
\\[\\begin{align} \\ \\mbox{IoU} = {A_{detections \\cap labels} \\over A_{detections \\cup labels}} \\ \\end{align}\\]with:
The median IoU (mIoU) of all the roof provide the evaluation metric for the dataset considered.
The optimal value for the selected metrics, i.e. f1 score and mIoU, is 1.
"},{"location":"PROJ-ROOFTOPS/#32-hyperparameter-optimization","title":"3.2 Hyperparameter optimization","text":"The algorithms used and developed in this project involve numerous hyperparameters. We adopted the Optuna framework to automate the search for the value of each hyperparameter giving the best results. The optimization was performed for the LiDAR segmentation and the image segmentation workflows. Although the values to be optimized are different, the strategy is similar.
We sought to maximize the f1 score and the mIoU. The search for the best hyperparameter value was performed using the Tree-structured Parzen Estimator12 (TPE) algorithm. At each iteration, the workflow was executed from segmentation to assessment. At the end of the process, the best hyperparameter combinations optimizing the metrics were provided. In addition, the relative importance of precision compared to recall can be tuned by adding one of these metrics to the list of value to be optimized.
The hyperparameters obtained for the whole training dataset are referred to as \"global\". Specific optimization can be performed given the building type or the roof type to take into account of specific features. In this case, the obtained hyperparameters are referred to as \"specialized\".
"},{"location":"PROJ-ROOFTOPS/#33-evaluating-the-relevance-of-the-detections","title":"3.3 Evaluating the relevance of the detections","text":"In addition to the selected metrics, the results were analyzed in terms of object characteristics relevant to the project objective, i.e. providing indications of potential surface available for the installation of new facilities such as solar panels and vegetated rooftops. The experts expect to get an estimate of the free surface available to estimate the potential. Therefore, the occupied and free areas obtained with the different methods were computed and compared to the GT to evaluate the accuracy. In addition, the continuity of the roof surface is an important parameter to consider when installing facilities. It depends on the size of the objects and their position on the roof. A large object or an object located in the middle of a roof can constitute an obstacle. To evaluate the models' ability to detect such objects, the surface area of the object and the position of its centroid relative to the roof edge were computed, and the metrics were analyzed accordingly.
"},{"location":"PROJ-ROOFTOPS/#4-classification-of-roof-plane-occupancy","title":"4. Classification of roof plane occupancy","text":"A first method was developed to identify potentially free and occupied surfaces on rooftops. It consists of using statistics derived from LiDAR data as an indicator of occupancy. We assumed that some LiDAR properties can capture the presence of objects on the target roofs. For instance, changes in intensity could be caused by the LiDAR hitting different objects. In addition, a surface covered with objects is likely to be rougher than a flat, free surface. Zonal statistics on these two parameters, intensity and roughness, were used in addition to the LiDAR classification and roof plane area to classify roof planes into three classes:
First, the intensity values of the LiDAR points classified as building were interpolated with inverse distance weighting and converted to raster. Second, a DEM was computed from the LiDAR point cloud and roughness was derived and saved as raster. The Python library WhiteboxTools was used for this processing. The roughness was calculated at a scale of 1 m, which was the smallest possible scale. The produced rasters of intensity and roughness have a resolution of 0.3 m/px.
Zonal statistics of intensity and roughness were computed for each roof plane and used to classify them with manual thresholds and a random forest (RF), as described in the next two sections. If a roof plane extended over several tiles, then the result was kept for the tile with the largest overlap.
The initial processing was performed for all the roofs of the 45 LiDAR tiles containing GT and eight test tiles selected in the city center, representing a total of 95,699 roof planes. It took around 30 minutes to create the rasters and get the zonal statistics for the LiDAR tiles, while the classification took less than a minute with 32 GB of RAM and a i7-1260P CPU.
"},{"location":"PROJ-ROOFTOPS/#412-classification-with-manual-thresholds","title":"4.1.2 Classification with manual thresholds","text":"Roof planes smaller than 2 m2 were classified as \"occupied\", because they are too small for solar or vegetated installations. In addition, roof planes for which the LiDAR point cloud was classified as \"building\" for less than 25% of the area were classified as \"undefined\". To classify the remaining roof planes, thresholds were set on the statistical values presented in Table 2. They were selected to reflect the variations in intensity and roughness induced by the presence of objects on the roof, as well as the presence of non-building classes in the LiDAR point cloud.
Variable Threshold Margin of error of intensity 400 Standard deviation of intensity 5500 Median roughness (m) 7.5 Overlap with interpolated pixels not classified as building (%) 25
Table 2: Variables considered to classify the roof planes and the thresholds at which they are classified as occupied.
A roof was classified as occupied if it exceeds the threshold for at least one statistical value. The thresholds were set through trials and errors until we came to a satisfying result. The resulting classification was reviewed by the experts for 650 roof planes. A satisfaction rate was calculated (Section 4.2.2). Further tests were performed to improve them by adjusting the thresholds, but no better combination could be found.
"},{"location":"PROJ-ROOFTOPS/#413-classification-with-random-forest","title":"4.1.3 Classification with random forest","text":"To avoid classification based on arbitrary thresholds, RF was used with zonal statistics (Tables A1 and A2). The manual threshold classification, reviewed by the experts, was used as GT to train two RFs, one for each office. The roof planes of the class \"undefined\" and the ones smaller than 2 m2 were ignored. The number of roof planes used in the training of each RF is presented in Table 3.
Office Potentially free Occupied OCAN 258 324 OCEN 301 297
Table 3: Correct classification of the roof reviewed by the experts and used as the ground truth for the random forest.
The GT was split with 80% of the roof planes for training and 20% for testing. Satisfaction rates were calculated on the test dataset to evaluate the performance.
Only roofs that could be used for potential solar and vegetated installations were classified by the RF. We excluded the roof planes smaller than 2 m2 that are automatically classified as \"occupied\", and the roof planes classified as \"undefined\".
"},{"location":"PROJ-ROOFTOPS/#42-results","title":"4.2 Results","text":""},{"location":"PROJ-ROOFTOPS/#421-classification","title":"4.2.1 Classification","text":"Examples of roof plane classification obtained with the manual thresholds and the RF models are shown in Figure 4. The results for the OCAN are closer to the results obtained with the manual thresholds than the ones for the OCEN. In addition, let us note that the RF for the OCAN is classifying more roof planes as \"occupied\" than the RF for the OCEN.
Figure 4: Results of the manual thresholds, the random forest for the classification of occupancy for the OCAN and the OCEN.The visualization of the results shows that not only roofs with obstacles are classified as \"occupied\", but also some of small or narrow empty roof planes, because they display high median roughness and/or high minimum roughness. The roof planes classified as \"undefined\" can often be considered as occupied due to the presence of vegetation or walkways. The corresponding areas in the LiDAR point cloud is mostly classified as ground and vegetation.
"},{"location":"PROJ-ROOFTOPS/#422-expert-assessment","title":"4.2.2 Expert assessment","text":"OCEN and OCAN experts are generally satisfied with the classification based on the manual thresholds (Table 4), with global satisfaction rates ranging from 83% to 89%.
Office Global Occupied Potentially free Undefined OCAN 89% 86% 93% 66% OCEN 83% 84% 81% - Table 4: Satisfaction rates of the OCAN and OCEN experts with the classification of 650 roof planes using manual thresholds. Global satisfaction rates were computed only for planes classified as \"occupied\" and \"potentially free\". The review of planes classified as \"undefined\" is not available for OCEN.
Satisfaction rates for the \"occupied\" roof planes are similar for both offices, while that for \"potentially free\" roof planes is 12 points higher for OCAN, reaching an excellent score of 93%. For the OCEN expert, small roof planes are more easily considered as occupied than for the OCAN expert. The OCAN expert approved the \"undefined\" class in 66% of the cases, while this class was not reviewed by the OCEN expert.
Manual threshold classification Global Occupied Potentially free OCAN 79% 70% 91% OCEN 77% 72% 82%
RF classification
Global Occupied Potentially free OCAN 86% 78% 96% OCEN 83% 74% 91%Table 5: Satisfaction rates of the OCAN and OCEN experts on the test dataset for the two classification methods.
The satisfaction rates obtained with the manual thresholds and the RF on the test dataset are presented in Table 5. The classification with RF outperforms the manual thresholds, with satisfaction rates increasing by 7 and 6 points for OCAN and OCEN respectively. Note that the satisfaction rates are improved by between 2 and 9 points, for the \"occupied\" and \"potentially free\" classes.
"},{"location":"PROJ-ROOFTOPS/#423-variable-importance","title":"4.2.3 Variable importance","text":"The influence of the variables considered in the two RF models can be identified by their relative importance (Tables A1 and A2 in Appendix A) provided by the algorithm.
The models are consistent, with four common variables, namely the margin of error (MOE) of intensity, the median roughness, the mean roughness and the minimum roughness, in the top 5 most influential variables with an importance higher than 7%. Note that the ranking differs. In particular, the median roughness is showing the greatest divergence, with a difference of 11 points between the two models. It plays the most important role in the OCAN's RF (19.3%) while its role is limited in the OCEN's model (8.3%). Difference in importance for the other variables does not exceed 3.2 points. The roof plane area plays a non-negligible role in the OCEN's RF (13.6%), while this is less the case in the OCAN's RF (5.8%). The standard deviation of intensity has a greater influence in the OCAN's RF (7.8%) than in the OCEN's RF (4.6%). The percentage of overlap with non-building pixels with less than 2%, is the least important parameter for both RF.
"},{"location":"PROJ-ROOFTOPS/#43-discussion","title":"4.3 Discussion","text":""},{"location":"PROJ-ROOFTOPS/#431-manual-thresholds-vs-rf","title":"4.3.1 Manual thresholds vs RF","text":"Although both the manual threshold and the RF methods give satisfactory results (Tables 3 and 4), classification with RF is better. This result was expected, as RF is a machine learning algorithm based on 14 variables, whereas the threshold method involves manual adjustments of only 4 variables. The choice of a small number of variables for the manual thresholds was made for simplicity sake. Our choices of selecting the MOE of intensity and the median roughness were pertinent as these variables are among the most influential (Tables A1 and A2). The standard deviation of intensity plays a stronger role for the OCEN's model but its significance remains limited (7.8%). Selecting the percentage of overlap with non-building data appears to not be relevant as this variables comes last in the list of relative importance (< 2%) for both RF models. On the other hand, we missed important variables in the manual thresholds such as the minimum roughness and the mean roughness of roof planes playing a significant role (> 10%) in the RF models. The mean roughness was absent of the manual thresholds as it was considered redundant with the median roughness.
Both methods have their advantages. The manual threshold method is easy to set up and does not require GT, while the RF method is automated, i.e. it does not require an operator to perform manual testing, which can be tedious.
"},{"location":"PROJ-ROOFTOPS/#432-classification-of-small-roof-planes","title":"4.3.2 Classification of small roof planes","text":"When using the manual thresholds, small or narrow roof planes are often classified as \"occupied\", because of their median roughness above the threshold. As the roughness was calculated at a scale of 1 m (Section 4.1.1), objects located up to 1 m away from the pixel will affect its value. As a result, the roughness of small or narrow roof planes is more influenced by their surroundings than the larger ones. This is the case, for example, with empty roof planes receding or protruding from other planes. This interpretation is supported by the fact that the minimum roughness of roof planes is a critical parameter in the RF (Tables A1 and A2). For the considered roughness scale of 1 m, the minimum value strongly depends on the dimensions of the roof plane. A large roof plane can have a low minimum value, because the obstacles on it and its surroundings do not affect the roughness values over the whole plane as can be the case for a small one.
Small unobstructed roof planes could be used for the installation of solar panels or vegetation. However, the more receding or protruding they are, the more difficult it is to install facilities on them. In addition, due to the limited benefits they would represent in comparison to the effort necessary to develop them, they are not a priority in the planning strategy of the Canton of Geneva. Therefore, the fact that the algorithm often classifies small roof planes as occupied suited the experts.
"},{"location":"PROJ-ROOFTOPS/#433-differences-between-random-forests","title":"4.3.3 Differences between random forests","text":"The differences in the results obtained for OCAN and OCEN can be explained by their different requirements (Tables 3, A1 and A2).
From the OCAN's point of view, which aims to develop vegetated rooftops, some surfaces already covered with low vegetation can be considered as \"occupied\". Conversely, the presence of some obstacles on the roof plane may not prevent the installation of vegetated rooftops and can be considered as \"potentially free\". The tolerance of the presence of sparse objects on roof planes could be captured by the median roughness driving the OCAN's RF classification. From the OCEN's point of view, which aims to install solar panels, large continuous area are required for a roof plane to be considered as \"potentially free\". This is consistent with the fact that the surface area of roof planes and the minimum roughness are critical parameters in OCEN's RF.
"},{"location":"PROJ-ROOFTOPS/#434-relevance-of-the-methods","title":"4.3.4 Relevance of the methods","text":"The primary goal of classifying roof planes is to provide a product that assists experts in identifying available surfaces for the installation of future equipment. The surfaces classified as \"potentially free\" need to be examined to assess their actual potential. The surfaces classified as \"occupied\" are assumed to be unusable and should not be taken into account when estimating potential. It is therefore important to obtain robust results for this class. The experts did not specify a minimum satisfaction rate, but were satisfied with the provided results. Thus, it is planned to apply the developed method on a larger scale for use by the experts. The generated vector layers can be used alone or combined with the results of other methods, such as the those presented in Sections 5 and 6.
It should to be recognized that the classification only evaluates the occupancy of a roof plane. Other factors such as roof slope or roof material were ignored. In addition, although the LiDAR intensity was normalized, its value can vary from one acquisition campaign to another, potentially affecting the results of the classification.
"},{"location":"PROJ-ROOFTOPS/#5-lidar-segmentation","title":"5. LiDAR segmentation","text":"The goal of this second method based on LiDAR point cloud is to detect objects on rooftops. It is assumed that each roof plane can be approximated by a flat plane and that obstacles protrude from it. The processing resulted in the production of a vector layer of occupied and free surfaces per building.
"},{"location":"PROJ-ROOFTOPS/#51-method","title":"5.1 Method","text":"The roof plane vectors were merged by EGID to obtain the roof delimitation for each building. Next, the point cloud was clipped according to the roof shape using WhiteboxTools. If the building extended over several LiDAR tiles, the clipped point clouds were merged. Finally, the point clouds were filtered with the minimum altitude of the roof to retain only the roof points.
Roof segmentation was performed per building using Open3D. Each plane in the 3D point cloud was segmented using the RANSAC algorithm. The DBSCAN algorithm was applied to the points of the potential plane to mitigate noise. The cluster with the largest number of points was retained and considered as a roof plane. This process was repeated with the rest of the point cloud for the expected number of roof planes, given by the roof vector layer, as long as enough points remained. Finally, the remaining points were clustered using DBSCAN and considered as obstacles. Despite our endeavors to fix the seed and make the process deterministic, slight variations remained in the output of the RANSAC algorithm. However, the observed impact was only a few hundredths on the final metrics.
The planes and obstacles were transformed from point clusters to concave polygons using the alpha shape algorithm. A minimum, respectively maximum, threshold was set on the projected area of the planes, respectively obstacles. If a polygon had a value that did not meet the threshold of its category, its category was changed.
The results were evaluated using the metrics described in Section 3.1. Note that the GT was adapted for the optimization of this method. Indeed, LiDAR segmentation is unable to detect low objects such as lawns, extensive vegetation and empty terraces and balconies. However, these objects can occupy entire flat roofs, creating a bias in the optimization of the process that would tend to segment entire roof planes as objects. Therefore, the aforementioned objects were excluded from the ground truth when running the optimization. As explained in Section 3.2, the hyperparameters of the RANSAC and the DBSCAN algorithms, as well as the thresholds on area, were optimized for the training dataset and for subsets based on the building type and the roof type. Combinations were tested between results obtained with different sets of hyperparameters, depending on the types.
The resulting detection shapes were unsatisfactory. They were shaky and sometimes had a lot of overlap due to the 3D component of the LiDAR data as visible in Figure 5 (left). To improve the rendering, the polygons were smoothed by buffering and cropping and by applying the Visvalingam-Wyatt algorithm. Though the polygons aspect have been improved by the simplifications, they still present a shaky aspect (Fig. 5, right). Then, the overlapping detection polygons were merged for each EGID and compared to the roof extend to create a partition of the occupied and free surfaces on roofs. This post-processing was performed after the optimization.
Figure 5: Original (left) and simplified (right) polygons obtained with LiDAR segmentation.Finally, the results were submitted to the OCAN and OCEN's experts for assessment.
"},{"location":"PROJ-ROOFTOPS/#52-results","title":"5.2 Results","text":"Figure 6: Example of LiDAR segmentation results.Figure 6 shows the results for seven buildings of the GT.
"},{"location":"PROJ-ROOFTOPS/#521-effect-of-the-optimized-hyperparameters","title":"5.2.1 Effect of the optimized hyperparameters","text":"A f1 score of 0.70 and a mIoU of 0.34 were obtained on the adapted GT with the global hyperparameters. The specialized hyperparameters have different influence according to the building and roof type considered (Table 6). Administrative buildings and pitched roofs show lower f1 scores than those of other categories with values around 0.70. The mIoU is lower than 0.5 for all subsets.
Global hyperparameters
Metric Administrative Industrial Residential Flat Mixed Pitched f1 score 0.41 0.70 0.72 0.73 0.71 0.59 mIoU 0.14 0.50 0.30 0.42 0.45 0.13Specialized hyperparameters
Metric Administrative Industrial Residential Flat Mixed Pitched f1 score 0.63 0.72 0.72 0.67 0.68 0.49 mIoU 0.11 0.49 0.38 0.44 0.23 0.31Table 6: Metrics obtained with the global and specialized hyperparameters on the subsets for each type of building and roof with the adapted GT.
The f1 score obtained for administrative buildings using specialized hyperparameters is improved by about 50%, while the mIoU is reduced by about 20%. The impact of specialized hyperparameters on the segmentation of industrial and residential buildings is rather limited, with a variation of less than 2.5%, except for the mIoU of the residential buildings, which increases by about 25%. The flat roofs do not benefit from specific optimization, with variations in the f1 score and mIoU of less than 8%. On the contrary, the pitched roofs are affected by the use of specialized hyperparameters, with a decrease in the f1 score of about 27% and an increase in mIoU of 150%. In this specific case, the general hyperparameters favor the segmentation of the entire roof as an obstacle, while the specialized hyperparameters improve the distinction between roof planes and obstacles (Fig. 7). The metrics of mixed roofs are a combination of the two previous types, with a f1 score that is little affected by the use of specialized hyperparameters (< 5%) and a mIoU 50% lower.
Figure 7: Comparison of results obtained with global (left) and specialized (right) hyperparameters on buildings with pitched roofs.To take advantage of the best segmentation results, combinations were produced (Table 7).
Metric Global Combined with administrative buildings Combined with pitched roofs f1 score 0.72 0.73 0.70 mIoU 0.37 0.37 0.41
Table 7: Metrics obtained for the global results and for their combination with specialized results on the training dataset with the adapted GT.
The influence of combining the global results with the specialized hyperparameter ones is limited. The metrics vary by less than 4%, except for the mIoU, which improves by 11% when combined with the optimized results for the pitched roofs. The improvement in object segmentation is sufficient to choose to use specialized hyperparameters for the pitched roofs. In addition, this is necessary to ensure results sufficiently discriminating, as visible on Figure 7.
After applying the post-processing procedure to the combined results, the final metrics are a f1 score of 0.77 and a mIoU of 0.42. The metrics are improved by the better coverage of the detected objects thanks to polygons smoothing and merging of the detections as visible on Figure 5.
"},{"location":"PROJ-ROOFTOPS/#522-global-results","title":"5.2.2 Global results","text":"Ground truth Precision Recall f1 score mIoU Relative error (%) adapted GT, training set 0.77 0.77 0.77 0.42 11 whole GT, training set 0.78 0.77 0.78 0.35 38 whole GT, test set 0.75 0.80 0.77 0.38 26
Table 8: Metrics and relative error on the occupied area for the training dataset when using the GT adapted for the LiDAR optimization or the whole GT, as well as for the test set on the whole GT.
The f1 score remains stable when using the whole GT (Table 8), meaning that the extensive vegetation, lawn and terraces that were removed are detected. However, they are not correctly delineated, resulting in a drop of about 17% in the mIoU and an increase in the relative error of the occupied area by a factor of 3.5. The values for the precision and recall are always close. The results obtained with the test dataset are consistent with those obtained with the training dataset (Table 8). The observations made in the following sections about the characteristics of the detections in the training dataset, are also valid for the test dataset.
"},{"location":"PROJ-ROOFTOPS/#523-detection-characteristics","title":"5.2.3 Detection characteristics","text":"Figure 8: Number of TP, FP and FN as a function of object area.Figure 8 shows that labeled objects with an area greater than 1 m2 are well detected, with f1 scores between 0.82 and 0.92. On the other hand, detection of objects with an area lower than 1 m2 is less trustworthy with a majority of FP detections and almost half of the labels tagged as FN.
Figure 9: Number of TP, FP and FN as a function of the distance of the centroid of the object from the roof edge.Figure 9 shows that labeled objects whose centroid is more than 1 m from the roof edge are well detected with a f1 score between 0.80 and 0.85. On the other hand, among the detections whose centroid is less than 1 m from the roof edge, 65% are FP, making them less trustworthy.
Visualizing the detections, we note that FP covering no obstacle, although they do exist, are rare. Most of the FP form a group of small detections delimiting a roof edge, sometimes detecting barriers that have not been vectorized as obstacles in the GT. Therefore, the FP having an area smaller than 1 m2 must often be the same as those with a centroid closer than 1 m from the roof edge.
Object class Recall Antenna 0.24 Pipe 0.59 Lawn 0.70 Other obstacle 0.70 Extensive vegetation 0.72 Window 0.76 Chimney 0.79 Aero 0.83 Solar thermal 0.83 Intensive vegetation 0.88 Solar unknown 0.89 Balcony / terrace 0.90 Solar photovoltaic 0.92
Table 9: Recall for each object class of the ground truth.
The developed method shows good performance in detecting most of the object classes (Table 9). Aeration outlets, balconies and terraces, intensive vegetation, and solar facilities all have recalls greater than 0.80. Antennas are the most difficult class to detect with a recall of 0.24. Next come pipes lawn, other obstacles and extensive vegetation with recall values between 0.59 and 0.72. Windows and chimneys, which are low and thin objects respectively, are detected satisfactorily with a recall of 0.76 and 0.78 respectively.
Although the detection of objects is globally satisfactory, the reproduction of their shape was not assessed. For example, only the upper parts of solar panels are generally detected as shown in Figure 10.
Figure 10: Example of results for the segmentation of solar panels using the segmentation of LiDAR data.The same goes for lawn and extensive vegetation, which are always partially detected (Fig 11).
Figure 11: Roof with a terrace and an area of extensive vegetation (left). Both are detected as TP, but are not covered by the area detected as occupied (right)."},{"location":"PROJ-ROOFTOPS/#524-estimated-area","title":"5.2.4 Estimated area","text":"Administrative Industrial Residential Flat Mixed Pitched Area labeled as occupied 4,986 32,720 19,399 54,875 1,386 844 Area detected as occupied 1,195 20,953 12,986 30,980 3,052 1,102 Total area 6,692 78,011 33,278 108,415 5,018 4,584
Table 10: Occupied area for the labels and the detections, as well as the total roof area in m2.
In total, 35,134 m2 of roofs were detected as occupied while 57,105 m2 were labeled as such (Table 10). This represents an error of 38%. Administrative buildings have the largest error in estimating the occupied area with an error of 76% compared to less than 37% for other building types. The occupied area is underestimated for flat roofs, while it is overestimated for pitched and mixed roofs. The mixed roofs have an error of 120%, i.e. the estimated occupied area is about twice larger than the actual value. Flat and pitched roofs each have an error of 44% and 30% respectively.
"},{"location":"PROJ-ROOFTOPS/#525-expert-assessment","title":"5.2.5 Expert assessment","text":"The experts were at least partially satisfied by more than 69% of the segmented roofs (Table 11).
Evaluation OCAN OCEN Not satisfied 22% 31% Partially satisfied 54% 33% Satisfied 24% 36%
Table 11: Expert's satisfaction with the results produced using segmentation of LiDAR data. OCAN's expert assessed 122 buildings, while OCEN's expert assessed 39 buildings.
The most satisfactory types were the administrative buildings and the flat roofs, while the most unsatisfactory types were the industrial buildings and the mixed roofs. This is in contradiction with the metrics for which the administrative buildings have the lowest mIoU.
"},{"location":"PROJ-ROOFTOPS/#53-discussion","title":"5.3 Discussion","text":""},{"location":"PROJ-ROOFTOPS/#531-global-capability-of-the-method","title":"5.3.1 Global capability of the method","text":"The method proved its ability to detect objects with a f1 score of 0.78 on the whole GT. The primary goal, which was to detect in priority large and roof-centered objects, was satisfactorily achieved. Indeed, objects larger than 1 m2 have a f1 score higher than 0.81 and objects with their centroid at more than 1 m from the roof edge have a f1 score higher than 0.79.
However, the mIoU remains lower than 0.50, indicating that the shapes of the detections are poorly reproduced. It should be noted that this metric seems to be sensitive to small variations in shape, making it a very strict metric. In addition, it should be remembered that the mIoU evaluates the delimitation of the occupied surface at the roof scale, including TP, FP and FN detections.
In addition to the low mIoU, the global estimation of the occupied area is medium with an error of 38%. However, this value is reduced to 11% when the lawn, the empty balconies and most of the extensive vegetation are removed from the ground truth. This highlights that the method is generally good, except for the segmentation of low objects. Indeed, although the aforementioned classes have a recall of 0.70 or higher, we note that their total area is largely underestimated, as visible on Figures 10 and 11. Once those classes removed from the GT, most of the false detections and missed objects have a small area (Fig. 8).
"},{"location":"PROJ-ROOFTOPS/#532-problematic-detections-and-labels","title":"5.3.2 Problematic detections and labels","text":""},{"location":"PROJ-ROOFTOPS/#5321-false-positive-detections","title":"5.3.2.1 False positive detections","text":"The majority of FP detections cover small areas (Fig. 8) and are located near the roof edges (Fig. 9). In many cases, FP detections near to the edge actually detect barriers that are not labeled in the GT. It may be considered that the protruding roof edges should be included in the annotation to improve the precision and better reflect the actual performance of the method.
"},{"location":"PROJ-ROOFTOPS/#5322-by-object-type","title":"5.3.2.2 By object type","text":"Antennas are often missed (Table 9). We think that LiDAR points due to the presence of an antenna were considered as noise during point clustering, because the object is represented by only a few points due to its morphology and the LiDAR density. Improving antenna detection in the LiDAR point cloud may require specific developments or the use of a denser point cloud. Other thin objects, such as small chimneys, are also missed.
As shown in Section 5.2.3, low objects such as windows, extensive vegetation, lawns, and pipes are more difficult to detect, as they do not protrude above roof planes. The recall, between 0.58 and 0.76, is acceptable, but the shape of objects is always only partially detected.
Finally, the method also has trouble detecting objects in the \"other obstacle\" class (Table 9). However, the lack of a precise definition for this category makes it difficult to label and it encompasses several types of objects. In particular, it would be necessary to define with measurable values when a roof part is labeled as a free surface or as \"other obstacle\".
"},{"location":"PROJ-ROOFTOPS/#5323-by-building-type-and-roof-type","title":"5.3.2.3 By building type and roof type","text":"We notice that administrative buildings tend to have low or small objects, such as windows, extensive vegetation, or small chimneys. In addition, there are many FP detections because of the roof edges. The use of specialized hyperparameters does not improve the metrics on the adapted GT used for the optimization (Tables 6 and 7), supporting the difficulty to detect these objects. At the cantonal level, the error on the estimated surface should have a limited impact, since the administrative buildings represent only a small fraction of all the buildings. However, as they belong to the state, they could be prioritized for the installation of facilities on their roof.
The pitched roofs are often segmented into a single obstacle when using the global hyperparameters (Fig. 7). This could be explained by the fact they have a different typology than other roofs. In addition, the training dataset is dominated by flat roofs, with 72 buildings against 29 buildings with pitched roofs. The hyperparameters resulting from the optimization are therefore better suited to the typology of flat roofs, motivating our choice to use specialized hyperparameters for pitched roofs. Pitched roofs can be automatically identified in the Canton of Geneva using the slope available in the roofs and buildings vector layers. If this information is not available, for instance in another city or canton, areas of interest can be defined, as pitched roofs are generally located in residential areas and old towns.
Most of the 21 roofs assigned to the \"mixed\" type have the entirety of their flat or pitched planes segmented as obstacles. Some of them would have benefited from being segmented with the parameters for pitched roofs. The definition of a pitched roof have to be studied further to define more precisely when to use the specialized hyperparameters.
"},{"location":"PROJ-ROOFTOPS/#533-limitation-and-further-developments","title":"5.3.3 Limitation and further developments","text":"The process relies on a roof vector layer. Methods exist to produce this information automatically1013. Their application should be tested in order to extend the project to areas where a roof vector layer does not yet exist. The one for the Canton of Geneva is produced manually to guarantee its quality.
Variations in detection quality were observed from one building to another. In addition, the detection shapes are not intuitive, making them difficult to interpret and less pleasing to the eye. Therefore, despite the method's respectable results, the experts were not interested in taking the algorithm to the production stage.
The visual aspect of the results could be improved by modifying the vectorization function to smooth the polygons directly during their production. Alternatively, more advanced processing could try to take advantage of the fact that obstacles have simple geometries, like a cylinder for straight pipes or a parallelepiped for aeration. by trying to match these shapes to the clustered point cloud, more precise and visually pleasing detections could be produced.
"},{"location":"PROJ-ROOFTOPS/#6-image-segmentation","title":"6. Image segmentation","text":"The third method consists of segmenting all the potential objects present in a given image. The processing resulted in the production of a vector layer of occupied surfaces per building.
"},{"location":"PROJ-ROOFTOPS/#61-method","title":"6.1 Method","text":"Figure 12: Illustration of the different steps in the image segmentation workflow for EGID 1005027. Black polygons correspond to the roof delimitation. (a) Bounding box (blue polygon) used to clip the true orthophotos for a given roof with a 1 m positive buffer. (b) Segmentation masks (colored pixels) obtained by processing the tile with SAM. (c) Vector masks (red polygons) of the detected objects after post-processing. (d) Detection tags assigned to the vectors."},{"location":"PROJ-ROOFTOPS/#611-image-preparation","title":"6.1.1 Image preparation","text":"Similar to the LiDAR segmentation workflow, we adopted a per-roof processing strategy to process the true orthophotos. For each selected building, the roof delimitation was used to derive a bounding box from which the true orthophoto was clipped (Fig. 12(a)). In case the roof was spread over several true orthophotos, the images were first merged. One tile was obtained for each roof considered. The tiles have the same pixel resolution as the true orthophotos, but different sizes according to the roof size.
"},{"location":"PROJ-ROOFTOPS/#612-object-segmentation-and-vectorization","title":"6.1.2 Object segmentation and vectorization","text":"First, potential objects visible in images were segmented using Segment Anything Model14 (SAM) implemented with PyTorch. It aims to be an open-source foundation model for object segmentation in images with strong zero-shot generalization capabilities. Instance segmentation is performed using a vision transformer (ViT-H) model and a mask is produced for each detected object (Fig. 12(b)). For the project, the default pre-trained model (checkpoints: sam_vit_h_4b8939
) was used without any fine-tuning specific to roof objects. Although the object classes are available in the GT dataset, no classification was performed. Second, SAM does not handle georeferenced datasets. To simplify the process of leveraging SAM for geospatial data analysis, we used the Python library segment-geospatial15 (samgeo). Georeferenced tiles are used as input to the algorithm. The coordinate reference system of the image is assigned to the SAM masks and their corresponding polygon vectors (Fig. 12(c)).
Some large buildings, up to 300 m in length, may be encountered. In this case, the number of pixels in the tile can saturate the RAM during image segmentation. To handle this issue, large tiles are split into smaller sub-tiles of 512 px size. Boundary effects are the downside of this method. Sub-tiles are processed individually by SAM. The output masks are then merged to recover the original tile extent, but the joints between the sub-tile masks may not match, causing artifacts in the vector layer (Fig. 13).
Figure 13: The squared orange polygon is an artifact due to the tiling performed ot process large tiles (EGID 1011376). The tagged detections are superimposed on (left) the segmented masks (white: detection, black: background) and (right) the true orthophotos. Grey and black polygons correspond to the building delineation."},{"location":"PROJ-ROOFTOPS/#613-result-filtering","title":"6.1.3 Result filtering","text":"To improve the quality of the results, post-processing tasks were performed. Polygons were discarded based on geometric considerations:
The vector layer of detected objects was clipped with the roof delimitation polygon to ensure that objects did not overlap several roofs (Fig. 12(c)).
Each building was processed independently. The vector layers were finally merged into a single layer.
"},{"location":"PROJ-ROOFTOPS/#614-assessment-and-hyperparameter-optimization","title":"6.1.4 Assessment and hyperparameter optimization","text":"The detections were compared to the GT labels (Fig. 12(d)) and metrics were calculated (Section 3.1) to evaluate the performance of the algorithm and the choice of post-processing parameters.
SAM displays numerous hyperparameters for which values were assigned after running the optimization workflow presented in Section 3.2. Depending on some hyperparameter values and the image size, processing a single image can take several minutes. Since the optimization requires tens of iterations, it was unreasonable to run the process on the entire training dataset. Therefore, we chose to sub-sample the training dataset down to 25 roofs (Table 1), selected to be representative of the entire dataset. Running the optimization process on this subset for 50 iterations took between 1 and 2 days using a 16 GiB GPU machine. Based on several replications of the optimization process, including one performed on 100 trials, four of the most influential hyperparameters14 were identified: (1) the threshold on the stability score of the predicted mask, (2) the stability score offset, (3) the box IoU cutoff used by non-maximal suppression to filter duplicated masks and (4) the prediction threshold. The other SAM hyperparameters have a limited impact on the value ranges explored. However, we noticed that the number of points sampled per side strongly influence the processing duration (Table 12). 64 points per side is a good trade-off between performance and computation time, which guided our final choice to set this value.
Points per side f1 score mIoU Duration (min) 128 0.75 0.40 43 96 0.75 0.44 25 64 0.74 0.41 12 32 0.66 0.40 4
Table 12: Influence of the number of points sampled per image side on the f1 score, mIoU and duration of segmentation using SAM on a 16 GiB GPU machine. Results obtained on the training sub-sampled dataset with the optimized hyperparameter values set in the configuration file and only varying the points per side.
The selected hyperparameter values can be found in the configuration file of the image segmentation workflow.
"},{"location":"PROJ-ROOFTOPS/#62-results","title":"6.2 Results","text":""},{"location":"PROJ-ROOFTOPS/#621-global","title":"6.2.1 Global","text":"Figure 14: Example of a result obtained with the image segmentation workflow. Free surfaces were obtained by subtracting detected objects from the roof boundary (black polygons).The image segmentation method produced vectors of detected objects for each roof considered (Fig. 14). The metrics obtained for the different datasets are presented in Table 13. They were obtained for a set of hyperparameters that balanced the precision and the recall as much as possible. Alternative result, obtained with a set of hyperparameters promoting the recall over the precision was also produced and evaluated but was not preferred by the experts, in particular due to the presence of large FP detections segmenting whole roofs. Overall, similar metric values are obtained for the different datasets, demonstrating the consistency of the method.
Dataset Precision Recall f1 score mIoU Relative error (%) Training subset 0.73 0.78 0.75 0.41 7 Training 0.75 0.82 0.78 0.37 42 Test 0.75 0.71 0.73 0.37 23
Table 13: Metrics and relative errors on the occupied area for the training and the test datasets.
Satisfactory f1 scores, between 0.73 and 0.78, are achieved. We note a slight imbalance, from 4 to 7 points, between the precision and the recall according to the different datasets. The value of the mIoU are modest, ranging between 0.37 and 0.41, with a standard deviation of about 0.20 (later noted as +/- 0.20). High mIoU (>= 0.70) are associated with high f1 score (0.87 +/- 0.11 in average) but the opposite is not true (Fig. 15). When an object is detected, the method usually shows good ability to segment it accurately. However, small discrepencies with GT shapes lower down the IoU value significantly.
Figure 15: Examples of detections with high f1 scores but variables IoU. (left) Roof segmentation (EGID 295060134) with both high f1 score and IoU and (right) roof segmentation (EGID 1023590) with a high f1 score and an averaged IoU.Finally, the surface area occupied by the detected objects in the training and test datasets, which represents about 30% of the total area, is significantly underestimated compared with the 50% of the GT (Table 1). This results in relative errors between 23% and 42%, for the test and training datasets respectively. Note the significant variation in relative errors depending on the dataset considered, in particular, the variability between the training subset and the training datasets. This highlights that errors concerning large objects can drastically increase the relative error on area estimations.
"},{"location":"PROJ-ROOFTOPS/#622-roof-characteristics","title":"6.2.2 Roof characteristics","text":"Metric Administrative Industrial Residential Flat Mixed Pitched f1 score 0.81 0.80 0.77 0.79 0.75 0.74 mIoU 0.23 0.33 0.38 0.30 0.39 0.51
Table 14: Metrics calculated by building and roof type for the training dataset.
Table 14 shows that the model detects objects similarly for the different building and roof types with a f1 score within 5 points for each types. The mIoU depends on the roof characteristics. Industrial and residential buildings have a mIoU of about 0.35, while the value for the administrative buildings is about 35% lower. The mIoU increases from flat, to mixed, to pitched roof over a range of about 20 points.
Figure 16: Comparison of the occupied and free surface areas of the GT labels and the detections according to (top) the building types and (bottom) the roof types for the training dataset.The detected occupied areas are underestimated regardless of the building type (Fig. 16, top). Industrial and residential buildings have a relative error of about 40%, while the administrative buildings have a higher error of 67%. The occupied areas of pitched and mixed roofs are accurately estimated with a relative error of about 10%, while the performance for the flat roof is worse, with an error of 43% (Fig. 16, bottom. Remember that all administrative and industrial roofs have a flat roof.
Considering the similar f1 scores obtained for all the roof properties and the significant amount of time required to run the optimization workflow, no specific optimization was carried out to date.
"},{"location":"PROJ-ROOFTOPS/#623-object-characteristics","title":"6.2.3 Object characteristics","text":""},{"location":"PROJ-ROOFTOPS/#6231-class","title":"6.2.3.1 Class","text":"The image segmentation method detects objects of different classes (Fig. 17) with an average recall of 0.80 +/- 0.11, with the exception of pipes, which performs significantly worse with a recall of 0.27.
Figure 17: Recall for each object class. The results are obtained for the training dataset.Lawns, PV panels and windows are particularly well detected with recall values above 0.93.
"},{"location":"PROJ-ROOFTOPS/#6232-surface-area","title":"6.2.3.2 Surface area","text":"Figure 18 shows that objects with a surface area between 0.5 m2 and 100 m2 are detected with equal performances by the algorithm with a recall of 0.84 +/- 0.02. Smaller and larger objects are more difficult to detect, with 65% and 76% of GT objects detected, respectively.
Figure 18: Number of TP and FN labels, as well as FP detections, depending on the object area (m2). The results are obtained on the training dataset.The proportion of FP detections increases for surface areas of less than 1 m2 leading to an average precision of 0.60 +/- 0.06, while the average precision for larger objects is 0.83 +/- 0.05.
"},{"location":"PROJ-ROOFTOPS/#6233-position-on-the-roof","title":"6.2.3.3 Position on the roof","text":"Objects are well detected, with an average recall of 0.83 +/- 0.04, as long as their centroid is more than 1 m from the roof edge (Fig. 19). For objects closer to the roof edge, the recall is only 0.56.
Figure 19: Number of TP and FN labels, as well as FP detections, depending on the distance of the object centroid to the roof edge (m). The results are obtained for the training dataset.The precision also decreases significantly for objects located near to the roof edge, from an average of 0.77 +/- 0.03 for object centroids more than 1 m away to 0.51 below, due to an increase of FP detections.
"},{"location":"PROJ-ROOFTOPS/#624-expert-assessment","title":"6.2.4 Expert assessment","text":"The experts are at least partially satisfied by over 86% with the image segmentation method (Table 15).
Evaluation OCAN OCEN Not satisfied 6% 14% Partially satisfied 40% 49% Satisfied 54% 37%
Table 15: Expert's satisfaction with the results produced using image segmentation. OCAN's expert assessed 122 buildings, while OCEN's expert assessed 39.
Satisfaction is independent of the building type and the roof type which is consistent with the f1 score (Table 13). Slightly lower satisfaction are attributed to the administrative buildings and the flat roofs, which is consistent with the fact that these types have the lowest mIoU.
The experts are generally satisfied with the shapes of the detection polygons and the consistency of the results from one building to another.
"},{"location":"PROJ-ROOFTOPS/#63-discussion","title":"6.3 Discussion","text":""},{"location":"PROJ-ROOFTOPS/#631-limits-to-object-segmentation","title":"6.3.1 Limits to object segmentation","text":"Although the workflow provides overall satisfactory results, there are inherent limitations when using the SAM algorithm to detect roof objects:
SAM is a pre-trained model showing good zero-shot generalization performance but is not dedicated to detect objects on roofs. The possibility of fine-tuning the model can be considered to improve the performance. It would require additional training with the dataset at our disposal (true orthophotos plus GT annotation).
"},{"location":"PROJ-ROOFTOPS/#632-reproduction-of-object-shape","title":"6.3.2 Reproduction of object shape","text":"We acknowledge that the mIoU has low values on the training and test datasets (Table 13), but note that this metric is strict. It is sensitive to the detection or not of an object as it was computed on all the polygons present on the roof, including TP, FP and FN. Thus, if a large object is not detected or if there is a large FP detection, the mIoU value will be strongly affected. In addition, the IoU metric is also sensitive to discrepancies in polygon shapes with the GT (Fig. 16(b)). While the object delineation may appear satisfactory from visual inspection, the metric may display low value. This aspect is difficult to improve as it dependents on the segmentation model and the GT delineation strategy. Overall, the shape of the object is usually satisfactorily reproduced when detected (Figs. 16 and 21).
The method tends to underestimate the occupied surface area (Fig. 17). Thus, the estimated free surface constitutes an upper limit to assess the potential. The small relative error of 10% obtained on the occupied area for the mixed and pitched roofs can be explained by the fact that they generally correspond to villas with limited roof surfaces and a \"simple\" arrangement of small objects. In comparison, industrial roofs can be large with complex arrangements of objects such as pipes, solar panels or ventilation systems. Therefore, detection errors on villas have usually less impact on the area estimation than detection errors on large industrial buildings.
"},{"location":"PROJ-ROOFTOPS/#633-relevance-of-the-method","title":"6.3.3 Relevance of the method","text":"The results provide strong arguments in favor of the ability of the image segmentation method to correctly detect and segment objects. The fact that the metrics are consistent between the different datasets (Table 13) is encouraging for its applicability to a wider area with a variety of buildings.
The performance of the method is lower for small objects (Fig. 19) and objects close to the roof edge (Fig. 20). However, the accurate detection of these objects is less critical as they interfere less with the continuity of the roof for the potential installation of solar panels and vegetated roofs.
The experts were satisfied with the results and interested in putting the method into production. However, the current processing time, about 12 min for 25 buildings, is an hindrance to extending the method to the whole canton of Geneva, gathering about 80,000 buildings. Parallelizing the algorithm to apply the method to an area of interest should be considered.
Finally, true orthophotos were used in this case. These provide the actual position of an object on a roof. However such product is rare because it is more expensive to produce. Thus, the product may not be available or regularly updated. However, we are confident that this segmentation method can be applied to orthophotos, more regularly acquired. In this case, methods for reprojecting the position of roofs and/or vectors will need to be explored.
"},{"location":"PROJ-ROOFTOPS/#7-combination-of-results","title":"7. Combination of results","text":"The developed methods display different strengths and weaknesses. For instance, LiDAR segmentation has difficulty detecting low and thin objects, which image segmentation does not. Conversely, image segmentation has difficulty with color change segmentation and pipe detection, which LiDAR segmentation does not. Therefore, combining the two results could yield interesting outcomes.
"},{"location":"PROJ-ROOFTOPS/#71-method","title":"7.1 Method","text":"Two combinations of results were tested:
The resulting combined vector layers were then assessed with metrics but not by the experts.
"},{"location":"PROJ-ROOFTOPS/#72-results","title":"7.2 Results","text":"Combination method Precision Recall f1 score mIoU Relative error (%) Concatenation 0.68 0.94 0.79 0.45 8 Spatial join 0.81 0.69 0.75 0.33 48
Table 16: Metrics obtained for the training dataset.
Comparing Tables 8 and 13 with Table 16, we note that the combination results in similar f1 scores, around 0.77. However, precision and recall values are affected differently. The recall increases by more than 10 points with the concatenation method, reaching the excellent value of 0.94. This means that most of the GT objects are detected, including the pipes reaching the satisfactory value of 0.68. On the other hand, the proportion of FPs increases, diminishing the precision by about 8 points. The spatial join discards all the single FP detections, improving the precision by 3 to 6 points. Non-overlapping TPs are discarded as well, reducing the recall value by more than 8 points.
The concatenation has a positive impact, more than 10 points, on the mIoU compared to the spatial join, which provides higher values than the segmentation methods. Finally, the relative error on the occupied surface is significantly reduced to less than 10% by concatenating the results while it increases to about 50% with the spatial join method.
"},{"location":"PROJ-ROOFTOPS/#73-discussion","title":"7.3 Discussion","text":"Combining the results does not improve the f1 score, but allows for modulation of the results, i.e. whether favor precision or recall, depending on the needs (Table 16).
The high recall value obtained with concatenation proves the complementarity of the two methods for detecting different objects. Note that in this case, the final vector layer contains polygons with different aspects. A higher recall value tends to favor the mIoU, since more GT objects are detected, despite the addition of FP. The surface of the detected object is thus improved, but the addition of FPs also contributes to the reduction of the relative error on the occupied surface, which must be carefully analyzed.
Note that the results of the object segmentation can also be combined with occupancy classification to refine the information on the \"potentially free\" roof planes. Finally, although incomplete, the roof and roof superstructure vector layers produced by the State of Geneva contain vectors of some roof objects that can be used additionally to improve the accuracy of the results.
"},{"location":"PROJ-ROOFTOPS/#8-conclusion","title":"8. Conclusion","text":"Detecting objects on rooftops is a key aspect of assessing the potential for installing facilities in cities, such as solar panels and vegetated rooftops. The STDL explored three methods to achieve this objective, based on machine learning and deep learning algorithms and on LiDAR, aerial imagery and vector data. All methods provided satisfactory results. Occupancy classification enabled roof planes to be classified with 85% accuracy. The two segmentation methods reached similar results, with a f1-score of about 0.77, a mIoU of about 0.36 and a relative error on the detected occupied area of 40%. In particular, segmentation methods have made it possible to accurately detect large objects and objects centered on the roof, which are most likely to constitute obstacles to the installation of facilities.
Overall, the beneficiaries were satisfied with all the methods, with at least 70% of buildings having satisfactory detections. Despite similar performance to image segmentation, LiDAR segmentation was considered the least satisfactory due to the appearance of the detection shapes and the varying results between buildings and object classes. Image segmentation gives satisfactory results overall, but at the current stage, the processing time is unrealistic to consider scaling up the method at the cantonal level. Further developments are required to reduce the computational cost. Finally, the classification method reconciles both accurate results and fast processing time. Therefore, it was selected for an application at the cantonal level. A vector layer indicating the presumed occupancy of roof planes will be produced helping the beneficiaries to find and assess areas potentially available for new installations.
Combining the results is an asset to enhance the strengths of the different methods. Combining segmentation results increases either precision or recall, depending on the chosen method, without changing the f1 score. A better recall translated into an enhanced delineation of the occupied area on a roof. Cross-referencing information sources, such as occupation classification and published vector layers, can improve results accuracy and help identify areas of interest.
It should be noted that the results are in line with the STDL's objective to automatically detect occupied and free surfaces on roofs. These results from numerical models are indications that need to be verified by an expert as part of an installation project. Our results do not indicate whether a facility can actually be installed. Additional parameters such as roof material, slope, solar potential, protected buildings, etc., which affect the possibility and prioritization of an installation, are not taken into account and are the responsibility of the beneficiaries.
"},{"location":"PROJ-ROOFTOPS/#code-availability","title":"Code availability","text":"The codes are available on the STDL's GitHub repository: proj-rooftops
"},{"location":"PROJ-ROOFTOPS/#acknowledgements","title":"Acknowledgements","text":"This project was made possible thanks to a tight collaboration between the STDL team and beneficiaries from the offices of the Etat de Gen\u00e8ve. In particular, the STDL team acknowledges the key contributions from Basile Grandjean (OCEN), Benjamin Guinaudeau (OCAN), Alisa Freyre (PanData), Mayeul Gaillet (DIT) and Geraldine Chollet (OCAN). We thank PanData for the production of the ground truth. This project has been funded by Strategie Suisse pour la G\u00e9oinformation.
"},{"location":"PROJ-ROOFTOPS/#appendix","title":"Appendix","text":""},{"location":"PROJ-ROOFTOPS/#a-variable-importance-in-the-random-forests","title":"A. Variable importance in the random forests","text":"Variable Importance for OCAN median roughness 19.3 margin of error of intensity 16.8 mean roughness 15.7 minimum roughness 10.6 standard deviation of intensity 7.8 area 5.8 mean intensity 4.5 median intensity 3.9 minimum altitude 3.5 standard deviation of roughness 3.3 maximum intensity 3.0 maximum roughness 2.5 minimum intensity 2.4 % of overlap with non-building data 1.1Table A1: List of the variables considered in the random forest and their importance in the classification for the OCAN.
Importance for OCEN margin of error of intensity 17.6 minimum roughness 17.4 area 13.5 mean roughness 10.7 median roughness 8.3 standard deviation of roughness 5.5 standard deviation of intensity 4.6 maximum roughness 4.2 maximum intensity 4.1 minimum altitude 3.6 mean intensity 3.1 minimum intensity 3.0 median intensity 2.4 % of overlap with non-building data 2Table A2: List of the variables considered in the random forest and their importance in the classification for the OCEN.
"},{"location":"PROJ-ROOFTOPS/#references","title":"References","text":"Qing Zhong, Jake R. Nelson, Daoqin Tong, and Tony H. Grubesic. A spatial optimization approach to increase the accuracy of rooftop solar energy assessments. Applied Energy, 316:119128, June 2022. URL: https://linkinghub.elsevier.com/retrieve/pii/S0306261922005062 (visited on 2022-05-27), doi:10.1016/j.apenergy.2022.119128.\u00a0\u21a9\u21a9
Junjing Yang, Devi Llamathy Mohan Kumar, Andri Pyrgou, Adrian Chong, Mat Santamouris, Denia Kolokotsa, and Siew Eang Lee. Green and cool roofs\u2019 urban heat island mitigation potential in tropical climate. Solar Energy, 173:597\u2013609, October 2018. URL: https://linkinghub.elsevier.com/retrieve/pii/S0038092X18307667 (visited on 2024-03-21), doi:10.1016/j.solener.2018.08.006.\u00a0\u21a9
Dan Stowell, Jack Kelly, Damien Tanner, Jamie Taylor, Ethan Jones, James Geddes, and Ed Chalstrey. A harmonised, high-coverage, open dataset of solar photovoltaic installations in the UK. Scientific Data, 7(1):394, November 2020. URL: https://www.nature.com/articles/s41597-020-00739-0 (visited on 2024-03-21), doi:10.1038/s41597-020-00739-0.\u00a0\u21a9
Nima Narjabadifam, Mohammed Al-Saffar, Yongquan Zhang, Joseph Nofech, Asdrubal Cheng Cen, Hadia Awad, Michael Versteege, and Mustafa G\u00fcl. Framework for Mapping and Optimizing the Solar Rooftop Potential of Buildings in Urban Systems. Energies, 15(5):1738, February 2022. URL: https://www.mdpi.com/1996-1073/15/5/1738 (visited on 2024-03-21), doi:10.3390/en15051738.\u00a0\u21a9
Youssef El Merabet, Cyril Meurie, Yassine Ruichek, Abderrahmane Sbihi, and Raja Touahni. Building Roof Segmentation from Aerial Images Using a Line and Region-Based Watershed Segmentation Technique. Sensors, 15(2):3172\u20133203, February 2015. URL: http://www.mdpi.com/1424-8220/15/2/3172 (visited on 2023-03-28), doi:10.3390/s150203172.\u00a0\u21a9\u21a9
Jordan M. Malof, Kyle Bradbury, Leslie M. Collins, and Richard G. Newell. Automatic detection of solar photovoltaic arrays in high resolution aerial imagery. Applied Energy, 183:229\u2013240, December 2016. URL: https://linkinghub.elsevier.com/retrieve/pii/S0306261916313009 (visited on 2024-03-21), doi:10.1016/j.apenergy.2016.08.191.\u00a0\u21a9
Sebastian Krapf, Lukas Bogenrieder, Fabian Netzler, Georg Balke, and Markus Lienkamp. RID\u2014Roof Information Dataset for Computer Vision-Based Photovoltaic Potential Assessment. Remote Sensing, 14(10):2299, May 2022. URL: https://www.mdpi.com/2072-4292/14/10/2299 (visited on 2022-05-27), doi:10.3390/rs14102299.\u00a0\u21a9\u21a9
Roberto Castello, Simon Roquette, Martin Esguerra, Adrian Guerra, and Jean-Louis Scartezzini. Deep learning in the built environment: automatic detection of rooftop solar panels using Convolutional Neural Networks. Journal of Physics: Conference Series, 1343(1):012034, November 2019. URL: https://iopscience.iop.org/article/10.1088/1742-6596/1343/1/012034 (visited on 2024-03-21), doi:10.1088/1742-6596/1343/1/012034.\u00a0\u21a9
Alexander Apostolov, August Baum, Ghali Chraibi, and Roberto Castello. Automatic detection of available area for rooftop solar panel installation. Technical Report, EPFL, December 2020. URL: https://www.epfl.ch/labs/mlo/wp-content/uploads/2021/05/crpmlcourse-paper859.pdf.\u00a0\u21a9
Fayez Tarsha Kurdi, Mohammad Awrangjeb, and Alan Wee-Chung Liew. Automated Building Footprint and 3D Building Model Generation from Lidar Point Cloud Data. In 2019 Digital Image Computing: Techniques and Applications (DICTA), 1\u20138. Perth, Australia, December 2019. IEEE. URL: https://ieeexplore.ieee.org/document/8946008/ (visited on 2024-03-21), doi:10.1109/DICTA47822.2019.8946008.\u00a0\u21a9\u21a9
Mohammad Aslani and Stefan Seipel. Automatic identification of utilizable rooftop areas in digital surface models for photovoltaics potential assessment. Applied Energy, 306:118033, January 2022. URL: https://www.sciencedirect.com/science/article/pii/S0306261921013283 (visited on 2023-03-24), doi:10.1016/j.apenergy.2021.118033.\u00a0\u21a9
Shuhei Watanabe. Tree-Structured Parzen Estimator: Understanding Its Algorithm Components and Their Roles for Better Empirical Performance. Technical Report, University of Freiburg, May 2023. arXiv:2304.11127 [cs]. URL: http://arxiv.org/abs/2304.11127 (visited on 2024-04-29), doi:10.48550/arXiv.2304.11127.\u00a0\u21a9
Zhen Qian, Min Chen, Teng Zhong, Fan Zhang, Rui Zhu, Zhixin Zhang, Kai Zhang, Zhuo Sun, and Guonian L\u00fc. Deep Roof Refiner: A detail-oriented deep learning network for refined delineation of roof structure lines using satellite imagery. International Journal of Applied Earth Observation and Geoinformation, 107:102680, March 2022. URL: https://linkinghub.elsevier.com/retrieve/pii/S030324342200006X (visited on 2022-05-27), doi:10.1016/j.jag.2022.102680.\u00a0\u21a9
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Doll\u00e1r, and Ross Girshick. Segment Anything. April 2023. arXiv:2304.02643 [cs]. URL: http://arxiv.org/abs/2304.02643 (visited on 2024-04-09).\u00a0\u21a9\u21a9
Qiusheng Wu and Lucas Prado Osco. Samgeo: A Python package for segmenting geospatial data with the Segment Anything Model (SAM). Journal of Open Source Software, 8(89):5663, September 2023. URL: https://joss.theoj.org/papers/10.21105/joss.05663 (visited on 2024-03-22), doi:10.21105/joss.05663.\u00a0\u21a9
Xiaoxia Liu, Fengbao Yang, Hong Wei, and Min Gao. Shadow Removal from UAV Images Based on Color and Texture Equalization Compensation of Local Homogeneous Regions. Remote Sensing, 14(11):2616, May 2022. URL: https://www.mdpi.com/2072-4292/14/11/2616 (visited on 2024-03-22), doi:10.3390/rs14112616.\u00a0\u21a9
Nicolas Beglinger (swisstopo) - Clotilde Marmy (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo) - Thilo D\u00fcrr-Auster (Canton of Fribourg) - Daniel K\u00e4ser (Canton of Fribourg)
Proposed by the Service de l'environnement (SEn) of the Canton of Fribourg - PROJ-SOILS May 2023 to April 2024 - Published in April 2024
All code is available on GitHub.
Abstract: This project focuses on developing an automated methodology to distinguish areas covered by pedological soil from areas comprised of non-soil. The goal is to generate high-resolution maps (10cm) to aid in the location and assessment of polluted soils. Towards this end, we utilize deep learning models to classify land cover types using raw, raster-based aerial imagery and digital elevation models (DEMs). Specifically, we assess models developed by the Institut National de l\u2019Information G\u00e9ographique et Foresti\u00e8re (IGN), the Haute Ecole d'Ing\u00e9nierie et de Gestion du Canton de Vaud (HEIG-VD), and the Office F\u00e9d\u00e9ral de la Statistique (OFS). The performance of the models is evaluated with the Matthew's correlation coefficient (MCC) and the Intersection over Union (IoU), as well as with qualitatifve assessments conducted by the beneficiaries of the project. In addition to testing pre-existing models, we fine-tuned the model developed by the HEIG-VD on a dataset specifically created for this project. The fine-tuning aimed to optimize the model performance on the specific use-case and to adapt it to the characteristics of the dataset: higher resolution imagery, different vegetation appearances due to seasonal differences, and a unique classification scheme. Fine-tuning with a mixed-resolution dataset improved the model performance of its application on lower-resolution imagery, which is proposed to be a solution to square artefacts that are common in inferences of attention-based models. Reaching an MCC score of 0.983, the findings demonstrate promising performance. The derived model produces satisfactory results, which have to be evaluated in a broader context before being published by the beneficiaries. Lastly, this report sheds light on potential improvements and highlights considerations for future work.
"},{"location":"PROJ-SOILS/#1-introduction","title":"1. Introduction","text":"Polluted soils present diverse health risks. In particular, contamination with lead, mercury, and polycyclic aromatic hydrocarbons (PAHs) currently mobilizes the Federal Office for the Environment 1. Therefore, it is necessary to know about the location of contaminated soils, like for prevention and management of soil displacement during construction works.
Current maps indicating the land cover or land use are often only accurate to the parcel level and therefore imprecise near houses (a property often includes a house and a garden), although those areas are especially prone to contamination 2. The Fribourgese Service de l'environnement wants to improve the knowledge about the location of contaminated soils. In this process, two phases can be distinguished:
The aim of this project is to explore methodologies for the first step only, creating a high-resolution map that distinguishes soil from non-soil areas. The problem of this project can be stated as following:
Identify or develop a model, that is able to distinguish areas covered by pedological soil from areas covered by non-soil land cover, given a raster-based input in the form of aerial imagery and digital elevation models (DEMs).
"},{"location":"PROJ-SOILS/#2-acceptance-criteria-and-concerned-metrics","title":"2. Acceptance criteria and concerned metrics","text":"The acceptance criteria describe the conditions that must be met by the outcome of the project, by which the proof-of-concept is considered a success.
These conditions can be of qualitative or quantitative nature. In the present case, the former ones rely on visual interpretation; the latter ones consist of metrics which measure the performance of the methodologies to evaluate and are easily standardized.
The chosen evaluation strategies are described below.
"},{"location":"PROJ-SOILS/#21-metrics","title":"2.1 Metrics","text":"As metrics, the Mathew's correlation coefficient and the intersection over union have been used.
Mathew's Correlation Coefficient (MCC) The Matthew's correlation coefficient (MCC) offers a balanced evaluation of model performance by incorporating and combining all four components of the confusion matrix: true positives, false positives, true negatives, and false negatives. This makes the metric be effective even in cases of class imbalance, which could be a challenge when working with aerial imagery.
\\[MCC = \\frac{TP \\times TN - FP \\times FN}{\\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}\\]Where:
The MCC is the only binary classification rate that generates a high score only if the binary predictor was able to correctly predict the majority of positive data instances and the majority of negative data instances. It ranges from -1 to 1, where 1 indicates a perfect prediction, 0 indicates a random prediction, and -1 indicates a perfectly wrong prediction3.
Intersection over Union (IoU) The IoU, also known as the Jaccard index, measures the overlap between two datasets. In the context of image segmentation, it calculates the ratio of the intersection (the area correctly identified as a certain class) to the union (the total area predicted and actual, combined) of these two areas. This makes the IoU a valuable metric for evaluating the performance of segmentation models. However, it's important to note that the IoU does not take true negatives into account, which can make interpretation challenging in certain cases.
\\[IoU = \\frac{TP}{TP + FP + FN}\\]Where:
The IoU ranges from 0 to 1, where 1 indicates a perfect prediction and 0 indicates no overlap between the ground truth and the prediction. The mIoU is the mean of the IoU values of all classes and is a common metric for semantic segmentation tasks.
In a binary scenario, the IoU does not render the same scores for the two classes. This means, that either the mIoU is considered to be the final metrics, or one of the two classes soil or non-soil is considered to be the positive or the negative class, respectively. The decision was made that the mIoU, meaning the mean of the IoU for soil and the IoU for non-soil is used in the binary case.
"},{"location":"PROJ-SOILS/#22-qualitative-assessment","title":"2.2 Qualitative Assessment","text":"To incorporate a holistic perspective of the results and to make sure that the evaluation and ranking based on the above metrics correspond to the actually perceived quality of the models, a qualitative assessment is also conducted. For this reason, the beneficiaries were asked to rank predictions of the models qualitatively. If the qualitatively assessed ranking corresponds to the ranking based on the above metrics, we can be confident that the chosen metrics are a good proxy for the actual perceived quality and usability of the models.
"},{"location":"PROJ-SOILS/#3-data","title":"3. Data","text":"The models evaluated in the project make use of different data: after inference on images with or without DEM, the obtained predictions were compared to ground truth data. All these data are described in this section.
"},{"location":"PROJ-SOILS/#31-input-data","title":"3.1 Input Data","text":"As stated in the introduction, the explored methodology should work with raw, raster-based data. The following data is provided by swisstopo and well-adapted to our problem:
The imagery and the data for the DEMs computation were not acquired at the same time, which means that the depicted land cover can differ between the two datasets. An important factor in this respect is the season (leaf-on or leaf-off). Data for swisstopo's DEMs are always acquired in the leaf-off period, which means that the used imagery should also have been acquired in the leaf-off period. To get the best fit regarding temporal and seasonal similarity, imagery from 2020 and DEMs from 2019 were used.
"},{"location":"PROJ-SOILS/#32-ground-truth","title":"3.2 Ground Truth","text":"The ground truth data for this project is used to compare the predictions of the models to the actual land cover types and to fine-tune an existing model for the project's specific needs. It was digitized by the beneficiaries of the project and is based on the SWISSIMAGE RS acquisition from 2020. As vector data allows for a more precise delineation of the land cover types, the ground truth data was digitized in a vector format. All contiguous areas comprised of the same land cover type were digitized as polygons.
"},{"location":"PROJ-SOILS/#classification-scheme","title":"Classification Scheme","text":"Although the goal of this project is to distinguish soil from non-soil areas, the ground truth data is classified into more detailed classes. This is due to the fact that it is easier to identify possible shortcomings of the models when the classes are more detailed. With a classification, techniques like confusion matrices can be used to identify which classes are often confused with each other, leading also to a better understanding about what areas should be covered in additional ground truth digitizations.
During development of the classification scheme, the focus lied on the distinction between soil and non-soil, which means that every class can be attributed to either soil, or non-soil, thereby respecting the legal definitions of soil according to the Federal Ordinance on Soil Pollutions4. The final classification scheme of the ground truth data is a product of an iterative process and has been subject of compromises between an optimal fit to the legal definitions and practical limitations like the possibility of a mere optical identification of the classes. Essentially, the scheme consists of 17 classes. However, during fine-tuning, it was found that some classes are too heavily underrepresented to be learnt by the model. As a result, the classes were merged into a new scheme consisting of 12 classes. Another feature of the classification scheme to keep in mind is that it is optimized for the Fribourgese territory, which means that some classes may not be directly applicable to other regions. The classification scheme is depicted in Figure 1.
Figure 1: Classification scheme of the ground truth data. Soil classes are depicted in green."},{"location":"PROJ-SOILS/#extent","title":"Extent","text":"The ground truth has been digitized on the Fribourgese territory on about 9.6 km\u00b2, including diverse land cover types. The area of interest is depicted in Figure 2.
Figure 2: Ground truth of the area of interest."},{"location":"PROJ-SOILS/#4-existing-models","title":"4. Existing Models","text":"There are no existing models that directly fit to the project's problem: models directly outputing georeferenced, binary raster-images distinguishing soil from non-soil. However, there are models that are able to classify land cover types on aerial imagery. Three institutions have developed such models which are assessed in the evaluation section. All of them are deep learning neural networks. In the following subchapters, the models are discussed briefly.
"},{"location":"PROJ-SOILS/#41-institut-national-de-linformation-geographique-et-forestiere-ign","title":"4.1 Institut National de l\u2019Information G\u00e9ographique et Foresti\u00e8re (IGN)","text":"The D\u00e9partement d'Appui \u00e0 l'Innovation (DAI) at IGN has implemented three AI models for land cover segmentation: odeon-unet-vgg16, smp-unet-resnet34-imagenet, and smp-fpn-resnet34-imagenet, each trained with two input modalities, named RVBI and RVBIE, resulting in six configurations.
The model architectures are:
The input modalities are:
IGN's own assessment of these 6 configurations suggests that resnet34 encoder models, with their larger receptive fields, generally outperform vgg16 models, benefiting from the spatial context in prediction. The pre-training with ImageNet further enhances model performance. Current evaluations of IGN are focusing on models from the FLAIR-1 challenge, which may replace existing models in production.
The FLAIR-1 Challenge was designed to enhance artificial intelligence (AI) methods for land cover mapping. Launched on November 21, 2022, the challenge focused on the FLAIR-1 (French Land cover from Aerospace ImageRy) dataset, one of the largest datasets for training AI models in land cover mapping. The dataset included data from over 50 departments, encompassing more than 20 billion annotated pixels, representing the diversity of the French metropolitan territory. The total area of the ground truth data is calculated as:
\\[A = \\frac{(512px*0.2\\frac{m}{px})^2 * 77412\\ tiles}{10^6} = 811.7 km^2\\]All of the used model architectures of the IGN are in the family of the convolutional neural networks (CNNs), which are a type of deep learning algorithm. Inspired by biological processes, CNNs implement patterns of connectivity between artificial neurons similar to the organization in the biological visual system. CNNs are particularly effective in image recognition tasks, as they can automatically learn features from the input data6.
According to IGN, the main challenges in model performance lie in adapting to varying radiometric calibrations and vegetation appearances in different datasets, such as the lack of orthophotos taken during winter (\u201cleaf-off\u201d) in the French training data. Ongoing efforts are aimed at improving model generalization across different types of radiometry and training with winter images to account for leafless vegetation appearances.
"},{"location":"PROJ-SOILS/#42-haute-ecole-dingenierie-et-de-gestion-du-canton-de-vaud-heig-vd","title":"4.2 Haute Ecole d'Ing\u00e9nierie et de Gestion du Canton de Vaud (HEIG-VD)","text":"The Institute of Territorial Engineering (INSIT) at the HEIG-VD has participated in the FLAIR-1 challenge.
INSIT use a Mask2Former7 architecture, which is an attention-based model. Attention-based models in computer vision are neural networks that selectively focus on certain areas of an image during processing. They also mimic the biological visual system by concentrating on specific parts of an image while ignoring others 8.
The researchers at HEIG-VD could not prove a significant performance increase in including the near-infrared (NIR) channel and/or a DEM. As a result, their model works with RGB imagery only.
"},{"location":"PROJ-SOILS/#43-office-federal-de-la-statistique-ofs","title":"4.3 Office F\u00e9d\u00e9ral de la Statistique (OFS)","text":"OFS has also created a deep learning model prototype to automatically segment land cover types. However, different than the models of IGN and HEIG-VD, it works with two steps:
The Methodology section describes the infrastructure used to run the models and to reproduce the project. Furthermore, it describes precisely the evaluation and fine-tuning approaches.
"},{"location":"PROJ-SOILS/#51-infrastructure","title":"5.1 Infrastructure","text":"The term \u201cinfrastrucutre\u201d refers here to both hardware and software resources.
"},{"location":"PROJ-SOILS/#hardware","title":"Hardware","text":"Most of the development of this project was conducted on a MacBook Pro (2021) with an M1 Pro chip. To accelerate the inference and fine-tuning of the models, virtual machines (VMs) were used. The VMs were provided by Infomaniak and were equipped with 16 CPUs, 32 GB of RAM, and an NVIDIA Tesla T4 GPU.
"},{"location":"PROJ-SOILS/#reproducibility","title":"Reproducibility","text":"The code is versioned using Git and hosted on the Swiss Territorial Data Lab GitHub repository. To ensure reproducibility across different environments, the environment is containerized using Docker12.
"},{"location":"PROJ-SOILS/#deep-learning-framework","title":"Deep Learning Framework","text":"We received the source code and the model weights of the HEIG-VD model and of the OFS model. Both models are implemented using the deep learning framework PyTorch13. The HEIG-VD model uses an additional library called mmsegmentation14, which is built on top of PyTorch and provides a high-level interface for training and evaluating semantic segmentation models.
"},{"location":"PROJ-SOILS/#52-evaluation","title":"5.2 Evaluation","text":"To realize the evaluation of the afore-mentionned models, reclassification of land cover classes into soil classes were necessary, as well as the definition of a common extent to the availabe inferences. Furthermore, the metrics were implemented in the workflow and a rigorous qualitative assessment was defined.
"},{"location":"PROJ-SOILS/#inference","title":"Inference","text":"In the beginning of the evaluation phase, the inferences of the models were generated directly by the aforementioned institutions. Later, after receiving the model weights and the source codes, we could infere from the models of the HEIG-VD and OFS directly.
"},{"location":"PROJ-SOILS/#reclassification","title":"Reclassification","text":"As already touched upon, the above-stated models do not directly output binary (soil/non-soil) raster images, but output segmented rasters with multiple classes. The classification scheme depends on the data that was used for training. The classes of the models of IGN and HEIG-VD are almost identical, since they have both been trained on French imagery and ground truth. They differ only in the numbering of the classes. The model of OFS, however outputs completely different classes. To harmonize the results of all three institution\u2019s models and to make them fit for out problem at hand, all outputs have been reclassified to the same classification scheme named \u201cPackage ID\u201d. The reason for this name is that there is an N:M relationship between the IGN-originated classes and the Fribourg ground truth classes. One \u201cpackage\u201d thus consists of all the classes that are connected via N:M relationships. The mapping of the classes is depicted in Figures 3 and 4.
Figure 3: Mapping between the package ID and the classification schemes of IGN and HEIG-VD. Figure 4: Mapping between the package ID and the classification scheme of OFS."},{"location":"PROJ-SOILS/#extents","title":"Extents","text":"From the extent originally covered by the ground truth, a smaller extent had to be defined during the evaluation for reasons of inferences availability and to understand the performance of the different models.
Extent 1 Because, we did not have inferences of all models for the whole area of the ground truth, we did not have the possibility to evaluate all the models on the whole extent. To allow for a fair comparison between the models, the evaluation was therefore conducted on the largest possible extent, which is the intersection between all the received inferences and the ground truth. This extent is called \u201cExtent 1\u201d and makes up a total area of about 0.42 km\u00b2. The area can be seen in Figure 5.
Figure 5: Ground truth of Extent 1.Masked Extent 1: Areas around buildings In the Extent 1, a great share of the area consists of vegetated soil. To check the performance only in the urban areas, the evaluation is also conducted for only the subset of pixels that are within 20 m of buildings. Although the extent of this modification is the same as Extent 1, it is treated like a separate extent, called \u201cextent1-masked\u201d, which is depicted in Figure 6.
Figure 6: Ground truth of Extent 1, masked to focus on the areas around buildings.Extent 2 The output of the HEIG-VD's model is affected by square-shaped artefacts, which can be seen in Figure 7. The squares coincide with the size of the model's receptive field, which is 512x512 pixels. With an image resolution of 10 cm, the artefacts are thus of size 51.2x51.2 m, or with an image resolution of 20 cm of size 102.4x102.4 m.
There are to observations regarding the occurence of the artefacts:
The artefacts, then, are probably a combination of those two factors.
Figure 7: Representative map showing square artefacts in areas without high-frequency context. Lines: GT, Fills: predictionsThe artefacts produce large areas of false predictions that supposedly greatly influence any evaluation metric that is computed. To obtain a clearer understanding of the influence of those artefacts on the metrics, a second extent, Extent 2 as shown in Figure 8, has been created, that excludes all the tiles where the HEIG-VD model produces those artefacts.
Figure 8: Ground truth of Extent 2."},{"location":"PROJ-SOILS/#metrics","title":"Metrics","text":"Both the MCC and the IoU values are created in a raster-based fashion. This means that the spatially overlapping pixels of the predictions and the GT are compared to be the same. For each class, each pixel is therefore classified as one of the following:
As described in the acceptance criteria, the MCC and the IoU are then calculated as a specific combination of these values. As the MCC is only suited for a binary classification, the MCC is computed for the binary classification of the models. Only the IoU is also computed for the multiclass classification of the models. The general workflow of the evaluation pipeline is the following:
More details about the technical implications of the evaluation pipeline can be found in the GitHub repository of the project.
"},{"location":"PROJ-SOILS/#qualitative-assessment","title":"Qualitative assessment","text":"As stated in the Qualitative Assessment section, this visual assessment serves to ensure that the chosen metrics (MCC, IoU) correspond to the qualitative evaluation of the beneficiaries. Three models where chosen, such that (regarding the metrics) high- and low-performing models were included. As the problem with the artefacts in the HEIG-VD model\u2019s predictions is very evident, for this assessment, only a subset from Extent 2 has been taken into account.
To conduct the qualitative assessment, the beneficiaries were given the predictions of the three chosen models on 4 representative tiles. The tiles were chosen to represent different land cover types (as far as possible on this area). The beneficiaries were then asked to rank the predictions of the models from best to worst.
As the inferences for the OFS model were not available at the relevant point in time, the qualitative assessment was conducted only for the IGN and the HEIG-VD models for the tiles displayed in Figure 9.
Figure 9: Tiles that were used for the qualitative assessment. Ground truth depicted as outlines, predictions as fills. IGN1: smp-unet-resnet34-imagenet_RVBI, IGN2: odeon-unet-vgg16_RVBIE."},{"location":"PROJ-SOILS/#53-fine-tuning","title":"5.3 Fine-Tuning","text":"After the evaluation, considerations about which model to take for further progress in the project were made and the HEIG-VD model was identified as the most promising in terms of performance and availability (more in the Discussion section). The model has been trained on the FLAIR-1 dataset (see Existing Models), which differs from the present dataset in several aspects. Fine-tuning allows to retrain a model to let it adapt to the specifics of the dataset. In this case, fine-tuning aims to adjust the model to the following specifics:
The Swiss imagery is of a higher resolution than the French imagery (10 cm vs 20 cm), which means that the model has to be able to work with more detailed information.
The Swiss imagery is of a different season than the French imagery, which means that the model has to be able to work with different vegetation appearances.
The classification scheme of the Swiss ground truth is different from the French ground truth, which means that the model has to be able to work with different classes.
For fine-tuning, the dataset is split into the training dataset and the validation dataset. The training dataset consists of 80% of the input imagery and ground truth, while the validation dataset consists of the remaining 20%. The dataset is split in a stratified manner, which means that the distribution of the classes in the training dataset is as close as possible to the distribution of the classes in the validation dataset. This is important to ensure that the model is trained on a representative sample of the data. The split is conducted in a semi-random and tile-based manner:
As stated in the Ground Truth section, the fine-tuning is conducted using the classification scheme consisting of 12 classes.
Figure 10: Class frequency distribution of the training and validation dataset. Mind that the y-axis is logarithmic.To mitigate the effect of the artefacts, mentioned in the Extent section, we propose to decrease the spatial resolution of the input (and thus also the output) of the model to increase the spatial receptive field of the model. With input tiles covering a larger area, the chance of the occurrence of high-frequency features that give context to the image increases. A visualization of this proposal is shown in Figure 11: while the 10 cm input tile has only low-frequency agricultural context, the 40 cm input tile has high-frequency context in the form of a road. This context, as proposed, could help the model to make a more informed decision.
Figure 11: Visualization of the changing receptive field of the model with different input resolutionsAs a model adjusts for a certain resolution during training, we test the effect of training on different resolutions. Thus, the model is fine-tuned on two different datasets, one with a spatial resolution of 10 cm and one with mixed resolutions of 10 cm, 20 cm, and 40 cm. The input shape of all the image tiles, regardless of the ground sampling distance, is 512x512 pixels. The spatially largest tiles (40 cm) were assigned to either the training or the validation set, and all the smaller tiles that are contained within the larger tiles were assigned to the same set. This way, the model can be trained and evaluated, respectively, on the same area at different resolutions. The resulting nested grid is depicted in Figure 12.
Figure 12: Example of the used grid. The shape of the tiles is always 512 by 512 pixels, only the ground sampling distance changes. Borders have an offset to increase legibility, in reality, they're perfectly overlapping.The obtained datasets and their sizes are summarized here:
Training Dataset
Validation Dataset
Both the models trained on the single-resolution and on the mixed-resolution dataset have been trained for a total of 160'000 iterations using the mmsegmentation library14. One iteration in this context means one batch of data has been processed. Because of memory limitations, the models were trained with a batch-size of 1, which means, that one iteration corresponds to one tile being processed. Thus, one epoch (one pass through the whole dataset) consists of 2'640 iterations for the single-resolution dataset and 3'460 iterations for the mixed-resolution dataset. During training, the models were evaluated on the validation set after every epoch by computing the mIoU metric. If the mIoU increased, a model checkpoint was saved and the old one deleted. After training for the predefined number of iterations, the model with the highest mIoU on the validation set was chosen as the final model.
"},{"location":"PROJ-SOILS/#6-results","title":"6. Results","text":"The metrics values for the evaluation and the fine-tuning parts of the project are first presented. Afterwards, a close view of the final product is shown.
"},{"location":"PROJ-SOILS/#61-evaluation","title":"6.1 Evaluation","text":"The multiclass evaluation is briefly presented before showing in details, from a quantitative and qualitative perpectives, the evaluation of the models for the binary classification in soil and non-soil classes.
"},{"location":"PROJ-SOILS/#multiclass-evaluation","title":"Multiclass Evaluation","text":"As the focus of this project lies in the binary distinction between soil and non-soil areas, the multiclass classification results are not discussed in further detail. However, plots displaying the class-IoU values of the different models are depicted in Figures 13 and 14. Confusion matrices can be found in the Appendices.
Figure 13: IoU values for different models and classes on Extent 1. Soil-classes are depicted in the green rectangles. Figure 14: IoU values for different models and classes on Extent 2. Soil-classes are depicted in the green rectangles."},{"location":"PROJ-SOILS/#quantitative-evaluation","title":"Quantitative Evaluation","text":"Figure 15 and 16 show the MCC values and mIoU values, respectively, computed for the binary classification of different models. As the distribution of the metrics across the models is very similar, only the MCC values are discussed in the following and are precisely given in Table 1.
Figure 15: MCC values of the binary predictions of the models on the two extents. Figure 16: mIoU values of the binary predictions of the models on the two extents. ModelMCC (Extent 1)MCC (Masked Extent 1)MCC (Extent 2) IGN_smp-unet-resnet34-imagenet_RVBI0.8250.8080.813 OFS_ADELE2(+SAM)0.8180.8020.794 IGN_smp-unet-resnet34-imagenet_RVBIE0.8100.7980.824 HEIG-VD0.7890.8390.859 IGN_smp-fpn-resnet34-imagenet_RVBIE0.7140.7490.795 IGN_odeon-unet-vgg16_RVBI0.7100.7060.794 IGN_smp-fpn-resnet34-imagenet_RVBI0.7100.7450.792 IGN_odeon-unet-vgg16_RVBIE0.6400.6130.713 Table 1: MCC of the binary predictions of the models on the three extents.Extent 1 The best-performing model in Extent 1 is the IGN_smp-unet-resnet34-imagenet_RVBI model, with an MCC of 0.825. The inclusion of the elevation channel does not seem to have a significant impact on the model's performance, as the IGN_smp-unet-resnet34-imagenet_RVBIE model only achieves an MCC of 0.810. The OFS_ADELE2(+SAM) model follows closely with an MCC of 0.818. The HEIG-VD model, on the other hand, performs significantly worse, with an MCC of 0.789. The models IGN_smp-fpn-resnet34-imagenet_RVBIE, IGN_odeon-unet-vgg16_RVBI, and IGN_smp-fpn-resnet34-imagenet_RVBI all achieve an MCC of around 0.710. The IGN_odeon-unet-vgg16_RVBIE model performs the worst, with an MCC of 0.640.
masked Extent 1 & Extent 2 The greatest difference to Extent 1 is that in masked Extent 1 and in Extent 2, the HEIG-VD model performs significantly better than in masked Extent 1, with an MCC of 0.839. Generally, the models perform similarly in masked Extent 1 and in Extent 1. The models are generally performing better in Extent 2.
"},{"location":"PROJ-SOILS/#qualitative-evaluation","title":"Qualitative Evaluation","text":"The results of the qualitative assessment are depicted in Figure 17. The qualitative assessment rendered the following ranking:
The ranking corresponds to the ranking based on the metric measures.
Figure 17: Qualitative assessment by the beneficiaries."},{"location":"PROJ-SOILS/#62-fine-tuning","title":"6.2 Fine-Tuning","text":"For the second part of the project - fine-tuning of the HEIG-VD model - the quantitative binary performance is first presented. Afterwards, the multiclass outputs are quantitatively, qualitatively and visually given. This allows to understand what is behind the visual binary outputs qualitatively discussed it the final subsection.
"},{"location":"PROJ-SOILS/#binary-results","title":"Binary Results","text":"Figure 18 shows the progress of the models during fine-tuning, with a datapoint after every epoch. The curves are quite similar to each other, but both are rather noisy. The best checkpoint of the 10 cm model is at epoch 71 with an mIoU of 0.939. The best checkpoint of the mixed model is at epoch 145 with an mIoU of 0.930. The names of the two models are thus HEIG-VD-10cm-71k and HEIG-VD-mixed-145k.
The training for the models for 160'000 iterations with the above stated hardware took about 7 days. Performing inference on one single tile with 512x512 pixels takes about 1 second. This means that with 10 cm input tiles, the model takes about 380 seconds or 6 minutes and 20 seconds to process 1 km\u00b2. As the canton of Fribourg has an area of about 1'670 km\u00b2, the model would take about one week to process the whole canton. The model would able to process the whole canton in a reasonable amount of time.
Figure 18: Training progress of the models.Figure 19 and Table 2 show the MCC values for the original HEIG-VD model, as well as for the two fine-tuned models one the evaluation extent. When comparing the MCC values of the original HEIG-VD model (MCC=0.553) with the fine-tuned models (MCC after 10 cm training : 0.939; MCC after mixed training: 0.938 ), the fine-tuned models perform significantly better. However, one should notice that the original HEIG-VD model was trained on a different dataset and with a different classification scheme. It was evaluated on the same extent but using the package ID, which is introduced in the Reclassification section.
Regarding the performance of the two models on inference with different input resolutions, they perform quite similarly on the 10 cm resolution input. Both models perform worse as the ground sampling distance increases:
However, the performance of the HEIG-VD-mixed-145k model is not decreasing as much as the HEIG-VD-10cm-71k model on the 20 cm and 40 cm resolution inputs.
Figure 19: MCC values of the binary predictions of the model fine-tuned on different resolutions. ModelMCC (10cm input)MCC (20cm input)MCC (40cm input) HEIG-VD-original0.553 HEIG-VD-10cm-71k0.9390.8840.795 HEIG-VD-mixed-145k0.9380.9300.893 Table 2: MCC values of the binary predictions of the model fine-tuned on different resolutions."},{"location":"PROJ-SOILS/#multi-class-results","title":"Multi-Class Results","text":"As in the Evaluation results section, the results are not discussed in further details. Confusion matrices can be found in the Appendices. Figures 20 and 21 shows the IoU values of the mixed-resolution model (Figure 20) and the 10 cm model (Figure 21) on different resolutions.
Figure 20: IoU values of the fine-tuned models on the 10 cm dataset. Figure 21: IoU values of the fine-tuned models on the mixed-resolution dataset."},{"location":"PROJ-SOILS/#qualitative-analysis-of-the-outputs","title":"Qualitative Analysis of the Outputs","text":"Figures 22, 23, and 24 show the outputs of the two models for 10, 20, and 40 cm input resolution, respectively, on three different areas. The areas were chosen to represent different land cover types. Since there is no ground truth on this areas, these inferences can only be analyzed qualitatively. On the inferences it is apparent, that the models still have trouble on regions with little high-frequency context and are prone to square artefacts. In the urban and countryside areas (Figure 22 and 23), the combination of decreased resolution (and thus increased spatial receptive field) and the fine-tuning on the mixed dataset seems to have a positive effect on the occurrence of the square artefacts. In the mountainous area (Figure 24), however, the artefacts are even more pronounced in the outputs of the mixed-resolution model than in the 10cm-only model.
An effect of the decreased resolution is that, generally, the predictions seem to be less impacted by the artefacts. However, if there are artefacts, their spatial extent, being the same as the spatial receptive field of the model, is larger.
Figure 22: Comparison of the predictions of the model fine-tuned on different resolutions in urban areas. Figure 23: Comparison of the predictions of the model fine-tuned on different resolutions in countryside areas. Figure 24: Comparison of the predictions of the model fine-tuned on different resolutions in mountainous areas."},{"location":"PROJ-SOILS/#qualitative-analysis-of-the-binary-outputs","title":"Qualitative Analysis of the Binary Outputs","text":"Figures 25, 26, and 27 show the binary output versions of the three Figures above (22, 23, and 24). Looking at the inferences on the same areas, one can see that the artefacts are much less of an issue in the binary outputs. They are still present to some extent, however, since many of the artefacts and their surroundings are in fact soil, or non-soil, respectively, the artefacts dissolve in the binary outputs. The artefacts are still present in the mountainous area where the mixed model predicts large areas of water, which is a non-soil class.
Figure 25: Comparison of the binary predictions of the model fine-tuned on different resolutions in urban areas. Figure 26: Comparison of the binary predictions of the model fine-tuned on different resolutions in countryside areas. Figure 27: Comparison of the binary predictions of the model fine-tuned on different resolutions in mountainous areas."},{"location":"PROJ-SOILS/#63-examplary-inference","title":"6.3 Examplary Inference","text":"Finally, to show an example of the model output, an inference of the HEIG-VD-mixed-145k model on a 10 cm input resolution tile is given in Figure 28. The inference is a zoomed part in the north-east of the extent shown in Figure 22 and 25. The inference illustrates that the model is capable of distinguishing between different land cover classes in great detail.
Figure 28: Representative inference of the HEIG-VD-mixed-145k model on a 10 cm input resolution tile."},{"location":"PROJ-SOILS/#7-discussion","title":"7. Discussion","text":"After presentation of the results, the evaluation and the fine-tuning outcomes are successively discussed.
"},{"location":"PROJ-SOILS/#71-evaluation","title":"7.1 Evaluation","text":"All institutions and models have their strengths and weaknesses:
IGN Regarding Extent 1, the model IGN_smp-unet-resnet34-imagenet_RVBI produced the best metrics. Furthermore, the CNN models of IGN are computationally less expensive than the other models and the inferences are not prone to the square artefacts that the HEIG-VD model produces.
HEIG-VD The HEIG-VD model, although it is outperformed by the other two institutions' models on Extent 1, performs significantly better in masked Extent 1 and in Extent 2. The model also performed best in the qualitative assessment. The assessment of the performance in Extent 2 shows that the square artefacts are responsible for a great share of false predictions.
OFS The OFS model OFS_ADELE2(+SAM) performs similarly to the best-performing IGN model, its outputs are not prone to square artefacts, and the inferences are very clean due to its usage of the SAM model. The downside of the OFS model is that it is specifically adapted for the Statistique suisse de la superficie10 and thus cannot be retrained on a different dataset.
The goal of the evaluation phase was to identify the most promising model for further steps in the project. Based on the results of the evaluation, the HEIG-VD model was chosen. It performed best in masked Extent 1 and in Extent 2, and it performed best in the qualitative assessment. Additionally, the model needs only aerial imagery with the three RGB channels which allows for an easier reproducibility. The model weights and source code of the HEIG-VD model were kindly shared with us, which enabled us to fine-tune the model to adapt to the specifics of this project. However, the premise of choosing the HEIG-VD model was that we are able to mitigate the square artefacts to an acceptable degree.
"},{"location":"PROJ-SOILS/#72-fine-tuning","title":"7.2 Fine-Tuning","text":"The following keypoints can be extracted from the fine-tuning results:
"},{"location":"PROJ-SOILS/#performance-increase","title":"Performance Increase","text":"The fine-tuning procedure could improve the model performance substantially, even though a small dataset was used. For comparison: The FLAIR-1 dataset comprises more than 800 km\u00b2, which is more than 80 times the size of our used dataset. The improvement is especially impressing, since the chosen model is an attention-based model, which is known to be dependent on large amounts of data 15. A possible explanation for the success of the fine-tuning is that most of the features that the model has to learn are already present in the pre-trained model. The adjustments of the weights needed to adapt to the specifics of the dataset may, in comparison to the vast amount of information needed to train a model from scratch, be quite small.
"},{"location":"PROJ-SOILS/#adaptability","title":"Adaptability","text":"Fine-Tuning allows to adjust for different specifics of new datasets. In this case, the model was able to adjust for different resolutions, a different acquisition season, and a new classification scheme. However, also the model that has been trained on the mixed-resolution dataset performed worse on 20 cm and 40 cm resolution input than on 10 cm resolution input. This could be due to the fact that the model has been trained on 4 times as many 10 cm resolution tiles as on 20 cm resolution tiles and 16 times as many 10 cm resolution tiles as on 40 cm resolution tiles. As a result, the model could be biased towards the 10 cm resolution. Another explanation imaginable could be that the defined classes are more easily identifiable in high-resolution input in general. While e.g., the IoU values for the class \"sol_vegetalise\" does not fluctuate much between the different resolutions, the IoU values for e.g., the class \"roche_dure_meuble\" seems to depend considerably on the resolution.
"},{"location":"PROJ-SOILS/#square-artefacts","title":"Square Artefacts","text":"While a decreased resolution and fine-tuning could not remove the square artefacts completely, their occurence could be drastically reduced. Even more so in the binary case, where the depicted confusion between water and vegetated soil in Figure 27 seems to contribute the most to the square artefacts, which could be reduced by a post-processing step, using known waterbodies as a mask. These water square artefacts that appear by decreasing the resolution, show that the model depends on both resolution and context. Indeed, the lower resolution seems to have removed the specific texture of mountainous meadow and rendered it similar to waterbody. Another factor contributing to this confusion could be a possible bias in the ground truth caused by overrepresented lake sediments that resemble soil.
The resolution decrease affects also the size of the smallest object segmentable. Luckily, urban areas profit the most from high resolution and are not prone to square artefacts, which means that a trade-off could be circumvented by a spatial seperation of high- and low-resolution inferences (e.g., urban: 10 cm, countryside: 40 cm).
"},{"location":"PROJ-SOILS/#73-remarks-from-beneficiaries","title":"7.3 Remarks from Beneficiaries","text":"The beneficiaries provided a feedback of the final state of the model. They were especially content with the performance in heterogeneous areas (i.e., urban areas) and stressed the quality of the inference regarding ambiguous features: the model is able to distinguish between soil and non-soil even in areas where the ground is covered by large objects (e.g., truck trailers), or where the soil is covered by canopies. The model is also not affected by shadows, which was a great concern at the beginning of the project, and shows a good separability of gravel and concrete, which could be used for mapping impervious surfaces. However, the square artefacts are still leading to soil/non-soil confusion, typically appearing as 51.2x51.2m squares in homogeneous areas (with 10 cm resolution). The beneficiaries concluded that around buildings, the soil map produced by the model appears more reliable than other available products and offers the opportunity to cross-reference the binary result (soil \u2013 non-soil) with existing indicative maps and to improve the quantitative assessment of the soil concerned by pollutions.
"},{"location":"PROJ-SOILS/#8-conclusion","title":"8. Conclusion","text":"One of the main findings of this project is that modern deep learning models are feasible tools to segment various land cover classes on aerial imagery. Furthermore, even complicated models can be fine-tuned for derived specifics and enhanced performance, even in the case of small datasets.
As the mixed-resolution model produces overall better results than the 10cm-only model, it can be considered as the main output of this project. It performs quite well, with an MCC value of 0.938 on the 10 cm validation set. It was able to adapt to the specifics of the Swiss dataset, which incorporates different resolutions, a different acquisition season, and a new classification scheme. The model performs especially well in urban and other high-frequency context areas, where the issue with square artefacts is less pronounced.
The project provides a methodology on how to compare different segmentation models in the geographic domain, gives insights in how a best-suited model can be chosen, and how it can be fine-tuned to adapt to a specific dataset.
"},{"location":"PROJ-SOILS/#81-limitations","title":"8.1 Limitations","text":"The main limitation of this project is the extent of the ground truth data. The ground truth data is only available for a small area in the canton of Fribourg. With a larger dataset, the model may have been able to perform even better and the generalization to other areas may have been better, because each class could have been presented to the model in a more nuanced way.
Another limitation of this project is that the seasonal diversity in the imagery used for this project is very limited. We showed that the model is able to adapt to different vegetation appearances, but the produced model has only been fine-tuned for the vegetation period of the imagery used for training. The model might perform worse in other vegetation periods.
Last but not least, the square artefacts, which were a main concern in the project, still occur within the inferences. The fine-tuning of the model on a mixed-resolution dataset was able to mitigate the effect of the artefacts, but not to remove it completely.
"},{"location":"PROJ-SOILS/#82-outlook","title":"8.2 Outlook","text":"Some ideas that emerged during the project but could not be implemented due to time constraints are:
As the square artefacts are not much of a problem in urban, high-frequency areas and a lower resolution can help to mitigate the effect of the artefacts in low-frequency, countryside areas, a possible approach could be to infer the model on 10 cm in the urban areas and on 40 cm in the countryside areas. Another way to combine low- and high-resolution inferences could be to make use of an ensemble technique, which combines the predictions of different models to get \u201cthe best of both worlds\u201d.
The confusion between water and vegetated soil is a main cause of error in the binary predictions. A post-processing step to remove these square artefacts could be conducted by using known waterbodies as a mask.
We would like to express our gratitude to the people working at HEIG-VD, IGN, and OFS, which contributed significantly to this project by sharing not only their code and models, but also their thoughts and experiences with us. It was a pleasure to collaborate with them.
"},{"location":"PROJ-SOILS/#9-appendices","title":"9. Appendices","text":"Figure 29: Confusion matrix of the HEIG-VD-10cm-71k model on the 10 cm validation set. Figure 30: Confusion matrix of the HEIG-VD-mixed-145k model on the 10 cm validation set. Figure 31: Confusion matrix of the HEIG-VD-10cm-71k model on the 20 cm validation set. Figure 32: Confusion matrix of the HEIG-VD-mixed-145k model on the 20 cm validation set. Figure 33: Confusion matrix of the HEIG-VD-10cm-71k model on the 40 cm validation set. Figure 34: Confusion matrix of the HEIG-VD-mixed-145k model on the 40 cm validation set. Figure 35: Confusion matrix of the original HEIG-VD model on Extent 1. Figure 36: Confusion matrix of the original HEIG-VD model on Extent 2. Figure 37: Confusion matrix of the IGN model smp-unet-resnet34-imagenet_RVBI on Extent 1. Figure 38: Confusion matrix of the IGN model smp-unet-resnet34-imagenet_RVBI on Extent 2. Figure 39: Confusion matrix of the OFS model OFS_ADELE2(+SAM) on Extent 1."},{"location":"PROJ-SOILS/#10-bibliography","title":"10. Bibliography","text":"Pieter Poldervaart and Bundesamt f\u00fcr Umwelt BAFU \\textbar Office f\u00e9d\u00e9ral de l'environnement OFEV \\textbar Ufficio federale dell'ambiente UFAM. Bleibelastung: Schweres Erbe in G\u00e4rten und auf Spielpl\u00e4tzen. September 2020. URL: https://www.bafu.admin.ch/bafu/de/home/themen/thema-altlasten/altlasten--dossiers/bleibelastung-schweres-erbe-in-gaerten-und-auf-spielplaetzen.html (visited on 2024-01-04).\u00a0\u21a9
Christian Niederer. Schwermetallbelastungen in Hausg\u00e4rten in Freiburgs Altstadt (Kurzfassung), Studie im Auftrag des Amtes f\u00fcr Umwelt des Kantons Freiburg. Technical Report, BMG Engineering AG, 2015.\u00a0\u21a9
Davide Chicco and Giuseppe Jurman. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1):6, January 2020. URL: https://doi.org/10.1186/s12864-019-6413-7 (visited on 2024-02-05), doi:10.1186/s12864-019-6413-7.\u00a0\u21a9
Conseil f\u00e9d\u00e9ral suisse. Ordonnance sur les atteintes port\u00e9es aux sols. 1998. URL: https://www.fedlex.admin.ch/eli/cc/1998/1854_1854_1854/fr.\u00a0\u21a9
Pavel Iakubovskii. Segmentation Models Pytorch. 2019. Publication Title: GitHub repository. URL: https://github.com/qubvel/segmentation_models.pytorch.\u00a0\u21a9
Rikiya Yamashita, Mizuho Nishio, Richard Kinh Gian Do, and Kaori Togashi. Convolutional neural networks: an overview and application in radiology. Insights into Imaging, 9(4):611\u2013629, August 2018. Number: 4 Publisher: SpringerOpen. URL: https://insightsimaging.springeropen.com/articles/10.1007/s13244-018-0639-9 (visited on 2024-04-08), doi:10.1007/s13244-018-0639-9.\u00a0\u21a9
Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, and Rohit Girdhar. Masked-attention Mask Transformer for Universal Image Segmentation. June 2022. arXiv:2112.01527 [cs]. URL: http://arxiv.org/abs/2112.01527 (visited on 2024-03-21), doi:10.48550/arXiv.2112.01527.\u00a0\u21a9
Meng-Hao Guo, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R. Martin, Ming-Ming Cheng, and Shi-Min Hu. Attention Mechanisms in Computer Vision: A Survey. Computational Visual Media, 8(3):331\u2013368, September 2022. arXiv:2111.07624 [cs]. URL: http://arxiv.org/abs/2111.07624 (visited on 2024-04-08), doi:10.1007/s41095-022-0271-y.\u00a0\u21a9
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Doll\u00e1r, and Ross Girshick. Segment Anything. April 2023. arXiv:2304.02643 [cs]. URL: http://arxiv.org/abs/2304.02643 (visited on 2024-04-09).\u00a0\u21a9
Unknown. Arealstatistik Schweiz. Erhebung der Bodennutzung und der Bodenbedeckung. (Ausgabe 2019 / 2020). Number 9406112. Bundesamt f\u00fcr Statistik (BFS), Neuch\u00e2tel, September 2019. Backup Publisher: Bundesamt f\u00fcr Statistik (BFS). URL: https://dam-api.bfs.admin.ch/hub/api/dam/assets/9406112/master.\u00a0\u21a9\u21a9
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A ConvNet for the 2020s. March 2022. arXiv:2201.03545 [cs]. URL: http://arxiv.org/abs/2201.03545 (visited on 2024-03-21), doi:10.48550/arXiv.2201.03545.\u00a0\u21a9
Dirk Merkel. Docker: lightweight linux containers for consistent development and deployment. Linux journal, 2014(239):2, 2014.\u00a0\u21a9
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K\u00f6pf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An Imperative Style, High-Performance Deep Learning Library. December 2019. arXiv:1912.01703 [cs, stat]. URL: http://arxiv.org/abs/1912.01703 (visited on 2024-04-02), doi:10.48550/arXiv.1912.01703.\u00a0\u21a9
MMSegmentation Contributors. MMSegmentation: OpenMMLab Semantic Segmentation Toolbox and Benchmark. 2020. URL: https://github.com/open-mmlab/mmsegmentation.\u00a0\u21a9\u21a9
Abdul Mueed Hafiz, Shabir Ahmad Parah, and Rouf Ul Alam Bhat. Attention mechanisms and deep learning for machine vision: A survey of the state of the art. June 2021. arXiv:2106.07550 [cs]. URL: http://arxiv.org/abs/2106.07550 (visited on 2024-04-09).\u00a0\u21a9
Adrian Meyer (FHNW) Contributions to Background & Agricultural Law: Pascal Salath\u00e9 (FHNW)
Proposed by the Canton of Thurgau - PROJ-TGOBJ March 2021 to June 2021 - Published on July 7, 2021
Abstract: The Cultivable agricultural area layer (\"LN, Landwirtschaftliche Nutzfl\u00e4che\") is a GIS vector product maintained by the cantonal agricultural offices and serves as the key calculation index for the receipt of direct subsidy contributions to farms. The canton of Thurgau requested a spatial vector layer indicating locations and area consumption extent of the largest silage bale deposits intersecting with the known LN area, since areas used for silage bale storage are not eligible for subsidies. Having detections of such objects readily available greatly reduces the workload of the responsible official by directing the monitoring process to the relevant hotspots. Ultimately public economical damage can be prevented which would result from the payout of unjustified subsidy contributions.
"},{"location":"PROJ-TGLN/#1-introduction","title":"1 Introduction","text":""},{"location":"PROJ-TGLN/#11-background","title":"1.1 Background","text":"Switzerland's direct payment system is the basis for sustainable, market-oriented agriculture. The federal government supports local farms in the form of various types of contributions and enables farming families to claim an adequate income. (cf. Art. 104 BV)
In the years 2014-2017 a new agricultural policy system was introduced in Switzerland. With specialized direct payment subsidies named \u00abLandscape Quality Contributions\u00bb (\u00abLQ\u00bb, Landschaftsqualit\u00e4tsbeitr\u00e4ge in German, Contributions \u00e0 la qualit\u00e9 du paysage in French) farms and agricultural businesses can be awarded for complying with measures that aim at increasing biodiversity and maintaining extensively cultivated open grasslands.
Subsidies are calculated by area and the agricultural offices of the respective cantonal administration have to constantly monitor the landscape status as well as the compliance of the business operations in order to approve the requested amounts. Only certain land usage profiles are eligible for subsidies payment.
According to Art. 104 \u00a71 BV, the agricultural sector, for its part, has to make a substantial decisive contribution to:
In order to be able to claim direct payments, farms are subject to various conditions. The Cultivable agricultural area layer (\u00abLN\u00bb, from German Landwirtschaftliche Nutzfl\u00e4che) is a GIS product maintained by the cantonal agricultural offices and serves as the key calculation index for the receipt of contributions. (cf. Art. 35 DZV).
The registration and adjustment of the LN is part of the periodic update (\u00abPNF\u00bb, Periodische Nachf\u00fchrung) within the framework of the official cadastral survey (\u00abAV\u00bb, Amtliche Vermessung) and is usually carried out every 6 years (Gamma 2021). Its correct determination is of immense importance, because if the LN area derived from the cadastral survey data deviates from the actual conditions on site, incorrect contribution amounts may be paid out (swisstopo/BLW/BUWAL 2000).
Farm areas that are not eligible for contributions, in particular areas that are not usable for effective agriculture such as farmyards or storage areas (e.g. for silage hay bales), are constantly changing due to the high degree of mechanization in agriculture and often fall within the perimeter of the LN. The tracking of these areas with conventional surveying such as repeated field visits or the visual interpretation of current aerial imagery proves to be very time-consuming and costly. Possible alternative approaches are searched for in the context of this use case project.
Artificial neural networks based on Deep Learning (DL) have been used for automated detection and classification of image features for quite some time. Reliable detection from aerial imagery using applications of DL would enable cost-effective detection of uneligible areas and provide added value to agricultural offices in all cantons.
The Swiss Territorial Data Lab (STDL) is a project of co-creation and a space of experimentation which aims to solve concrete problems of public administrations by using data science applied to geodata. These characteristics make it the perfect environment to conduct this project. Research in the agricultural domain was already lead by project's partners at Fachhochschule Nordwestschweiz (FHNW) using machine learning. Furthermore, students are regularly involved in these projects, for example to automatically define the agricultural cultivation boundaries in collaboration with the Canton of Thurgau.
"},{"location":"PROJ-TGLN/#12-silage-bales","title":"1.2 Silage Bales","text":"Photo of wrapped and stacked silage hay bales (Source Wikimedia).
One of several features of interest specifically excluded from the subsidized cultivable LN area are silage hay bales. These bales are processed and compacted fermenting grass cuttings wrapped in plastic foil. They often roughly measure 1 - 2 cubic meters in volume and are weighed in at around 900kg. They are mainly used as animal food during winter when no fresh hay is available. Farmers are encouraged to compactly (\u00abdiscretely\u00bb) stack them in regular piles at few locations rather than keeping them in scattered collections consuming large areas.
The agricultural office can assess the silage bale stack locations and sizes in order to approve the application for subsidies, since areas where silage bales are stored do not count into the cultivable LN area. Farmers can specify those areas where they must not receive contributions for in a specialized webGIS system by digitizing them manually with the attribute \u00abcode 898\u00bb. For validation purposes specialists are manually evaluating aerial imagery and conduct field visits. The process of aerial imagery evaluation is arduous and monotonous and could therefore greatly profit from automatization.
The agricultural office of the Canton of Thurgau (LWA) requested a spatial vector layer indicating locations and area consumption extent of the largest silage bale deposits intersecting with the known LN area. The delivered dataset should be compatible with their webGIS workflow and should be made available with new aquisitions of aerial imaging campaigns. Having such detections readily available would reduce the workload of the responsible official by directing the monitoring to the relevant hotspots. Ultimately public economical damage can be prevented which would result from the payout of unjustified subsidy contributions. This project therefore aims at the development of an efficient silage bale detection algorithm which offers a highly accurate performance and can be quickly deployed over imaged areas as large as the complete canton of Thurgau (approx. 992 km\u00b2).
"},{"location":"PROJ-TGLN/#2-method","title":"2 Method","text":""},{"location":"PROJ-TGLN/#21-overview","title":"2.1 Overview","text":"Sileage bale stacks are clearly visible on the newest 2019 layer of the 10cm Swissimage orthophoto provided by Swisstopo. A few hundred of these stacks were manually digitized as vector polygons with QGIS in a semi-automatic approach.
Following the structure of the STDL Object Detection Framework, an Area of Interest (AoI) was defined (most of the cantonal area of Thurgau) and tiled into smaller quadratic images (tiles). Those tiles containing an intersecting overlap with an annotation were subsequently fed to a neural object detection network for training in a process known as Transfer Learning. A random portion of the dataset was kept aside from the training process in order to allow an unbiased evaluation of the detector performance.
Multiple iterations were performed in order to find out near-optimal input parameters such as tile size, zoom level, or network- and training-specific variables termed \u00abhyperparameters\u00bb. All detector models were evaluated for their prediction perforwmance on the reserved test dataset. The best model was chosen by means of its optimal overall performance.
This model was used in turn to perform a prediction operation (\u00abInference\u00bb) on all tiles comprising the AoI \u2013 thereby detecting silage hay bale stacks over the whole canton of Thurgau.
Postprocessing included filtering the resulting polygons by a high confidence score threshold provided by the detector for each detection in order to reduce the risk of false positive results (misidentification of an object as a silage bale stack). Subsequently adjacent polygons on seperate tiles were merged by standard vector operations. A spatial intersection with the known LN layer was performed to identify the specific areas occupied by silage stacks which should not receive contributions but potentially did in last years rolling payout. Only stacks covering more than 50m2 of LN area are considered \u00abrelevant\u00bb for the final delivery which translates to the equivalent of max. 10 CHF subsidy payment difference. For completeness, all LN-intersecting polygons of detections covering at least 20m2 are included in the finaly delivery. Filtering can be undertaken easily on the end user side by sorting the features with along a precalculated area column.
"},{"location":"PROJ-TGLN/#22-aerial-imagery","title":"2.2 Aerial Imagery","text":"The prototypical implementation uses the publically available Swissimage dataset. It was last flown for Thurgau in spring 2019 and offers a maximum spatial resolution of 10cm GSD (Ground Sampling Distance) at 3 year intervals. As the direct subsidies are paid out yearly the periodicity of Swissimage in theory is insufficient for annual use. In this case the high quality imagery on the one hand can serve as a proof of concept though. On the other hand the cantons have the option to order own flight campaigns to increase the periodicity of available aerial imagery if sufficient need can shown from several relevant administrative stakeholders. For our approach aerial images need to be downloaded as small quadratic subsamples of the orthomosaic called \u00abtiles\u00bb to be used in the Deep Learning process. The used tiling grid system follows the slippy map standard with an edge length of 256 pixels and a zoom level system which is derived from a quadaratic division on a mercator-projected world map (whole world equals zoom level = 0). A zoom level = 18 in this system would roughly equal to a ground sampling distance (GSD) of 60 cm.
"},{"location":"PROJ-TGLN/#23-labels-annotations","title":"2.3 Labels / Annotations","text":"As no conducive vector dataset for silage bale locations exists in Thurgau or other sources known at this point, the annotations for this use case had to be created manually by the data scientists at STDL. A specific labeling strategy to obtain such a dataset was therefore implemented.
Using Swissimage 10cm as a WMS bound basemap in QGIS, a few rural areas throughout the canton of Thurgau were selected and initially approximately 200 stacks of silage bales were manually digitized as polygons. Clearly disjunct stacks were digitized as two separate polygons. For partially visible stacks only visible parts were included. Loose collection of bales were connected into one common polygon if the distances between the single bales were not exceeding the diameter of a single bale. Ground imprints where silage bales were previously stored were not included. Also shadows on the ground were not part of the polygon. Plastic membrane rests were not included unless they seemed to cover additional bales. Most bales were of circular shape with an approximate diameter of 1.2 \u2013 1.5 m, but also smaller rectangular ones were common. Colours ranged from mostly white or green tinted over still common dark green or grey to also more exotic variants such as pink, light blue and yellow (the latter three are related to a specific cancer awareness program).
Image: Example of the annotation rules.
With these initial 200 annotations a preliminary detector was trained on a relatively high zoom level (18, 60cm GSD, tiling grid at about 150m) and predictions were generated over the whole cantonal area (See section \u00abTraining\u00bb for details). Subsequently, the 300 highest scoring new predictions (all above 99.5%) were checked visually in QGIS, precisely corrected and then transferred into the training dataset.
Image: Example of label annotations manually drawn (left and top), as well as semiautomatically generated (right) \u2013 the pixel structure of the detector is visible in the label.
All tiles containing labels were checked visually again at full zoom and missing labels were created manually. The resulting annotation dataset consists of approximately 700 silage bale stacks.
Image: Positions of the Silage Bale Labels (red) within the borders of Thurgau.
"},{"location":"PROJ-TGLN/#24-training","title":"2.4 Training","text":"Training of the model was performed with the STDL Object Detection Framework. The technology is based on a Mask RCNN architecture implemented with the High-Level API Detectron2 and the Deep Learning framework Pytorch. Parallelisation is achieved with CUDA-enabled GPUs on the High-Performance Computing cluster at the FHNW server facility in Muttenz. The Mask RCNN Backbone is formed by a ResNet-50 implementation and is accompanied by a Feature Pyramid Network (FPN). This combination of code elements results in a neural network leveraging more than 40 Mio. parameters. The dataset consists of RGB images and feature regions represented by pixel masks superimposing the imagery in the shape of the silage bale stack vectors.
Training is performed iteratively by presenting subsets of the tiled dataset to modify \u00abedge weights\u00bb in the network graph. Progress is measured step by step by statistically minimizing the loss functions. Only tiles containing masks (labels) can be trained. Two smaller subsets of all label containing tiles are reserved from the training set (TRN), so a total of 70% of the trainable tiles are presented to the network for loss minimization. The validation set (VAL, 15%) and the test set (TST, 15%) also contain labels but are statistically independent from the TRN set. The VAL set is used to perform recurrent evaluation during training. Training can be stopped if the loss function on the validation set has reached a minimum since after that point further training would push the model into an overfitting scenario. The TST set serves as an unbiased reserve to evaluate the detector performance on previously \u00abunseen\u00bb, but labelled data. Tiles not containing a label yet were classified into a separate class called \u00abother\u00bb (OTH). This dataset was only used for generating predictions.
Image: Dataset Split \u2013 Grey tiles are only used in prediction (OTH); they do not contain any labels during training. The colourful tiles contain labels, but are scattered relatively sparsely. Green tiles are used for training the model weights (TRN); orange tiles validate the learning progress during training to avoid overfitting (VAL) and blue tiles are reserved for unbiased post-training evaluation (TST).
Multiple training runs were performed not only to optimize the network-specific variables called \u00abhyper-parameters\u00bb (such as batch size, learning rate or momentum), but also to test which zoom level (spatial resolution) would yield the best results.
"},{"location":"PROJ-TGLN/#25-prediction-and-assessment","title":"2.5 Prediction and Assessment","text":"For the TRN, VAL and TST subset, confusion matrix counts and classification metrics calculations can be performed since they offer a comparison with the digitized \u00abground truth\u00bb. For all subsets (including the rest of the canton as OTH), predictions are generated as vectors covering those areas of a tile that the detector algorithm identifies as target objects and therefore attributes a confidence score.
In case of the label containing tiles, the overlap between the predictions and the labels can be checked. Is an overlap found between a label and a prediction this detection is considered a \u00abTrue Positive\u00bb (TP). If the detector missed a label entirely this label can be considered as \u00abFalse Negative\u00bb (FN). Did the detector predict a silage bale stack that was not present in the labelled data it is considered \u00abFalse Positive\u00bb (FP). On the unlabelled OTH tiles, all detections are by definition therefore considered FP.
The counting of TPs, FPs and FNs on the TST subset allows the calculation of standard metrics such as precision (user accuracy), recall (producer accuracy) and F1 score (a common overall performance metric calculated as the harmonic mean of precision and recall). The counts, as well as the metrics can be plotted as function of the minimum confidence score threshold (THR) which can be set to an acceptable percentage for a certain detection task. A low threshold should generally yield fewer FN errors, while a high threshold should yield fewer FP detections.
The best performing model by means of maximum F1 score was used to perform a prediction run over the entire cantonal surface area.
"},{"location":"PROJ-TGLN/#26-post-processing","title":"2.6 Post-Processing","text":"In order to obtain a consistent result dataset, detections need to be postprocessed. Firstly, the confidence score threshold operation is applied. Here, a comparatively high threshold can be used for this operation. \u00abMissing\u00bb the detection of a silage bale stack (FN) is not as costly for the analysis of the resulting dataset at the agricultural office as analyzing large numbers of FP detections would be. Also missing single individual silage bales is much less problematic than missing whole large stacks. These larger stacks are typically attributed with high confidence scores though and are therefore less likely to be missed.
In some cases, silage bale stacks cross the tiling grid and are therefore detected on multiple images. This results in edge artifacts along the tile boundaries intersecting detections that should be unified. For this resaon adjacent detection polygons need to merged into a single polygon. This is achieved by first buffering all detections with a 1.5m radius (about the diameter of a single bale). Then all touching polygons are dissolved into a single feature. Afterwards, negative buffering with -1.5m radius is applied to restore the original boundary. This process also leads to an edge smoothing by planing the pixel step derived vector boundary into curves.
Image: Example of adjacent detection polygons that need to be unified (buffer dissolved).
Curve polygons contain a high number of vertex points, which is why a simplification operation can be performed afterwards. The intersection with the LN layer required a preparation of that dataset. First, the perimeters of all LN polygons in Thurgau, stemming from the cadastre, were intersected with the layer \"LN difference\". Areas which contained the attribute \"No LN\" in the difference layer were therefore removed, areas with the attribute \"LN\" or \"To be checked\" were kept or if necessary (if not yet available) added to the LN dataset. Areas excluded by farmers from the subsidy themself (so-called \"layer code 898\") were removed from the LN polygons. The silage bale detections were now intersected (clipped) with all remaining LN areas such that only those portions of the detections remained that were present within the LN perimeter. For all these leftover detection polygons, the area is calculated and added as an attribute to the polygon. With a threshold operation all silo bale stacks with an area below 20 m2 are filtered out of the dataset in order to provide only economically relevant detections.
"},{"location":"PROJ-TGLN/#3-results","title":"3 Results","text":""},{"location":"PROJ-TGLN/#31-metrics-and-evaluation","title":"3.1 Metrics and Evaluation","text":"Figure: Performance of serveral detectors depending on zoom level (ground sampling distance) as measured by their maximum F1-Score.
The model trained with tiles at zoom level = 19 (every pixel approx. 30cm GSD) showed the highest performance with a maximum F1 Score of 92.3%. Increasing the resolution even further by using 15 cm/px GSD did not result in a gain in overall detection performance while drastically increasing storage needs and computational load.
Figure: Confusion matrix counts on the TST dataset in dependency of the minimum confidence score threshold.
The detector model is performing very well on the independent TST dataset detecting the largest portion of silage bale stacks at any given confidence threshold. The number of FP reaches very low counts towards the higher end of the threshold percentage.
Figure: Performance metrics on the TST dataset as a function of the minimum confidence score threshold.
Precision, Recall and F1 Score all remain on very performant values throughout the threshold range. The F1 Score plateaus above 90% performance between 5% and 93% essentially allowing to choose any given threshold value to adapt the model performance to the end user needs.
For delivery of the dataset a detector was subsequently used at a threshold of 96%. At this value 809 silage bale stacks were rediscovered in the TRN, TST and VAL subset. Just 10 FP detections were found in these subsets. 97 silage bale stacks were not rediscovered (FN). Hence, the model precision (user accuracy) was set at approx. 99% and the recall (hit rate, producer accuracy) was set at approx. 89%.
The applied model detected a total of 2\u2019473 additional silage bale stacks over the rest of the canton of Thurgau (FP on OTH).
"},{"location":"PROJ-TGLN/#32-examples","title":"3.2 Examples","text":"Image: Raw detections (yellow) of silage bale stacks displaying very high confidence scores.
Image: Raw detections (yellow) and postprocessed detections (red) \u2013 the area occupied by these silage bale stacks does not interesect with the Cultivable land (LN, green hatched). Direct subsidies are correctly paid out in this case.
"},{"location":"PROJ-TGLN/#33-relevant-features-for-delivery","title":"3.3 Relevant Features for Delivery","text":"In total, 288 silage bale stack sections are placed within the subsidized LN area and exhibit an area consumption larger than 20m\u00b2. 87 silage bale stacks consume more than 50m\u00b2, 24 stacks consume more than 100m\u00b2. One has to keep in mind that many stacks only partially intersect with the LN layer. The overlap between all detected silage bale stacks over 20m\u00b2 and the LN layer amounts to 14\u2019200m\u00b2 or an estimated damage between CHF 1'420.- and CHF 2'840.- (assuming the subsidy payout ranges between CHF 10.- and CHF 20.- per 100m\u00b2). Considering only the overlap of the 87 largest stacks with the LN layer the area consumption amounts to 7\u2019900m\u00b2 or a damage between CHF 790.- and CHF 1'580.-.
Image: Undeclared silage bale stack (red and yellow) that intersects with the cultivable land layer \u00abLN\u00bb (green).
Image: The left side silage bale stack (red) is only touching the LN area (green). The center bottom silage bale stack is completely undeclared within the LN area.
Image: Approximately half of the center silage bale stack (red) is undeclared and situated within the LN area.
Image: This farm selfdeclared almost all areas needed (blue) for silage bales (red) to be excluded from direct subsidies areas (green). Pink areas are already pre-excluded by the agricultural office.
Image: The intersection between the silage bale stack (red) and the LN area (green) is so minute, that it should not be found within the delivery dataset to the agricultural office.
Image: Small silage bale stacks in the very left and very right of the image (yellow) are undeclared but each detection falls below the relevance threshold.
"},{"location":"PROJ-TGLN/#4-discussion","title":"4 Discussion","text":""},{"location":"PROJ-TGLN/#41-feedback-by-the-agricultural-office","title":"4.1 Feedback by the Agricultural Office","text":"The contact person at the agricultural office, Mr. T. Froehlich describes the detections as very accurate with a very low percentage of wrong detections. As a GIS product the detections layer can be used in the standard workflow in order to cross-check base datasets or to perform updates and corrections.
On an economical scale the damage from misplaced sileage bale stacks in the LN areas is not negligible but also not extremely relevant. Federal annual direct agricultural subsidies of approx. 110 Mio. CHF stand in stark contrast to the estimated economical damage of maybe approx. CHF 2'000.- that misplaced silage bales might have caused for the Canton of Thurgau in 2019.
Most farmers adhere to the policies and false declaration of areas followed by sanctions is extremely rare. Silage bales are therefore not the first priority when monitoring the advancements and updates considering the LN layer. Nevertheless these new detections allow the end users at the agricultural office to direct their eyes more quickly at relevant hotspots and spare them some aspects of the long and tidious manual search that was performed in the past.
"},{"location":"PROJ-TGLN/#42-outlook","title":"4.2 Outlook","text":"Silage bales are by far not the only object limiting the extent of the cultivable subsidized land. A much larger area is consumed by farm yards \u2013 heterogenous spaces around the central farm buildings. Monitoring the growth of these spaces into the LN layer would greatly diminuish the manual workload at the agricultural office. As these spaces might also be detectable by a similar approach, this project will continue to investigate the potential of the STDL Object Detection Framework now into this direction.
"},{"location":"PROJ-TGLN/#references","title":"References","text":"Federal Office of Topography swisstopo (2020). SWISSIMAGE 10 cm - The Digital Color Orthophotomosaic of Switzerland. https://www.swisstopo.admin.ch/en/geodata/images/ortho/swissimage10.html
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448). https://openaccess.thecvf.com/content_iccv_2015/html/Girshick_Fast_R-CNN_ICCV_2015_paper.html
He, K., Gkioxari, G., Doll\u00e1r, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969). https://arxiv.org/abs/1703.06870
OpenStreetMap Foundation (2021). Slippy Map. https://wiki.openstreetmap.org/wiki/Slippy_Map
QGIS.org (2021). QGIS Geographic Information System. QGIS Association. https://qgis.org/en/site/
Wu, Y., Kirillov, A., Massa, F., Lo, W. Y., & Girshick, R. (2019). Detectron2. https://github.com/facebookresearch/detectron2
Adrian Meyer (FHNW) - Alessandro Cerioni (Canton of Geneva)
Proposed by the Canton of Thurgau - PROJ-TGPOOL January 2021 to April 2021 - Published on April 21, 2021
Abstract: The Canton of Thurgau entrusted the STDL with the task of producing swimming pool detections over the cantonal area. Specifically interesting was to leverage the ground truth annotation data from the Canton of Geneva to generate a predictive model in Thurgau while using the publicly available SWISSIMAGE aerial imagery datasets provided by swisstopo. The STDL object detection framework produced highly accurate predictions of swimming pools in Thurgau and thereby proved transferability from one canton to another without having to manually redigitize annotations. These promising detections showcase the highly useful potential of this approach by greatly reducing the need of repetitive manual labour.
"},{"location":"PROJ-TGPOOL/#introduction","title":"Introduction","text":"Until February 2021 the Swiss Territorial Data Lab developed an approach based on Mask RCNN Deep Learning algorithms for the detection of objects on aerial images, with swimming pools serving as a demonstration object. The official cadastres of the Canton of Thurgau include \u2013 among many other objects \u2013 the registration of larger private swimming pools that are permanently anchored in the ground.
The challenge is to keep the cadastre up to date on a regular basis which is usually done manually by surveying or verification with aerial imagery. Because the Canton of Thurgau (unlike the Canton of Geneva) does not maintain an own specific register of swimming pools, this study primarily serves as a technology demonstration.
A secondary goal encompasses detailed knowledge transfer from the data scientist team at the STDL to the cantonal authorities such as providing insight and interpretation guidance into the performance metrics and raising awareness for the prerequisites of the detector framework.
"},{"location":"PROJ-TGPOOL/#methodology","title":"Methodology","text":""},{"location":"PROJ-TGPOOL/#process-overview","title":"Process Overview","text":"Generating a Model from Cadastral Vectors and Aerial Images to Predict Objects in the Same or a New Area of Interest (AoI).
The STDL object detection framework is based on a bipartite approach of training and inference. This means that a predictive model is statistically adapted to known and verified data (\"training\") in order to then generate classification predictions on new, unknown data (\"inference\"). To achieve this we resample large high-resolution orthophoto mosaics by decomposing them into small square image tiles on which vectorized annotations of swimming pools are drawn.
Verified vector annotation data (\"ground truth\") for the training process was available for the cantonal area of Geneva, as well as for a smaller part of the cantonal area of Neuch\u00e2tel covering a total of almost 5'000 swimming pools present in 2019.
The predictive model used is a convolutional neural network developed for computer vision (Mask RCNN). It was trained on a high performance computing cluster at the University of Applied Sciences Northwestern Switzerland FHNW using the open source Detectron2 object detection library.
During inference, pixel-precise vector contours (\u201csegments\u201d) are produced over the tiled imagery of the canton of Thurgau. Each segment is attributed a confidence score which indicates the certainty of the detections when applied to new data. Using this score as a threshold level, performance metrics are computed in post-classification assessment.
"},{"location":"PROJ-TGPOOL/#ground-truth-dataset","title":"Ground Truth Dataset","text":"Label annotations are derived from cadastral data and manually curated
Vector ground truth annotations demarcating private swimming pools were available at two locations: A near-complete coverage of the cantonal area of Geneva which contains 4\u2019652 known objects, as well as a smaller subsection of the cantonal area of Neuchatel which contains 227 known objects. Label annotations in both cases are derived from cadastral surface vector datasets and then manually curated/verified. In case of the Geneva dataset the manual verification was performed by STDL data scientists in a previous study; in case of the Neuchatel dataset the manual verification was performed by the local cadastre experts.
"},{"location":"PROJ-TGPOOL/#reference-data-and-area-of-interest","title":"Reference Data and Area of Interest","text":"Approximately 5000 cross checked swimming pool annotations are available as vectorized shapes in the Cantons of Geneva and partially in Neuch\u00e2tel. They are compatible with orthophotos from 2018/19 such as the latest SWISSIMAGE 10cm layer.
The Area of Interest (AoI) for all tests conducted in this study are divided into two main sections:
Those areas in Geneva and Neuchatel containing vectorized ground truth labels are used as \u201cTraining AoI\u201d.
The cantonal area of Thurgau is used as \u201cPrediction AoI\u201d.
Only those parts of the cantonal surface of Thurgau are used as Prediction AoI which are designated as relevant settlement areas. For this purpose the STDL has received two additional reference datasets from the canton of Thurgau:
Vector layer: List of all water basins from the official survey; 3'131 objects.
Vector layer: Settlement areas / construction zones to delimit the study area.
2,895 objects from the water basin layer are located wholly or partially within the \u201cPrediction AoI\u201d. Only these objects were used for analysis (see Figure 4, light green objects). For each grid square, an image file with 256x256 pixels edge length and 60cm GSD was generated by WMS. Metadata and georeferencing were stored in an associated JSON. A quick qualitative review of the Thurgau datasets in QGIS revealed two limitations of the datasets.
About 7,5% of the water basins are not located in the selected settlement area (e.g., on remote farmsteads or mixed industrial / commercial zones), so no detection attempt was initially undertaken for areas encompassing these objects. It is important to note that there are some objects in the water basin layer that are not comparable to private swimming pools in shape or size, such as public large scale swimming pools, but also sewage treatment plants, silos, tanks, reservoirs, or retention dams. By limiting the Prediction AoI to residential areas and adjacent land, the largest portion of these objects could be excluded.
Example of a water treatment plant that appears in the \u201cwater basin layer\u201d and had to be excluded by limiting the \u201cPrediction AoI\u201d to residential and adjacent areas.
To additionally calculate metrics on the quality of this reference dataset vs. the quality of the detections a small area over the city of Frauenfeld (Thurgau) containing approximately 100 swimming pools was manually curated and verified by the STDL data scientists.
"},{"location":"PROJ-TGPOOL/#orthocorrected-imagery","title":"Orthocorrected Imagery","text":"Orthoimagery tiles of 150m/256px edge length containing labelled annotations
Both AoIs are split by a regular checkerboard segmentation grid into squares (\u201ctiles\u201d), making use of the \u201cSlippy Map Tiles\u201d quadtree-style system. The image data used here was tested with different zoom level resampling resolutions (Ground Sampling Distance, GSD) between 30 cm and 480 cm edge length per pixel while maintaining a consistent extent of 256x256 pixels. Query of the imagery was undertaken using public web map services such using common protocols such as WMS or the MIL standard.
Three separate imagery sources were used over the course of the study. The 10cm GSD RGB orthophotomosaic layer SWISSIMAGE of Swisstopo was the primary target of investigation as it was used as the basis of prediction generation over the cantonal area of Thurgau. SWISSIMAGE was also used as the imagery basis for most of the training test runs over the ground truth areas of Geneva and Neuchatel. Additionally, a model was trained leveraging combined cantonal orthophoto imagery from Geneva (SITG) and Neuchatel (SITN) to comparatively test the prediction performance of such a model on the unrelated SWISSIMAGE inference dataset in Thurgau.
As it was known from the STDL\u2019s previous work, that the usage of tiles exhibiting a GSD of ~60cm/Px (tile zoom level 18) offered a decent tradeoff between reaching high accuracies during training while keeping computational effort manageable this approach was used for the test using the own cantonal imagery of Geneva and Neuchatel.
Using SWISSIMAGE for training, zoom levels in a range between 15 (~480 cm/Px) and 19 (~30 cm/Px) were tested.
"},{"location":"PROJ-TGPOOL/#training","title":"Training","text":""},{"location":"PROJ-TGPOOL/#transfer-learning","title":"Transfer Learning","text":"The choice of a relevant predictive approach fell on a \u201cCOCO-pretrained\u201d deep learning model of the type \"ResNet 50 FPN\" structured in a \u201cMask-RCNN\u201d architecture and implemented with Python and the Detectron2 API. In a transfer learning process about 44 million trainable statistical parameters are adapted (\u201cfinetuned\u201d) as edge weights in a pretrained neural network graph through a number of iterations trying to minimize the value of the so-called \u201closs function\u201d (which is a primary measure for inaccuracy in classification).
Transfer Learning is common practice with Deep Learning models. The acquired knowledge gained from massive datasets allows an adaptation of the model to smaller new datasets.
Training is performed through highly multithreaded GPU parallelisation of the necessary tensor / matrix operations to speed up training duration. For this purpose the vector annotations are converted into pixel-per-pixel binary masks which are aligned with the respective input image.
Network- or Training-specific pre-set variables (\u201chyperparameters\u201d) such as learning rate, learning rate decay, optimizer momentum, batch size or weight decay were either used in their standard configuration or iteratively manually tuned until comparatively high accuracies (e.g. by means of the F1-Score) could be reached. More systematic approaches such as hyperparameter grid search or advanced (e.g. Bayesian) optimization strategies could be implemented in follow-up studies.
"},{"location":"PROJ-TGPOOL/#dataset-split","title":"Dataset Split","text":"Tiles falling into the \u201cTraining AoI\u201d but not exhibiting any intersecting area with the Ground Truth Labels are discarded. The remaining ground truth tile datasets are randomly sampled into three disjunct subsets:
The \u201cTraining Subset\u201d consists of 70% of the ground truth tiles and is used to change the network graph edge weights.
The \u201cValidation Subset\u201d consists of another 15% of the ground truth tiles and is used to validate the generalization performance of the network during training. The iteration cycling is stopped when the loss on the validation dataset is minimized.
The \u201cTest Subset\u201d consists of the last 15% of the ground truth tiles and is entirely reserved from the training process to allow for independent and unbiased assessment in the post processing.
Subdivision of Ground Truth Datasets
"},{"location":"PROJ-TGPOOL/#inference-and-assessment","title":"Inference and Assessment","text":"After training, tile by tile the entire \u201cPrediction AoI\u201d as well as the ground truth datasets presented to the final model for prediction generation. From a minimum confidence threshold up to 100% the model produces a segmentation mask for each swimming pool detection delimiting its proposed outer boundary. This boundary can be vectorized and transformed back from image space into map coordinates during post-processing. Through this process we can accumulate a consistent GIS-compatible vector layer for visualization, counting and further analysis.
In case of the ground truth data the resulting vector layer can be intersected with the original input data (especially the \u201cTest Subset\u201d) to obtain unbiased model performance metrics. In case of a well-performing model the resulting vector layer can then be intersected with the \u201cPrediction AoI\u201d-derived Thurgau dataset to identify missing or surplus swimming pools in the cadastre.
"},{"location":"PROJ-TGPOOL/#results","title":"Results","text":""},{"location":"PROJ-TGPOOL/#metrics-and-model-selection","title":"Metrics and Model Selection","text":"Results of different training runs using SWISSIMAGE depending on the chosen zoom level
The choice of a correct confidence threshold (\"THR\") is of central importance for the interpretation of the results. The division of a data set into true/false positives/negatives is a function of the confidence threshold. A high threshold means that the model is very confident of a detection; a low threshold means that as few detections as possible should be missed, but at the same time more false positive (\"FP\") detections should be triggered.
Results of different training runs using SWISSIMAGE depending on the chosen zoom level
There are several standardized metrics to evaluate model performance on unknown data. The most important are \"Precision\" (user accuracy), \"Recall\" (hit rate or producer accuracy) and \"F1 Score\" (the mathematical harmonic mean of the other two). \"Precision\" should increase with higher THR, \"Recall\" should decrease. The maximum F1 Score can be used as a measure of how well the model performs regardless of the viewing direction.
Results of different training runs using SWISSIMAGE depending on the chosen zoom level
Using the cantonal orthomosaics as training input with zoom level 18 the F1 Score reached a maximum of 81,0%. Using SWISSIMAGE as training input with zoom level 18 a slightly higher maximum F1 Score of 83,4% was achieved resulting in the choice of a \u201conly SWISSIMAGE\u201d approach for both, training and inference.
The best detection by means of maximum F1 Score was reached using tiles with zoom level 19 displaying a GSD of approx. 30 cm/Px. Since the Slippy Map tile system is based on equal division of squares increasing the zoom level by one step results roughly in quadrupling the number of tiles presented for analysis. Hence also computational demand increases with an exponential factor in particular for file system read/write and sequential processing operations if the zoom level is increased.
On the other hand increasing the zoom level (and therefor the GSD) also boosts visibility and size of the target objects which in turn increases detection accuracy. Comparatively slight increases in F1 Score between zoom levels 17, 18 and 19 suggest an asymptotic behaviour where the usage of massively higher amounts computing resources will not result in a much higher detection accuracy any longer. Zoom level 20 (GSD~15cm/Px) was not computed for this reason.
"},{"location":"PROJ-TGPOOL/#true-positives","title":"True Positives","text":"A detection is considered \"True Positive\" (TP) if the algorithm detected a pool that was listed at the same position in the cadastral layer. Setting the threshold very low (THR \u2265 5%), 2'227 of 2\u2019959 swimming pools were detected. This corresponds to a detection proportion of 75% of the recorded water pools. Conversely, this could mean that 25% or 732 objects are False Negatives and therefore \"erroneously\" recorded in the cadastre as swimming pools or missed by the algorithm.
\u201cTrue Positive\u201d detections \u2013 note that cases of empty and covered swimming pools are detected with a very high confidence threshold in this example.
"},{"location":"PROJ-TGPOOL/#false-negatives","title":"False Negatives","text":"FN describe those objects that the algorithm completely failed to detect, no matter what threshold is set. A total of 732 objects were not detected. FN easily occur when there are obvious discrepancies between orthophoto and cadastre - for example, a pool may have been constructed after the time of flight.
The combined number from FN and TP corresponds to the number of analyzed labels from the water pool layer (2\u2019959 objects). Due to the splitting of pools at the segmentation grid boundaries, this value is slightly higher than the 2\u2019895 objects that were in the \u201cPrediction AoI\u201d. Here, only objects larger than 5m\u00b2 in area were counted, since the segmentation grid cuts some pools into several parts and tiny residual of only a few pixels in total area polygons might otherwise be counted as FN even though the largest part of a swimming pool was detected (and therefore counted as TP).
\u201cFalse Negatives\u201d \u2013 (Left) An obvious mismatch between the cadastre and the orthophoto, an update should be considered. (Right) An ambiguous swimming pool which might be covered by a white canvas and was therefore missed by the detector.
"},{"location":"PROJ-TGPOOL/#false-positives","title":"False Positives","text":"Swimming pools that were recognized as such in the orthophoto but are not found in the cadastre represent the FP group. If the threshold is set very low (e.g. THR \u2265 5%), a total of 9'427 additional pools would be found in the settlement area. However, this number is not realistic, since most of the detections at such a low threshold do not correspond to pools, but only mark image areas that are related to a pool in a very distant way.
Therefore, to get a better estimation of objects that really represent private pools but are still missing in the cadastre, the choice of a very high threshold is recommended. For example, the geoinformation services of the Canton of Geneva work with a threshold of THR \u2265 97%. Applying this threshold, 271 unrecorded swimming pools remain in the dataset with an extremely high probability of correct redetection (9% of the cadastre).
However, it is still worth looking at slightly less likely FP detections with a threshold of THR \u2265 90% here. Filtering with this value, a total of 672 unregistered swimming pools were found, which would correspond to 23% of the cadastre layer. At the same time the risk for clear errors by the object detector also increases at lower thresholds, leading to some misclassifications.
\u201cFalse Positive\u201d detections \u2013 (Top) Two clear examples of detected swimming pools that are missing in the cadastre. (Bottom Left) More ambiguous examples of detected swimming pools which might be missing in the cadastre. (Bottom Right) A clear error of the detector misclassifying a photovoltaic installation as a swimming pool.
"},{"location":"PROJ-TGPOOL/#conclusion","title":"Conclusion","text":""},{"location":"PROJ-TGPOOL/#manual-evaluation","title":"Manual Evaluation","text":"In the city of Frauenfeld a sample district was chosen for manual evaluation by a STDL data scientist. Even though this task should ideally be performed by a local expert this analysis does provide some insight on the potential errors currently existing within the cadastre as well as the object detection quality. Within the sampled area a total of 99 identifiable swimming pool objects were found to be present.
Table: Manually evaluated dataset accuracy vs. detector performance comparison. Green indicates the preferred value.
Overall, the STDL Detector was more accurate than the provided dataset with a F1 Score of ~90% vs. ~87%. Especially a lot fewer swimming pools (5 FN) were missing in the detections than in the cadastre (18 FN). Room for improvement exists with the False Positives, where our detector identified 16 surplus objects as potential swimming pools which could be falsified manually. At the same time only 9 surplus objects were found in the cadastre.
"},{"location":"PROJ-TGPOOL/#interpretation","title":"Interpretation","text":"We can conclude that the use of annotation data gathered in another canton of Switzerland allows for highly accurate predictions in Thurgau using the freely and publicly available SWISSIMAGE dataset. We demonstrate that such a transferrable approach can therefore be applied within a relatively short time span to other cantons without the effort of manually digitizing objects in a new area. This is supported by the assumption that SWISSIMAGE is of the same consistent radiometrical and spatial quality we see in Thurgau over the whole country.
Manual evaluation will stay paramount before authorities take for example legal action or perform updates and changes to the cadastre. Nevertheless a great amount of workload reduction can be achieved by redirecting the eyes of the experts to the detected or undetected areas that are worth looking at.
"},{"location":"PROJ-TGPOOL/#references","title":"References","text":"Federal Office of Topography swisstopo (2020). SWISSIMAGE 10 cm - The Digital Color Orthophotomosaic of Switzerland. https://www.swisstopo.admin.ch/en/geodata/images/ortho/swissimage10.html
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448). https://openaccess.thecvf.com/content_iccv_2015/html/Girshick_Fast_R-CNN_ICCV_2015_paper.html
He, K., Gkioxari, G., Doll\u00e1r, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969). https://arxiv.org/abs/1703.06870
OpenStreetMap Foundation (2021). Slippy Map. https://wiki.openstreetmap.org/wiki/Slippy_Map
QGIS.org (2021). QGIS Geographic Information System. QGIS Association. https://qgis.org/en/site/
Wu, Y., Kirillov, A., Massa, F., Lo, W. Y., & Girshick, R. (2019). Detectron2. https://github.com/facebookresearch/detectron2
Nils Hamel (UNIGE) - Huriel Reichel (FHNW) Supervision : Roxane Pott (swisstopo)
Project in collaboration with Geneva and Neuch\u00e2tel States - TASK-TPNL July 2021 to February 2022 - Published on February 22, 2022
Abstract: Deployment of renewable energy becomes a major stake in front of our societies challenges. This imposes authorities and domain expert to promote and to demonstrate the deployment of such energetic solutions. In case of thermal panels, politics ask domain expert to certify, along the year, of the amount of deployed surface. In front of such challenge, this project aims to determine to which extent data science can ease the survey of thermal panel installations deployment and how the work of domain expert can be eased.
"},{"location":"PROJ-TPNL/#introduction","title":"Introduction","text":"For authorities, being able to track the deployment of renewable energy is becoming a major challenge in front of stakes of our societies. In addition, following the deployment of installations on territory is difficult, as construction permits are not sufficient evidences. Indeed, the construction permits materialize a will, but the actual deployment and its specifications can differ from paperwork to reality. In case of thermal panels, domain experts are then put in front of a major challenge, as they have to certify of the surface of solar thermal energy that is deployed and active on their territory on a regular basis. This reporting is made for politics that aim to deploy a certain amount of renewable energy, part of territories energetic politic.
Mainly based on paperwork, the current survey of thermal panels deployment are affected by drawbacks. Indeed, it is currently complicated to determine whether a construction permit lead to the deployment of a thermal panel installation and if this installation is still in activity. The goal of this project is to determine if data science is able to provide new solutions for the survey of thermal energy production in order to report more accurate surface values to the politics.
"},{"location":"PROJ-TPNL/#research-project-specification","title":"Research Project Specification","text":"In this project, the goal is to determine whether it is possible to track down thermal panels installation on territory by using aerial images and deep learning methods. The main axis are :
Train a deep learning model on aerial images to detect thermal panels
Assess the performances of the deep learning model
Determine to which extent it is possible to link the predictions to existing domain expert database
This research project was made in collaboration with the States of Neuch\u00e2tel and Geneva. Both domain experts are facing similar challenges and their needs are nearly identical, despite their current processes differs. For each collaboration, the goals are similar, but the methodology is different. With Neuch\u00e2tel, the domain expert database is considered while with Geneva, the database is not considered.
Considering the database in the collaboration with Neuch\u00e2tel lead to a much larger amount of work, as the database need to be pre-processed before to put it into perspective of the deep learning network results. It is nevertheless important to be able to assess the possibility to insert our demonstrator in the existing procedures, that are used by domain expert to track thermal panels installations.
"},{"location":"PROJ-TPNL/#research-data-selected-areas","title":"Research Data & Selected Areas","text":"As mentioned, the best (and probably the only) solution to track down thermal panels is to use aerial images. Indeed, due to their nature, thermal panels are always visible on aerial images. Exceptions to this rule are unusual. In addition, aerial images are acquired regularly and a full set of orthomosaic can be easily obtained each five years (at least in Switzerland). For Geneva and Neuch\u00e2tel, it is not impossible to obtain a set of images each two years.
Nevertheless, using aerial images come with drawbacks. The main one is of course resolution (GSD). Aerial image sets used to compose orthomosaics are acquired to cover the whole territory. It follows that the resolution is limited. For a large amount of applications, the available resolution is sufficient. But for thermal panels, the resolution starts to become challenging.
Illustration of the resolution at which thermal panels can be viewed on aerial images - Data : swisstopo, SITG (GSD ~ 10 cm/pixel)Despite the resolution, aerial images are selected to train a deep learning network. Mainly SWISSIMAGE from swisstopo are considered for this research project. At the time, the 2020 version of the orthomosaic is considered for both Neuch\u00e2tel and Geneva.
For both cases, a test area is defined. On the side of Neuch\u00e2tel, a large test area is chosen in order to cover a large portion of the territory that mixes constructed zones and more rural ones. On the side of Geneva, the test area is defined by the domain expert and consists of a rectangular zone.
Illustration of the test areas defined on Neuch\u00e2tel (left) and Geneva (right) - Data : swisstopoThe research project is then only focusing on portion of territory to keep the scale realistic for such demonstrator according to the available time.
"},{"location":"PROJ-TPNL/#deep-learning-model-initial-training","title":"Deep Learning Model Initial Training","text":"In this project, it is not possible to extract a ground truth, that is annotations on aerial images, from the domain expert databases. Thankfully, the FHNW, partner of the STDL, conducted some year ago annotations for thermal panels on the States of Aargau. The set consists of thousands of annotated tiles of 80x80m in size made on the SWISSIMAGE images set (2020). The annotation work was made by students of the FHNW and supervised by the D. Jordan scientists team.
Such data-set is exactly the required bootstrap data to train an initial deep learning model. The only constraint is coming from the fact that the ground truth is defined by the 80x80m wide tiles on which annotations are made.
Illustration of the FHNW ground truth - Labels in white, tiles in red - Data : swisstopo, FHNWSeveral training sessions are conducted in order to determine which sub-tiling system lead to the best performances scores. Due to the predefined ground truth, only sub-tiles of the 80x80m original tiles are possible. As a result, 80x80m, 40x40m and 26x26m tiles are considered for the network training.
In all training sessions, the results are quite stable around a F1-score of 0.8-0.85, with always a non-negligible proportion of false positives. The best results are obtained for the smaller tiles : 26x26m. It is unfortunate as small tiles comes with drawbacks. Indeed, using small tiles impose important tiling strategy to cover a large area. In addition, using small tiles also induce larger amount of cuts that have to be merged afterwards to create a usable geographical layer. Despite these drawbacks, as a demonstrator is desired, the performances are favored.
The following plot shows the precision, recall and F1-score obtained for the initial training using the data of the FHNW. These values are computed over the test set, that consists of 15% of the total data-set.
Scores obtained with the FHNW ground truth - Precision, Recall and F1-scoreOn the previous plot, the scores are all computed entity-wise and not pixel-wised. This choice is made to fit the main necessity of domain experts, which is to inventory thermal panel installations more than estimating their surfaces, which is a secondary goal. One can see that encouraging results are obtained, but one can also see that the F1-score plateau is not significantly marked, a sign that the model is not yet optimal, despite the large amount of data.
As we are working with domain experts, presenting F1-score according to threshold can be challenging and difficult to understand. During other research projects, it has been clear that efforts have to be put on our side to present the performances of our tools in a way that is informative and understandable by the domain exerts, in order to ensure a working collaboration and dialog, without which, such research projects can be difficult to conduct.
It is the reason why an alternate representation of the performances are introduced. It shows the performances of the neural network in a more compact and readable way, focusing on elements that are interesting for the domain experts and their real-world necessities. The proposed plot is as follows :
Simplified representation used with domain experts of the obtained scores - The green area is the true positives, the yellow one is false negatives and the red on is the false positive. The upper percentage give the inventory capacity, the lower one adding the false positive to the percentage.The bar is containing three proportions : the true positives, the false negatives and the false positives. The two first proportions are grouped into one in order to represent the capacity of the network to create a reliable inventory. It shows the amount of thermal panels detected over their total amount (recall). The overall bar adds the proportion of false positive, that are seen by domain experts as pollution of the obtained inventory. Showing this proportions indicates to the domain experts, in a simple way, how usable the inventory is.
"},{"location":"PROJ-TPNL/#assessment-and-improvement-of-the-model","title":"Assessment and Improvement of the Model","text":"This section is split into two separated parts, one for the Geneva case and the other for the Neuch\u00e2tel one, as the chosen strategy is different. The case of Geneva, with a more direct approach (not considering the domain expert pre-existing database), is presented first.
"},{"location":"PROJ-TPNL/#case-of-geneva","title":"Case of Geneva","text":"In the case of Geneva, the choice is made to not consider existing databases and to proceed on detecting thermal panel installations directly on images to create an inventory that can then be assessed by the domain expert to extract reliable performance scores.
"},{"location":"PROJ-TPNL/#assessment-by-the-domain-expert","title":"Assessment by the Domain Expert","text":"In order to produce the predictions over the test area, in this case defined by the domain expert, the area is split into tiles with the chosen size. The tiles are then sent to the deep learning network in order to produce the predictions of thermal panel installations. The following image shows the tiling system over the test area :
Illustration of the tiling system applied on the Geneva test area (26x26m tiles)A set of tiles is obtained with predictions on them. The optimal threshold, deduced from the initial training on the FHNW data-set, is used to filter the predictions over the test area of Geneva. The tiles containing no prediction are removed by an automated process. The other tiles are associated with the predictions geographical footprints that and stored in a shapefile to keep the format simple and easy to exchange with the domain expert.
By defining a common language with the domain expert on how to validate the predictions, the shapefile containing the predictions are sent to the domain expert along with the aerial images on which predictions are made. The role of the domain expert is to assess the predictions to indicate, on the tiles containing at least a prediction, the true positives, the false positives and the false negatives.
Illustration of the common language defined to assess the predictions - The domain expert simply puts a mark in the determined false positive and at the location of the false negative. The true positives are left untouchedBy assessing the prediction with a domain expert, ensure that the obtained scores are reliable, as thermal panels are difficult to identify on the aerial image for a non expert. Without assessing the predictions through a domain expert would lead to unreliable scores.
"},{"location":"PROJ-TPNL/#results","title":"Results","text":"The assessment of the predictions made by the domain expert lead to the following results on the test area. A total of 89 tiles are formally certified by the domain expert with the following counts :
Category Count TP 47 FP 63 FN 35On a total of 110 predictions on the certified tiles, 47 are true positives, 63 being false positives. A total of 35 missing predictions are pointed by the domain expert. It follows that 47 thermal panel installations are found over 47+35=82. This leads to the performances score for the initial deep learning over the Geneva test area model of :
Score Value Precision 0.43 Recall 0.57 F1 0.49From the inventory point of view, nearly 60% of the thermal panel installations are found by the initial deep learning model on the test area. This is clearly below the initial model, showing that the data-set are not sufficient to obtain stable results at this stage. The following plot shows the results presented in the simplified form :
Score obtained on Geneva with the initial deep learning model - Simple representationTaking into account the large amount of false positives, the initial training is clearly not at the desired level to be usable by the domain expert to produce a reliable geographical layer of thermal panel installations. But these number are important, as they are certified by a domain expert, ensuring the ground truth used to assess the prediction is reliable.
"},{"location":"PROJ-TPNL/#improvement-of-the-deep-learning-network","title":"Improvement of the Deep Learning Network","text":"With the assessment made by the domain expert, reliable scores are obtained. In addition, as predictions are marked as correct or incorrect, with addition of missing thermal panel installations on the certified tiles, it was possible to create an extension to the ground truth. Indeed, driven by the corrections of the domain expert, new annotations are made on the certified tiles, including true positives and false negatives.
These annotations are made by STDL on the images used to produce the predictions. The predictions in themselves are not sufficiently reliable to be directly translated into labels, and the false negative have to be added anyway.
Annotations created on the Geneva area driven by the assessment of the domain expert - The labels are in white and the certified tiles in redIn the case of Geneva, the ground truth extension is made on the same images used to produce the prediction. As the number of certified tiles is low, a strategy is tested in order to improve the ground truth extension. The idea consist in looking along the time dimension. Indeed, in Switzerland, aerial images are acquired in a regular basis. It follows that a history of aerial images is available.
The time range from 2000 to 2020 is then considered in terms of the available images. For each set of images, the annotation created on the 2020 image set are transferred to the older images. This process is not straightforward, as each prediction have to be checked to certify that the thermal panel installation is there on older images. In addition, each tile has to be checked individually in order to check that no older thermal panel installation was there and destroyed before 2020.
Illustration of the propagation of the ground truth along the time dimension - The image on the right illustrates the limit of the processBy doing this exploration along the time dimension, it was possible to increase the ground truth extracted from the assessment procedure made by the domain expert. From only 41 tiles and 64 annotations extracted using the initial test zone on the year 2020, 394 tiles and 623 annotations are obtained by considering the 2000 to 2020 time range for aerial images.
Considering the time dimension allows to better leverage the assessment made by the domain expert, despite the procedure is time-consuming. One has to keep in mind that such process is not ideal, as the same examples are simply repeated. It has some interest as it allows showing the same examples under different condition of luminosity and orientation, which can improve the deep learning model detection ability.
With this new ground truth, it was possible to re-train the initial network. This was done using both the FHNW initial ground truth and the annotations made on Geneva. The following results are obtained, shown using the simple representation :
Scores obtained on Geneva with consideration of the new annotations certified by the domain expert - Simple representationThis plot shows the results on the test set limited to the Geneva test area. Again, the test set contains 15% of the ground truth, and limiting it to the area of Geneva leads to only several tens of tiles. This amount of tiles is quite low to conclude on the encouraging results obtained with the extended ground truth. This is reinforced by the lack of stability already observed in the previous results.
"},{"location":"PROJ-TPNL/#conclusion","title":"Conclusion","text":"It is clear that the initial deep learning model, trained with the FHNW ground truth is not satisfying for a real-world usage by domain experts. Its ability to produce an inventory is not optimal, and the amount of false positives make the produced geographical layer difficult to use.
Nevertheless, reliable score are obtained and can be trusted on the Geneva area thanks to the domain expert assessment. In addition, the assessment made by the domain expert, as it also included the false negatives (at least on the considered tiles), allowed to extend the ground truth. The extension of the ground truth along the time dimension allows taking advantage of the work of the domain expert as much as possible, leading to more certified tiles.
The new training allowed to improve the situation on the Geneva area quite clearly. The inventory capacity of the deep learning model went from around 60% to around 80%. The amount of false positives is also drastically reduced. These are encouraging results, but the small amount of new tiles and the multiplication of the same examples along the time dimension has to lead us to a certain care, especially due to the instabilities of the results.
"},{"location":"PROJ-TPNL/#case-of-neuchatel","title":"Case of Neuch\u00e2tel","text":"The case of Neuch\u00e2tel is clearly more complex than the case of Geneva. In this case, the database of the domain expert is considered in order to try to link the predictions with the entries of the existing database. This choice is made to demonstrate the ability to integrate data science technology in existing pipeline, in order to avoid creating disruptive effect.
"},{"location":"PROJ-TPNL/#processing-and-linkage-with-the-domain-expert-database","title":"Processing and Linkage with the Domain Expert Database","text":"In the first stage, the domain expert database is analyzed in order to determine the best solution to link the prediction made by the deep learning model and the entries of the database.
The database in itself is a simple Excel sheet, with each line corresponding to a subsidy query that goes along the construction permit. Subsidies are provided by the authorities to promote deployment of the renewable energy. This is also a reason explaining the necessity for authorities to track down the construction of thermal panel installations.
The major issue with the database is the localization of the thermal panels installation. Along the years, the database being quite old, different ways of localizing the installation were used. Three different localization systems are then available : the postal addresses, the geographical coordinate and the EGID (federal building identifier). Unfortunately, these standards are mixed, and all entries are localized differently. Sometimes only one localization is available, sometimes two or three. In some cases, the different localization information are not consistent, which lead to contradictions in the installation position.
For some entries, the localization information is also incorrect or only approximate, which can lead to difficulties to associate a geo-referenced prediction to an entry of the database.
For these reason, lots of efforts are put on the pre-processing of the database to make the link between prediction and entries as reliable as possible. The RegBL (federal register of buildings and dwellings) is used to assess the EGID and the postal addresses and to track down contradiction. In addition, the post addresses of the State of Neuch\u00e2tel is also considered to match addresses with geographical positions for the same reason.
By doing this, many potential positions are extracted for each entry of the database. This allows to assess the contradiction in order to retain the most probable and reliable localization for each entry of the database. Of course, in many cases, the assessment is quite weak as the amount of information on localization is low (this is especially the case for older installation, the new one being localized in a much more reliable manner using the EGID).
At the end of this complex and time-consuming task, almost all entries of the database are associated with a geographical position. This allows to match predictions, that are geographically localized, to the most probable entry of the database. This process is important as it allows the domain expert to not only have a geographical layer of the thermal panel installation but to have also the link with its pre-existing database. This allows to put into perspective prediction and database to track down construction and destruction of installations along the time dimension.
"},{"location":"PROJ-TPNL/#assessment-by-the-domain-expert_1","title":"Assessment by the Domain Expert","text":"After pre-processing of the domain expert database, a region of Neuch\u00e2tel state is defined. A tiling strategy is made to translate the defined area in tiles of the appropriated size according to the initial training of the deep learning model. Predictions are then made on each of the tiles. Again, the optimal threshold is selected according to the initial training to filter the predictions made on the test area.
At this stage, the procedure differ from the case of Geneva. Here, tiles are not filtered based on their content of prediction or not. The database is considered, after its pre-processing, and the predictions are linked to the optimal entry according to its best localization. As a result, a set of predictions linked to a specific entry of the database is obtained. The other predictions are simply discarded for this specific assessment procedure.
In order to serve as much as possible the interests of the domain expert, a specific assessment procedure is set. This is set to allow to assess the prediction on one side and to help the domain expert to correct the bad localization of the thermal panel installation in his database on the other side. The chosen approach is based on a dictionary of GeoTIFF images on which the prediction are shown and on which additional information are specified to help the domain expert to assess the localization provided by the database.
Illustration of one page of the dictionary corresponding to one database entry - For each entry, such image is provided, showing information on the entry, its localization consistency and the prediction made by the model - Each image is a geo-referenced TIFFThe dictionary is made of a GeoTIFF per prediction that is linked with a unique entry of the database. In addition to the prediction geometry drawn on the image, basic information on the linked database entry is provided. The optimal localization (between post addresses, coordinates or EGID) used to link the prediction and the entry of the database is also indicated to help the domain expert to understand the link. Information about the estimated quality of the localization of the thermal panel installation is also provided.
This quality indicator is based on the consistency of the multiple location information (post address, coordinates and EGID). The more consistent they are, the better the localization is considered. In case of a potential bad localization, the domain expert is invited to check the entry of the database to correct the position.
In parallel, a simple Excel file is set and filled by the domain expert along the procedure. It allows setting the corrected positions, when required, and to indicate if the prediction is correct and correctly linked to the database entry. This process allows setting a win-win strategy where incorrectly located installation are treated on the side of the database and the prediction is assessed for the correct localization.
The procedure for the domain expert consists then only to parse a sequence of images on which all the information are shown and to fill columns in the assessment Excel sheet. This allows to assess quickly and efficiently the prediction while correcting the inconsistency in the database.
"},{"location":"PROJ-TPNL/#results_1","title":"Results","text":"Thanks to the assessment procedure, part of the predictions are certified by the domain expert. This allows to compute scores on the capacity of the initial deep learning model to compute inventory of thermal panel installations. Unfortunately, this assessment procedure does not allow the computation of the formal scores, as the false negative are not considered. This is the main drawback coming from the fact that we work in parallel with the domain expert database.
On the 354 predictions linked to the database, 316 corresponds to correctly localized entries of the database. On the 316 correct entries of the database, the domain expert reported 255 visible installation. This shows that many installations, present in the database through an entry, are not visible in the reality. With these numbers, one can deduce that 61 installations are reported in the database through paper work but cannot be found in the real world. The explanation is probably complex, but this shows how difficult it is to keep a database of installation up to date with the reality.
Without a formal historical analysis, it is not possible to determine what happened to these missing installation. For some of them, one needs to consider the natural cycle of life of such installations. Indeed, thermal panel have a determined lifetime and need to be replaced or decommissioned. It is also possible that for some of them, the construction permit was asked but without leading to the actual construction of a thermal panel installation. This case is expected to be less usual.
Back to the score of the initial deep learning model, on the 255 visible installation, the domain expert determined that 204 are correctly detected by the model. This lead to an inventory capacity of 0.8 which remains in the initial model scores. It is interesting to observe that the initial model scores seem to hold in the case of Neuch\u00e2tel but not in the previous case of Geneva. Indeed, on Geneva, the inventory capacity drop to 0.6.
"},{"location":"PROJ-TPNL/#improvement-of-the-deep-learning-network_1","title":"Improvement of the Deep Learning Network","text":"With the assessment made by the domain expert, despite false negatives are not considered, it was possible to increase the ground truth with new annotation on the test area of Neuch\u00e2tel.
The procedure starts by isolating all prediction that are marked as correct (true positive) by the domain expert. A tiling system is then set to cover the entire test area with size fitting the initial choices. The certified true positive are then manually processed to create a proper annotation, as the prediction are not reliable enough. The certifications made by the domain expert are sufficiently clear for a data scientist to do this task autonomously.
The last stage consist in validating the tiles containing a new annotation. This part is the most complex one, as the data scientist has to work autonomously. The tiles containing a new annotation can only be validated, and enter the ground-truth, if and only if no other ambiguous element appear in the validated tiles. If any ambiguities arise for a tile, it needs to be dropped and not considered for the ground truth. In the case of Neuch\u00e2tel a few tiles are then removed for this reason.
With this procedure, 272 new annotation are added to the ground truth on 254 tiles. These annotations, as for Geneva, are certified by a domain expert, providing a reliable ground truth. With this new set of annotation, and considering the new annotation made in the case of Geneva, it is possible to conduct a new training of the deep learning model. For this last training, the full ground truth is considered, with the FHNW annotations and those coming from the domain experts of Geneva and Neuch\u00e2tel.
The following plot gives an overall simple representation of the obtained results :
Score obtained using all the available ground truth, including FHNW, Geneva and Neuch\u00e2tel - Simple representationOn the test set, an F1-score of 0.82 is obtained, which is slightly worse that for the initial training (0.85). On the overall, one can also sees that the inventory capacity is decreased while the amount of false positive is reduced. Again, one can here sees the instabilities of the results, showing that the used data is not sufficient or not enough well suited for such task.
One can see on this following plots, the simple representation of the score reduced only the Geneva and Neuch\u00e2tel areas :
Score obtained restricted to the Geneva (test set) - Simple representationScore obtained restricted to the Neuch\u00e2tel (test set) - Simple representation
One has to take into account that restricting the score to such area leads to very few prediction, leading to poor statistics. It is nevertheless clear that the results on the Neuch\u00e2tel restriction demonstrate the instabilities observed all along the project. On Neuch\u00e2tel, choosing a different threshold could lead to a better inventory capacity, but the fact that the threshold needs to be adapted according to the situation shows that the model was not able to generalise.
It is most likely that the nature of the objects, its similarity with other objects and the resolution of the images play a central role in the lack of generalisation. As a conclusion, detecting thermal panels needs higher resolution in order for the model to be able to extract more reliable features from the object instead of relying only on the situation of the object.
"},{"location":"PROJ-TPNL/#conclusion_1","title":"Conclusion","text":"In the case of Neuch\u00e2tel, the procedure is more complex, as the database is considered. The work on the database is time-consuming and the linkage of the predictions with the entries of the database is not straightforward, mainly due to the inconsistencies on thermal panel installation localization.
In addition, considering the database lead it to be the main point of view from which the prediction are analyzed, assessed and understood. It offers a very interesting point of view as it allows assessing the synchronization between the database and the current state of the thermal panel installations deployment. Nevertheless, such point of view also introduce drawback, as it does not allow to directly assess the false negative and only part of the false positive. This lead to intermediate scores, that are more focused on the database-reality synchronization than the performances of the deep learning model.
It is then clearly demonstrated that a deep learning model can be interfaced with an existing database to ensure processes continuity with the introduction of new technologies in the territory management. It shows that new methods can be introduced without requiring to abandon the previous processes, which is always complicated and undesired.
On the initial deep learning model assessment, with an inventory capacity of around 0.85 (recall), one can observe a difference between Neuch\u00e2tel and Geneva. Indeed, in Geneva, the recall dropped to around 0.6 while it was more around 0.8 in the Neuch\u00e2tel case. A possible explanation is the similarity between the Aargau (used as to train the initial deep learning model) and Neuch\u00e2tel in terms of geography. The case of Geneva is more urban than these two others. This confirms the instabilities already observed and seems to indicate that thermal panels remains a complex object to detect at this stage considering the available data.
"},{"location":"PROJ-TPNL/#conclusion-and-perspectives","title":"Conclusion and Perspectives","text":"As a main conclusion, this project, performed in two stage with Geneva and Neuch\u00e2tel states, is a complex task. The nature of the object of interest is the main source of difficulty.
The current available aerial images made the detection of such object possible, but the resolution of the images (GSD) makes the task very difficult. Indeed, as mentioned, the thermal panel installations visible on the image are at the limit of resolution. This forces the deep learning model to learn more with the context than with the object features themselves.
To add complexity, thermal panels appear very alike electrical panels on images, leading to a major source of confusion. The fact that the deep learning model is relying more on context than on object features lead the electrical panel to be reported as a thermal one, reducing the efficiency of inventory, leading to large amount of false positive.
Despite that, interesting results are obtained and cannot lead to the conclusion that inventory such object is currently impossible. It remains very challenging, but data science can already bring help in the tracking and surveillance of the thermal panel installations.
The collaboration with the domain experts is here a necessity. Indeed, such installations, especially with the image resolution, are extremely complex to confirm as such (mainly due to the confusion with electrical panels and other roof elements). Even for the domain expert, determining if a prediction is a true positive or not is challenging and time-consuming. Without the help of domain experts, data scientists are not able to tackle such problem.
Another positive outcome is the demonstration that data science can be interfaced smoothly with existing processes. This is shown with the Neuch\u00e2tel case, where the predictions can instantly be linked to the entries of the pre-existing domain expert database. This eases the domain expert assessment procedure and can also participate to assess the synchronization between the database and the reality.
As a final word, the obtained deep learning model is not formally able to enter the management of the territory. It is demonstrated that the nature of the object and the available data makes the model unstable from a situation to another. This shows that the current data available is not formally enough to lead to the production of a fully working prototype able to satisfy the specifications of the domain experts. Nevertheless, such model can already perform pre-processes to ease the work of domain expert in the complex task of tracking the deployment of thermal energy generators on the Swiss territory.
"},{"location":"PROJ-TREEDET/","title":"Tree Detection from Point Clouds over the Canton of Geneva","text":"Alessandro Cerioni (Canton of Geneva) - Flann Chambers (University of Geneva) - Gilles Gay des Combes (CJBG - City of Geneva and University of Geneva) - Adrian Meyer (FHNW) - Roxane Pott (swisstopo)
Proposed by the Canton of Geneva - PROJ-TREEDET May 2021 to March 2022 - Published on April 22, 2022
Abstract: Trees are essential assets, in urban context among others. Since several years, the Canton of Geneva maintains a digital inventory of isolated (or \"urban\") trees. This project aimed at designing a methodology to automatically update Geneva's tree inventory, using high-density LiDAR data and off-the-shelf software. Eventually, only the sub-task of detecting and geolocating trees was explored. Comparisons against ground truth data show that the task can be more or less tricky depending on how sparse or dense trees are. In mixed contexts, we managed to reach an accuracy of around 60%, which unfortunately is not high enough to foresee a fully unsupervised process. Still, as discussed in the concluding section there may be room for improvement.
"},{"location":"PROJ-TREEDET/#1-introduction","title":"1. Introduction","text":""},{"location":"PROJ-TREEDET/#11-context","title":"1.1 Context","text":"Human societies benefits from the presence of trees in cities and their surroundings. More specifically, as far as urban contexts are concerned, trees deliver many ecosystem services such as:
Moreover, they play an important role of support of the biodiversity by offering resources and shelter to numerous animal, plant and fungus species.
The quality and quantity of such benefits depend on various parameters, such as the height, the age, the leaf area, the species diversity within a given population of trees. Therefore, the preservation and the development of a healthy and functional tree population is one of the key elements of those public policies which aim at increasing resilience against climate change.
For these reasons, the Canton of Geneva has set the ambitious goal of increasing its canopy cover (= ratio between the area covered by foliage and the total area) from 21% (as estimated in 2019) to 30% by 2050. In order to reach this goal, the concerned authorities (i.e. the Office cantonal de l\u2019agriculture et de la nature) need detailed data and tools to keep track of the cantonal tree population and drive its development.
The Inventaire Cantonal des Arbres Isol\u00e9s (ICA) is the most extensive and detailed source of data on isolated trees (= trees that do not grow in forests) within the Canton of Geneva. Such dataset is maintained by a joint effort of several public administrations (green spaces departments of various municipalities, the Office cantonal de l\u2019agriculture et de la nature, the Geneva Conservatory and Botanical Garden, etc.). For each tree, several attributes are provided: geographical coordinates, species, height, plantation date, trunk diameter, crown diameter, etc.
To date, the ICA includes data about more than 237\u00a0000 trees. However, it comes with a host of known limitations:
In light of Geneva's ambitions in terms of the canopy growth, the latter observations call for the need of a more efficient methodology to improve the exhaustivity and veracity of the ICA. Over the last few years, several joint projects of the Canton, the City and the University of Geneva explored the potential of using LiDAR point clouds and tailored software to characterize trees in a semi-automatic way, following practices that are already established in forestry. Yet, forest and urban settings are quite different from each other: forests exhibit higher tree density, which can hinder tree detection; forests exhibit lower heterogeneity in terms of species and morphology, which can facilitate tree detection. Hence, the task of automatic detection is likely to be harder in urban contexts than in forests.
The study reported in this page, proposed by the Office cantonal de l\u2019agriculture et de la nature (OCAN) and carried out by the STDL, represents a further yet modest step ahead towards the semi-automatic digitalisation of urban trees.
"},{"location":"PROJ-TREEDET/#12-objectives","title":"1.2 Objectives","text":"The objectives of this project was fixed by the OCAN domain experts and, in one sentence, amount to designing a robust and reproducible semi-automatic methodology allowing one to \"know everything\" about each and every isolated tree of the Canton of Geneva, which means:
Regarding quality, the following requirements were fixed:
Property Expected precision Trunk geolocation 1 m Top geolocation 1 m Height 2 m Trunk diameter at 1m height 10 cm Crown diameter 1 m Canopy area 1 m\u00b2 Canopy volume 1 m\u00b3
In spite of such thorough and ambitious objectives, the time span of this project was not long enough to address them all. As a matter of fact, the STDL team only managed to tackle the tree detection and trunk geolocation.
"},{"location":"PROJ-TREEDET/#13-methodology","title":"1.3 Methodology","text":"As shown in Figure 1.1 here below, algorithms and software exist, which can detect individual trees from point clouds.
Figure 1.1: The two panels represent a sample of a point cloud before (top panel) and after (bottom) tree detection.
Not only such tools take point cloud as input data, but also the values of a bunch of parameters have to be chosen by users. The quality of results depend both on input data and on input parameters. The application of some pre-processing to the input point cloud have an impact, too. Therefore, it becomes clear that in order to find the optimal configuration for a given context, one should be able to measure the quality of results as a function of the chosen parameters as well as of the pre-processing operations. To this end, the STDL team called for the acquisition of ground truth data. Further details about input data (point cloud and ground truth), software and methodology will be provided shortly.
"},{"location":"PROJ-TREEDET/#14-input-data","title":"1.4 Input data","text":""},{"location":"PROJ-TREEDET/#141-lidar-data","title":"1.4.1 LiDAR data","text":"A high-density point cloud dataset was produced by the Flotron Ingenieure company, through Airborne Laser Scanning (ALS, also commonly known by the acronym LiDAR - Light Detection And Ranging). Thanks to a lateral overlap of flight lines of ~80%, more than 200 pts/m\u00b2 were collected, quite a high density when compared to more conventional acquisitions (30 \u2013 40 pts/m\u00b2). Flotron Ingenieure took care of the point cloud classification, too.
The following table summarizes the main features of the dataset:
LIDAR 2021 - OCAN, Flotron Ingenieure Coverage Municipalities of Ch\u00eane-Bourg and Th\u00f4nex (GE) Date of acquisition March 10, 2021 Density > 200 pts/m\u00b2 Planimetric precision 20 mm Altimetric precision 50 mm Tiles 200 tiles of 200 m x 200 m Format LAS 1.2 Classes 0 - Unclassified 2 - Ground 4 - Medium vegetation (0.5 - 3m) 5 - High vegetation (> 3m) 6 - Building 7 - Low points 10 - Error points 13 - Bridges 16 - Noise / Vegetation < 0.5m
Figs.\u00a01.2 and 1.3 represent the coverage of the dataset and a sample, respectively.
Figure 1.2: Coverage and tiling of the 2021 high-density point cloud dataset.
Figure 1.3: A sample of the 2021 high-density point cloud. Colors correspond to different classes: green = vegetation (classes 4 and 5), orange = buildings (class 6), grey = ground or unclassified points (class 2 and 0, respectively).
"},{"location":"PROJ-TREEDET/#142-test-sectors-and-ground-truth-data","title":"1.4.2 Test sectors and ground truth data","text":"In order to be able to assess the exhaustivity and quality of our results, we needed reference (or \"ground truth\") data to compare with. Following the advice of domain experts, it was decided to acquire ground truth data regarding trees within three test sectors, which represent three different types of contexts: [1] alignment of trees, [2] park, [3] a mix of [1] and [2]. Of course, these types can also be found elsewhere within the Canton of Geneva.
Ground truth data was acquired through surveys conducted by geometers, who recorded
for every tree having a trunk diameter larger than 10 cm.
Details about the three test sectors are provided in the following, where statistics on species, height, age and crown diameter stem from the ICA.
"},{"location":"PROJ-TREEDET/#avenue-de-bel-air-chene-bourg-ge","title":"Avenue de Bel-Air (Ch\u00eane-Bourg, GE)","text":"Property Value Type Alignment of trees Trees 135 individuals Species monospecific (Tilia tomentosa) Height range 6 - 15 m Age range 17 - 28 yo Crown diameters 3 - 10 m Comments Well separated trees, heights and morphologies are relatively homogenous, no underlying vegetation (bushes) around the trunks.
Figure 1.4: \"Avenue de Bel-Air\" test sector in Ch\u00eane-Bourg (GE). Orange dots represents ground truth trees as recorded by geometers.
"},{"location":"PROJ-TREEDET/#parc-floraire-chene-bourg-ge","title":"Parc Floraire (Ch\u00eane-Bourg, GE)","text":"Property Value Type Park with ornemental trees Trees 95 individuals Species 65 species Height range 1.5 - 28 m Age range Unknown Crown diameters 1 - 23 m Comments Many ornemental species of all sizes and shapes, most of them not well separated. Very heterogenous vegetation structure.
Figure 1.5: \"Parc Floraire\" test sector in Ch\u00eane-Bourg (GE). Orange dots represents ground truth trees as recorded by geometers.
"},{"location":"PROJ-TREEDET/#adrien-jeandin-thonex-ge","title":"Adrien-Jeandin (Th\u00f4nex, GE)","text":"Property Value Type Mixed (park, alignment of tree, tree hedges, etc.) Trees 362 individuals Species 43 species Height range 1 - 34 m Age range Unknown Crown diameters 1 - 21 m Comments Mix of different vegetation structures, such as homogenous tree alignments, dense tree hedges and park with a lot of underlying vegetation under big trees.Figure 1.6: \"Adrien-Jeandin\" test sector in Th\u00f4nex (GE). Orange dots represents ground truth trees as recorded by the geometers.
"},{"location":"PROJ-TREEDET/#15-off-the-shelf-software","title":"1.5 Off-the-shelf software","text":"Two off-the-shelf software products were used to detect trees from LiDAR data, namely TerraScan and the Digital Forestry Toolbox (DFT). The following table summarizes the main similarities and differences between the two:
Feature Terrascan DFT Licence Proprietary (*) Open Source (GPL-3.0) Price See here Free Standalone No: requires MicroStation or Spatix No: requires Octave or MATLAB Graphical User Interface Yes No In-app point cloud visualization Yes (via MicroStation or Spatix) No (**) Scriptable Partly (via macros) Yes Hackable No Yes
(*) Unfortunately, we must acknowledge that using network licenses turned out to be quite problematic. Weeks of unexpected downtime were experienced, due to puzzling issues related to the interplay between the self-hosted license server, firewalls, VPN and end-devices. (**) We used the excellent Potree Free and Open Source software for visualization.
The following sections are devoted to brief descriptions of these two tools; further details will be provided in Section 4 and Section 5.
"},{"location":"PROJ-TREEDET/#151-terrascan","title":"1.5.1 Terrascan","text":"Terrascan is a proprietary software, developed and commercialized by Terrasolid, a MicroStation and Spatix plugin which is capable of performing several tasks on point clouds, including visualisation, classification. As far as tree detection is concerned, Terrascan offers multiple options to
Two methods are provided to group (one may also say \"to segment\") points into individual trees:
For further details on these two methods, we refer the reader to the official documentation.
"},{"location":"PROJ-TREEDET/#152-digital-forestry-toolbox-dft","title":"1.5.2 Digital Forestry Toolbox (DFT)","text":"The Digital Forestry Toolbox (DFT) is a
collection of tools and tutorials for Matlab/Octave designed to help process and analyze remote sensing data related to forests (source: official website)
developed and maintained by Matthew Parkan, released under an Open Source license (GPL-3.0).
The DFT implements algorithms allowing one to perform
We refer the reader to the official documentation for further information.
"},{"location":"PROJ-TREEDET/#2-method","title":"2. Method","text":"As already stated, in spite of the thorough and ambitious objectives of this project (cf. here), only the
sub-tasks could be tackled given the resources (time, humans) which were allocated to the STDL.
The method we followed goes through several steps,
which are documented here-below.
"},{"location":"PROJ-TREEDET/#21-pre-processing-point-cloud-reclassification-and-cleaning","title":"2.1 Pre-processing: point cloud reclassification and cleaning","text":"[1] In some cases, points corresponding to trunks may be misclassified and lay in class 0 \u2013 Unclassified instead of class 4 \u2013 Medium vegetation. As the segmentation process only takes vegetation classes (namely classes 4 and 5) into account, the lack of trunk points can make some trees \"invisibles\".
[2] We suspected that the standard classification of vegetation in LiDAR point clouds could be too basic for the task at hand. Indeed, vegetation points found at less (more) than 3 m above the ground are classified as 4 \u2013 Medium Vegetation (5 \u2013 High Vegetation). This may cause one potential issue: all the points of a given tree that are located at up to 3 meters above the ground (think about the trunk!) belong to a class (namely class no.\u00a04) which can also be populated by bushes and hedges. The \"contamination\" by bushes and hedges may spoil the segmentation process, especially in situations where dense low vegetation exists around higher trees. Indeed, it was acknowledged that in such situations the segmentation algorithm fails to properly identify trunk locations and distinguish one tree from another.
Issues [1] and [2] can be solved or at least mitigated by reclassifying and cleaning the input point cloud, respectively. Figures\u00a02.1 and 2.2 show how tree grouping (or \"segmentation\") yields better results if pre-processed pointclouds are used.
Figure 2.1: Tree grouping (or \"segmentation\") applied to the original (top panel) vs pre-processed (bottom) point cloud. Without pre-processing, two trees connected by a hedge are segmented as one single individual. Therefore, only one detection is made (green circle slightly above the ground). With pre-processing, we get rid of the hedge and recover the lowest trunk points belonging to the tree on the left. Eventually, both trees are properly segmented and we end up having two detections (green circles).
Figure 2.2: Tree grouping (or \"segmentation\") applied to the original (left panel) vs reclassified (right) point cloud. Without pre-processing, segmentation yields a spurious detection (= false positive, red circle slightly above the ground), resulting from the combination of a pole and a hedge. With pre-processing, we get rid of most of the points belonging to the hedge and the pole; no false positive shows up.
"},{"location":"PROJ-TREEDET/#211-reclassification-with-terrascan-and-fme-desktop","title":"2.1.1 Reclassification with Terrascan and FME Desktop","text":"The reclassification step aims at recovering trunk points which might be misclassified and hence found in some class other than class 4 \u2013 Medium Vegetation (e.g. class 0 - Unclassified). It was carried out with Terrascan using the Classify by normal vectors tool, which
Finally, during the cleaning process with FME Desktop (cf.\u00a0Chapter 2.1.2 here below), these points are reclassified in class 4.
The outcome of this reclassification step is shown in Figure\u00a02.3.
Figure 2.3: Outcome of reclassification. In the upper picture, the trunk of the tree on the left is partially misclassified, while the trunk of the tree in the middle is completely misclassified. After reclassification, almost all the points belonging to trunks are back in class 4.
Let us note that the reclassification process may also recover some unwanted objects enjoying linear features similar to trees (poles, power lines, etc.). However, such spurious objects can at least partly filtered out by cleaning step described here below.
"},{"location":"PROJ-TREEDET/#212-cleaning-point-clouds-with-fme-desktop","title":"2.1.2 Cleaning point clouds with FME Desktop","text":"The cleaning step aims to filter as many \"non-trunk\" points as possible out of class 4 \u2013 Medium Vegetation, in order to isolate trees from other types of vegetation. Vegetation is considered as part of a tree if higher than 3 m.
Cleaning consists in two steps:
Note that in case the point cloud is reclassified in order to recover missing trunks, the cleaning step also allow to get rid of unwanted linear objects (poles, electric lines, etc) that have been recovered during the reclassification. The class containing reclassified points (class 10) will simply be process together with class 4 and receive the same treatment. Eventually, reclassified points that are kept (discarded) by the cleaning process will be integrated in class 4 (3).
Figure 2.4: Outcome of the cleaning process. Red points correspond to the \"cleaned\" points that were moved to class 3.
Figure 2.5: Outcome of the cleaning process. Red points correspond to the \"cleaned\" points that were moved to class 3. Hedges under trees escape the cleaning.
"},{"location":"PROJ-TREEDET/#213-fme-files-and-documentation-of-pre-processing-steps","title":"2.1.3 FME files and documentation of pre-processing steps","text":"More detailed information about the reclassification and cleaning of the point cloud can be found here.
FME files can be downloaded by following these links:
Further information on the generation of a Canopy Cover Layer can be found here.
"},{"location":"PROJ-TREEDET/#22-running-terrascan","title":"2.2 Running Terrascan","text":"Terrascan offers multiple ways to detect trees from point clouds. In this project, we focused on the fully automatic segmentation, which is available through the \"Assign Groups\" command.
As already said (cf.\u00a0here), two methods are available: highest point (aka \"watershed\") method and trunk method. In what follows, we introduce the reader to the various parameters that are involved in such methods.
"},{"location":"PROJ-TREEDET/#221-watershed-method-parameters","title":"2.2.1 Watershed method parameters","text":""},{"location":"PROJ-TREEDET/#group-planar-surfaces","title":"Group planar surfaces","text":"Quoting the official documentation,
If on, points that fit to planes are grouped. Points fitting to the same plane get the same group number.
"},{"location":"PROJ-TREEDET/#min-height","title":"Min height","text":"This parameter defines a minimum threshold on the distance from the ground that the highest of a group of points must have, in order for the group to be considered as a tree. The default value is 4 meters. The Inventaire Cantonal des Arbres Isol\u00e9s includes trees which are at least 3 m high. This parameter ranged from 2 to 6 m in our tests.
Figure 2.6: Cross-section view of two detected trees. The left tree would not be detected if the parameter \"Min height\" were larger than 3.5 m.
"},{"location":"PROJ-TREEDET/#require","title":"Require","text":"This parameter defines the minimum number of points which are required to form a group (i.e. a tree). The default value is 20 points, which is very low in light of the high density of the dataset we used. Probably, the default value is meant to be used with point clouds having a one order of magnitude smaller density.
In our analysis, we tested the following values: 20 (default), 50, 200, 1000, 2000, 4000, 6000.
"},{"location":"PROJ-TREEDET/#222-trunk-method-parameters","title":"2.2.2 Trunk method parameters","text":""},{"location":"PROJ-TREEDET/#group-planar-surfaces_1","title":"Group planar surfaces","text":"See here.
"},{"location":"PROJ-TREEDET/#min-height_1","title":"Min Height","text":"Same role as in the watershed method, see here.
"},{"location":"PROJ-TREEDET/#max-diameter","title":"Max diameter","text":"This parameter defines the maximum diameter (in meters) which a group of points identified as trunk can reach. Default value is 0.6 meters. Knowing that
we used the following values: 0.20, 0.30, 0.40, 0.60 (default), 0.80, 1.00, 1.50 meters.
"},{"location":"PROJ-TREEDET/#min-trunk","title":"Min trunk","text":"This parameter defines a minimum threshold on the length of tree trunks. Default value is 2 m. We tested the following values: 0.50, 1.00, 1.50, 2.00 (default), 2.50, 3.00, 4.00, 5.00 meters.
"},{"location":"PROJ-TREEDET/#group-by-density","title":"Group by density","text":"Quoting the official documentation,
If on, points are grouped based on their distance to each other. Close-by points get the same group number.
"},{"location":"PROJ-TREEDET/#gap","title":"Gap","text":"Quoting the official documentation,
Distance between consecutive groups:
Automatic: the software decides what points belong to one group or to another. This is recommended for objects with variable gaps, such as moving objects on a road.
User fixed: the user can define a fixed distance value in the text field. This is suited for fixed objects with large distances in between, such as powerline towers.
We did not attempt the optimization of this parameter but kept the default value (Auto).
"},{"location":"PROJ-TREEDET/#223-visualizing-results","title":"2.2.3 Visualizing results","text":"Terrascan allows the user to visualize the outcome of the tree segmentation straight from within the Graphical User Interface. Points belonging to the same group (i.e. to the same tree) are assigned the same random color, which allows the user to perform intuitive, quick, qualitative in-app assessments. An example is provided in Figure 2.7.
Figure 2.7: Three examples of tree segmentations. From a qualitative point of view, we can acknowledge that the leftmost (rightmost) example is affected by undersegmentation (oversegmentation). The example in the middle seems to be a good compromise.
"},{"location":"PROJ-TREEDET/#224-exporting-results","title":"2.2.4 Exporting results","text":"As already said, Terrascan takes point clouds as input data and can run algorithms which form group out of these points, each group corresponding to an individual tree. A host of \"features\" (or \"measurements\"/ \"attributes\"/...) are generated for each group, which the user can export to text files using the \"Write group info\" command. The set of exported features can be customized through a dedicated configuration panel which can be found within the software settings (\"File formats / User group formats\").
The list and documentation of all the exportable features can be found here. Let us note that
The following table summarizes the features which the watershed and trunk methods can export:
Feature Watershed Method Trunk Method Group ID Yes Yes Point Count Yes Yes Average XY Coordinates Yes Yes Ground Z at Avg. XY Yes Yes Trunk XY No Yes Ground Z at Trunk XY No Yes Trunk Diameter See here below See here below Canopy Width Yes Yes Biggest Distance above Ground (Max. Height) Yes Yes Smallest Distance above Ground Yes Yes Length Yes Yes Width Yes Yes Height Yes Yes
"},{"location":"PROJ-TREEDET/#225-trunk-diameters","title":"2.2.5 Trunk Diameters","text":"Terrascan integrates a functionality allowing users to measure trunk diameters (see Figure 2.8).
Figure 2.8: Screenshots of the trunk diameter measurement function.
Let us note that the measurement of trunk diameters can be feasible or not, depending on the number of points which sample a given trunk.
We performed some rapid experiments, which showed that some diameters could actually be estimated, given the high density of the point cloud we used (cf. here). Still, we did not analyzed the reliability of such estimations against reference/ground truth data.
"},{"location":"PROJ-TREEDET/#23-running-dft","title":"2.3 Running DFT","text":"As already said, DFT consists of a collection of functions which can be run either with Octave or MATLAB. The former software was used in the frame of this context. A few custom Octave scripts were written to automatize the exploration of the parameter space.
Our preliminary, warm-up tests showed that we could not obtain satisfactory results by using the \"tree top detection method\" (cf. here). Indeed, upon using this method the F1-score topped at around 40%. Therefore, we devoted our efforts to exploring the parameter space of the other available method, namely the \"tree stem detection method\" (cf.\u00a0this tutorial). In the following, we provide a brief description of the various parameters involved in such a detection method.
"},{"location":"PROJ-TREEDET/#232-parameters-concerned-by-the-tree-stem-detection-method","title":"2.3.2 Parameters concerned by the tree stem detection method","text":"Quoting the official tutorial,
The stem detection algorithm uses the planimetric coordinates and height of the points above ground as an input.
To compute the height, DFT provides a function called elevationModels
, which takes the classified 3D point cloud as input, as well as some parameters. Regarding these parameters, we stuck to the values suggested by the official tutorial, except for
cellSize
parameter (=\u00a0size of the raster cells) which was set to 0.8 (meters);searchRadius
parameter which was set to 10 (meters).Once that each point is assigned an height above the ground, the actual tree stem detection algorithm can be invoked (treeStems
DFT function, cf.\u00a0DFT Tree Stem Detection Tutorial / Step 4 - Detect the stems), which takes a host of parameters. While referring the reader to the official tutorial for the definition of these parameters, we provide the list of values we used (unit\u00a0=\u00a0meters):
Parameter Value cellSize
0.9 bandWidth
0.7 verticalStep
0.15 searchRadius
from 1 to 6, step = 0.5 minLength
from 1 to 6, step = 0.5
searchRadius
(minLength
) was fixed to 4 (meters) when minLength
(searchRadius
) was let vary between 1 and 6 meters.
DFT does not include any specific Graphical User Interface. Still, users can rely on Octave/MATLAB to generate plots, something useful and clever especially when performing analysis in an interactive way. In our case, DFT was used in a non-interactive way and visualisation was delayed until the assessment step, which we describe in Section\u00a02.4.
"},{"location":"PROJ-TREEDET/#234-exporting-results","title":"2.3.4 Exporting results","text":"Thanks to the vast Octave/MATLAB ecosystem, DFT results can be output to disk in several ways and using data formats. More specifically, we used the ESRI Shapefile file format to export the average (x, y) coordinates of the detected stems/peaks.
"},{"location":"PROJ-TREEDET/#235-trunk-diameters","title":"2.3.5 Trunk diameters","text":"This feature is missing in DFT.
"},{"location":"PROJ-TREEDET/#24-post-processing-assessment-algorithm-and-metrics-computation","title":"2.4 Post-processing: assessment algorithm and metrics computation","text":"As already said, the STDL used a couple of third-party tools, namely TerraScan and the Digital Forestry Toolbox (DFT), in order to detect trees from point clouds. Both tools can output
one (X, Y, Z) triplet per detected tree, where the X, Y and Z (optional) coordinates are
computed either as the centroid of all the points which get associated to a given tree, or - under some conditions - as the centroid of the trunk only;
As the ground truth data the STDL was provided with take the form of one (X', Y') pair per tree, with Z' implicitly equal to 1 meter above the ground, the comparison between detections and ground truth trees could only be performed on the common ground of 2D space. In other words, we could not assess the 3D point clouds segmentations obtained by either TerraScan or DFT against reference/ground truth segmentations in the 3D space.
The problem which needed to be solved amounts to finding matching and unmatching items between two sets of 2D points:
In order to fulfill the requirement of a 1 meter accuracy which was set by the beneficiaries of this project, the following matching rule was adopted:
a detection (D) matches a ground truth tree (GT) (and vice versa) if and only if the Cartesian distance between D and GT is less or equal to 1 meter
Figure 2.9 shows how such a rule would allow one to tag
in the most trivial case.
Figure 2.9: Tagging as True Positive (TP), False Positive (FP), False Negative (FN) ground truth and detected trees in the most trivial case.
Actually, far less trivial cases can arise, such as the one illustrated in Figure 2.10.
Figure 2.10: Only one detection can exist for two candidate ground truth trees, or else two detections can exist for only one candidate ground truth tree.
The STDL designed and implemented an algorithm, which would produce relevant TP, FP, FN tags and counts even in such more complex cases. For instance, in a setting like the one in the image here above, one would expect the algorithm to count 2 TPs, 1 FP, 1 FN.
Details are provided here below.
"},{"location":"PROJ-TREEDET/#241-the-tagging-and-counting-algorithm","title":"2.4.1 The tagging and counting algorithm","text":""},{"location":"PROJ-TREEDET/#1st-step-geohash-detections-and-ground-truth-trees","title":"1st step: geohash detections and ground truth trees","text":"In order to keep track of the various detections and ground truth trees all along the execution of the assessment algorithm, each item is given a unique identifier, computed as the geohash of its coordinates, using the pygeohash
Python module. Such identifier is not only unique (as far as a sufficiently high precision is used), but also stable across subsequent executions. The latter property allows analysts to \"synchronise\" the concerned objects between the output of the (Python) code and the views generated with GIS tools such as QGIS, which turns out to be quite useful especially at development and debugging time.
As a 2nd step, each detection is converted to a circle,
This operation can be accomplished by generating a 1 m buffer around each detection. For the sake of precision, this method was used, which generates a polygonal surface approximating the intended circle.
"},{"location":"PROJ-TREEDET/#3rd-step-perform-left-and-right-outer-spatial-joins","title":"3rd step: perform left and right outer spatial joins","text":"As a 3rd step, the following two spatial joins are computed:
left outer join between the circles generated at the previous step and ground truth trees;
right outer join between the same two operands.
In both cases, the \"intersects\" operation is used (cf.\u00a0this page for more technical details).
"},{"location":"PROJ-TREEDET/#4th-step-tag-trivial-false-positives-and-false-negatives","title":"4th step: tag trivial False Positives and False Negatives","text":"All those detections output by the left outer join for which no right attribute exists (in particular, we focus on the right geohash) can trivially be tagged as FPs. As a matter of fact, this means that the 1 m circular buffer surrounding the detection does not intersect any ground truth tree; in other words, that no ground truth tree can be found within 1 m from the detection. The same reasoning leads to trivially tagging as FNs all those ground truth trees output by the right outer join for which no left attribute exists. These cases correspond to the two rightmost items in Fig.\u00a06.1.
For reasons which will be clarified here below, the algorithm does not actually tag items as either FPs or FNs; instead,
Here's how:
TP charge FP charge 0 1
TP charge FN charge 0 1
"},{"location":"PROJ-TREEDET/#5th-step-tag-non-trivial-false-positives-and-false-negatives","title":"5th step: tag non-trivial False Positives and False Negatives","text":"The left outer spatial join performed at step 3 establishes relations between each detection and those ground truth trees which are located no further than 1 meter, as shown in Figure 2.11.
Figure 2.11: The spatial join between buffered detections and ground truth trees establishes relations between groups of items of these two populations. In the sample setting depicted in this picture, two unrelated groups can be found.
The example here above shows 4 relations,
which can be split (see the red dashed line) into two unrelated, independent groups:
In order to generate this kind of groups in a programmatic way, the algorithm first builds a graph out of the relations established by the left outer spatial join, then it extracts the connected components of such a graph (cf.\u00a0this page).
The tagging and counting of TPs, FPs, FNs is performed on a per-group basis, according to the following strategy:
if a group contains more ground truth than detected trees, then the group is assigned an excess \"FN charge\", equal to the difference between the number of ground truth trees and detected trees. This excess charge is then divided by the number of ground truth trees and the result assigned to each of them. For instance, the {D1 - GT1, D1 - GT2} group in the image here above would be assigned an FN charge equal to 1; then, each ground truth tree would be assigned an FN charge equal to 1/2.
Similarly, if a group contains more detected trees than ground truth trees, then the group is assigned an excess FP charge, equal to the difference between the number of detected trees and ground truth trees. This excess charge is then divided by the number of detections and the result assigned to each of them. For instance, the {D2 - GT3, D3 - GT3} group in the image here above would be assigned an excess FN charge equal to 1; then, each detection would be assigned an FP charge equal to 1/2.
In case the number of ground truth trees be the same as the number of detections, no excess FN/FP charge is assigned to the group.
Concerning the assignment of TP charges, the per-group budget is established as the minimum between the number of ground truth and detected trees, then equally split between the items of these two populations. In the example above, both groups would be assigned TP charge = 1.
Wrapping things up, here are the charges which the algorithm would assign to the various items of the example here above:
item TP charge FP charge Total charge D1 1 0 1 D2 1/2 1/2 1 D3 1/2 1/2 1 Sum 2 1 3
item TP charge FN charge Total charge GT1 1/2 1/2 1 GT2 1/2 1/2 1 GT3 1 0 1 Sum 2 1 3
Let us note that:
TP, FP, FN counts are extensive properties, out of which we can compute some standard metrics such as
which are intensive, instead. While referring the reader to this paragraph for the definition of these metrics, let us state the interpretation which holds in the present use case:
Typically, one cannot optimize both precision and recall for the same values of a set of parameters. Instead, they can exhibit opposite trends as a function of a given parameter (e.g. precision increases while recall decreases). In such cases, the F1-score would exhibit convexity and could be optimized.
"},{"location":"PROJ-TREEDET/#3-results-and-discussion","title":"3. Results and discussion","text":"Figure 3.1 shows some of the tree detection trials we performed, using Terrascan and DFT. Each trial corresponds to a different set of parameters and is represented either by gray dots or colored diamonds in a precision-recall plot (see the image caption for further details).
Figure 3.1: Precision vs. Recall of a subset of the tree detections we attempted, using different parameters in Terrascan and DFT. Colored diamonds represent the starting point (red) as well as our \"last stops\" in the parameter space, with (yellow, green) and without (orange) pre-processing. All the three test sectors are here combined.
Let us note that:
More detailed comments follow, concerning the best trials made with Terrascan and DFT.
"},{"location":"PROJ-TREEDET/#31-the-best-trial-made-with-terrascan","title":"3.1 The best trial made with Terrascan","text":"Among the trials we ran with Terrascan, the one which yielded the best F1-score was obtained using the following parameters:
Parameter Value Method / Algorithm Trunk Classes 4+5, cleaned and reclassified Group planar surfaces Off Min height 3.00 m Max diameter 0.40 m Min trunk 3.00 m Group by density On Gap Auto Require 1500 pts
This trial corresponds to the green diamond shown in Figure 3.1.
Figure 3.2: Test sectors as segmented by the best trial made with Terrascan.
Figure 3.2 provides a view of the outcome on the three test sectors. Metrics read as follows:
Sector TP FP FN Detectable (TP+FN) Precision Recall F1-Score Global 323 137 234 557 70.2% 58.0% 63.5% Adrien-Jeandin 177 69 160 337 72.0% 52.5% 60.7% Bel-Air 114 15 11 125 88.4% 91.2% 89.8% Floraire 32 53 63 89 37.6% 33.7% 35.6%
Figure 3.3 provides a graphical representation of the same findings, with the addition of the metrics we computed before cleaning and reclassifying the LiDAR point cloud.
Figure 3.3: Cleaning and reclassifying the point cloud has a positive influence on precision and recall, although modest.
Our results confirm that the tree detection task is more or less hard depending on the sector at hand. Without any surprise, we acknowledge that:
Cleaning and Reclassification have a benificial impact on Precision and Recall for all sectors as well as the global context (TOT). While for BEL mainly Recall profited from preprocessing, ADR and FLO showed a stronger increase in Precision. For the global context both, Precision and Recall, could be increased slighty.
Figure 3.4: The F1-score attained by our best Terrascan trial.
Figure 3.4 shows how our best Terrascan trial performed in terms of F1-score: globally, on a per-sector basis; with and without pre-processing.
We can notice that pre-processing slightly improves the F1-score for the global context as well as for the individual sectors. The largest impact was observed for the Bel-Air sector, especially for preprocessing including Reclassification.
"},{"location":"PROJ-TREEDET/#32-the-best-trial-made-with-dft","title":"3.2 The best trial made with DFT","text":"The DFT trial yielding the highest global F1-score was obtained using the stem detection method and the following parameters:
Parameter Value Method / Algorithm Stem detection Classes 4+5, cleaned and reclassified Search radius 4.00 Minimum length 4.00
Here's a summary of the resulting metrics:
Sector Precision Recall F1-score Adrien-Jeandin 75.4% 36.5% 49.2% Bel-Air 88.0% 82.4% 85.1% Floraire 47.9% 36.8% 41.7% Global 74.0% 46.6% 57.2%
Similar comments to those formulated here apply: the \"Avenue de Bel-Air\" sector remains the easiest to process; \"Parc Floraire\" the hardest. However, here we acknowledge a bigger gap between the global F1-score and the F1-score related to the \"Adrien-Jeandin\" test sector.
Figure 3.5 shows how our best DFT trial performed in terms of F1-score: globally, on a per-sector basis; with and without pre-processing. We can notice that the impact of point cloud reclassification can be slightly positive or negative depending on the test sector.
Figure 3.5: The F1-score attained by our best DFT trial.
"},{"location":"PROJ-TREEDET/#33-comparison-terrascan-vs-dft","title":"3.3 Comparison: Terrascan vs. DFT","text":"Figure 3.6: Comparison of Terrascan and DFT in terms of F1-score.
The comparison of the best Terrascan trial vs. the best DFT trial in terms of F1-score shows that there is no clear winner (see Figure 3.6). Still, we can notice that:
In addition to applying our method to the 2021 high-density (HD) LiDAR dataset, we also tried using two other datasets exhibiting a by far more standard point density (20-30 pt/m\u00b2):
The goal was twofold:
Concerning the 1st point, lower point densities make the \"trunk method\" unreliable (if not completely unusable). In Figure 3.7, we report results obtained with the watershed method, along with results related to the best performing trials obtained with the 2021 HD dataset. The scores we obtained with the SD dataset are far below the best we obtained with the HD dataset, confirming the interest of high-density acquisitions.
Figure 3.7: Comparison of F1-scores of the best performing trials. Parameters were optimized for each model individually.
Concerning the 2nd point, without any surprise we confirmed that parameters must be re-optimized for SD datasets. The usage of the set of parameters which were optimized on the basis of the HD dataset yielded poor results, as shown in Figure 3.8.
Figure 3.8: Using the parameters which were optimized for the high-density dataset leads to poor results (strong under-segmentation) on SD datasets. In accordance with the TS documentation we can see that the trunk method is unusable for lower and medium density datasets.
The watershed algorithm produces a more realistic segmentation pattern on the SD dataset but still cannot reach the performance levels of the trunk or the watershed method on the HD dataset. After optimizing parameters, we could obtain quite decent results though (see Figure 3.9).
Figure 3.9: After a dataset-specific parameter optimization, convincing results can be achieved on the medium-density 2019 dataset (Terrascan's watershed method was used).
"},{"location":"PROJ-TREEDET/#35-tree-detection-over-the-full-2021-high-density-lidar-dataset","title":"3.5 Tree detection over the full 2021 high-density LiDAR dataset","text":"Clearly, from a computational point of view processing large point cloud dataset is not the same as processing small datasets. Given the extremely high density of the 2021 LiDAR datasets, we wanted to check whether and how Terrascan could handle such a resource-intensive task. Thanks to Terrascan's macro actions, one can split the task into a set of smaller sub-tasks, each sub-task dealing with a \"tile\" of the full dataset. Additionally, Terrascan integrates quite a smart feature, which automatically merges groups of points (i.e. trees) spanning multiple tiles.
Figure 3.10 provides a static view of the results we obtained, using the parameters which globally performed the best on the three sectors. We refer the reader to this Potree viewer (kindly hosted by the G\u00e9oportail du SITN) for an interactive view.
Figure 3.10: Result of the application of the best performing Terrascan parameters to the full dataset.
"},{"location":"PROJ-TREEDET/#4-conclusion-and-outlook","title":"4. Conclusion and outlook","text":"Despite all the efforts documented here above, the results we obtained are not as satisfactory as expected. Indeed, the metrics we managed to attain all sectors combined indicate that tree detections are neither reliable (low precision) nor exhaustive (low recall). Still, we think that results may be improved by further developing some ideas, which we sketch in the following.
"},{"location":"PROJ-TREEDET/#41-further-the-dft-parameter-space-exploration","title":"4.1 Further the DFT parameter space exploration","text":"We devoted much more time to exploring Terrascan's parameter space than DFT's. Indeed, as already stated here, we only explored the two parameters searchRadius
and minLenght
. Other parameters such as cellSize
, bandwidth
and verticalStep
were not explored at all (we kept default values). We think it is definitely worth exploring these other parameters, too.
Moreover,
We showed that the algorithms implemented by TerraScan and DFT yield much better results in sparse contexts (ex.: the \"Avenue de Bel-Air\" test sector) than in dense ones (ex.: the \"Parc Floraire\" test sector). This means that precision may be improved (at the expense of recall, though) if one could restrain the tree detection to sparse contexts only, either as a pre- or post-processing step. We can think of at least a couple of methods which would allow one to (semi-)automatically tell sparse from dense contexts:
intrinsic method: after segmenting the point cloud into individual trees, one could analyze how close (far) each individual is to (from) the nearest neighbor and estimate the density of trees on some 2D or 3D grid;
extrinsic method: territorial data exist (see for instance the dataset \u00a0\"Carte de couverture du sol selon classification OTEMO\" distributed by the SITG), providing information about urban planning and land use (e.g.\u00a0roads, parks, sidewalks, etc.). These data may be analyzed in order to extract hints on how likely it is for a tree to be in close proximity with another, according to its position.
Detections coming from two or more independent trials (obtained with different software or else with the same software but different parameters) could be combined in order to improve either precision or recall:
recall would be improved (i.e.\u00a0the number of false negatives would be reduced) if detections coming from multiple trials were merged. In order to prevent double counting, two or more detections coming from two or more sources could be counted as just one if they were found within a given distance from each other. The algorithm would follow along similar lines as the ones which led us to the \"tagging and counting algorithm\" presented here above;
precision would be improved (i.e.\u00a0the number of false positives would be reduced) if we considered only those detections for which a consensus could be established among two or more trials, and discarded the rest. A distance-based criterion could be used to establish such consensus, along similar lines as those leading to our \"tagging and counting algorithm\".
Generic (i.e. not tailored for tree detection) clustering algorithms exist, such as DBSCAN (\"Density-Based Spatial Clustering of Applications with Noise\", see e.g. here), which could be used to segment a LiDAR point cloud into individual trees. We think it would be worth giving these algorithms a try!
"},{"location":"PROJ-TREEDET/#45-use-machine-learning","title":"4.5 Use Machine Learning","text":"The segmentation algorithms we used in this project do not rely on Machine Learning. Yet, alternative/complementary approaches might me investigated, in which a point cloud segmentation model would be first trained on reference data, then used to infer tree segmentations within a given area of interest. For instance, it would be tempting to test this Deep Learning model published by ESRI and usable with their ArcGIS Pro software. It would be also worth deep diving into this research paper and try replicating the proposed methodology. Regarding training data, we could generate a ground truth dataset by
The work documented here was the object of a Forum SITG which took place online on March 29, 2022. Videos and presentation materials can be found here.
"},{"location":"PROJ-TREEDET/#6-acknowledgements","title":"6. Acknowledgements","text":"This project was made possible thanks to a tight collaboration between the STDL team and some experts of the Canton of Neuch\u00e2tel (NE), the Canton of Geneva (GE), the Conservatoire et Jardin botaniques de la Ville de Gen\u00e8ve (CJBG) and the University of Geneva (UNIGE). The STDL team acknowledges key contributions from Marc Riedo (SITN, NE), Bertrand Favre (OCAN, GE), Nicolas Wyler (CJBG) and Gregory Giuliani (UNIGE). We also wish to warmly thank Matthew Parkan for developing, maintaining and advising us on the Digital Forestry Toolbox.
"},{"location":"TASK-4RAS/","title":"TASK-4RAS - HR, NH","text":"Schedule : September 2020 to February 2021 (initially planned from August 2021 February 2022)
This document describe the state of an ongoing task (DIFF) and is subject to daily revision and evolution
"},{"location":"TASK-4RAS/#context","title":"Context","text":"The 4D platform developed at EPFL with the collaboration of Cadastre Suisse is able to ingest both large scale point-based and vector-based models. During the previous development, the possibility to have this different type of data in a single framework lead to interesting results, showing the interest to have the possibility to put this different type of data into perspectives.
Illustrations of mixed models in the 4D platform : INTERLIS, Mesh and LIDAR - Data : SITN
Taking into account point-based and vector-based model allows to almost cover all type of data that are traditionally considered for land registering.
The only type of data that is currently missing is the two-dimensional rasters. Indeed, due to their nature, image are more complicated to put in perspective of other three-dimensional data. The goal of this task is then to address the management of the raster by the platform in order to be able to ingest, store and broadcast any type of data with the 4D platform.
"},{"location":"TASK-4RAS/#specifications","title":"Specifications","text":"In order to address this task, a step-by-step approach is defined. In the first place, a set of data has to be gathered from the STDL partners :
Gathering a dataset of geo-referenced ortho-photography of a chosen place of reasonable size
The dataset has to provide ortho-photography for at least two different times
The format of the dataset has to be analyzed in order to be able to extract the image pixels with their position (CH1903+)
As the platform indexation formalism is not straightforward, the images are treated as point-based model, each pixel being one colored point of the model. This will allow to provide a way of starting to analyze and understand the indexation formalism while having first results on image integration :
Transform images into simple point-based models (each pixel being one point)
Injection of the point-based model in an experimental instance of the platform
Understanding the indexation formalism for point-based models and, subsequently, its adaptation for the vector-based models
As the indexation formalism is understood for point-based models, the following adaptation will be performed :
At this point, a first reporting is required :
Is there an advantage to add raster to such a platform in perspective of the other types of model (points, vectors, meshes) ?
How the adaptation of the point-based indexation performs for images ?
How taking advantage of color accumulation enrich the image integration ?
What is the cost of rendering the image with the adaptation of the point-based indexation ?
Based on the formulated answer, the following strategical choice has to be discussed :
Depending on the answer, a new set of specification will be decided (if this direction is favored).
Depending on the remaining time and on the obtained results, the question of the time management in the platform will be addressed. Currently, the time is treated linearly in the platform and a multi-scale approach, as for the spatial dimensions, could be interesting. The specifications will be decided as the previous points will be fulfilled.
"},{"location":"TASK-4RAS/#resources","title":"Resources","text":"List of the resources initially linked to the task :
Other resources will be provided according to requirements.
"},{"location":"TASK-DIFF/","title":"AUTOMATIC DETECTION OF CHANGES IN THE ENVIRONMENT","text":"Nils Hamel (UNIGE)
Project scheduled in the STDL research roadmap - TASK-DIFF September 2020 to November 2020 - Published on December 7, 2020
Abstract : Developed at EPFL with the collaboration of Cadastre Suisse to handle large scale geographical models of different nature, the STDL 4D platform offers a robust and efficient indexation methodology allowing to manage storage and access to large-scale models. In addition to spatial indexation, the platform also includes time as part of the indexation, allowing any area to be described by models in both spatial and temporal dimensions. In this development project, the notion of model temporal derivative is explored and proof-of-concepts are implemented in the platform. The goal is to demonstrate that, in addition to their formal content, models coming with different temporal versions can be derived along the time dimension to compute difference models. Such proof-of-concept is developed for both point cloud and vectorial models, demonstrating that the indexation formalism of the platform is able to ease considerably the computation of difference models. This research project demonstrates that the time dimension can be fully exploited in order to access the data it holds.
"},{"location":"TASK-DIFF/#task-context-difference-detection","title":"Task Context : Difference Detection","text":"As the implemented indexation formalism is based on equivalences classes defined on space and time, a natural discretization along all the four dimensions is obtained. In the field of difference detection, it allowed implementing simple logical operators on the four-dimensional space. The OR, AND and XOR operators were then implemented allowing the platform to compute, in real time, convolutions to compare models with each others across the time.
The implementation of these operators was simple due to the natural spatio-temporal discretization obtained from the indexation formalism. Nevertheless, two major drawbacks appeared : the first one is that such operators only works for point-based models. Having the opportunity to compute and render differences and similarities between any type of data is not possible with such formal operators.
The second drawback comes from the nature of the point-based capturing devices. Indeed, taking the example of a building, even without any change to its structure, two digitization campaigns can lead to disparities only due to measures sampling. The XOR operator is the natural choice to detect and render differences, but this operator is very sensitive to sampling disparities. Computing the XOR convolution between two point-based models leads the rendering to be dominated by sampling variations rather than the desired structural differences.
This drawback was partially solved by considering the AND operator. Indeed, the AND operator allows to only shows constant structural elements from two different positions in time and is insensitive to sampling disparities. As shown on the following images, the AND operator shows differences as black spots (missing parts) :
AND convolution between two LIDAR models : Geneva 2005 and 2009 - Data : SITGAs one can see, AND convolutions allow detecting, through the black spots, large area of structural changes between the two times and also, with more care, allow guessing smaller differences. Nevertheless, reading and interpreting such representation remains complex for users.
The goal of this task is then to tackle these two drawbacks, allowing the platform to detect changes not only for point-based models but also for vector-based models and to implement a variation of the XOR operator for point-based models allowing to efficiently highlight the structural evolution. The task consists then in the implementation, testing and validation of a difference detection algorithm suitable for any type of model and to conduct a formal analysis on the best rendering techniques.
"},{"location":"TASK-DIFF/#methodology","title":"Methodology","text":"A step by step methodology is defined to address the problem of difference detection in the platform. In a first phase, the algorithm will be developed and validated on vector-based models as follows :
Obtaining a large scale vector-based model on which synthetic variation are introduced
Development of the algorithm using the synthetic variations model
Testing and validation of the algorithm (using the known synthetic variations)
First conclusion
In a second phase, true land register data will be used to formally detect real evolutions of the territory :
Obtaining true land register vector-based models (INTERLIS) at different times
Analysis of the difference detection algorithm on true land register vector-based models
Second conclusion
In a third phase, the algorithm will be validated and adapted to work on point-based models :
Obtaining true land register point-based models (LAS) at different position in time
Verifying the performances of the vector-based detection algorithm on point-based data
Adaptation of the algorithm for point-based models
Analysis of the difference detection algorithm on true land register point-based models
Comparison of the detected differences on point-based models and on their corresponding land register vector-based models (INTERLIS)
Third conclusion
In addition, the development of difference detection algorithm has to be conducted keeping in mind the possible future evolutions of the platform such as addition of layers (separation of data), implementation of a multi-scale approach of the time dimension and addition of raster data in the platform.
"},{"location":"TASK-DIFF/#first-phase-synthetic-variations","title":"First Phase : Synthetic Variations","text":"In order to implements the vector-based difference detection algorithm, sets of data are considered as base on which synthetic differences are applied to simulate the evolution of the territory. This approach allows focusing on well controlled data to formally benchmark the results of the implemented algorithm. Experiments are conducted using these data to formally evaluate the performance of the developed algorithm.
"},{"location":"TASK-DIFF/#selected-resources-and-models","title":"Selected Resources and Models","text":""},{"location":"TASK-DIFF/#vector-models-line-based","title":"Vector Models : Line-based","text":"In this first phase, line-based data are gathered from openstreetmap in order to create simple models used during the implementation and validation of the detection algorithm. A first set of vector-based models are considered made only of lines. Three sets are created each with a different scale, from city to the whole Switzerland.
The line-based sets of data are extracted from openstreetmap shapefiles and the elevations are restored using the SRTM geotiff data. The EGM96-5 geoid model is then used to convert the elevation from MSL to ellipsoid heights. The following images give an illustration of these sets of data :
Line-based data-sets : Switzerland - Data : OSMThe following table gives a summary of the models sizes and primitives count :
Model Size (UV3) Primitive Count Frauenfeld 5.0 Mio 93.3 K-Lines Neuch\u00e2tel 33.1 Mio 620.2 K-Lines Switzerland 1.3 Gio 25.0 M-LinesIn order to simulate evolution of the territory in time, synthetic variations are added to these models. A script is developed and used to insert controlled variations on selected primitives. The script works by randomly selecting a user-defined amount of primitives of a model and by adding a variation on one of its vertex position using a user-specified amplitude. The variation is applied on the three dimensions of space.
"},{"location":"TASK-DIFF/#vector-models-triangle-based","title":"Vector Models : Triangle-based","text":"A second set of triangle-based models is also considered for implementing and validating the difference detection algorithm. The selected model is a mesh model of the Swiss buildings provided by swisstopo. It comes aligned in the CH1903+ frame with elevations. It is simply converted into the WGS84 frame using again the EGM96-5 geoid model :
Triangle-based data-sets : Switzerland - Data : swisstopoThe following table gives a summary of the models sizes and primitives count :
Model Size (UV3) Primitive Count Frauenfeld 116.9 Mio 1.4 M-Triangles Neuch\u00e2tel 842.2 Mio 10.5 M-Triangles Switzerland 30.5 Gio 390.6 M-TrianglesThese models are very interesting for difference detection as the ratio between primitive size and model amplitude is very low. It means that all the primitives are small according to the model coverage, especially for the Switzerland one.
The developed script for line-based models is also used here to add synthetic variations to the models primitives in order to simulate an evolution of the territory.
"},{"location":"TASK-DIFF/#models-statistical-analysis","title":"Models : Statistical Analysis","text":"Before using the models in the following developments, a statistical analysis is performed on the two Switzerland models, line and triangle-based. Each primitive of these two models are considered and their edges size are computed to deduce their distribution :
Statistical analysis : Models primitive edge size distribution, in meters, for the Switzerland models : line-based (left) and triangle-based (right)One can see that the line-based model comes with a much more broad distribution of the primitives size. Most of the model is made from lines between zero and twenty meters. In the case of the triangle-based models, the primitives are much smaller. As most of them are less than ten meters, a significant fraction of primitives is below one meter.
"},{"location":"TASK-DIFF/#implementation-of-the-algorithm","title":"Implementation of the Algorithm","text":"In order to compare two models at two different positions in time to detect differences, the solution is of course to search for each primitive of the primary time if it has a corresponding one in the secondary time. In such case, the primitives can be concluded as static in time and only the primitives that have no correspondence will be highlighted as differences.
A first approach was initially tested : a vertex-based comparison. As every primitive (points, lines and triangles) is supported by vertexes, it can be seen as a common denominator on which comparison can take place. Unfortunately, it is not a relevant approach as it leads to an asymmetric detection algorithm. To illustrate the issue, the following image shows the situation of a group of line-based primitives at two different times with an evolution on one of the primitive vertex :
Asymmetric approach : The variation is detected only when comparing backward in timeWhen the comparison occurs between the second time and the first one, the modified vertex correspondence is not found, and the vertex can be highlighted as a difference. The asymmetry appears as the first time is compared to the second one. In this case, despite the primitive vertex changed, the vertex-based approach is able to find another vertex, part of another primitive, and interprets it as a vertex identity, leading the modified primitive to be considered as static.
In order to obtain a fully symmetric algorithm, that does not depend on the way models are compared in times, a primitive-attached approach is considered. The implemented algorithm then treats the correspondence problem from the whole primitive point of view, by checking that the whole primitive can be found in the other model to which it is compared to. This allows to highlight any primitive showing a modification, regardless of the way models are compared and the nature of the modification.
In addition to highlighting the primitives that changed through time, the implemented algorithm also renders the primitives that have not changed. The primitives are then shown by modulating their color to emphasize the modifications by keeping their original color for the modified one, while the static primitives are shown in dark gray. This allows to not only show the modifications but also to keep the context of the modifications, helping the user to fully understand the nature of the territory evolution.
In addition to color modulation, a variation of difference rendering is analyzed. In addition to color modulation, a visual and artificial marker is added to ease their search. The visual marker is a simple line emanating from the primitive and goes straight up with a size of 512 meters. Such markers are introduced to ease the detection of small primitives that can be difficult to spot according to large point of views.
Additional developments were required for triangle-based models : indeed, such models need to be subjected to a light source during rendering for the user to understand the model (face shading). The previously implemented lighting model is then modified to take into account color modulation in order to correctly render the triangle that are highlighted. Moreover, the lighting model was modified to light both face of the triangles in order to light them regardless of the point of view.
In addition, as mesh models are made of triangles, primitives can hide themselves. It can then be difficult for the user to spot the highlighted primitives as they can be hidden by others. An option was added to the rendering client allowing the user to ask the rendering of triangles as line-loops or points in order to make them transparent. Finally, an option allowing the user to enable or disable the render face culling was added for him to be able to see the primitive from backward.
"},{"location":"TASK-DIFF/#results-and-experiments","title":"Results and Experiments","text":"With the implemented algorithm, a series of experiments are conducted in order to validate its results and to analyze the efficiency of the difference detection and rendering from a user point of view. In addition, experiments are also conducted to quantify the efficiency of the difference detection for automated processes.
"},{"location":"TASK-DIFF/#difference-detection-overview","title":"Difference Detection : Overview","text":"Considering the selected data-sets, each original model is injected at a given time and synthetic variations are added to a copy of it to create a second model injected at another time. The synthetic variations are randomly added to a small amount of primitives of the original model and are of the order of one meter. On the following examples, the detection is operated considering the original model as primary and the modified one as secondary.
The following images show examples of how the detection algorithm allows to highlight the detected differences while keeping the rest of the model using a darker color in case of line-based models :
Example of difference detection on line-based Frauenfeld (left) and Neuch\u00e2tel (right) models - Data : OSMOne can see how the modified primitives are highlighted while keeping the context of the modifications. The highlighted primitive is the one belonging to the primary time. Comparing the models in the other way around would lead the secondary model primitives to be highlighted.
Considering the Frauenfeld example, the following images show the situation in the primary time (original model) and the secondary time (model with synthetic variations) :
Primary model (left) and secondary one (right) showing the formal situations - The modified primitive is circled in read - Data : OSMAs a result, the user can choose between the differences highlighting with the choice of model as primary and can also switch back and worth between the models themselves though the platform interface.
Of course, the readability of the difference detection models depends on the size of the modified primitive and the scale at which the model is looked at by the user. If the user adopts a large scale point of view, the differences, even highlighted, can become difficult to spot. This issue can be worsened as triangle-based models are considered. In addition to primitive size, triangles also bring occlusions.
The visual markers added to the highlighted primitives can considerably improve ease of differences search by the user. The following images give an example of difference detection without and with the visual markers added by the algorithm :
Example of highlighted primitives without (left) and with (right) visual markers - Data : OSMConsidering the triangle-based models, difference detection is made more complicated by at least three aspects : the first one is that 3D vector models are more complex than 2D ones in the way primitives (triangles) are more densely packed in the same regions of space in order to correctly model the buildings. The second one is that triangles are solid primitives that bring occlusions in the rendering, hiding other primitives. The last aspect is that such a model can contain very small primitives in order to model the details of the buildings. In such a case, the primitives can be difficult to see, even when highlighted.
The following images show an example of highlighted triangles on the Frauenfeld model :
Example of highlighted primitive on the Frauenfeld building model - Data : swisstopoOn the right image above, the highlighted triangle is underneath the roof of the house, forcing the user to adopt an unconventional point of view (from above the house) to see it. In addition, some primitives can be defined fully inside a volume close by triangles, making them impossible to see without going inside the volume or playing with triangle rendering mode.
In such a context, the usage of the visual markers become very important for such models coming with large amount of occlusion and small primitives :
Example of highlighted primitives without (left) and with (right) visual markers - Data : swisstopoIn case of triangle-based models, the usage of markers appears to be mandatory in order for the user to be able to locate the position of the detected differences in a reasonable amount of time.
"},{"location":"TASK-DIFF/#difference-detection-user-based-experiments","title":"Difference Detection : User-Based Experiments","text":"In any case, for both line and triangle-based models, the difference detection algorithm is only able to highlight visible primitives. Depending on the point of view of the user, part of the primitives are not provided by the platform because of their small size. Indeed, the whole point of the platform is to allow the user to browse through arbitrary large models, which implies to provided only the relevant primitives according to its point of view.
As a result, the detection algorithm will not be able to highlight the variations as the involved primitives are not considered as a query answer by the platform. The user has then to reduce is point of view in order to zoom on the small primitives to make them appear, and so, allowing the algorithm to highlight them.
In order to show this limitation, an experiment is performed. For each model, a copy is made on which eight synthetic differences are randomly introduced. The variations are of the order of one meter. The models and their modulated copy are injected in the platform. The rule is the following : the user uses the detection algorithm on each model and its modulated copy and has five minutes to detect the eight differences. Each time a difference is seen by the user, the detection time is kept. The user is allowed to use the platform in the way he wants. In each case, the experiment is repeated five times to get a mean detection rate.
As one could ask, these measures are made by the user and are difficult to understand without a reference. In order to provide such reference, the following additional experiment is conducted : each model and its modulated copy are submitted to a naive automated detection process. This process parses each primitive of the original model to search in its modulated copy if the primitive appear. If the primitive is not found, the process trigger a difference detection. This process is called naive as it simply implements two nested loops, which is the simplest searching algorithm implementation. The process is written in C with full code optimization and executed by a single thread.
Starting with the line-based models, the following figures shows the difference detection rates according to time. For each of the three models, the left plots show the rate without visual markers, the middle ones with visual markers and the right ones the naive process detection rate :
Frauenfeld : The black curve shows the mean detection rate while the blue (left, middle) and red (right) area gives the worst and best ratesLeft : without visual markers - Middle : with visual markers - Right : automated process Canton of Neuch\u00e2tel : The black curve shows the mean detection rate while the blue (left, middle) and red (right) area gives the worst and best ratesLeft : without visual markers - Middle : with visual markers - Right : automated process Switzerland : The black curve shows the mean detection rate while the blue (left, middle) and red (right) area gives the worst and best ratesLeft : without visual markers - Middle : with visual markers - Right : automated processAs expected, the larger the model is, the more difficult it is for the user to find the highlighted differences, with or without visual markers. Considering a city, the differences, even of the order of one meter, are easy to spot quickly. As the model gets larger, the more time it takes for the user to find the differences. On a model covering a whole canton (Neuch\u00e2tel), one can see that most of the differences are detected in a reasonable amount of time despite their small size according to the overall model. On the Swiss model, things get more complicated, as simply looking at each part of the country is already complicated in only five minutes, leading the detection rate to be lower, even using the visual markers.
These results are consistent with the statistical analysis made on the line-based Switzerland model. Detection on a city or even a whole canton lead the user to adopt a point of view sufficiently close to make most of the primitives appearing. For the Switzerland model, the user is forced to adopt a larger point of view, leading to a significant proportion of primitives to stay hidden.
These results also show that adding visual markers to the highlighted primitives increases the user detection rate, meaning that the markers lead to a more suitable rendering from the user experience point of view.
Considering the user results and the naive detection process, one can see that the user obtains at least similar results but most of the time outperforms the automated process. This allows to demonstrate how the implementation and data broadcasting strategy of the platform is able to provide an efficient way to access models and composite models, here in the context of difference detection.
The following figures show the experiments results for the triangle-based models, which were not performed on the whole Switzerland model due to limited rendering capabilities :
Frauenfeld : The black curve shows the mean detection rate while the blue (left, middle) and red (right) area gives the worst and best ratesLeft : without visual markers - Middle : with visual markers - Right : automated process Canton of Neuch\u00e2tel : The black curve shows the mean detection rate while the blue (left, middle) and red (right) area gives the worst and best ratesLeft : without visual markers - Middle : with visual markers - Right : automated processSimilar conclusions apply for the triangle-based models : the larger the model is, the more difficult the difference detection is. These results also confirm that adding visual markers in addition to primitives highlighting significantly helps the user, particularly in case of triangle-based models.
The obtained results on triangle-based models are lower than for line-based models. A first explanation is the greater amount of primitive that lead the user to spend more time at each successive point of view. The occlusion problem also seems to play a role, but to a lesser extent as the visual markers seems to largely solve it. The differences between detection on line and triangle-based models have to be searched in the statistical analysis of the triangle-based models. Indeed, for these models, a large proportion of the primitives are very small (less than a meter), leading them to be rendered only as the user adopts a close point of view, making the detection much more complicated in such a small amount of time.
The triangle-based models being larger than the line-based one, the results of the naive process are very poor. As for the line-based models experiments, the user outperforms this automated process, in a much more significant way.
"},{"location":"TASK-DIFF/#difference-detection-process-based-experiments","title":"Difference Detection : Process-Based Experiments","text":"In the previous experiments, the user ability to find the differences on the data-sets, using synthetic variations, was benchmark in perspective of the results provided by a naive automated process. The user performs quite well using the platform, but start to struggle as the data-sets get bigger according to the sizes of their primitives.
In this second set of experiments, the platform is used through an automated process instead of a user. The process has the same task as the user, that is, finding the eight synthetic differences introduced in the models copy. The process starts with a list of index (the discretization cells of the platform) in order to query the corresponding data to the platform before to search for differences in each cell. The process implements, then, a systematic difference detection covering the whole model.
In order for the process to work, it requires an input index list. To create it, the primitive injection condition of the platform is used to determine the maximal depth of these index. The following formula gives the poly-vertex (lines and triangles) primitives injection condition according to the platform scale. In other words, the formula gives the shallowest scale at which the primitive is considered through queries according to its size :
where s gives the shallowest scale, R being the WGS84 major semi-axis and e is the largest distance, in meters, between the primitive first vertex and its other ones. For example, choosing s = 26 allows the index to reach any primitive that is greater than ~30 cm over the whole model covered by the index.
The scale 26 is then chosen as the deepest search scale in the following experiments. This value can be adapted according to the primitives size and to the nature of the detection process. The larger it is, the more data are broadcast by the platform increasing the processing time.
In order to compare the user-based experiments, the naive automated approach and this process-based exhaustive search, the same protocol is considered. The process addresses queries to the platform, based on the index list, and save the detection time of each difference. The detection rate is plot in the same way as for the previous experiments. Again, eight synthetic differences are randomly introduced and the experiment is repeated five times for the line-based model and only two times for the triangle-based model.
As the scale 26 is chosen as the deepest search scale, the index list can be built in different ways. Indeed, as a query is made of one spatial index, that points at the desired cell, and an additional depth (span), to specify the density of data, the only constraint to maintain the deepest search scale at 26 is the following :
where the two left hand side terms are the spatial index size and span value. In these experiments, a first list of index is built using a span of 9 and a second with a span of 10. As the deepest scale is maintained constant, increasing the span reduces the index list size, but the queried cells contain more data to analyze.
The following figures show the mean detection rate for the Switzerland lined-based model with the deepest scale at 26 and span at 9 and 10. The plots are scaled in the same way as for the user-based experiments :
Switzerland : The black curve shows the mean detection rate while the blue area gives the worst and best rates - Span at 9 (left) and 10 (right)One can see that the detection rate on such a model is much better than the user-based or naive approach ones. In a manner of five minutes, for the span set to 10, the eight differences can be detected and reported. The full detection process took ~5 minutes with span set to 10 and ~8 minutes with the span set to 9. This shows how the platform can be used by automated processes as an efficient data provider. In addition, as the data are queried by the automated process, the detected primitive geometry is directly available, allowing all sorts of subsequent processes to take place.
As the deepest scale was set to 26, in one of the five measures session, one of the eight differences was not detected (at all). It means that the primitive on which a synthetic variation was introduced is smaller than 30cm and was then not reached by any index. This shows the importance of defining the spatial index and spans according to the processes needs. For example, increasing the deepest scale to 27 would allow reaching primitive down to ~15 cm over the whole Switzerland, and so on.
The following figures show the mean detection rate for the Switzerland triangle-based model. In this case, only two measure sessions were made to limit the time spent on this analysis :
Switzerland : The black curve shows the mean detection rate while the blue area gives the worst and best rates - Span at 9 (left) and 10 (right)The conclusion remain, but the rate is slower in this case as the model contains much more primitives than the line-based one. In this case, the full detection process took ~15 minutes with span set to 10 and ~20 minutes with the span set to 9. Again, in one of the two measure session, one difference was not detected due to the size of the primitive. Nevertheless, these results shows how the platform, seen as a process data provider, allows outperforming user-based and classic detection algorithms.
Such process-based strategy can be performed in many ways depending on the needs. For example, the index list can be limited to a specific area or set to focus on spread and defined locations (for example at the intersection of the Swiss hectometric grid). The following image gives a simple example of how the detected differences can be leveraged. As the geometry of the differences is known by the process, a summary of the differences can be provided through a simple map :
Example of a differences map based on the results of the detection process - Data : SRTMThe eight synthetic differences are easily presented allowing a user to analyze them more in detail in the platform interface for example. This map was created detecting the eight differences on the line-based Switzerland model in about 5 minutes with a span set to 10.
"},{"location":"TASK-DIFF/#conclusion-first-phase","title":"Conclusion : First Phase","text":"During this first phase, the difference detection algorithm was developed and validated on both line-based and triangle-based data. An efficient algorithm is then implemented in the platform allowing emphasizing differences between models at different temporal positions. The algorithm is able to perform the detection on the fly with good performances allowing the users to dynamically browse the data to detect and analyze the territory evolutions.
The performances of the detection algorithm allow the platform to be suitable for automated detection processes, as a data provider, answering large amounts of queries in an efficient and remote manner.
Two variations of the difference detection algorithm are implemented. The first version consists in highlighting the primitives that are subject to modifications over a time. This variation is suitable for automated processes that can rely on simple search methods to list the differences.
For the users, this first variation can lead to more difficult visual detection of the differences, especially in case the highlighted primitives are small or hidden by others. For this reason, visual markers were added on top of the highlighted primitives in order to be seen from far away, regardless of the primitives size. The measures sessions made during the user-based experiments showed a clear improvement of the detection rate when using the visual markers. This was especially true for triangle-based models, where the primitives bring occlusions.
The user-based experiments showed that using the platform interface, a human can significantly outperform the result of a naive automated process operating on the models themselves. The experiments showed that the user is able to efficiently search and find through space and time the evolutions of the territory appearing in the data.
Of course, as the model size and complexity increases, the user-driven interface starts to show its limits. In such a case, the process-based experiments showed that automated processes can take over these more complicated searches through methods allowing performing exhaustive detection over wide models in a matter of several minutes.
At this point, the developments and validations of the algorithm, and its variations, were conducted on synthetic modifications introduced in models using controlled procedures. The next phase focuses on formal data extracted from land registers.
"},{"location":"TASK-DIFF/#second-phase-true-variations","title":"Second Phase : True Variations","text":"In this second phase, also dedicated to vector-based models, the focus is set on applying the developed difference detection algorithm on true land register models. Two sets of data are considered in order to address short-term and long-term difference detection.
"},{"location":"TASK-DIFF/#selected-resources-and-models_1","title":"Selected Resources and Models","text":"In both cases, short-term and long-term, INTERLIS data are considered. A selection of tables in different topics is performed to extract the most interesting geometries of the land registering. For all models, the following colors are used to distinguish the extracted layers :
INTERLIS selected topics and tables colors - Official French and German designationsThe layers are chosen according to their geometric content. The color assignation is arbitrary and does not correspond to any official colorization standard.
"},{"location":"TASK-DIFF/#short-term-difference-detection-thurgau","title":"Short-Term Difference Detection : Thurgau","text":"For the short-term application of the difference detection algorithm, the case of the Thurgau canton is considered. Two set of INTERLIS data are considered that are very close in time, of the order of days. The selected layers are extracted from the source files before to be converted to the WGS84 frame using the EGM95-6 geoid model. The heights are restored using the SRTM topographic model. The following images give an illustration of the considered data :
Canton of Thurgau (left) and close view of Frauenfeld (right) - Data : Kanton ThurgauTwo INTERLIS models are considered with times 2020-10-13 and 2020-10-17, corresponding to the models gathering time. The following table gives the models size and primitives count :
Model Size (UV3) Primitive Count Thurgau 2020-10-13 203.7 Mio 3.8 M-Lines Thurgau 2020-10-17 203.8 Mio 3.8 M-LinesAs the two models are very close in time, they are very similar in size and content as the corrections count made during the considered time range is small.
"},{"location":"TASK-DIFF/#long-term-difference-detection-geneva","title":"Long-Term Difference Detection : Geneva","text":"For the long-term difference detection analysis, the Geneva case is selected as the canton of Geneva keeps a copy of each land register model for each month from at least 2009. This allows to compare INTERLIS models that are further away from each other from a temporal point of view. The selected layers are extracted and converted to the WGS84 coordinates system using the EGM96-6 geoid model. Again, the SRTM model is used to restore the heights. The following images give an illustration of the selected models :
Canton of Geneva in 2019-04 (left) and close view of Geneva in 2013-04 (right) - Data : SITGThe selected models are not chosen randomly along the time dimension. Models that corresponds to the Geneva LIDAR campaigns are selected as they are used in the next phase. In addition, as the LIDAR campaigns are well spread along the time dimension, the selected models are far away from each other in time, of the order of at least two years. The following table summarize the models size and primitives count :
Model Size (UV3) Primitive Count Geneva 2009-10 (MN03) 550.2 Mio 10.3 M-Lines Geneva 2013-04 407.0 Mio 7.6 M-Lines Geneva 2017-04 599.6 Mio 11.2 M-Lines Geneva 2019-04 532.6 Mio 9.9 M-LinesAs the temporal gaps between the models are much larger than for the Thurgau models, the size and primitive count show larger variations across the time, indicating that numerous differences should be detected on these data.
"},{"location":"TASK-DIFF/#models-statistical-analysis_1","title":"Models : Statistical Analysis","text":"As in the first phase, a statistical analysis of the Thurgau and Geneva models is conducted. The following figures show the line length distribution of the two Thurgau models :
Statistical analysis : Primitive size distribution, in meters, for the Thurgau 2020-10-13 (left) and 2020-10-17 (right)As expected, as the models are very similar, the distribution between both models is almost identical. In both cases, the distribution is centered around two meters and is mostly contained within the [0,5] range. The following figures show the same statistical analysis for the Geneva models, more spread along the time dimension :
Statistical analysis : Primitive size distribution, in meters, for the Geneva 2009-10 (top-left), 2013-04 (top-right), 2017-04 (bottom-left) and 2019-04 (bottom-right)One can see that the distribution varies more from a time to another. In addition, in comparison with the Thurgau models, the Geneva models tend to have smaller primitive, mostly distributed in the [0,1] range with a narrower distribution.
"},{"location":"TASK-DIFF/#results-and-analysis","title":"Results and Analysis","text":""},{"location":"TASK-DIFF/#short-term-thurgau","title":"Short-Term : Thurgau","text":"In the case of Thurgau data, the models are only separated in time by a few days. It follows that only a small amount of differences is expected. As an introduction, the following images show the overall situation of the difference detection between the two models. The differences are highlighted by keeping the primitive original color while identities are shown in dark gray to allow context conservation :
Overall view of difference detection : Thurgau (right) and Amriswil (left)As expected, as the two models are very close in time, only a limited amount of differences is detected. Such situation allows to have a clear view and understanding of each difference.
In order to analyze the results of the difference detection algorithm on real cases, selected differences, using the algorithm itself, are studied more in detail to emphasize the ability of the algorithm to detect and make the difference understandable for the user. As a first example, the case of the Bielackerstrasse in Amriswil is considered and illustrated by the following images :
Example of difference detection : Bielackerstrasse in Amriswil - 2020-10-17 (right) and 2020-10-13 (left) as primary timeIn this case, new buildings are added to the official land register. As the 2020-10-17 is selected as primary, the highlighted elements correspond the footprint of the added buildings. When the 2020-10-13 time is set as primary, as it does not contain the building footprints, the highlighted elements only corresponds to the re-measured elements for land register correction. This illustrates the asymmetry of the difference detection algorithm that only highlight primitives of the primary time.
In addition, by keeping the color of the highlighted primitives, the difference detection algorithm allows to immediately see that three layers of the land register have been affected by the modification (German : Einzelobjekte, Flaechenelement Geometrie; Bodenbedeckung, BoFlaeche Geometrie; Einzelobjekte, Linienelement). The following images show the respective situation of the 2020-10-13 and 2020-10_17 models :
Situation of Bielackerstrasse in Amriswil - 2020-10-17 (right) and 2020-10-13 (left)This confirms the analysis deduced from the difference detection algorithm that a group of new buildings are added to the land register. In this example, if the inner road was not re-measured, at least on some portion, the difference detection with 2020-10-13 as primary time would have shown noting.
To illustrate the asymmetry of the algorithm more clearly, the example of Mammern is considered. On the following image, the result of the difference detection is illustrated with both time chosen successively as primary :
Example of difference detection : Mammern - 2020-10-17 (right) and 2020-10-13 (left) as primary timeOn this specific example, one can see that choosing the 2020-10-17 time as primary, which is the most recent time, nothing is highlighted by the detection algorithm. But when the 2020-10-13 time is set as primary, a specific element appears as highlighted, showing an evolution of the land register. This example illustrates the deletion of a sequence of primitive of the property (German : Liegenschaften, ProjLiegenschaft Geometrie) layer of the land register, which then only appear as the oldest time is set as primary. The following images show both time situation :
Situation of Mammern - 2020-10-17 (right) and 2020-10-13 (left)This example shows the opposite situation of the previous one, where elements were deleted from the land register instead of added.
As a last example, an in-between situation is selected. The case of the Trungerstrasse in M\u00fcnchwilen is considered and illustrated by the following images showing both time as primary :
Example of difference detection : Trungerstrasse in M\u00fcnchwilen - 2020-10-17 (right) and 2020-10-13 (left) as primary timeThis situation is in-between the two previous one as nothing really appeared and nothing really disappeared from the land register. A modification was made on the situation of this specific property and so, appear no matter which of the two times is selected as primary. The following images show the formal situation of the land register for the two times :
Situation of Trungerstrasse in M\u00fcnchwilen - 2020-10-17 (right) and 2020-10-13 (left)One can see that the correction made are around the pointed house, as the access road of the rear delimitation. For this type of situation, the algorithm recover some kind of symmetry, as the selected time as primary does is not relevant to detect the difference.
To conclude this short-term difference detection analysis, the efficiency of visual markers is illustrated on the region of Romanshorn and Amriswil on the following images. Both images show the difference detection rendering without and with the visual markers :
Illustration of difference detection without (right) and with (left) visual markers - 2020-10-17 as primary time for both imagesOne can see that, for small highlighted primitive, the usage of visual markers eases the differences view for the user. Of course, as the highlighted primitive are big enough, or if the point of view is very close to the model, the efficiency of the visual markers decreases.
"},{"location":"TASK-DIFF/#long-term-geneva","title":"Long-Term : Geneva","text":"Considering the Geneva land register, the compared model are much more spread along the time dimension, leading to a much richer difference model. Starting with the 2019-04 and 2017-04 models, the following images gives an overview of the detected differences on the whole canton :
Overall view of difference detection between Geneva 2019-04 and 2017-04 models with 2019-04 as primaryOn this example, one can see that a much larger amount of differences is detected as the model are separated by two years. As the first observation, one can see that large portions of the model seems to have entirely moved between the two dates. Three of these zones are clearly visible on the images above as all their content is highlighted by the difference detection algorithm : the superior half of the Geneva commune, the Carouge commune and the left half of the Plan-les-Ouates commune, but more can be seen, looking more closely.
These zones have been subjected to correction during the time interval separating the two models. These corrections mainly comes from the FINELTRA [1] adjustment used to ensure conversion between the old Swiss coordinates system MN03 and the MN95 current standard. As these corrections operate on each coordinate, the whole area is then modified of the order of a few centimeters. In these condition, the whole area is then highlighted by the difference detection algorithm as illustrated by the following image on the Carouge commune :
Closer view of the Carouge 2019-04 and 2017-04 differences with 2019-04 as primaryOn this closer view, one can see that almost all the primitive of this specific commune have been corrected. Some exceptions remain. It is the case of the train tracks for example, that appear as static between the two models. Looking more closely, one can also observe that some primitive were not affected by the correction.
Looking at the areas that have not been corrected through the FINELTRA triangular model, one can see that a lot of modification appear. For example, the following two images gives the differences of the Geneva historical part and the Verbois dam :
Closer view of the Historical city (left) and Verbois dam (right) 2019-04 and 2017-04 differences with 2019-04 as primaryOne can see that, despite very few elements truly changed, a lot of primitives are highlighted as differences. This can be explained by a constant work of correction based on in-situ measurement. Some other factors can also explain these large amount of differences such as scripts used to correct the data to bring them in the expected Swiss standards.
In such context, detected real changes of the territory is made much more complicated, as large amounts of detected differences are due to corrections of the model itself, without underlying true modification on the territory. Nevertheless, differences that corresponds to a true territory modification can be found. The following images show an example on the Chemin du Signal in Bernex :
Differences on Chemin du Signal in Bernex with 2019-04 (left) and 2017-04 (right) as primaryThese differences can be detected by the user on the difference model as they appear more clearly due to an accumulation of highlighted primitives. Indeed, in case of simple correction, the highlighted primitive appear more isolated. The following images give the formal situation for the two times :
Situation of Chemin du Signal in Bernex in 2019-04 (left) and 2017-04 (right)On this example, one can see that, with both time as primary, the territory evolution can be seen by the user as the highlighted primitives are more consistent. Nevertheless, territory changes are more difficult to list in such a case than in the previous short-term analysis. The following images give two example of visible territory changes in the difference model :
La Gradelle (left) and Puplinge (right) 2019-04 and 2017-04 differences with 2019-04 as primaryOn the previous left image, a clear block of buildings can be seen as more highlighted than the rest of the difference model and correspond to new building. On the right of this block, a smaller one can also be seen that also corresponds to new buildings. On the right images, a clear block of new buildings is also visible, as more highlighted. In such a case, the user has more effort to perform in order to detect the differences that correspond to true changes in the territory, the differences model showing the land register modification in the first place rather than of the proper territory evolution.
Considering the 2013-04 model, similar observations apply with stronger effect due to the larger temporal gap. The difference models are dominated by correction made to the model rather than proper territory changes. Comparing the 2017-04 and 2013-04 lead to even more difficult detection of these true modification, as the correction are widely dominating the difference models.
The case of the 2009-10 model is made even worse by its coordinates system, as it is expressed in the old MN03 coordinates system. This model is made very difficult to compare with the three others, expressed in the MN95 frame, as all its primitives are highlighted in difference models due to the conversion performed between the MN03 and MN95 frames. Comparing the 2009-10 model with the 2013-04 lead to no primitive detected as identity, leaving only differences.
"},{"location":"TASK-DIFF/#conclusion-second-phase","title":"Conclusion : Second Phase","text":"Two cases have been addressed in this phase showing each specific interesting application of the difference detection applied on land register data through the INTERLIS format. Indeed, short and long term differences emphasize two different points of view according to the analysis of the land register and its evolution in time.
In the first place, the short term application clearly showed how difference detection and their representation opens a new point of view on the evolution of the land register as it allows focusing on clear and well identified modifications. As the compared models are close in time, one is able to produced differences models allowing to clearly see, modification by modification, what happened between the two compare situations, allowing focusing on each evolution to fully understand the modification.
It follows that this short-term difference detection can provide a useful approach for the user of the land register that are more interested in the evolution of the model rather than in the model itself. The difference models can provide users a clear a simple view on what to search and to analyze to understand the evolution of such complex models. In some way, the differences on land register models can be seen as an additional layer proposed to the user to allow him to reach information that are not easy to extract from the models themselves.
The case of Geneva, illustrating the long-term difference detection case, showed another interesting point of view. In the first place, one has to understand that land register models are complex and living models, not only affected by the transcription of the real-world situation across the time.
Indeed, on the Geneva models, a large amount of differences is detected even on a relative short period of time (two years). In addition to the regular updates, following the territory evolution, a large amount of corrections is made to keep the model in the correct reference frame. The Swiss federal system can also add complexity, as all Cantons have to align themselves on a common set of expectations.
In such a case, the difference detection turned out to be an interesting tool to understand and follows the corrections made to the model in addition to the regular updates. On the Geneva case, we illustrated that, by detecting it in the difference model, the correction on the coordinates frame on large pieces of the territory. This shows how the difference detection can be seen as a service that can help to keep track of the life of the model by detecting and checking these type of modifications.
As a result, difference detection can be a tool for the user of the land register but can also be a tool for the land register authorities themselves. The difference models can be used to check and audit the evolution of the models, helping the required follow-up on the applied correction and updates.
"},{"location":"TASK-DIFF/#third-phase-point-based-models","title":"Third Phase : Point-Based Models","text":"In this third and last phase, the developed algorithm for difference detection on vector models is tested on point-based ones. As mentioned in the introduction, the platform was already implementing logical operators allowing comparing point-based models across time. As illustrated in the introduction, only the AND operator allowed emphasizing differences, but rendering them as missing part of the composite models. It was then difficult for the user to determine and analyze those differences.
The goal of this last phase is to determine in which extend the developed algorithm is able to improve the initial results of point-based logical operators and how it can be adapted to provide better detection of differences.
"},{"location":"TASK-DIFF/#selected-resources-and-models_2","title":"Selected Resources and Models","text":""},{"location":"TASK-DIFF/#point-based-models-lidar","title":"Point-Based Models : LIDAR","text":"Smaller data-sets are considered as point-based models are usually much larger. The city of Geneva is chosen as an example. Four identical chunks of LIDAR data are considered covering the railway station and its surroundings. The four models correspond to the digitization campaigns of 2005, 2009, 2013 and 2017. The data are converted from LAS to UV3 and brought to WGS84 using the EGM96-5 geoid model. The following images give an overview of the selected models :
Point-based data-sets : Geneva LIDAR of 2005 (left) and 2009 (right) - Data : SITGThe following table gives a summary of the models sizes and primitive count :
Model Size (UV3) Primitive Count Geneva 2005 663.2 Mio 24.8 M-Points Geneva 2009 1.2 Gio 46.7 M-Points Geneva 2013 3.9 Gio 4.2 G-Points Geneva 2017 7.0 Gio 7.5 G-PointsThe color of the models corresponds to the point classification. In addition, the models have a density that considerably increases with time, from 1 points/m^2 (2005) to 25 points/m^2 (2017). This disparity of density is considered as part of the sampling disparity, leading to a set of data very interesting to analyze and benchmark the difference detection algorithm.
"},{"location":"TASK-DIFF/#models-statistical-analysis_2","title":"Models : Statistical Analysis","text":"As for line and triangle-based models, a statistical analysis of the point-based models is performed. The analysis consists in computing an approximation of the nearest neighbor distance distribution of points. The following figure shows the distribution of the 2005 and 2009 models :
Statistical analysis : Nearest neighbor distribution approximation of the 2005 (left) and 2009 (right) modelsand the following figure shows the results for the 2013 and 2017 models :
Statistical analysis : Nearest neighbor distribution approximation of the 2013 (left) and 2017 (right) modelsThe nearest neighbor distribution tends to go toward zeros with the year of acquisition, showing that modern models are significantly denser that the older ones, making these models interesting for the difference detection algorithm analysis.
"},{"location":"TASK-DIFF/#differences-detection-algorithm-direct-application-on-point-based-models","title":"Differences Detection Algorithm : Direct Application on Point-Based Models","text":"In order to determine the performances of the difference detection algorithm on the selected point-based models, the algorithm is simply applied without any adaptation on the data-sets and the results are analyzed. The following images give an overview of the obtained results comparing the 2005 and 2009 models :
Application of the difference detection algorithm on point-based models : Geneva model of 2005 and 2009 with 2005 as primary (left) and inversely (right) - Data SITGOne can see that the obtained results are very similar to the results obtained with the previously implemented XOR logical operator. The only differences is that the identical points are shown (in dark gray) along with the highlighted points (showing the differences). The same conclusion applies : the obtained composite model is difficult to read as it is dominated by sampling disparities. One can, by carefully looking at the model, ending up detecting large modifications by searching for highlighted points accumulation. In addition, taking one model or the other as primary for the algorithm does not really help as shown on the images above. The same conclusion applies even when the two compared models comes with a similar point density as the 2013 and 2017 models :
Application of the difference detection algorithm on point-based models : Geneva model of 2013 and 2017 with 2013 as primary (left) and inversely (right) - Data SITGOne can nevertheless observe that choosing the less dense model as primary leads to results a bit more clear for difference detection, but remaining very hard to interpret for a user, and much more for automated processes.
In addition, the performances of the algorithm are very poor as point-based models are much denser in terms of primitives than line or triangle-based models. These reasons lead to the conclusion that the algorithm can not be directly used for point-based models and need a more specific approach.
"},{"location":"TASK-DIFF/#differences-detection-algorithm-adaptation-for-point-based-models","title":"Differences Detection Algorithm : Adaptation for Point-Based Models","text":"In order to adapt the difference detection algorithm for point-based models, two aspects have to be addressed : the efficiency of the detection and the reduction of the sampling disparities over-representation, which are both server-side operations.
The problem of efficiency can be solved quite easily if the adaptation of the difference detection algorithm goes in the direction of logical operators, for which an efficient methodology is already implemented. Solving the sampling disparity over-representation is more complicated.
The adopted solution is inspired from a simple observation : the less deep (density of cells) the queries are, the clearer the obtained representation is. This can be illustrated by the following images showing the 2005 model compared with the 2009 one with depth equal to 7, 6 and 5, from left to right :
Example of decreasing query depth on the comparison of 2005 and 2009 models - Data SITGThis is expected, as the sampling disparities can only appear at scales corresponding to the nearest neighbor distribution. Nevertheless, as the depth is decreased, the models become less and less dense. The increase of difference readability is then compensated by the lack of density, making the structures more difficult to identify, and then, their subsequent modifications. The goal of the algorithm adaptation is to keep both readability and density.
To achieve this goal, the implementation of the previous XOR operator is considered as a base, mostly for its efficiency. As the XOR simply detects if a cell of the space-time discretization at a given time is in a different state as its counterpart at another time, it can be modulated to introduce a scale delay mechanism that only applies detection on low-valued scales, broadcasting their results to their daughter cells. This allows to preserve the density and to perform the detection only on sufficiently shallow scales to avoid sampling disparities to become dominant.
The question is how to operate the scale delay according to the scale itself. Indeed, with large points of view, the delay is not necessary as the model is viewed from far away. The necessity of the scale delay appears as the point of view is reduced, and, the more it is reduced, the larger the scale delay needs to be. A scale-attached delay is then defined to associate a specific value for each depth.
"},{"location":"TASK-DIFF/#results-and-experiments_1","title":"Results and Experiments","text":"The adaptation of the difference detection algorithm for point-based models is analyzed using the selected data-sets. An overview of its result is presented before a more formal analysis is made using difference detection made on line-based official land register data to be compared with the differences on point-based models.
"},{"location":"TASK-DIFF/#differences-detection-overview","title":"Differences Detection : Overview","text":"Considering the two first models, from 2005 and 2009 campaigns, the following images shows the results of the initial version of the difference detection algorithm (similar to XOR operator) and its adapted version implementing the scale delay :
Differences detection on 2005 and 2009 models with 2005 as primary - Left : without scale delay - Right : with scale delay - Data SITGOne can see how scale delay is able to drastically reduce the effect of sampling disparities while comparing two point-based models. The effect is more obvious as the 2009 model is set as primary for difference detection :
Differences detection on 2005 and 2009 models with 2009 as primary - Left : without scale delay - Right : with scale delay - Data SITGThis improvement gets more clear as the point of view is reduced. The following image shows the initial algorithm and the scale delay algorithm on a specific area of the city with 2005 as primary model :
Differences detection on 2005 and 2009 models with 2005 as primary - Left : without scale delay - Right : with scale delay - Data SITGBy inverting the model roles and making the 2009 model primary for difference detection lead to similar results :
Differences detection on 2005 and 2009 models with 2009 as primary - Left : without scale delay - Right : with scale delay - Data SITGConsidering the denser models of 2013 and 2017 campaigns, the results of the scale delay introduction also lead to a better understanding of the differences as shown on the following images :
Differences detection on 2013 and 2017 models with scale delay - Left : 2013 as primary - Right : 2017 as primary - Data SITGNevertheless, one can see that scale delay is not able to get rid entirely of sampling disparities. The right image above, comparing the 2017 model to the 2013 one, shows sampling disparities being highlighted as differences on the wall of the building in the background. This does not affect too much the user readability, but still make the model a bit more complicated to understand.
In addition, the models play an important role in the way differences can be detected through classic approach. For example, focusing on a specific building, the obtained highlighted differences :
Differences detection on 2013 and 2017 models with scale delay with 2013 (left) and 2017 (right) as primary - Data SITGcould lead the user to consider the building wall as a difference. Looking at the formal situation in both 2013 and 2017 models :
Structural situation in 2013 (left) and 2017 (right) - Data SITGOne can see that the detected difference comes from the missing wall on the 2013, and not from a formal evolution of the building. This example illustrates that sampling disparity is not the only factor that could reduce the readability of the model for the user.
"},{"location":"TASK-DIFF/#differences-detection-comparison-with-land-register-differences","title":"Differences Detection : Comparison with Land Register Differences","text":"As the algorithm is already tested for land register models, one can use its results on these data in order to put them into perspective of the detected differences on point cloud. As the methodology is not the same for vector-based and point-based models, it is interesting to see the coherence and deviations of both approaches.
One important thing to underline, is that difference detection in land register model does not detect changes in the environment directly, but detects the revision of the land register itself, as discussed in the previous phase. Of course, land register models evolve with environment, but come also with a large amount of modifications that only represent corrections of the model and not formal changes in the environment. This reason reinforces the interest to but point-based model difference detection with the land register models ones.
In the previous phase, the land register models of Geneva were selected to be the closest to the LIDAR campaigns. It follows, that these models can be directly used here, as each corresponding to the compared point-based model of this phase.
As a first example, the following case is studied : Rue de Bourgogne and Rue de Lyon. In this case, looking at the following images giving the situation in 2013-04 and 2017-04 through the LIDAR models, that an industrial building was partially demolished.
Structural situation in 2013 (left) and 2017 (right) - Data SITGThe following images show the differences computed on both point-based and line-based models :
Difference models between 2013 and 2017 of LIDAR (left) and INTERLIS (right), with 2013 as primary - Data SITGOne can clearly see that the difference detection on the LIDAR models correctly emphasized a true structural difference between the two times. The situation is much less clear on the land register model. Indeed, as the time separating the two models is quite high, four years in such a case, a large mount of corrections dominates the difference model, leading to a difficult interpretation of the building situation change. The following images give the situation of the land register model in 2013 and 2017 that lead to the difference model above :
Land register situation in 2013 (left) and 2017 (right) - Data SITGLooking at the land register models, one can also see that such large scale modification of the building situation does not appear clearly. Indeed, it takes some effort to detect minor changes on the two models, without leading to a clear indication of the modification. This shows how the LIDAR and its differences can help to detect and analyze differences in complement to the land register itself.
Considering the second example, Avenue de France and Avenue Blanc, the following images give the structural situation of the two times as capture by the LIDAR campaigns :
Structural situation in 2013 (left) and 2017 (right) - Data SITGOne can clearly see the destruction of the two 2013 buildings replaced by a parking lot in 2017. The detected differences on the LIDAR and land register models are presented on the following images :
Difference models between 2013 and 2017 of LIDAR (left) and INTERLIS (right), with 2013 as primary - Data SITGAgain, despite the differences are clearly and correctly highlighted on the LIDAR differences model, the situation remains unclear on the differences model of the land register. Again, one can observe that the land register was highly corrected between the two dates, leading to difficulties to understand the modification and its nature. Looking at the land register respective models :
Land register situation in 2013 (left) and 2017 (right) - Data SITGthe modification appears a bit more clearly. One can clearly see the disappearance of the two 2013 buildings in the land register replaced by a big empty area. Again, the difference detection on LIDAR seems clearly more relevant to detect and analyze structural differences than the land register itself.
An interesting example is provided by the situation just east of the Basilique Notre-Dame. The two situations as captured by the LIDAR campaigns are presented on the following images :
Structural situation in 2013 (left) and 2017 (right) - Data SITGOne can observe two structure mounted on top of two buildings roof in the 2013 situation. These structures are used to ease the work that has to be performed on the roofs. These structures are no more present in the 2017 situation. The following images give the difference detection models for the LIDAR and land register :
Difference models between 2013 and 2017 of LIDAR (left) and INTERLIS (right), with 2013 as primary - Data SITGIn such a case, as the structural modification between 2013 and 2017 occurs on top of the buildings, their footprints is not affected and the differences have no chance to appear in the land register models, even looking at them individually as in the following images :
Land register situation in 2013 (left) and 2017 (right) - Data SITGThis is another example where the LIDAR difference detection lead to more and clearer information on the structural modification that appear on Geneva between the two times.
"},{"location":"TASK-DIFF/#conclusion-third-phase","title":"Conclusion : Third Phase","text":"The main element of this third phase conclusion is that difference detection on point-based models is less straightforward than for other models. Indeed, applied naively, the algorithm is dominated by the sampling disparities of the compared models. This illustrate that point-based models, being a close mirror of the true territory state, have a large information density that is more difficult to reach, especially from their evolution point of view.
Nevertheless, we showed that the algorithm can be adapted, with relatively simple adjustments, to perform well on point-based models difference detection problem. The implemented algorithm is able to track and represent the differences appearing between the models in a useful and comprehensive way for users. The proposed example showed that the differences models are able to guide the user toward interesting structural changes in the territory, with a clear view of the third dimension.
Of course, the highlighted differences in point-based models are more complex and required a trained user that is able to interpret correctly the detail of the highlighted part of the model. The trees are a good example. As the tree re-grow each year, they will always appear as a differences in the compared models. A user only interested in building changes has to be aware of that and be able to separate the relevant differences from the others.
Following the comparison between LIDAR and land register (INTERLIS) differences models, a very surprising conclusion appear. In the first place, one could stand that land register is the proper way of detected changes that can be then analyzed more in detail in point-based differences models. In turns out that to opposite is true. Several reason explain this surprising situation.
In the first place, LIDAR are available only with large temporal gaps between them, at least two/three years. This allows the land register models to be filled with large amount of updates and correction, leading the differences model on this temporal gap to be filled with much more than structural modification. In addition, the LIDAR models come with the third dimension where the land register models are flat. The third dimension comes with large amount of differences that can not be seen in the land register.
To some extend, the land register, and its evolution, is the reflect of the way the territory is surveyed, not the reflect of the formal territory evolution. In the opposite, as LIDAR models are a structural snapshot of a territory situation, the analyze of their differences across the time lead to a better tracking of the formal modification of the real world.
"},{"location":"TASK-DIFF/#conclusion","title":"Conclusion","text":""},{"location":"TASK-DIFF/#first-phase","title":"First Phase","text":"In the first phase, the difference detection algorithm was implemented for vector models and tested using synthetic differences on selected models. The results showed the interest of the obtained differences models to emphasize evolution of models from both user and process points of view. It was demonstrated that the information between models exists and can be extracted and represented in a relevant way for both users and processes.
"},{"location":"TASK-DIFF/#second-phase","title":"Second Phase","text":"In the second phase, the difference detection algorithm was tested on the Swiss land register models on which the results obtained during the first phase were confirmed. The differences models are able to provide both user and process a clear and understandable view of the modification brought to the models.
In addition, through the short and long-term perspectives, it was possible to demonstrate how the difference detection algorithm is able to provide different points of view on the model evolution. From a short-term perspective, the differences models are able to provide a clear and individual view of the modification while the long-term perspective allows to see the large scale evolution and transformation of the models. It follows that the difference models can be used as a tool for various actors using or working with the land register models.
"},{"location":"TASK-DIFF/#third-phase","title":"Third Phase","text":"In the third phase, the difference detection algorithm, developed on vector models, was applied on point-based models, showing that a direct application on these models lead to the same issue as the logical operators : the differences models are dominated by sampling disparities, making them complicated to read. The solution of scale delay brought to the algorithm allowed to produce much clearer differences models for point-based data, allowing to generalize the difference detection on any models.
In addition to these results, the comparison of difference models on land register and on their corresponding LIDAR point-based models showed an interesting result : for structural changes, the point-based models lead to much more interesting results through the highlighted differences. Indeed, as land register models, considered long term perspective, are dominated by a large amount of corrections and adjustments in addition to territory evolution updates, making the structural changes not easy to detect and understand. The differences models are more clear with point-based models form this point of view.
In addition, as point-based models, such as LIDAR, come with the third dimension, a large amount of structural differences can only be seen through such data as many structural changes are made along the third dimension. It then follows that difference detection applied to point-based models offers a very interesting point of view for the survey of territory structural changes.
"},{"location":"TASK-DIFF/#synthesis","title":"Synthesis","text":"As a synthesis, it is clear that models are carrying a large amount of richness themselves, that is already a challenge to exploit, but it is also clear that a large amount of information can be found between the versions of the models. The difference detection algorithm brings a first tool that demonstrate the ability to reach and start to exploit these informations.
More than the content of the models itself, the understanding of the evolution of this content is a major topic especially in the field of geodata as they represent, transcript, the evolution of the surveyed territory. It then appears clear that being able to reach and exploit the information contained in-between the models is a major advantage as it allows understanding what are these models, that is four dimensional objects.
"},{"location":"TASK-DIFF/#perspectives","title":"Perspectives","text":"Many perspectives are opened following the implementation and analysis of the difference detection. Several perspectives, mostly technical, are presented here as a final section.
In the first place, as raster are entering the set of data that can be injected in the platform, evolution of the difference detection could be applied to the platform, taking advantage of the evolution of machine learning. The possibility of detected differences in images could lead to very interesting perspective through the data communication features of the platform.
Another perspective could be to allow the platform to separate the data into formal layers, the separation being only currently ensure by type and times. Splitting data into layers would allow applying difference detection in a much more controlled manner, leading to difference models focused on very specific elements of the model temporal evolution.
The addition of layer could also be the starting point to the notion of data convolution micro language. Currently, data communication and difference detection only apply through the specification of two different and parallel navigation time. The users, or processes, have to specify each of the two time position in order to obtain the mixed of differences models they need.
An interesting evolution would be to replace these two navigation time by a small and simple micro language allowing the user to compare more than two times in a more complex manner. This could also benefit from data separation through layer. Such micro language could allow to compare two, three or more models, or layers, and would also open the access the mixed models of differences models such as comparing the difference detection between point-based and vector-based models, which would then be a comparison of a comparison.
"},{"location":"TASK-DIFF/#reproduction-resources","title":"Reproduction Resources","text":"To reproduce the presented experiments, the STDL 4D framework has to be used and can be found here :
You can follow the instructions on the README to both compile and use the framework.
Only part of the considered datasets are publicly available. For the OpenStreetMap datasets, you can download them from the following source :
For the Swiss 3D buildings model, you can contact swisstopo :
For the land register datasets of Geneva and Thurgau, you can contact the SITG and the Thurgau Kanton :
INTERLIS land register, Thurgau Kanton
INTERLIS land register, SITG (Geneva)
The point-based models of Geneva can be downloaded from the SITG online extractor :
To extract and convert the data from planimetric shapefiles, the following code is used :
where the README gives all the information needed. In case of shapefile containing 3D models, please ask the STDL for advice and tools.
To extract and convert the data from INTERLIS and LAS, the following codes are used :
INTERLIS to UV3 (dalai-suite), STDL/EPFL
LAS to UV3 (dalai-suite), STDL/EPFL
where the README gives all the information needed.
For the 3D geographical coordinates conversion and heights restoration, we used two STDL internal tools. You can contact the STDL to obtain the tools and support in this direction :
ptolemee-suite : 3D coordinate conversion tool (EPSG:2056 to WGS84)
height-from-geotiff : Restoring geographical heights using topographic GeoTIFF (SRTM)
You can contact STDL for any question regarding the reproduction of the presented results.
"},{"location":"TASK-DIFF/#auxiliary-developments-corrections","title":"Auxiliary Developments & Corrections","text":"In addition to the main developments made, some additional scripts and other corrections have been made to solve auxiliary problems or to improve the code according to the developed features during this task. The auxiliary developments are summarized here :
Correction of socket read function to improve server-client connectivity.
Creation of scripts that allows to insert synthetic modifications (random displacements on the vertex coordinates) on UV3 models.
Creation of a script to convert CSV export from shapefile to UV3 format. The script code is available here.
Adding temporary addresses (space-time index) exportation in platform 3D interface.
Correction of the cell enumeration process in platform 3D interface (wrong depth limit implementation).
Creation of a script allowing segmenting UV3 model according to geographical bounding box.
Creation of C codes to perform statistical analysis of the point, line and triangle-based models : computation of edge size and nearest neighbor distributions.
Creation of a C code allowing enumerating non-empty cell index over the Switzerland models injected in the platform.
Creation of a C code allowing to automate the difference detection based on an index list and by searching in the data queried from the platform.
Developments of various scripts for plots and figures creations.
[1] REFRAME, SwissTopo, https://www.swisstopo.admin.ch/de/karten-daten-online/calculation-services.html
"},{"location":"TASK-IDET/","title":"Object Detection Framework","text":"Alessandro Cerioni, Etat de Geneve - Cl\u00e9mence Herny, Exolabs - Adrian F. Meyer, FHNW - Gwena\u00eblle Salamin, Exolabs
Published on November 22, 2021 Updated on December 12, 2023
Abstract: The STDL develops a framework allowing users to train and use deep learning models to detect objects from aerial images. While relying on a general purpose third-party open source library, the STDL's framework implements an opinionated workflow, targeting georeferenced aerial images and labels. After a brief introduction to object detection, this article provides detailed information about this framework. References to successful applications are provided along with concluding remarks.
"},{"location":"TASK-IDET/#introduction","title":"Introduction","text":"Object detection is a computer vision task which aims at detecting instances of objects of some target classes (e.g. buildings, swimming pools, solar panels, ...) in digital images and videos.
According to the commonly adopted terminology, a distinction is made between the following tasks:
This distinction is well illustrated by the bottom half of the following image:
Object Detection vs Instance Segmentation. Image credit: Waleed Abdulla.
Significant progress has been made over the past decades in the domain of object detection and instance segmentation (see e.g. this review paper). Applications of object detection methods are today popular also in consumer products: for instance, some cars are already capable of detecting and reading speed limit signs; social media applications integrate photo and video effects based on face and pose detection. All these applications usually rely on deep learning methods, which are the subset of machine learning methods leveraging deep neural networks. While referring the reader to other sources for further information on these methods (see e.g. these lecture notes), we wish to highlight a key point in all these learning-based approaches: no rigid, static, human-engineered rule is given to the machine to accomplish the task. Instead, the machine is provided with a collection of input-output pairs, where the output represents the outcome of a properly solved task. As far as object detection is concerned, we provide deep learning algorithms with a set of images accompanied by reference annotations (\"ground truth labels\"), which the machine is expected to reproduce. Things become particularly interesting when the machine learns how to generate acceptable detections/segmentation on previously unseen images; such a crucial ability is referred to as \"generalization\".
A generic framework is being developed within the STDL, allowing the usage of state-of-the-art machine learning methods to detect objects from aerial images. Among other possible applications, such framework allows one to leverage aerial images to provide valuable hints towards the update of cadastral information.
At its core, the STDL's object detection framework is powered by Detectron2, a Python library developed by the Facebook Artificial Intelligence Research group and released under the Apache 2.0 open-source license. Detectron2 features built-in methods to train models performing various tasks, object detection and instance segmentation to name a few. Our framework includes pre- and post-processing scripts allowing to use Detectron2 with georeferenced images and labels.
The workflow goes through the steps described here-below.
"},{"location":"TASK-IDET/#workflow","title":"Workflow","text":""},{"location":"TASK-IDET/#1-tileset-generation","title":"1. Tileset generation","text":"Typically, aerial coverages are made accessible through web services, publicly or privately. While making opaque to the user the server-side tiling and file-based structure, these web services can efficiently generate raster images on-demand depending on the parameters sent by the requesting client. These parameters include:
GIS tools such as QGIS and ArcGIS Pro as well as Web Applications powered by Web Mapping clients such as Leaflet, OpenLayers, MapLibre GL, etc. actually rely on this mechanism to let end users navigating through tons of bits in quite a seamless, fluent, reactive way. As a matter of fact, zooming in and out in such 2D scenes amounts to fetching and visualizing different images depending on the zoom level, instead of \"simply\" increasing/decreasing the size of the various image pixels as displayed on screen.
Through this 1st step, several requests are issued against a web service in order to generate a consistent set of tiled images (\"tileset\") covering the area of interest (AoI), namely the area over which the user intends to train a detection model and/or to perform the actual object detection. Connectors for the following web services have been developed so far:
Except when using the XYZ connector, our framework is agnostic with respect to the tiling scheme. The user has to just provide an input file compliant with some requirements. We refer the user to the code documentation for detailed information.
Concerning the AoI and its extension, the following scenarios are supported:
In the case of scenarios no. 1 and 3, ground truth labels are necessary. Provided by the user as polygons in some geographic coordinate system, these polygons are then mapped onto each image coordinate system - the latter ranging from (0, 0)
to (<image width in pixels> - 1, <image height in pixels> - 1)
- in order to generate ground truth segmented images. Such a mapping is achieved by applying an affine transformation and encoded using the COCO format, which is natively supported by Detectron2. Labels can optionally be provided in the case of inference-only scenarios as well, should the user be willing to check non-ground truth labels against detections and vice versa.
As mentioned above, machine learning models are valuable as far as they do not \"overfit\" to the training data; in other words, as far as they generalize well to new, unseen data. One of the techniques which are commonly used in order to prevent machine learning algorithms from overfitting is the \"train, validation, test split\". While referring the interested reader to this Wikipedia page for further details, let us note that a 70%-15%-15% split is currently hard-coded in our framework.
Various independent COCO tilesets are generated, depending on the scenario:
in training-only scenarios, three COCO tilesets are generated:
trn
;val
);tst
).For the time being, training, validation and test tiles are chosen exclusively among the tiles within the AoI which include one or more ground truth labels.
In inference-only scenarios, a single COCO tileset labeled as \"other\" is generated (oth
).
In training AND inference scenarios, the full collection of tilesets is generated: trn
, val
, tst
, oth
.
The 1st step provides a collection of tiled images, sharing the same size and resolution, plus the corresponding COCO files (trn
+ val
+ tst
and/or oth
depending on the scenario).
The 2nd step performs the actual training of a predictive model, iterating over the training dataset. As already mentioned, we delegate this crucial part of the process to the Detectron2 library; support for other libraries may be implemented in the future, if suitable. Detectron2 comes with a large collection of pre-trained models tailored for various tasks. In particular, as far as instance segmentation is concerned, pre-trained models can be selected from this list.
In our workflow, we setup Detectron2 in such a way that inference is made on the validation dataset every N training iterations, N being an user-defined parameter. By doing this, we can monitor both the training and validation losses all along the iterative learning and decide when to stop. Typically, learning is stopped when the validation loss reaches a minimum (see e.g. this article for further information on early stopping). As training and validation loss curves are somewhat noisy, these curves can be smoothed on the fly in order to reveal steady trends. Other metrics may be tracked and used to decide when to stop. For now, within our framework (early) stopping can be done manually and is left to the user; it will be made automatic in the future, following some suitable criterion.
Training and validation losses in a sample object detection task. In this case, one could stop the training after the first ~1400 iterations. Note that, in this example, the validation loss is evaluated every 200 iterations.
Let us note that the learning process is regulated by several parameters, which are usually called \"hyperparameters\" in order to distinguish them from the learned \"parameters\", the latter being - in our deep learning context - the coefficients of the many neurons populating the various layers of the deep neural network. In successful scenarios, the iterative learning process does actually lower the validation loss until a minimum value is reached. Yet, such a minimum is likely to be a \"local\" one (i.e. relative to a given set of hyperparameters); indeed, the global minimum may be found along a different trajectory, corresponding to a different set of hyperparameters. Actually, even finding the global minimum of the validation loss could be not as relevant as checking how different models compare with each other on the common ground of more meaningful \"business metrics\". Our code does not implement any automatic hyper-parameter tuning, it just outputs business metrics, as explained here-below.
"},{"location":"TASK-IDET/#3-detection","title":"3. Detection","text":"The model trained at the preceding step can be used to perform the actual object detection or instance segmentation over the various tilesets concerned by a given study:
Depending on the configuration, Detectron2 can perform either object detection and instance segmentation at once, or object detection only. In both cases, every detection is accompanied by the following information:
In the case of object detection only, a bounding box is output as a list of vertices relative to the image coordinate system. In the case of instance segmentation, detections are also output as binary masks, one per input tile/image, in which pixels belonging to target objects are encoded with ones whereas background pixels are encoded with zeros. Our code can then generate a vector layer out of these binary masks. Optionally, polygons can be simplified using the Ramer-Douglas-Peucker algorithm (RDP).
"},{"location":"TASK-IDET/#4-assessment","title":"4. Assessment","text":"Results are assessed by matching detections against ground truth labels. For a detection and a ground truth label to be matched with each other, the intersection over union (IoU) between the two polygons must be greater than a user-defined threshold (default value = 0.25). Let us remind that the intersection over union is defined as follows:
\\[\\mbox{IoU} = \\frac{\\mbox{Area}({\\mbox{label} \\cap \\mbox{detection}})}{\\mbox{Area}({\\mbox{label} \\cup \\mbox{detection}})}\\]If multiple detections and ground truth labels intersect, the detection which exhibits the largest IoU is tagged as true positive, the other detections as false positives.
Detections are then tagged according to the following criteria:
The reader may wonder why there are no true negatives (TN) in the list. Actually, all the pixels which are not associated with any target class can be considered as \"true negatives\". Yet, as far as object detection and instance segmentation are concerned, we do not need to group leftover pixels into \"dummy objects\". Should the user need to model such a scenario, one idea might consist in introducing a dummy class (e.g. \"background\" or \"other\").
Metrics are calculated on a class-by-class basis, in order to take into account possible imbalances between classes. Detections in the wrong class are classified as FN, i.e. missed object, or false positive (FP), i.e. detections not matching any object, depending on the target class we are making the computation for.
Precision and recall by class are used here:
While referring the reader to this page for further information on these metrics, let us note that:
Each metric can be aggregated to keep only one value per dataset, rather than one per class.
As already mentioned, each detection is assigned a confidence score, ranging from 0 to 1. By filtering out all the detections exhibiting a score smaller than some cut-off/threshold value, one would end up having more or less detections to compare against ground truth data; the higher the threshold, the smaller the number of detections, the better their quality in terms of the confidence score. By sampling the threshold from a minimum user-defined value to a maximum value (e.g. 0.95) and counting TPs, FPs, FNs at each sampling step, meaningful curves are obtained representing counts and metrics like precision and recall as a function of the threshold. Typically, precision (recall) is monotonically increasing (decreasing) as a function of the threshold. As such, neither the precision nor the recall can be used to determine the optimal value of the threshold, which is why precision and recall are customarily aggregated in order to form a third metric which can be convex if computed as a function of the threshold or, at least, can exhibit local minima. This metric is named \"\\(F_1\\) score\" and is defined as follows:
Different models can then be compared with each other in terms of \\(F_1\\) scores; the best model can be selected as the one exhibiting the maximum \\(F_1\\) over the validation dataset. At last, the test dataset can be used to assess the selected model and provide the end user with an objective measure of its reliability.
Other approaches exist, allowing one to summarize metrics and eventually come up with threshold-independent scores. One of these approaches consist in computing the \"Area Under the ROC curve\" (AUC, cf. this page).
"},{"location":"TASK-IDET/#5-iterate-until-results-are-satisfactory","title":"5. Iterate until results are satisfactory","text":"Several training sessions can be executed, using different values of the various hyperparameters involved in the process. As a matter of fact, reviewing and improving ground truth data is also part of the hyper-parameter tuning (cf. \"From Model-centric to Data-centric Artificial Intelligence''). Keeping track of the above-mentioned metrics across multiple realizations, eventually an optimal model should be found (at least, a local optimum).
The exploration of the hyper-parameter space is a tedious task, which consumes time as well as human and computing resources. It can be performed in a more or less systematic/heuristic way, depending on the experience of the operator as well as on the features offered by the code. Typically, a partial exploration is enough to obtain acceptable results. Within the STDL team, it is customary to first perform some iterations until \"decent scores\" are obtained, then to involve beneficiaries and domain experts in the continuous evaluation and improvement of results, until satisfactory results are obtained. These exchanges between data scientists and domain experts are also key to raise both communities' awareness of the virtues and flaws of machine learning approaches.
"},{"location":"TASK-IDET/#use-cases","title":"Use cases","text":"Here is a list of the successful applications of the framework described in this article:
The STDL's object detection framework is still under development and receives updates as new use cases emerge. The source code can be found here.
"}]} \ No newline at end of file