diff --git a/PROJ-SOILS/index.html b/PROJ-SOILS/index.html index f556a5d..09a782b 100644 --- a/PROJ-SOILS/index.html +++ b/PROJ-SOILS/index.html @@ -629,7 +629,7 @@
OFS has also created a deep learning model prototype to automatically segment land cover types. However, different than the models of IGN and HEIG-VD, it works with two steps:
The Methodology section describes the infrastructure used to run the models and to reproduce the project. Furthermore, it describes precisely the evaluation and fine-tuning approaches.
@@ -1042,7 +1042,7 @@OFS -The OFS model OFS_ADELE2(+SAM) performs similarly to the best-performing IGN model, its outputs are not prone to square artefacts, and the inferences are very clean due to its usage of the SAM model. The downside of the OFS model is that it is specifically adapted for the Statistique suisse de la superficie10 and thus cannot be retrained on a different dataset.
+The OFS model OFS_ADELE2(+SAM) performs similarly to the best-performing IGN model, its outputs are not prone to square artefacts, and the inferences are very clean due to its usage of the SAM model.The goal of the evaluation phase was to identify the most promising model for further steps in the project. Based on the results of the evaluation, the HEIG-VD model was chosen. It performed best in masked Extent 1 and in Extent 2, and it performed best in the qualitative assessment. Additionally, the model needs only aerial imagery with the three RGB channels which allows for an easier reproducibility. The model weights and source code of the HEIG-VD model were kindly shared with us, which enabled us to fine-tune the model to adapt to the specifics of this project. However, the premise of choosing the HEIG-VD model was that we are able to mitigate the square artefacts to an acceptable degree.
The following keypoints can be extracted from the fine-tuning results:
@@ -1203,7 +1203,7 @@Unknown. Arealstatistik Schweiz. Erhebung der Bodennutzung und der Bodenbedeckung. (Ausgabe 2019 / 2020). Number 9406112. Bundesamt für Statistik (BFS), Neuchâtel, September 2019. Backup Publisher: Bundesamt für Statistik (BFS). URL: https://dam-api.bfs.admin.ch/hub/api/dam/assets/9406112/master. ↩↩
+Unknown. Arealstatistik Schweiz. Erhebung der Bodennutzung und der Bodenbedeckung. (Ausgabe 2019 / 2020). Number 9406112. Bundesamt für Statistik (BFS), Neuchâtel, September 2019. Backup Publisher: Bundesamt für Statistik (BFS). URL: https://dam-api.bfs.admin.ch/hub/api/dam/assets/9406112/master. ↩
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A ConvNet for the 2020s. March 2022. arXiv:2201.03545 [cs]. URL: http://arxiv.org/abs/2201.03545 (visited on 2024-03-21), doi:10.48550/arXiv.2201.03545. ↩
diff --git a/search/search_index.json b/search/search_index.json index baef652..f2dc661 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Swiss Territorial Data Lab - STDL \u00b6 The STDL aims to promote collective innovation around the Swiss territory and its digital copy. It mainly explores the possibilities provided by data science to improve official land registering. A multidisciplinary team composed of cantonal, federal and academic partners is reinforced by engineers specialized in geographical data science to tackle the challenges around the management of territorial data-sets. The developed STDL platform codes and documentation are published under open licenses to allow partners and Swiss territory management actors to leverage the developed technologies. Exploratory Projects \u00b6 Exploratory projects in the field of the Swiss territorial data are conducted at the demand of institutions or actors of the Swiss territory. The exploratory projects are conducted with the supervision of the principal in order to closely analyze the answers to the specifications along the project. The goal of exploratory project aims to provide proof-of-concept and expertise in the application of technologies to Swiss territorial data. Automatic Soil Segmentation April 2024 Nicolas Beglinger (swisstopo) - Clotilde Marmy (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo) Proposed by the Canton of Fribourg - PROJ-SOILS This project focuses on developing an automated methodology to distinguish areas covered by pedological soil from areas comprised of non-soil. The goal is to generate high-resolution maps (10cm) to aid in the location and assessment of polluted soils. Towards this end, we utilize deep learning models to classify land cover types using raw, raster-based aerial imagery and digital elevation models (DEMs). Specifically, we assess models developed by the Institut National de l\u2019Information G\u00e9ographique et Foresti\u00e8re (IGN), the Haute Ecole d'Ing\u00e9nierie et de Gestion du Canton de Vaud (HEIG-VD), and the Office F\u00e9d\u00e9ral de la Statistique (OFS). The performance of the models is evaluated with the Matthew's correlation coefficient (MCC) and the Intersection over Union (IoU), as well as with qualitatifve assessments conducted by the beneficiaries of the project. In addition to testing pre-existing models, we fine-tuned the model developed by the HEIG-VD on a dataset specifically created for this project. The fine-tuning aimed to optimize the model performance on the specific use-case and to adapt it to the characteristics of the dataset: higher resolution imagery, different vegetation appearances due to seasonal differences, and a unique classification scheme. Fine-tuning with a mixed-resolution dataset improved the model performance of its application on lower-resolution imagery, which is proposed to be a solution to square artefacts that are common in inferences of attention-based models. Reaching an MCC score of 0.983, the findings demonstrate promising performance. The derived model produces satisfactory results, which have to be evaluated in a broader context before being published by the beneficiaries. Lastly, this report sheds light on potential improvements and highlights considerations for future work. Full article Cross-generational change detection in classified LiDAR point clouds for a semi-automated quality control April 2024 Nicolas M\u00fcnger (Uzufly) - Gwena\u00eblle Salamin (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo) Proposed by the Federal Office of Topography swisstopo - PROJ-QALIDAR The acquisition of LiDAR data has become standard practice at national and cantonal levels during the recent years in Switzerland. In 2024, swisstopo will complete a comprehensive campaign of 6 years covering the whole Swiss territory. The produced point clouds are classified post-acquisition, i.e. each point is attributed to a certain category, such as \"building\" or \"vegetation\". Despite the global control performed by providers, local inconsistencies in the classification persist. To ensure the quality of a Swiss-wide product, extensive time is invested by swisstopo in the control of the classification. This project aims to highlight changes in a new point cloud compared to a previous generation acting as reference. We propose here a method where a common grid is defined for the two generations of point clouds and their information is converted in voxels, summarizing the distribution of classes and comparable one-to-one. This method highlights zones of change by clustering the concerned voxels. Experts of the swisstopo LiDAR team declared themselves satisfied with the precision of the method. Full article Automatic detection and observation of mineral extraction sites in Switzerland January 2024 Cl\u00e9mence Herny (ExoLabs) - Shanci Li (Uzufly) - Alessandro Cerioni (Etat de Gen\u00e8ve) - Roxane Pott (Swisstopo) Proposed by the Federal Office of Topography swisstopo - TASK-DQRY The study of the evolution of mineral extraction sites (MES) is primordial for the management of mineral resources and the assessment of their environmental impact. In this context, swisstopo has solicited the STDL to automate the vectorisation of MES over the years. This tedious task was previously carried out manually and was not regularly updated. Automatic object detection using a deep learning method was applied to SWISSIMAGE RGB orthophotos with a spatial resolution of 1.6 m px -1 . The trained model proved its ability to accurately detect MES, achieving a f1-score of 82%. Detection by inference was performed on images from 1999 to 2021, enabling us to track the evolution of potential MES over several years. Although the results are satisfactory, a careful examination of the detections must be carried out by experts to validate them as true MES. Despite this remaining manual work involved, the process is faster than a full manual vectorisation and can be used in the future to keep MES information up-to-date. Full article Dieback of beech trees: methodology for determining the health state of beech trees from airborne images and LiDAR point clouds August 2023 Clotilde Marmy (ExoLabs) - Gwena\u00eblle Salamin (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo) Proposed by the Republic and Canton of Jura - PROJ-HETRES Beech trees are sensitive to drought and repeated episodes can cause dieback. This issue affects the Jura forests requiring the development of new tools for forest management. In this project, descriptors for the health state of beech trees were derived from LiDAR point clouds, airborne images and satellite images to train a random forest predicting the health state per tree in a study area (5 km\u00b2) in Ajoie. A map with three classes was produced: healthy, unhealthy, dead. Metrics computed on the test dataset revealed that the model trained with all the descriptors has an overall accuracy up to 0.79, as well as the model trained only with descriptors derived from airborne imagery. When all the descriptors are used, the yearly difference of NDVI between 2018 and 2019, the standard deviation of the blue band, the mean of the NIR band, the mean of the NDVI, the standard deviation of the canopy cover and the LiDAR reflectance appear to be important descriptors. Full article Using spatio-temporal neighbor data information to detect changes in land use and land cover April 2023 Shanci Li (Uzufly) - Alessandro Cerioni (Canton of Geneva) - Clotilde Marmy (ExoLabs) - Roxane Pott (swisstopo) Proposed by the Swiss Federal Statistical Office - PROJ-LANDSTATS From 2020 on, the Swiss Federal Statistical Office started to update the land use/cover statistics over Switzerland for the fifth time. To help and lessen the heavy workload of the interpretation process, partially or fully automated approaches are being considered. The goal of this project was to evaluate the role of spatio-temporal neighbors in predicting class changes between two periods for each survey sample point. The methodolgy focused on change detection, by finding as many unchanged tiles as possible and miss as few changed tiles as possible. Logistic regression was used to assess the contribution of spatial and temporal neighbors to the change detection. While time deactivation and less-neighbors have a 0.2% decrease on the balanced accuracy, the space deactivation causes 1% decrease. Furthermore, XGBoost, random forest (RF), fully convolutional network (FCN) and recurrent convolutional neural network (RCNN) performance are compared by the means of a custom metric, established with the help of the interpretation team. For the spatial-temporal module, FCN outperforms all the models with a value of 0.259 for the custom metric, whereas the logistic regression indicates a custom metrics of 0.249. Then, FCN and RF are tested to combine the best performing model with the model trained by OFS on image data only. When using temporal-spatial neighors and image data as inputs, the final integration module achieves 0.438 in custom metric, against 0.374 when only the the image data is used.It was conclude that temporal-spatial neighbors showed that they could light the process of tile interpretation. Full article Classification of road surfaces March 2023 Gwena\u00eblle Salamin (swisstopo) - Cl\u00e9mence Herny (Exolabs) - Roxane Pott (swisstopo) - Alessandro Cerioni (Canton of Geneva) Proposed by the Federal Office of Topography swisstopo - PROJ-ROADSURF The Swiss road network extends over 83\u2019274 km. Information about the type of road surface is useful not only for the Swiss Federal Roads Office and engineering companies, but also for cyclists and hikers. Currently, the data creation and update is entirely done manually at the Swiss Federal Office of Topography. This is a time-consuming and methodical task, potentially suitable to automation by data science methods. The goal of this project is classifying Swiss roads according to their surface type, natural or artificial. We first searched for statistical differences between these two classes, in order to then perform supervised classification based on machine-learning methods. As we could not find any discriminant feature, we used deep learning methods. Full article Tree Detection from Point Clouds for the Canton of Geneva March 2022 Alessandro Cerioni (Canton of Geneva) - Flann Chambers (University of Geneva) - Gilles Gay des Combes (CJBG - City of Geneva and University of Geneva) - Adrian Meyer (FHNW) - Roxane Pott (swisstopo) Proposed by the Canton of Geneva - PROJ-TREEDET Trees are essential assets, in urban context among others. Since several years, the Canton of Geneva maintains a digital inventory of isolated (or \"urban\") trees. This project aimed at designing a methodology to automatically update Geneva's tree inventory, using high-density LiDAR data and off-the-shelf software. Eventually, only the sub-task of detecting and geolocating trees was explored. Comparisons against ground truth data show that the task can be more or less tricky depending on how sparse or dense trees are. In mixed contexts, we managed to reach an accuracy of around 60%, which unfortunately is not high enough to foresee a fully unsupervised process. Still, as discussed in the concluding section there may be room for improvement. Full article Detection of thermal panels on canton territory to follow renewable energy deployment February 2022 Nils Hamel (UNIGE) - Huriel Reichel (FHNW) Project in collaboration with Geneva and Neuch\u00e2tel States - TASK-TPNL Deployment of renewable energy becomes a major stake in front of our societies challenges. This imposes authorities and domain expert to promote and to demonstrate the deployment of such energetic solutions. In case of thermal panels, politics ask domain expert to certify, along the year, of the amount of deployed surface. In front of such challenge, this project aims to determine to which extent data science can ease the survey of thermal panel installations deployment and how the work of domain expert can be eased. Full article Automatic detection of quarries and the lithology below them in Switzerland January 2022 Huriel Reichel (FHNW) - Nils Hamel (UNIGE) Proposed by the Federal Office of Topography swisstopo - TASK-DQRY Mining is an important economic activity in Switzerland and therefore it is monitored by the Confederation through swisstopo. To this points, the identification of quarries has been mode manually, which even being done with very high quality, unfortunately does not follow the constant changing and updating pattern of these features. For this reason, swisstopo contacted the STDL to automatically detect quarries through the whole country. The training was done using SWISSIMAGE with 10cm spatial resolution and the Deep Learning Framework from the STDL. Moreover there were two iteration steps with the domain expert which included the manual correction of detection for new training. Interaction with the domain expert was very relevant for final results and summing to his appreciation, an f1-score of 85% was obtained in the end, which due to peculiar characteristics of quarries can be considered an optimal result. Full article Updating the \u00abCultivable Area\u00bb Layer of the Agricultural Office, Canton of Thurgau June 2021 Adrian Meyer (FHNW) - Pascal Salath\u00e9 (FHNW) Proposed by the Canton of Thurgau - PROJ-TGLN The Cultivable agricultural area layer (\"LN, Landwirtschaftliche Nutzfl\u00e4che\") is a GIS vector product maintained by the cantonal agricultural offices and serves as the key calculation index for the receipt of direct subsidy contributions to farms. The canton of Thurgau requested a spatial vector layer indicating locations and area consumption extent of the largest silage bale deposits intersecting with the known LN area, since areas used for silage bale storage are not eligible for subsidies. Having detections of such objects readily available greatly reduces the workload of the responsible official by directing the monitoring process to the relevant hotspots. Ultimately public economical damage can be prevented which would result from the payout of unjustified subsidy contributions. Full article Swimming Pool Detection for the Canton of Thurgau April 2021 Adrian Meyer (FHNW) - Alessandro Cerioni (Canton of Geneva) Proposed by the Canton of Thurgau - PROJ-TGPOOL The Canton of Thurgau entrusted the STDL with the task of producing swimming pool detections over the cantonal area. Specifically interesting was to leverage the ground truth annotation data from the Canton of Geneva to generate a predictive model in Thurgau while using the publicly available SWISSIMAGE aerial imagery datasets provided by swisstopo. The STDL object detection framework produced highly accurate predictions of swimming pools in Thurgau and thereby proved transferability from one canton to another without having to manually redigitize annotations. These promising detections showcase the highly useful potential of this approach by greatly reducing the need of repetitive manual labour. Full article Completion of the federal register of buildings and dwellings February 2021 Nils Hamel (UNIGE) - Huriel Reichel (swisstopo) Proposed by the Federal Statistical Office - TASK-REGBL The Swiss Federal Statistical Office is in charge of the national Register of of Buildings and Dwellings (RBD) which keep tracks of every existing building in Switzerland. Currently, the register is being completed with building in addition to regular dwellings to offer a reliable and official source of information. The completion of the register introduced issue dues to missing information and their difficulty to be collected. The construction years of the building is one missing information for large amount of register entries. The Statistical Office mandated the STDL to investigate on the possibility to use the Swiss National Maps to extract this missing information using an automated process. A research was conducted in this direction with the development of a proof-of-concept and a reliable methodology to assess the obtained results. Full article Swimming Pool Detection from Aerial Images over the Canton of Geneva January 2021 Alessandro Cerioni (Canton of Geneva) - Adrian Meyer (FHNW) Proposed by the Canton of Geneva - PROJ-GEPOOL Object detection is one of the computer vision tasks which can benefit from Deep Learning methods. The STDL team managed to leverage state-of-art methods and already existing open datasets to first build a swimming pool detector, then to use it to potentially detect unregistered swimming pools over the Canton of Geneva. Despite the success of our approach, we will argue that domain expertise still remains key to post-process detections in order to tell objects which are subject to registration from those which aren't. Pairing semi-automatic Deep Learning methods with domain expertise turns out to pave the way to novel workflows allowing administrations to keep cadastral information up to date. Full article Difference models applied to the land register November 2020 Nils Hamel (UNIGE) - Huriel Reichel (swisstopo) Project scheduled in the STDL research roadmap - TASK-DTRK Being able to track modifications in the evolution of geographical datasets is one important aspect in territory management, as a large amount of information can be extracted out of differences models. Differences detection can also be a tool used to assess the evolution of a geographical model through time. In this research project, we apply differences detection on INTERLIS models of the official Swiss land registers in order to emphasize and follow its evolution and to demonstrate that change in reference frames can be detected and assessed. Full article Research Developments \u00b6 Research developments are conducted aside of the research projects to provide a framework of tools and expertise around the Swiss territorial data and related technologies. The research developments are conducted according to the research plan established by the data scientists and validated by the steering committee. OBJECT DETECTION FRAMEWORK November 2021 Alessandro Cerioni (Canton of Geneva) - Cl\u00e9mence Herny (Exolabs) - Adrian Meyer (FHNW) - Gwena\u00eblle Salamin (Exolabs) Project scheduled in the STDL research roadmap - TASK-IDET This strategic component of the STDL consists of the automated analysis of geospatial images using deep learning while providing practical applications for specific use cases. The overall goal is the extraction of vectorized semantic information from remote sensing data. The involved case studies revolve around concrete object detection use cases deploying modern machine learning methods and utilizing a multitude of available datasets. The goal is to arrive at a prototypical platform for object detection which is highly useful not only for cadastre specialists and authorities but also for stakeholders at various contact points in society. Full article AUTOMATIC DETECTION OF CHANGES IN THE ENVIRONMENT November 2020 Nils Hamel (UNIGE) Project scheduled in the STDL research roadmap - TASK-DIFF Developed at EPFL with the collaboration of Cadastre Suisse to handle large scale geographical models of different nature, the STDL 4D platform offers a robust and efficient indexation methodology allowing to manage storage and access to large-scale models. In addition to spatial indexation, the platform also includes time as part of the indexation, allowing any area to be described by models in both spatial and temporal dimensions. In this development project, the notion of model temporal derivative is explored and proof-of-concepts are implemented in the platform. The goal is to demonstrate that, in addition to their formal content, models coming with different temporal versions can be derived along the time dimension to compute difference models. Such proof-of-concept is developed for both point cloud and vectorial models, demonstrating that the indexation formalism of the platform is able to ease considerably the computation of difference models. This research project demonstrates that the time dimension can be fully exploited in order to access the data it holds. Full article Steering Committee \u00b6 The steering committee of the Swiss Territorial Data Lab is composed of Swiss public administrations bringing their expertise and competences to guide the conducted projects and developments. Members of the STDL steering committee Submitting a project \u00b6 To submit a project to the STDL, simply fill this form . To contact the STDL, please write an email to info@stdl.ch . We will reply as soon as possible!","title":"Homepage"},{"location":"#swiss-territorial-data-lab-stdl","text":"The STDL aims to promote collective innovation around the Swiss territory and its digital copy. It mainly explores the possibilities provided by data science to improve official land registering. A multidisciplinary team composed of cantonal, federal and academic partners is reinforced by engineers specialized in geographical data science to tackle the challenges around the management of territorial data-sets. The developed STDL platform codes and documentation are published under open licenses to allow partners and Swiss territory management actors to leverage the developed technologies.","title":"Swiss Territorial Data Lab - STDL"},{"location":"#exploratory-projects","text":"Exploratory projects in the field of the Swiss territorial data are conducted at the demand of institutions or actors of the Swiss territory. The exploratory projects are conducted with the supervision of the principal in order to closely analyze the answers to the specifications along the project. The goal of exploratory project aims to provide proof-of-concept and expertise in the application of technologies to Swiss territorial data. Automatic Soil Segmentation April 2024 Nicolas Beglinger (swisstopo) - Clotilde Marmy (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo) Proposed by the Canton of Fribourg - PROJ-SOILS This project focuses on developing an automated methodology to distinguish areas covered by pedological soil from areas comprised of non-soil. The goal is to generate high-resolution maps (10cm) to aid in the location and assessment of polluted soils. Towards this end, we utilize deep learning models to classify land cover types using raw, raster-based aerial imagery and digital elevation models (DEMs). Specifically, we assess models developed by the Institut National de l\u2019Information G\u00e9ographique et Foresti\u00e8re (IGN), the Haute Ecole d'Ing\u00e9nierie et de Gestion du Canton de Vaud (HEIG-VD), and the Office F\u00e9d\u00e9ral de la Statistique (OFS). The performance of the models is evaluated with the Matthew's correlation coefficient (MCC) and the Intersection over Union (IoU), as well as with qualitatifve assessments conducted by the beneficiaries of the project. In addition to testing pre-existing models, we fine-tuned the model developed by the HEIG-VD on a dataset specifically created for this project. The fine-tuning aimed to optimize the model performance on the specific use-case and to adapt it to the characteristics of the dataset: higher resolution imagery, different vegetation appearances due to seasonal differences, and a unique classification scheme. Fine-tuning with a mixed-resolution dataset improved the model performance of its application on lower-resolution imagery, which is proposed to be a solution to square artefacts that are common in inferences of attention-based models. Reaching an MCC score of 0.983, the findings demonstrate promising performance. The derived model produces satisfactory results, which have to be evaluated in a broader context before being published by the beneficiaries. Lastly, this report sheds light on potential improvements and highlights considerations for future work. Full article Cross-generational change detection in classified LiDAR point clouds for a semi-automated quality control April 2024 Nicolas M\u00fcnger (Uzufly) - Gwena\u00eblle Salamin (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo) Proposed by the Federal Office of Topography swisstopo - PROJ-QALIDAR The acquisition of LiDAR data has become standard practice at national and cantonal levels during the recent years in Switzerland. In 2024, swisstopo will complete a comprehensive campaign of 6 years covering the whole Swiss territory. The produced point clouds are classified post-acquisition, i.e. each point is attributed to a certain category, such as \"building\" or \"vegetation\". Despite the global control performed by providers, local inconsistencies in the classification persist. To ensure the quality of a Swiss-wide product, extensive time is invested by swisstopo in the control of the classification. This project aims to highlight changes in a new point cloud compared to a previous generation acting as reference. We propose here a method where a common grid is defined for the two generations of point clouds and their information is converted in voxels, summarizing the distribution of classes and comparable one-to-one. This method highlights zones of change by clustering the concerned voxels. Experts of the swisstopo LiDAR team declared themselves satisfied with the precision of the method. Full article Automatic detection and observation of mineral extraction sites in Switzerland January 2024 Cl\u00e9mence Herny (ExoLabs) - Shanci Li (Uzufly) - Alessandro Cerioni (Etat de Gen\u00e8ve) - Roxane Pott (Swisstopo) Proposed by the Federal Office of Topography swisstopo - TASK-DQRY The study of the evolution of mineral extraction sites (MES) is primordial for the management of mineral resources and the assessment of their environmental impact. In this context, swisstopo has solicited the STDL to automate the vectorisation of MES over the years. This tedious task was previously carried out manually and was not regularly updated. Automatic object detection using a deep learning method was applied to SWISSIMAGE RGB orthophotos with a spatial resolution of 1.6 m px -1 . The trained model proved its ability to accurately detect MES, achieving a f1-score of 82%. Detection by inference was performed on images from 1999 to 2021, enabling us to track the evolution of potential MES over several years. Although the results are satisfactory, a careful examination of the detections must be carried out by experts to validate them as true MES. Despite this remaining manual work involved, the process is faster than a full manual vectorisation and can be used in the future to keep MES information up-to-date. Full article Dieback of beech trees: methodology for determining the health state of beech trees from airborne images and LiDAR point clouds August 2023 Clotilde Marmy (ExoLabs) - Gwena\u00eblle Salamin (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo) Proposed by the Republic and Canton of Jura - PROJ-HETRES Beech trees are sensitive to drought and repeated episodes can cause dieback. This issue affects the Jura forests requiring the development of new tools for forest management. In this project, descriptors for the health state of beech trees were derived from LiDAR point clouds, airborne images and satellite images to train a random forest predicting the health state per tree in a study area (5 km\u00b2) in Ajoie. A map with three classes was produced: healthy, unhealthy, dead. Metrics computed on the test dataset revealed that the model trained with all the descriptors has an overall accuracy up to 0.79, as well as the model trained only with descriptors derived from airborne imagery. When all the descriptors are used, the yearly difference of NDVI between 2018 and 2019, the standard deviation of the blue band, the mean of the NIR band, the mean of the NDVI, the standard deviation of the canopy cover and the LiDAR reflectance appear to be important descriptors. Full article Using spatio-temporal neighbor data information to detect changes in land use and land cover April 2023 Shanci Li (Uzufly) - Alessandro Cerioni (Canton of Geneva) - Clotilde Marmy (ExoLabs) - Roxane Pott (swisstopo) Proposed by the Swiss Federal Statistical Office - PROJ-LANDSTATS From 2020 on, the Swiss Federal Statistical Office started to update the land use/cover statistics over Switzerland for the fifth time. To help and lessen the heavy workload of the interpretation process, partially or fully automated approaches are being considered. The goal of this project was to evaluate the role of spatio-temporal neighbors in predicting class changes between two periods for each survey sample point. The methodolgy focused on change detection, by finding as many unchanged tiles as possible and miss as few changed tiles as possible. Logistic regression was used to assess the contribution of spatial and temporal neighbors to the change detection. While time deactivation and less-neighbors have a 0.2% decrease on the balanced accuracy, the space deactivation causes 1% decrease. Furthermore, XGBoost, random forest (RF), fully convolutional network (FCN) and recurrent convolutional neural network (RCNN) performance are compared by the means of a custom metric, established with the help of the interpretation team. For the spatial-temporal module, FCN outperforms all the models with a value of 0.259 for the custom metric, whereas the logistic regression indicates a custom metrics of 0.249. Then, FCN and RF are tested to combine the best performing model with the model trained by OFS on image data only. When using temporal-spatial neighors and image data as inputs, the final integration module achieves 0.438 in custom metric, against 0.374 when only the the image data is used.It was conclude that temporal-spatial neighbors showed that they could light the process of tile interpretation. Full article Classification of road surfaces March 2023 Gwena\u00eblle Salamin (swisstopo) - Cl\u00e9mence Herny (Exolabs) - Roxane Pott (swisstopo) - Alessandro Cerioni (Canton of Geneva) Proposed by the Federal Office of Topography swisstopo - PROJ-ROADSURF The Swiss road network extends over 83\u2019274 km. Information about the type of road surface is useful not only for the Swiss Federal Roads Office and engineering companies, but also for cyclists and hikers. Currently, the data creation and update is entirely done manually at the Swiss Federal Office of Topography. This is a time-consuming and methodical task, potentially suitable to automation by data science methods. The goal of this project is classifying Swiss roads according to their surface type, natural or artificial. We first searched for statistical differences between these two classes, in order to then perform supervised classification based on machine-learning methods. As we could not find any discriminant feature, we used deep learning methods. Full article Tree Detection from Point Clouds for the Canton of Geneva March 2022 Alessandro Cerioni (Canton of Geneva) - Flann Chambers (University of Geneva) - Gilles Gay des Combes (CJBG - City of Geneva and University of Geneva) - Adrian Meyer (FHNW) - Roxane Pott (swisstopo) Proposed by the Canton of Geneva - PROJ-TREEDET Trees are essential assets, in urban context among others. Since several years, the Canton of Geneva maintains a digital inventory of isolated (or \"urban\") trees. This project aimed at designing a methodology to automatically update Geneva's tree inventory, using high-density LiDAR data and off-the-shelf software. Eventually, only the sub-task of detecting and geolocating trees was explored. Comparisons against ground truth data show that the task can be more or less tricky depending on how sparse or dense trees are. In mixed contexts, we managed to reach an accuracy of around 60%, which unfortunately is not high enough to foresee a fully unsupervised process. Still, as discussed in the concluding section there may be room for improvement. Full article Detection of thermal panels on canton territory to follow renewable energy deployment February 2022 Nils Hamel (UNIGE) - Huriel Reichel (FHNW) Project in collaboration with Geneva and Neuch\u00e2tel States - TASK-TPNL Deployment of renewable energy becomes a major stake in front of our societies challenges. This imposes authorities and domain expert to promote and to demonstrate the deployment of such energetic solutions. In case of thermal panels, politics ask domain expert to certify, along the year, of the amount of deployed surface. In front of such challenge, this project aims to determine to which extent data science can ease the survey of thermal panel installations deployment and how the work of domain expert can be eased. Full article Automatic detection of quarries and the lithology below them in Switzerland January 2022 Huriel Reichel (FHNW) - Nils Hamel (UNIGE) Proposed by the Federal Office of Topography swisstopo - TASK-DQRY Mining is an important economic activity in Switzerland and therefore it is monitored by the Confederation through swisstopo. To this points, the identification of quarries has been mode manually, which even being done with very high quality, unfortunately does not follow the constant changing and updating pattern of these features. For this reason, swisstopo contacted the STDL to automatically detect quarries through the whole country. The training was done using SWISSIMAGE with 10cm spatial resolution and the Deep Learning Framework from the STDL. Moreover there were two iteration steps with the domain expert which included the manual correction of detection for new training. Interaction with the domain expert was very relevant for final results and summing to his appreciation, an f1-score of 85% was obtained in the end, which due to peculiar characteristics of quarries can be considered an optimal result. Full article Updating the \u00abCultivable Area\u00bb Layer of the Agricultural Office, Canton of Thurgau June 2021 Adrian Meyer (FHNW) - Pascal Salath\u00e9 (FHNW) Proposed by the Canton of Thurgau - PROJ-TGLN The Cultivable agricultural area layer (\"LN, Landwirtschaftliche Nutzfl\u00e4che\") is a GIS vector product maintained by the cantonal agricultural offices and serves as the key calculation index for the receipt of direct subsidy contributions to farms. The canton of Thurgau requested a spatial vector layer indicating locations and area consumption extent of the largest silage bale deposits intersecting with the known LN area, since areas used for silage bale storage are not eligible for subsidies. Having detections of such objects readily available greatly reduces the workload of the responsible official by directing the monitoring process to the relevant hotspots. Ultimately public economical damage can be prevented which would result from the payout of unjustified subsidy contributions. Full article Swimming Pool Detection for the Canton of Thurgau April 2021 Adrian Meyer (FHNW) - Alessandro Cerioni (Canton of Geneva) Proposed by the Canton of Thurgau - PROJ-TGPOOL The Canton of Thurgau entrusted the STDL with the task of producing swimming pool detections over the cantonal area. Specifically interesting was to leverage the ground truth annotation data from the Canton of Geneva to generate a predictive model in Thurgau while using the publicly available SWISSIMAGE aerial imagery datasets provided by swisstopo. The STDL object detection framework produced highly accurate predictions of swimming pools in Thurgau and thereby proved transferability from one canton to another without having to manually redigitize annotations. These promising detections showcase the highly useful potential of this approach by greatly reducing the need of repetitive manual labour. Full article Completion of the federal register of buildings and dwellings February 2021 Nils Hamel (UNIGE) - Huriel Reichel (swisstopo) Proposed by the Federal Statistical Office - TASK-REGBL The Swiss Federal Statistical Office is in charge of the national Register of of Buildings and Dwellings (RBD) which keep tracks of every existing building in Switzerland. Currently, the register is being completed with building in addition to regular dwellings to offer a reliable and official source of information. The completion of the register introduced issue dues to missing information and their difficulty to be collected. The construction years of the building is one missing information for large amount of register entries. The Statistical Office mandated the STDL to investigate on the possibility to use the Swiss National Maps to extract this missing information using an automated process. A research was conducted in this direction with the development of a proof-of-concept and a reliable methodology to assess the obtained results. Full article Swimming Pool Detection from Aerial Images over the Canton of Geneva January 2021 Alessandro Cerioni (Canton of Geneva) - Adrian Meyer (FHNW) Proposed by the Canton of Geneva - PROJ-GEPOOL Object detection is one of the computer vision tasks which can benefit from Deep Learning methods. The STDL team managed to leverage state-of-art methods and already existing open datasets to first build a swimming pool detector, then to use it to potentially detect unregistered swimming pools over the Canton of Geneva. Despite the success of our approach, we will argue that domain expertise still remains key to post-process detections in order to tell objects which are subject to registration from those which aren't. Pairing semi-automatic Deep Learning methods with domain expertise turns out to pave the way to novel workflows allowing administrations to keep cadastral information up to date. Full article Difference models applied to the land register November 2020 Nils Hamel (UNIGE) - Huriel Reichel (swisstopo) Project scheduled in the STDL research roadmap - TASK-DTRK Being able to track modifications in the evolution of geographical datasets is one important aspect in territory management, as a large amount of information can be extracted out of differences models. Differences detection can also be a tool used to assess the evolution of a geographical model through time. In this research project, we apply differences detection on INTERLIS models of the official Swiss land registers in order to emphasize and follow its evolution and to demonstrate that change in reference frames can be detected and assessed. Full article","title":"Exploratory Projects"},{"location":"#research-developments","text":"Research developments are conducted aside of the research projects to provide a framework of tools and expertise around the Swiss territorial data and related technologies. The research developments are conducted according to the research plan established by the data scientists and validated by the steering committee. OBJECT DETECTION FRAMEWORK November 2021 Alessandro Cerioni (Canton of Geneva) - Cl\u00e9mence Herny (Exolabs) - Adrian Meyer (FHNW) - Gwena\u00eblle Salamin (Exolabs) Project scheduled in the STDL research roadmap - TASK-IDET This strategic component of the STDL consists of the automated analysis of geospatial images using deep learning while providing practical applications for specific use cases. The overall goal is the extraction of vectorized semantic information from remote sensing data. The involved case studies revolve around concrete object detection use cases deploying modern machine learning methods and utilizing a multitude of available datasets. The goal is to arrive at a prototypical platform for object detection which is highly useful not only for cadastre specialists and authorities but also for stakeholders at various contact points in society. Full article AUTOMATIC DETECTION OF CHANGES IN THE ENVIRONMENT November 2020 Nils Hamel (UNIGE) Project scheduled in the STDL research roadmap - TASK-DIFF Developed at EPFL with the collaboration of Cadastre Suisse to handle large scale geographical models of different nature, the STDL 4D platform offers a robust and efficient indexation methodology allowing to manage storage and access to large-scale models. In addition to spatial indexation, the platform also includes time as part of the indexation, allowing any area to be described by models in both spatial and temporal dimensions. In this development project, the notion of model temporal derivative is explored and proof-of-concepts are implemented in the platform. The goal is to demonstrate that, in addition to their formal content, models coming with different temporal versions can be derived along the time dimension to compute difference models. Such proof-of-concept is developed for both point cloud and vectorial models, demonstrating that the indexation formalism of the platform is able to ease considerably the computation of difference models. This research project demonstrates that the time dimension can be fully exploited in order to access the data it holds. Full article","title":"Research Developments"},{"location":"#steering-committee","text":"The steering committee of the Swiss Territorial Data Lab is composed of Swiss public administrations bringing their expertise and competences to guide the conducted projects and developments. Members of the STDL steering committee","title":"Steering Committee"},{"location":"#submitting-a-project","text":"To submit a project to the STDL, simply fill this form . To contact the STDL, please write an email to info@stdl.ch . We will reply as soon as possible!","title":"Submitting a project"},{"location":"PROJ-DQRY/","text":"Automatic Detection of Quarries and the Lithology below them in Switzerland \u00b6 Huriel Reichel (FHNW) - Nils Hamel (UNIGE) Supervision : Nils Hamel (UNIGE) - Raphael Rollier (swisstopo) Proposed by swisstopo - PROJ-DQRY June 2021 to January 2022 - Published on January 30th, 2022 Abstract : Mining is an important economic activity in Switzerland and therefore it is monitored by the Confederation through swisstopo. To this points, the identification of quarries has been made manually, which even being done with very high quality, unfortunately does not follow the constant changing and updating pattern of these features. For this reason, swisstopo contacted the STDL to automatically detect quarries through the whole country. The training was done using SWISSIMAGE with 10cm spatial resolution and the Deep Learning Framework from the STDL. Moreover there were two iteration steps with the domain expert which included the manual correction of detection for new training. Interaction with the domain expert was very relevant for final results and summing to his appreciation, an F1 Score of 85% was obtained in the end, which due to peculiar characteristics of quarries can be considered an optimal result. 1 - Introduction \u00b6 Mining is an important economic activity worldwide and this is also the case in Switzerland. The Confederation topographic office (swisstopo) is responsible for monitoring the presence of quarries and also the materials being explored. This is extremely relevant for planning the demand and shortage of explored materials and also their transportation through the country. As this of federal importance the mapping of these features is already done. Although this work is very detailed and accurate, quarries have a very characteristical updating pattern. Quarries can appear and disappear in a matter of a few months, in especial when they are relatively small, as in Switzerland. Therefore it is of interest of swisstopo to make an automatic detection of quarries in a way that it is also reproducible in time. A strategy offen offered by the Swiss Territorial Data Lab is the automatic detection of several objects in aerial imagery through deep learning, following our Object Detection Framework . In this case it is fully applicable as quarries in Switzerland are relatively small, so high resolution imagery is required, which is something our Neural Network has proven to tackle with mastery in past projects. This high resolution imagery is also reachable through SWISSIMAGE , aerial images from swisstopo that cover almost the whole country with a 10cm pixel size (GSD). Nevertheless, in order to train our neural network, and as it's usually the case in deep learning, several labelled images are required. These data work as ground truth so that the neural network \"learns\" what's the object to be detected and which should not. For this purpose, the work from the topographic landscape model ( TLM ) team of swisstopo has been of extreme importance. Among different surface features, quarries have been mapped all over Switzerland with a highly detailed scale. Although the high quality and precision of the labels from TLM, quarries are constantly changing, appearing and disappearing, and therefore the labels are not always synchronized with the images from SWISSIMAGE. This lack of of synchronization between these sets of data can be seen in Figure 1, where in the left one has the year of mapping of TLM and on the right the year of the SWISSIMAGE flights. Figure 1 : Comparison of TLM (left) and SWISSIMAGE (right) temporality. For this purpose, a two-times interaction was necessary with the domain expert. In order to have a ground truth that was fully synchronized with SWISSIMAGE we required two stages of training : one making use of the TLM data and a second one with a manual correction of the predicted labels from the first interaction. It is of crucial importance to state that this correction needed to be made by the domain expert so that he could carefully check each detection in pre-defined tiles. With that in hands, we could go further with a more trustworthy training. As stated, it is of interest of swisstopo to also identify the material being explored by every quarry. For that purpose, it was recommended the usage of the GeoCover dataset from swisstopo as well. This dataset a vector layer of the the geological cover of the whole Switzerland, which challenged us to cross the detector predictions with such vector information. In summary, the challenge of the STDL was to investigate to which extent is it possible to automatically detect quarries using Deep Learning considering their high update ratio using aerial imagery. 2 - Methodology \u00b6 First of all the \"area of interest\" must be identified. This is where the detection and training took place. In this case, a polygon of the whole Switzerland was used. After that, the area of interest is divided in several tiles of fixed size. This is then defining the slicing of SWISSIMAGE (given as WMS). For this study, tiles of different sizes were tested, being 500x500m tiles defined for final usage. Following the resolution of the images must be defined, which, again, after several tests, was defined as 512x512 pixels. For validation purposes the data is then split into Training, Validation and Testing. The training data-set is used inside the network for its learning; the validation is completely apart from training and used only to check results and testing is used for cross-validation. 70% of the data was used for training, 15% for validation and 15% for testing. To what concerns the labels, the ones from TLM were manually checked so that a group of approximately 250 labels with full synchronization with the SWISSIMAGE were found and recorded. Following, the first row training passes through the same framework from former STDL projects. We make use of a predictive Recursive-Convolutional Neural Network with ResNet-50 backbone provided by Detectron2 . A deeper explanation of the network functionality can be found here and here . Even with different parameters set, it was observed that predictions were including too much false positives, which were mainly made of snow. Most probably the reflectance of snow is similar to the one of quarries and this needed a special treatment. For this purpose, a filtering of the results was used. First of all the features were filtered based on the score values (0.9) and then by elevation, using the SRTM digital elevation model. As snow usually does not precipitate below around 1155 m, this was used as threshold. Finally an area threshold is also passed (using smallest predictions area) and predictions are merged. A more detailed description of how to operate this first filter can be seen here . Once several tests were performed, the new predictions were sent back to the domain expert for detailed revision with a rigid protocol. This included the removal of false positives and the inclusion of false negatives, mainly. This was performed by 4 different experts from swisstopo in 4 regions with the same amount of tiles to be analyzed. It is important to the state again the importance of domain expertise in this step, as a very careful and manual evaluation of what is and what is not a quarry must be made. Once the predictions were corrected, a new session of training was performed using different parameters. Once again, the same resolution and tile size were used as in the first iteration (512x512m tiles with 512x512 pixels of resolution), although this time a new filtering was developed. Very similar to the first one, but in a different order, allowing more aesthetical predictions in the end, something the domain expert was also carrying about. This procedure is summarized in figure 2. Figure 2 : Methodology applied for the detection of quarries and new training sessions. In the end, in order to also include the geological information of the detected quarries, a third layer resulting of the intersection of both the predictions and the GeoCover labels is created. This was done in a way that the final user can click to obtain both information on the quarry (when not a pure prediction) and the information of the geology/lithology on this part of the quarry. As a result, each resulting intersection poylgon contains both information from quarry and GeoCover. In order to evaluate the obtained results, the F1 Score was computed and also the final predictions were compared to the corrected labels from the domain expert side. This was done visually by acquiring the centroid of each quarry detected and by a heat-map, allowing one to detect the spatial pattern of detections. The heat-map was computed using 10'000 m radius and a 100 m pixel size. 3 - Results & Discussion \u00b6 In the first iteration, when the neural was trained with some labels of the TLM vector data, an optimal F1 score of approximately 0.78 was obtained. The figure 3 shows the behavior of the precision, recall and F1 score for the final model selected. Figure 3 : Precision, Recall and F1 score of the first iteration (using TLM data). Given the predictions resulting from the correction by the domain experts, there was an outstanding improve in the F1 score obtained, which was of approximately 0.85 in its optimal, as seen in figure 4. A total of 1265 were found in Switzerland after filtering. Figure 4 : Precision, Recall and F1 score of the second iteration (using data corrected by the domain expert). Figure 5 demonstrates some examples of detected quarries and this one can have some notion of the quality of the shape of the detections and how they mark the real-world quarries. Examples of false positives and false negatives, unfortunately still present in the detections are also shown. This is also an interesting demonstration of how some objects that are very similar to quarries, in the point of view of non-experts and how they may influence the results. These examples of errors are also an interesting indication of the importance of domain expertise in evaluating machine made results. Figure 5 : Examples of detected quarries, with true positive, false negative and false positive. To check on the validity of the new predictions generated, the centroid of them was plot along the centroid of the corrected labels, so one could check the spatial pattern of them and this way evaluate if they were respecting the same behavior. Figure 6 shows this plot. Figure 6 : Disposition of the centroids of assessed predictions and final predictions. One can see that despite some slight differences, the overall pattern is very similar among the disposition of the predictions. A very similar result can be seen with the computed heat-map of these points, seen in figure 7. Figure 7 : Heatmap of assessed predictions and final predictions. There is a small area to the west of the country where there were less detections than desired and in general there were more predictions than before. The objective of the heat-map is more to give a general view of the results than giving an exact comparison, as a point is created for every feature and the new filter used tended to smooth the results and join many features into a single one too. At the end the results were also intersected with GeoCover, which provide the Swiss soil detailed lithology, and an example of the results can be seen below using the QGIS Software. Figure 8 : Intersection of predictions with GeoCover seen in QGIS. Finally and most important, the domain expert was highly satisfied with this work, due to the support it can give to swisstopo and the TLM team in mapping the future quarries. The domain expert also demonstrated interest in pursuing the work by investigating the temporal pattern of quarries and detecting the volume of material in each quarry. 4 - Conclusion \u00b6 Through this collaboration with swisstopo, we managed to demonstrate that data science is able to provide relevant and efficient tool to ease complex and time-consuming task. With the produced inventory of the quarries on the whole Swiss territory, we were able to provide a quasi-exhaustive view of the situation to the domain expert, leading him to have a better view of the exploitation sites. This is of importance and a major step forward compared to the previous situation. Indeed, before this project, the only solution available to the domain expert was to gather all the federal and cantonal data, through non-standardized and time-consuming process, to hope having a beginning of an inventory, with temporality issues. With the developed prototype, within hours, the entire SWISSIMAGE data-set can be processed and turn into a full scale inventory, guiding the domain expert directly toward its interests. The resulting geographical layer can then be seen as the result of this demonstrator, able to turn the aerial images into a simple polygonal layer representing the quarries, with little false positive and false negative, providing the required view for the domain expert understanding of the Swiss situation. With such a result, it is possible to convolve it with all the other existing data, with the GeoCover in the first place. This lithology model of the Swiss soil can be intersected with the produced quarries layer in order to create a secondary geographical layer merging both quarries location and quarries soil type, leading to a powerful analyzing tool for the domain expert. The produced demonstrator shows that it is possible, in hours, to deduce a simple and reliable geographical layer based on a simple set of orthomosaic. The STDL then was able to prove the possibility to repeat the process along the time dimension, for future and past images, opening the way to build and rebuild the history and evolution of the quarries. With such a process, it will be possible to compute statistical quantities on the long term to catch the evolution and the resources, leading to more reliable strategical understanding of the Swiss resources and sovereignty.","title":"Automatic Detection of Quarries and the Lithology below them in Switzerland"},{"location":"PROJ-DQRY/#automatic-detection-of-quarries-and-the-lithology-below-them-in-switzerland","text":"Huriel Reichel (FHNW) - Nils Hamel (UNIGE) Supervision : Nils Hamel (UNIGE) - Raphael Rollier (swisstopo) Proposed by swisstopo - PROJ-DQRY June 2021 to January 2022 - Published on January 30th, 2022 Abstract : Mining is an important economic activity in Switzerland and therefore it is monitored by the Confederation through swisstopo. To this points, the identification of quarries has been made manually, which even being done with very high quality, unfortunately does not follow the constant changing and updating pattern of these features. For this reason, swisstopo contacted the STDL to automatically detect quarries through the whole country. The training was done using SWISSIMAGE with 10cm spatial resolution and the Deep Learning Framework from the STDL. Moreover there were two iteration steps with the domain expert which included the manual correction of detection for new training. Interaction with the domain expert was very relevant for final results and summing to his appreciation, an F1 Score of 85% was obtained in the end, which due to peculiar characteristics of quarries can be considered an optimal result.","title":"Automatic Detection of Quarries and the Lithology below them in Switzerland"},{"location":"PROJ-DQRY/#1-introduction","text":"Mining is an important economic activity worldwide and this is also the case in Switzerland. The Confederation topographic office (swisstopo) is responsible for monitoring the presence of quarries and also the materials being explored. This is extremely relevant for planning the demand and shortage of explored materials and also their transportation through the country. As this of federal importance the mapping of these features is already done. Although this work is very detailed and accurate, quarries have a very characteristical updating pattern. Quarries can appear and disappear in a matter of a few months, in especial when they are relatively small, as in Switzerland. Therefore it is of interest of swisstopo to make an automatic detection of quarries in a way that it is also reproducible in time. A strategy offen offered by the Swiss Territorial Data Lab is the automatic detection of several objects in aerial imagery through deep learning, following our Object Detection Framework . In this case it is fully applicable as quarries in Switzerland are relatively small, so high resolution imagery is required, which is something our Neural Network has proven to tackle with mastery in past projects. This high resolution imagery is also reachable through SWISSIMAGE , aerial images from swisstopo that cover almost the whole country with a 10cm pixel size (GSD). Nevertheless, in order to train our neural network, and as it's usually the case in deep learning, several labelled images are required. These data work as ground truth so that the neural network \"learns\" what's the object to be detected and which should not. For this purpose, the work from the topographic landscape model ( TLM ) team of swisstopo has been of extreme importance. Among different surface features, quarries have been mapped all over Switzerland with a highly detailed scale. Although the high quality and precision of the labels from TLM, quarries are constantly changing, appearing and disappearing, and therefore the labels are not always synchronized with the images from SWISSIMAGE. This lack of of synchronization between these sets of data can be seen in Figure 1, where in the left one has the year of mapping of TLM and on the right the year of the SWISSIMAGE flights. Figure 1 : Comparison of TLM (left) and SWISSIMAGE (right) temporality. For this purpose, a two-times interaction was necessary with the domain expert. In order to have a ground truth that was fully synchronized with SWISSIMAGE we required two stages of training : one making use of the TLM data and a second one with a manual correction of the predicted labels from the first interaction. It is of crucial importance to state that this correction needed to be made by the domain expert so that he could carefully check each detection in pre-defined tiles. With that in hands, we could go further with a more trustworthy training. As stated, it is of interest of swisstopo to also identify the material being explored by every quarry. For that purpose, it was recommended the usage of the GeoCover dataset from swisstopo as well. This dataset a vector layer of the the geological cover of the whole Switzerland, which challenged us to cross the detector predictions with such vector information. In summary, the challenge of the STDL was to investigate to which extent is it possible to automatically detect quarries using Deep Learning considering their high update ratio using aerial imagery.","title":"1 - Introduction"},{"location":"PROJ-DQRY/#2-methodology","text":"First of all the \"area of interest\" must be identified. This is where the detection and training took place. In this case, a polygon of the whole Switzerland was used. After that, the area of interest is divided in several tiles of fixed size. This is then defining the slicing of SWISSIMAGE (given as WMS). For this study, tiles of different sizes were tested, being 500x500m tiles defined for final usage. Following the resolution of the images must be defined, which, again, after several tests, was defined as 512x512 pixels. For validation purposes the data is then split into Training, Validation and Testing. The training data-set is used inside the network for its learning; the validation is completely apart from training and used only to check results and testing is used for cross-validation. 70% of the data was used for training, 15% for validation and 15% for testing. To what concerns the labels, the ones from TLM were manually checked so that a group of approximately 250 labels with full synchronization with the SWISSIMAGE were found and recorded. Following, the first row training passes through the same framework from former STDL projects. We make use of a predictive Recursive-Convolutional Neural Network with ResNet-50 backbone provided by Detectron2 . A deeper explanation of the network functionality can be found here and here . Even with different parameters set, it was observed that predictions were including too much false positives, which were mainly made of snow. Most probably the reflectance of snow is similar to the one of quarries and this needed a special treatment. For this purpose, a filtering of the results was used. First of all the features were filtered based on the score values (0.9) and then by elevation, using the SRTM digital elevation model. As snow usually does not precipitate below around 1155 m, this was used as threshold. Finally an area threshold is also passed (using smallest predictions area) and predictions are merged. A more detailed description of how to operate this first filter can be seen here . Once several tests were performed, the new predictions were sent back to the domain expert for detailed revision with a rigid protocol. This included the removal of false positives and the inclusion of false negatives, mainly. This was performed by 4 different experts from swisstopo in 4 regions with the same amount of tiles to be analyzed. It is important to the state again the importance of domain expertise in this step, as a very careful and manual evaluation of what is and what is not a quarry must be made. Once the predictions were corrected, a new session of training was performed using different parameters. Once again, the same resolution and tile size were used as in the first iteration (512x512m tiles with 512x512 pixels of resolution), although this time a new filtering was developed. Very similar to the first one, but in a different order, allowing more aesthetical predictions in the end, something the domain expert was also carrying about. This procedure is summarized in figure 2. Figure 2 : Methodology applied for the detection of quarries and new training sessions. In the end, in order to also include the geological information of the detected quarries, a third layer resulting of the intersection of both the predictions and the GeoCover labels is created. This was done in a way that the final user can click to obtain both information on the quarry (when not a pure prediction) and the information of the geology/lithology on this part of the quarry. As a result, each resulting intersection poylgon contains both information from quarry and GeoCover. In order to evaluate the obtained results, the F1 Score was computed and also the final predictions were compared to the corrected labels from the domain expert side. This was done visually by acquiring the centroid of each quarry detected and by a heat-map, allowing one to detect the spatial pattern of detections. The heat-map was computed using 10'000 m radius and a 100 m pixel size.","title":"2 - Methodology"},{"location":"PROJ-DQRY/#3-results-discussion","text":"In the first iteration, when the neural was trained with some labels of the TLM vector data, an optimal F1 score of approximately 0.78 was obtained. The figure 3 shows the behavior of the precision, recall and F1 score for the final model selected. Figure 3 : Precision, Recall and F1 score of the first iteration (using TLM data). Given the predictions resulting from the correction by the domain experts, there was an outstanding improve in the F1 score obtained, which was of approximately 0.85 in its optimal, as seen in figure 4. A total of 1265 were found in Switzerland after filtering. Figure 4 : Precision, Recall and F1 score of the second iteration (using data corrected by the domain expert). Figure 5 demonstrates some examples of detected quarries and this one can have some notion of the quality of the shape of the detections and how they mark the real-world quarries. Examples of false positives and false negatives, unfortunately still present in the detections are also shown. This is also an interesting demonstration of how some objects that are very similar to quarries, in the point of view of non-experts and how they may influence the results. These examples of errors are also an interesting indication of the importance of domain expertise in evaluating machine made results. Figure 5 : Examples of detected quarries, with true positive, false negative and false positive. To check on the validity of the new predictions generated, the centroid of them was plot along the centroid of the corrected labels, so one could check the spatial pattern of them and this way evaluate if they were respecting the same behavior. Figure 6 shows this plot. Figure 6 : Disposition of the centroids of assessed predictions and final predictions. One can see that despite some slight differences, the overall pattern is very similar among the disposition of the predictions. A very similar result can be seen with the computed heat-map of these points, seen in figure 7. Figure 7 : Heatmap of assessed predictions and final predictions. There is a small area to the west of the country where there were less detections than desired and in general there were more predictions than before. The objective of the heat-map is more to give a general view of the results than giving an exact comparison, as a point is created for every feature and the new filter used tended to smooth the results and join many features into a single one too. At the end the results were also intersected with GeoCover, which provide the Swiss soil detailed lithology, and an example of the results can be seen below using the QGIS Software. Figure 8 : Intersection of predictions with GeoCover seen in QGIS. Finally and most important, the domain expert was highly satisfied with this work, due to the support it can give to swisstopo and the TLM team in mapping the future quarries. The domain expert also demonstrated interest in pursuing the work by investigating the temporal pattern of quarries and detecting the volume of material in each quarry.","title":"3 - Results & Discussion"},{"location":"PROJ-DQRY/#4-conclusion","text":"Through this collaboration with swisstopo, we managed to demonstrate that data science is able to provide relevant and efficient tool to ease complex and time-consuming task. With the produced inventory of the quarries on the whole Swiss territory, we were able to provide a quasi-exhaustive view of the situation to the domain expert, leading him to have a better view of the exploitation sites. This is of importance and a major step forward compared to the previous situation. Indeed, before this project, the only solution available to the domain expert was to gather all the federal and cantonal data, through non-standardized and time-consuming process, to hope having a beginning of an inventory, with temporality issues. With the developed prototype, within hours, the entire SWISSIMAGE data-set can be processed and turn into a full scale inventory, guiding the domain expert directly toward its interests. The resulting geographical layer can then be seen as the result of this demonstrator, able to turn the aerial images into a simple polygonal layer representing the quarries, with little false positive and false negative, providing the required view for the domain expert understanding of the Swiss situation. With such a result, it is possible to convolve it with all the other existing data, with the GeoCover in the first place. This lithology model of the Swiss soil can be intersected with the produced quarries layer in order to create a secondary geographical layer merging both quarries location and quarries soil type, leading to a powerful analyzing tool for the domain expert. The produced demonstrator shows that it is possible, in hours, to deduce a simple and reliable geographical layer based on a simple set of orthomosaic. The STDL then was able to prove the possibility to repeat the process along the time dimension, for future and past images, opening the way to build and rebuild the history and evolution of the quarries. With such a process, it will be possible to compute statistical quantities on the long term to catch the evolution and the resources, leading to more reliable strategical understanding of the Swiss resources and sovereignty.","title":"4 - Conclusion"},{"location":"PROJ-DQRY-TM/","text":"Automatic detection and observation of mineral extraction sites in Switzerland \u00b6 Cl\u00e9mence Herny (Exolabs), Shanci Li (Uzufly), Alessandro Cerioni (\u00c9tat de Gen\u00e8ve), Roxane Pott (swisstopo) Proposed by swisstopo - PROJ-DQRY-TM October 2022 to February 2023 - Published on January 2024 Abstract : Studying the evolution of mineral extraction sites (MES) is of primary importance for assessing the availability of mineral resources, managing MES and evaluating the impact of mining activity on the environment. In Switzerland, MES are inventoried at local level by the cantons and at federal level by swisstopo. The latter performs manual vectorisation of MES boundaries. Unfortunately, although the data is of high quality, it is not regularly updated. To automate this tedious task and to better observe the evolution of MES, swisstopo has solicited the STDL to carry out an automatic detection of MES in Switzerland over the years. We performed instance segmentation using a deep learning method to automatically detect MES in RGB aerial images with a spatial resolution of 1.6 m px -1 . The detection model was trained with 266 labels and orthophotos from the SWISSIMAGE RGB mosaic published in 2020. The selected trained model achieved a f1-score of 82% on the validation dataset. The model was used to do detection by inference of potential MES in SWISSIMAGE RGB orthophotos from 1999 to 2021. The model shows good ability to detect potential MES with about 82% of labels detected for the 2020 SWISSIMAGE mosaic. The detections obtained with SWISSIMAGE orthophotos acquired over different years can be tracked to observe their temporal evolution. The framework developed can perform detection in an area of interest (about a third of Switzerland at the most) in just a few hours, which is a major advantage over manual mapping. We acknowledge that there are some missed and false detections in the final product, and the results need to be reviewed and validated by domain experts before being analysed and interpreted. The results can be used to perform statistics over time and update MES evolution in future image acquisitions. 1. Introduction \u00b6 1.1 Context \u00b6 Mineral extraction constitutes a strategic activity worldwide, including in Switzerland. Demand for mineral resources has been growing significantly in recent decades 1 , mainly due to the rapid increase in the production of batteries and electronic chips, or buildings construction, for example. As a result, the exploitation of some resources, such as rare earth elements, lithium, or sand, is putting pressure on their availability. Being able to observe the development of mineral extraction sites (MES) is of primary importance to adapting mining strategy and anticipating demand and shortage. Mining has also strong environmental and societal impact 2 3 . It implies the extraction of rocks and minerals from water ponds, cliffs, and quarries. The surface affected, initially natural areas, can reach up to thousands of square kilometres 1 . The extraction of some minerals could lead to soil and water pollution and involves polluting truck transport. Economic and political interests of some resources might overwhelm land protection, and conflicts are gradually intensifying 2 . MES are dynamic features that can evolve according to singular patterns, especially if they are small, as is the case in Switzerland. A site can expand horizontally and vertically or be filled to recover the site 4 2 3 5 . Changes can happen quickly, in a couple of months. As a results, updating the MES inventory can be challenging. There is a significant demand for effective MES observation of development worldwide. Majority of MES mapping is performed manually by visual inspection of images 1 . Alternatively, recent improvements in the availability of high spatial and temporal resolution space/airborne imagery and computational methods have encouraged the development of automated image processing. Supervised classification of spectral images is an effective method but requires complex workflow 6 4 2 . More recently, few studies have implemented deep learning algorithms to train models to detect extraction sites in images and have shown high levels of accuracy 3 . In Switzerland, MES management is historically regulated on a canton-based level using GIS data, including information about the MES location, extent, and extracted materials among others. At the federal level, swisstopo and the Federal Office of Statistics (FSO) observe the development of MES. swisstopo has carried out a detailed manual delineation of MES based on SWISSIMAGE dataset over Switzerland. In the scope to fasten and improving the process of MES mapping in Switzerland, we developed a method for automating MES detection over the years. Ultimately, the goal is to keep the database up to date when new images are acquired. The results can be statistically process to better assess the MES evolution over time in Switzerland. 1.2. Approach \u00b6 The STDL has developed a framework named object-detector to automatically detect objects in a georeferenced imagery dataset based on deep learning method. The framework can be adapted to detect MES (also referred as quarry in the project) in Switzerland. A project to automatically detect MES in Switzerland 7 has been carried out by the STDL in 2021 ( detector-interface framework). Detection of potential MES obtained by automatic detection on the 2020 SWISSIMAGE mosaic has already been delivered to swisstopo (layer 2021_10_STDL_QC1 ). The method has proven its efficiency detecting MES. The numerical model trained with the object detector achieved a f1-score of 82% and detected about 1200 potential MES over Switzerland. In this project, we aim to continue this work and extend it to a second objective, that of observing MES evolution over time. The main challenge is to prove the algorithm reliability for detecting objects in a multi-year dataset images acquired with different sensors. The project workflow is synthesised in Figure 1. First, a deep learning algorithm is trained using a manually mapped MES dataset that serves as ground truth (GT). After evaluating the performance of the trained model, the selected one was used to perform inference detection for a given year dataset and area of interest (AoI). The results were filtered to discard irrelevant detection. The operation was repeated over several years. Finally, each potential MES detected was tracked over the years to observe its evolution. Figure 1: Workflow diagram for automatic MES detection. In this report, we first describe the data used, including the image description and the definition of AoI. Then we explain the model training, evaluation and object detection procedure. Next, we present the results of potential MES detection and the MES tracking strategy. Finally, we provide conclusion and perspectives. 2. Data \u00b6 2.1 Images and area of interest \u00b6 Automatic detection of potential MES over the years in Switzerland was performed with aerial orthophotos from the swisstopo product SWISSIMAGE Journey . Images are georeferenced RGB TIF tiles with a size of 256 x 256 pixels (1 km 2 ). Product Year Coordinate system Spatial resolution SWISSIMAGE 10 cm 2017 - current CH1903+/MN95 (EPSG:2056) 0.10 m ( \\(\\sigma\\) \\(\\pm\\) 0.15 m) - 0.25 m SWISSIMAGE 25 cm 2005 - 2016 MN03 (2005 - 2007) and MN95 (since 2008) 0.25 m ( \\(\\sigma\\) \\(\\pm\\) 0.25 m) - 0.50 m ( \\(\\sigma\\) \\(\\pm\\) 3.00 - 5.00 m) SWISSIMAGE 50 cm 1998 - 2004 MN03 0.50 m ( \\(\\sigma\\) \\(\\pm\\) 0.50 m) Table 1: SWISSIMAGE products characteristics. Several SWISSIMAGE products exist, produced from different instrumentation (Table 1). SWISSIMAGE mosaics are built and published yearly. The year of the mosaic corresponds to the last year of the dataset publication, and the most recent orthophotos datasets available are then used to complete the mosaic. For example the 2020 SWISSIMAGE mosaic is a combination of 2020, 2019 and 2018 images acquisition. The 1998 mosaic release corresponds to a year of transition from black and white images ( SWISSIMAGE HIST ) to RGB images. For this study, only RGB data from 1999 to 2021 were considered. Figure 2: Acquisition footprint of SWISSIMAGE aerial orthophotos for the years 2016 to 2021. The SWISSIMAGE Journey mosaic in the background is the 2020 release. Acquisition footprints of yearly acquired orthophotos were used as AoI to perform MES detection through time. Over the years, the footprints may spatially overlap (Fig. 2). Since 2017, the geometry of the acquisition footprints has been quasi-constant, dividing Switzerland into three more or less equal areas, ensuring that the orthophotos are updated every three years. For the years before 2017, the acquisition footprints were not systematic and do not guarantee a periodically update of the orthophotos. The acquisition footprint may also not be spatially contiguous. Figure 3: Illustration of the combination of SWISSIMAGE images and FSO images for the 2007 SWISSIMAGE mosaic. (a) Overview of the 2007 SWISSIMAGE mosaic. The red polygon corresponds to the provided SWISSIMAGE acquisition footprint for 2007. The orange polygon corresponds to the surface covered by the new SWISSIMAGE for 2007. The remaining area of the red polygon corresponds to the FSO image dataset acquired in 2007. The black box indicates the panel (b) location, and the white box indicates the panel (c) location. (b) Side-by-side comparison of image composition in 2006 and 2007 SWISSIMAGE mosaics. (c) Examples of detection polygons (white polygons) obtained by inference on the 2007 SWISSIMAGE dataset (red box) and FSO images 2007 (outlined by black box). SWISSIMAGE Journey mosaics of 2005, 2006, and 2007 present a particularity as it is composed not only of 25 cm resolution SWISSIMAGE but also of orthophotos acquired for the FSO. These are tiff RGB orthophotos with a spatial resolution of 50 cm px -1 (coordinate system: CH1903/LV03 (EPSG:21781)) and have been integrated into the SWISSIMAGE Journey products. However, these images were discarded (modification of the footprint shape) from our dataset because they were causing issues in the MES automatic detection producing odd segmented detection shapes (Fig. 3). This is probably due to the different stretching of pixel colour between datasets. It also has to be noted that there are currently missing images (about 88 tiles at zoom level 16) in the 2020 SWISSIMAGE dataset. 2.2 Image fetching \u00b6 Pre-rendered SWISSIMAGE tiles (256 x 256 px, 1 km 2 ) are downloaded using the Web Map Tile Service (WMTS) wmts.geo.admin.ch via an XYZ connector. Tiles are served on a cartesian coordinates grid using a Web Mercator Quad projection and a coordinate reference system EPGS 3857. Position of a tile on the grid is defined by x and y coordinates and the pixel resolution of the image is defined by z , its zoom level. Changing the zoom level affects the resolution by a factor of 2 (Fig. 4). For instance a zoom level of 17 corresponds to a resolution of 0.8 m px -1 and a zoom level of 16 to a resolution of 1.6 m px -1 . Figure 4: Examples of tiles geometry at zoom level 16 (z16, black polygons) and at zoom level 17 (z17, blue polygons). The number of tiles for each zoom level is indicated in square brackets. The tiles are selected for model training, i.e. only tiles intersecting swissTLM3D labels (tlm-hr-trn-topo, yellow polygons). Note that in the subsequent project carried out by Reichel and Hamel (2021) 7 , the tiling method adopted was slightly different from the one adopted for this project. Custom size and resolution tiles were built. A sensitivity analysis of these two parameters was conducted and led to the choice of tiles with a size of about 500 m and a pixel resolution of about 1 m (above, the performance was not significantly improved). 2.3 Ground truth \u00b6 The MES labels originate from the swiss Topographic Landscape Model 3D ( swissTLM3D ) produced by swisstopo . swissTLM3D is a large-scale topographic landscape model of Switzerland, including manually drawn and georeferenced vectors of objects of interest at a high resolution, including MES features. Domain experts from swisstopo have carried out extensive work to review the labeled MES and to synchronise them with the 2020 SWISSIMAGE mosaic to improve the quality of the labeled dataset. A total of 266 labels are available. The mapped MES reveal the diversity of MES characteristics, such as the presence or absence of buildings/infrastructures, trucks, water pounds, and vegetation (Fig. 5). Figure 5: Examples of MES mapped in swissTLM3D and synchronised to 2020 SWISSIMAGE mosaic. These labels are used as the ground truth (GT) i.e. the reference dataset indicating the presence of a MES in an image. The GT is used both as input to train the model to detect MES and to evaluate the model performance. 3. Automatic detection methodology \u00b6 3.1 Deep learning algorithm for object detection \u00b6 Training and inference detection of potential MES in SWISSIMAGE were performed with the object detector framework. This project is based on the open source detectron2 framework 8 implemented with PyTorch by the Facebook Artificial Intelligence Research group (FAIR). Instance segmentation (delineation of object) was performed with a Mask R-CNN deep learning algorithm 9 . It is based on a Recursive-Convolutional Neural Network (CNN) with a backbone pre-trained model ResNet-50 (50 layers deep residual network). Images were annotated with custom COCO object based on the labels (class 'Quarry'). The model is trained with this dataset to later perform inference detection on images. If the object is detected by the algorithm, a pixel mask is produced with a confidence score (0 to 1) attributed to the detection (Fig. 6). Figure 6: Example of detection mask. The pink rectangle corresponds to the bounding box of the object, the object is segmented by the pink polygons associated with the detection class ('Quarry') and a confidence score. The object detector framework permits to convert detection mask to georeferenced polygon that can be used in GIS softwares. The implementation of the Ramer-Douglas-Peucker ( RDP ) algorithm, allows the simplification of the derived polygons by discarding non-essential points based on a smoothing parameter. This allow to considerably reduces the amount of data to be stored and prevent potential memory saturation while deriving detection polygons on large areas as it is the case for this study. 3.2 Model training \u00b6 Orthophotos from the 2020 SWISSIMAGE mosaic, for which the GT has been defined, were chosen to proceed the model training. Tiles intersecting labels were selected and split randomly into three datasets: the training dataset (70%), the validation dataset (15%), and the test dataset (15%). Addition of empty tiles (no annotation) to confront the model to landscapes not containing the target object has been tested ( Appendix A.1 ) but did not provide significant improvement in the model performance to be adopted. Figure 7: Training curves obtained at zoom level 16 on the 2020 SWISSIMAGE mosaic. The curves were obtained for the trained model 'replicate 3'. (a) Learning rate in function of iteration. The step was defined every 500 iterations. The initial learning rate was 5.0 x 10 -3 with a weight and bias decay of 1.0 x 10 -4 . (b) The total loss is a function of iteration. Raw measurement (light red) and smoothed curve (0.6 factor, solid red) are superposed. (c) The validation loss curve is a function of iteration. Raw measurement (light red) and smoothed curve (0.6 factor, solid red) are superposed. The vertical dashed black lines indicate the iteration minimising the validation loss curve, i.e. 3000. Models were trained with two images per batch ( Appendix A.2 ), a learning rate of 5 x 10 -3 , and a learning rate decay of 1 x 10 -4 every 500 steps (Fig. 7 (a)). For the given model, parameters and a zoom level of 16 ( Section 3.3.3 ), the training is performed over 7000 iterations and lasts about 1 hour on a 16 GiB GPU (NVIDIA Tesla T4) machine compatible with CUDA . The total (train and validation loss) loss curve decreases until reaching a quasi-steady state around 6000 iterations (Fig. 7 (b)). The optimal detection model corresponds to the one minimising the validation loss curve. This minimum is reached between 2000 and 3000 iterations (Fig. 7 (c)). 3.3 Metrics \u00b6 The model performance and detection reliability were assessed by comparing the results to the GT. The detection performed by the model can be either (1) a True Positive (TP), i.e. the detection is real (spatially intersecting the GT) ; (2) a False Positive i.e. the detection is not real (not spatially intersecting the GT) or (3) a False Negative (FN) i.e. the labeled object is not detected by the algorithm (Fig. 8). Tagging the detection (Fig. 9(a)) allows to calculate several metrics (Fig. 9(b)) such as: Figure 8: Examples of different detection cases. Label is represented with a yellow polygon and detection with a red polygon. (a) True Positive (TP) detection intersecting the GT, (b) a potential True Positive (TP?) detection with no GT, (c) False Negative (FN) case with no detection while GT exists, (d) False Positive (FP) detection of object that is not a MES. the recall , translating the amount of TP detections predicted by the model: \\[recall = \\frac{TP}{(TP + FN)}\\] the precision , translating the number of well-predicted TP among all the detections: \\[precision = \\frac{TP}{(TP + FP)}\\] the f1-score , the harmonic average of the precision and the recall: \\[f1 = 2 \\times \\frac{recall \\times precision}{recall + precision}\\] Figure 9: Evaluation of the trained model performance obtained at zoom level 16 for the trained model 'replicate 3' (Table 2). (a) Number of TP (blue), FN (red), and FP (green) as a function of detection score threshold for the validation dataset. (b) Metrics value, precision (blue), recall (red), and f1-score (green) as a function of the detection score threshold for the validation dataset. The maximum f1-score value is 82%. 4. Automatic detection model analysis \u00b6 4.1. Model performance and replicability \u00b6 Trained models reached f1-scores of about 80% with a standard deviation of 2% (Table 2). The performances are similar to the model trained by Reichel and Hamel (2021) 7 . model precision recall f1 replicate 1 0.84 0.79 0.82 replicate 2 0.77 0.76 0.76 replicate 3 0.83 0.81 0.82 replicate 4 0.89 0.77 0.82 replicate 5 0.78 0.82 0.80 Table 2: Metrics value computed for the validation dataset for trained models replicates with the 2020 SWISSIMAGE mosaic at zoom level 16. A variability is expected as the deep learning algorithm displays some random behavior, but it is supposed to be negligible. However, the observed model variability is enough to affect final results that might slightly change by using different trained models with same input parameters (Fig. 10). Figure 10: Detection polygons obtained for the different trained model replicates (Table 2) highlighting results variability. The labels correspond to orange polygons. The number in the square bracket corresponds to the number of polygons. The inference detections have been performed on a subset of 2000 tiles for the 2020 SWISSIMAGE at zoom level 16. Detections have been filtered according to the parameters defined in Section 5.1. To reduce the variability of the trained models, the random seeds of both detectron2 and python have been fixed. Neither of these attempts have been successful, and the variability remains. The nondeterministic behavior of detectron2 has been recognised ( issue 1 , issue 2 ), but no suitable solution has been provided yet. Further investigation on the model performance and consistency should be performed in the future. To mitigate the results variability of model replicates, we could consider in the future to combine the results of several model replicates to remove FP while preserving the TP and potential TP detection. The choice and number of models used should be evaluated. This method is tedious as it requires inference detection from several models, which can be time-consuming and computationally intensive. 4.2 Sensitivity to the zoom level \u00b6 Image resolution is dependent on the zoom level ( Section 2.2 ). To select the most suitable zoom level for MES detection, we performed a sensitivity analysis on trained model performance. Increasing the zoom level increases the value of the metrics following a global linear trend (Fig. 11). Figure 11: Metrics values (precision, recall and f1) as function of zoom level for the validation dataset. The results of the replicates performed at each zoom level are included (Table A1). Models trained at a higher zoom level performed better. However, a higher zoom level implies smaller tile and thus, a larger number of tiles to fill the AoI. For a typical AoI, i.e up to a third of Switzerland, this can lead to a large number of tiles to be stored and processed, leading to potential RAM and/or disk space saturation. For 2019 AoI, 89'290 tiles are required at zoom level 16 while 354'867 tiles are required at zoom level 17, taking respectively 3 hours and 11 hours to process on a 30 GiB RAM machine with a 16 GiB GP. Visual comparison of inference detection reveals that there was no significant improvement in the object detection quality from zoom level 16 to zoom level 17. Both zoom level present a similar proportion of detections intersecting labels (82% and 79% for zoom level 16 and zoom level 17 respectively). On the other hand, the quality of object detection at zoom level 15 was depreciated. Indeed, detection scores were lower, with only tens of detection scores above 0.95 while it was about 400 at zoom level 16 and about 64% of detection intersecting labels. 4.3 Model choice \u00b6 Based on tests performed, we selected the 'replicate 3' model, obtained (Tables 2 and A1) at zoom level 16, to perform inference detection. Models trained at zoom level 16 (1.6 m px -1 pixel resolution) have shown satisfying results in accurately detecting MES contour and limiting the number of FP with high detection score (Fig. 11). It represents a good trade-off between results reliability (f1-score between 76% and 82% on the validation dataset) and computational resources. Then, among all the replicates performed at zoom level 16, we selected the trained model 'replicate 3' (Table 2) because it combines both the highest metrics values (for the validation dataset but also the train and test datasets), close precision and recall values and a rather low amount of low score detections. 5. Automatic detection of MES \u00b6 5.1 Detection post-processing \u00b6 Detection by inference was performed over AoIs with a threshold detection score of 0.3 (Fig. 12). The low score filtering results in a large amount of detections. Several detections may overlap, potentially segmenting a single object. In addition a detection might be split into multiple tiles. To improve the pertinence and the aesthetics of the raw detection polygons, a post-processing procedure was applied. First, a large proportion of FP occurred in mountainous areas (rock outcrops and snow, Fig. 12(a)). We assumed MES are not present (or at least sparse) above a given altitude. An elevation filtering was applied using a Switzerland Digital Elevation Model (about 25 m px -1 ) derived from the SRTM instrument ( USGS - SRTM ). The maximum elevation of the labeled MES is about 1100 m. Second, detection aggregation was applied: - polygons were clustered ( K-means ) according to their centroid position. The method involves setting a predefined number k of clusters. Manual tests performed by Reichel and Hamel (2021) 7 concluded to set k equal to the number of detection divided by three. The highest detection score was assigned to the clustered detection. This method preserves the final integrity of detection polygons by retaining detection that has potentially a low confidence score but belongs to a cluster with a higher confidence score improving the final segmentation of the detected object. The value of the threshold score must be kept relatively low ( i.e. 0.3) when performing the detection to prevent removing too many polygons that could potentially be part of the detected object. We acknowledge that determining the optimal number of clusters by clustering validation indices rather than manual adjustment would be more robust. In addition, exploring other clustering methods, such as DBSCAN , based on local density, can be considered in the future. - score filtering was applied. - spatially close polygons were assumed to belong to the same MES and are merged according to a distance threshold. The averaged score of the merged detection polygons was ultimately computed. Finally, we assumed that a MES covers a minimal area. Detection with an area smaller than a given threshold were filtered out. The minimum MES area in the GT is 2270 m 2 . Figure 12: MES detection filtering. (a) Overview of the automatic detection of MES obtained with 2020 SWISSIMAGE at zoom level 16. Transparent red polygons (with associated confidence score in white) correspond to the raw object detection output and the red line polygons (with associated confidence score in red) correspond to the final filtered detection. The black box outlines the location of the (b) and (c) panel zoom. Note the large number of detection in the mountains (right area of the image). (b) Zoom on several raw detections polygons of a single object with their respective confidence score. (c) Zoom on a filtered detection polygon of a single object with the resulting score. Sensitivity of detections to these filters was investigated (Table 3). The quantitative evaluation of filter combination relevance is tricky as potential MES presence is performed by inference, and the GT provided by swissTLM3D constitutes an incomplete portion of the MES in Switzerland (2020). As indication, we computed the number of spatial intersection between ground truth and detection obtained with the 2020 SWISSIMAGE mosaic. Filter combination number 3 was adopted, allowing to detect about 82% of the GT with a relatively limited amount of FP detection compared to filter combinations 1 and 2 (from visual inspection). filters combination score threshold elevation threshold (m) area threshold (m 2 ) distance threshold (m) number of detection label detection (%) 1 0.95 2000 1100 10 1745 85.1 2 0.95 2000 1200 10 1862 86.6 3 0.95 5000 1200 10 1347 82.1 4 0.96 2000 1100 10 1331 81.3 5 0.96 2000 1200 8 1445 78.7 6 0.96 5000 1200 10 1004 74.3 Table 3: Threshold values of filtering parameters and their respective number of detections and intersection proportion with swissTLM3D labels. The detections have been obtained for the 2020 SWISSIMAGE mosaic. We acknowledged that for the selected filter combination, the area threshold value is higher than the smallest area value of the GT polygons. However, reducing the area value increases significantly the presence of FP. Thirteen labels display an area below 5000 m 2 . 5.2 Inference detections \u00b6 The trained model was used to perform inference detection on SWISSIMAGE orthophotos from 1999 to 2021. The automatic detection model shows good capabilities to detect MES in different years orthophotos (Fig. 13), despite being trained on the 2020 SWISSIMAGE mosaic. The model also demonstrates capabilities to detect potential MES that have not been mapped yet but are strong candidates. However, the model misses some labeled MES or potential MES (FN, Fig. 8). However, when the model process FSO images, with different colour stretching, it failed to correctly detect potential MES (Fig. 3). It reveals that images must have characteristics close to the training dataset for optimal results with a deep learning model. Figure 13: Examples of object detection segmented by polygons in different year orthophotos. The yellow polygon for the year 2020 panel of object ID 3761 corresponds to the label. Other coloured polygons correspond to the algorithm detection. Then, we acknowledge that a significant amount of FP detection can still be observed in our filtered detection dataset (Figs. 8 and 14). The main sources of FP are the presence of large rock outcrops, mountainous areas without vegetation, snow, river sand beds, brownish-coloured fields, or construction areas. MES present a large variety of features (buildings, water pounds, trucks, vegetation) (Fig. 5) which can be a source of confusion for the algorithm but even sometimes for human eye. Therefore, the robustness of the GT is crucial for reliable detection. The algorithm's results should be taken carefully. Figure 14: Examples of FP detection. (a) Snow patches (2019) ; (b) River sand beds and gullies (2019); (c) Brownish field (2020); (d) vineyards (2005); (e) Airport tarmac (2020); (f) Construction site (2008). The detections produced by the algorithm are potential MES, but the final results must be reviewed by experts in the field to discard remaining FP detection and correct FN before any processing or interpretation. 6. Observation of MES evolution \u00b6 6.1 Object tracking strategy \u00b6 Switzerland is covered by RGB SWISSIMAGE product over more than 20 years (1999 to actual), allowing changes to be detected (Fig. 13). Figure 15: Strategy for MES tracking over time. ID assignment to detection. Spatially intersecting polygons share the same ID allowing the MES to be tracked in a multi-year dataset. We assumed that detection polygons that overlap from one year to another describe a single object (Fig. 15). Overlapping detections and unique detections (which do not overlap with polygons from other years) in the multi-year dataset were assigned a unique object identifier (ID). A new object ID in the timeline indicates: - the first occurrence of the object detected in the dataset of the first year available for the area. It does not mean that the object was not present before, - the creation of a potential new MES. The disappearance of an object ID indicates its potential refill. Therefore, the chronology of MES, creation, evolution and filling, can be constrained. 6.2 Evolution of MES over years \u00b6 Figures 13 and 16 illustrate the ability of the trained model to detect and track a single object in a multi-year dataset. The detection over the years appears reliable and consistent, although object detection may be absent from a year dataset ( e.g. due to shadows or colour changes in the surroundings). Remember that the image coverage of a given area is not renewed every year. Characteristics of the potential MES, such as surface evolution (extension or retreat), can be quantified. For example, the surfaces of object IDs 239 and 3861 have more than doubled in about 20 years. Tracking object ID along with image visualisation allows observation of the opening and the closing of potential MES, as object IDs 31, 44, and 229. Figure 16: Detection area (m 2 ) as a function of years for several object ID. Figure 13 provides the visualisation of the object IDs selected. Each point corresponds to an object ID occurrence in the corresponding year dataset. The presence of an object in several years dataset strengthens the likeliness of the detected object to be an actual MES. On the other hand, object detection of only one occurrence is more likely a FP detection. 7. Conclusion and perspectives \u00b6 The project demonstrated the ability to automatically, quickly (a matter of hours for one AoI), and reliably detect potential MES in orthophotos of Switzerland with an automatic detection algorithm (deep learning). The selected trained model achieved a f1-score of 82% on the validation dataset. The final detection polygons accurately delineate the potential MES. We can track single MES through multiple years, emphasising the robustness of the method to detect objects in multi-year datasets despite the detection model being trained on a single dataset (2020 SWISSIMAGE mosaic). However, image colour stretching different from that used to train the model can significantly affect the model's ability to provide reliable detection, as was the case with the FSO images. Although the performance of the trained model is satisfactory, FP and FN are present in the datasets. They are mainly due to confusion of the algorithm between MES and rock outcrops, river sandbeds or construction sites. A manual verification of the relevance of the detection by experts in the field is necessary before processing and interpreting the data. Revision of all the detections from 1999 to 2021 is a time-consuming effort but is necessary to guarantee detection reliability. Despite the required manual checks, the provided framework and detection results constitute a valuable contribution that can greatly assist the inventory and the observation of MES evolution in Switzerland. It provides state-wide detection in a matter of hours, which is a considerable time-saving compared with manual mapping. It also enables MES detection with a standardised method, independent of the information or method adopted by the cantons. Further model improvements could be consider, such as increasing the metrics by improving GT quality, improving model learning strategy, mitigating the model learning variability, or test supervised clustering methods to find relevant detection. This work can be used to compute statistics to study long-term MES in Switzerland and better management of resources and land use in the future. MES detection can be combined with other data, such as the geologic layer, to identify the mineral/rocks exploited and high-resolution DEM ( swissALTI3D ) to infer elevation changes and observe excavation or filling of MES 5 . So far only RGB SWISSIMAGE orthophotos from 1999 to 2021 were processed. Prior to 1999, black and white orthophotos exist but the model trained on RGB images could not be applied trustfully to black and white images. Image colourisation tests (with the help of deep learning algorithm[@farella_colour_2022]) were performed and provided encouraging detection results. This avenue needs to be explored. Finally, automatic detection of MES is rare 1 3 , and most studies perform manual mapping. Therefore, the framework could be the extended to other datasets and/or other countries to provide a valuable asset to the community. A global mapping of MES has been completed with over 21'000 polygons 1 and can be used as a GT database to train an automatic detection model. Code availability \u00b6 The codes are stored and available on the STDL's github repository: proj-dqry : mineral extraction site framework object-detector : object detector framework Acknowledgements \u00b6 This project was made possible thanks to a tight collaboration between the STDL team and swisstopo . In particular, the STDL team acknowledges key contribution from Thomas Galfetti ( swisstopo ). This project has been funded by \"Strategie Suisse pour la G\u00e9oinformation\". Appendix \u00b6 A.1 Influence of empty tiles addition to model performance \u00b6 By selecting tiles intersecting only labels, the detection model is mainly confronted with the presence of the targeted object to be detected. Addition of non-label-intersecting tiles, i.e. empty tiles, provides landscape diversity that might help to improve the object detection performance. In order to evaluate the influence of adding empty tiles to the dataset used for the model performance, empty tiles were chosen randomly (not intersecting labels) within Switzerland boundaries and added to the tile dataset used for the model training (Fig. A1). Empty tiles were added to (1) the whole dataset split as for the initial dataset (training: 70%, test: 15%, and validation: 15%) and (2) only to the training dataset. A visual inspection must be performed to prevent a potential unlabeled MES to be present in the image and disturbing the algorithm learning. Figure A1: View of tiles intersecting (black) labels (yellow) and randomly selected empty tiles (red) in Switzerland. This case correspond to the addition of 35% empty tiles. Figure A1 reveals that adding empty tiles to the dataset does not significantly influence the metrics values. The number of TP, FP, and FN do not show significant variation. However, when performing an inference detection test on a subset of tiles (2000) for an AOI, it appears that the number of raw detections (unfiltered) is reduced as the number of empty tiles increases. However, visual inspection of the final detection after applying filters does not show significant improvement compared to a model trained without adding empty tiles. Figure A1: Influence of the addition of empty tiles (relative to the number of tiles intersecting labels) on trained performance for zoom levels 16 and 17 with (a) the F1-score as a function of the percentage of added empty tiles and (b) the normalised (by the number of tiles sampled = 2000) number of detection as a function of added empty tiles. Empty tiles have been added to only the train dataset for the 5% and 30% cases and to all datasets for 9%, 35%, 70%, and 140% cases. A considered solution to improve the results could be to specifically select tiles for which FP occurred and include them in the training dataset as empty tiles. This way, the model could be trained with relevant confounding features such as snow patches, river sandbeds, or gullies not labeled as GT. A.2 Sensitivity of the model to the number of images per batch \u00b6 During the model learning phase, the trained model is updated after each batch of samples was processed. Adding more samples, i.e. in our case images, to the batch can influence the model learning capacity. We investigated the role of adding more images per batch for a dataset with and without adding a portion of empty tiles to the learning dataset. Adding more images per batch speeds up the model learning (Table A1), and the minimum of the loss curve is reached for a smaller number of iterations. Figure A2: Metrics (precision, recall and f1-score) evolution with the number of images per batch during the model training. Results have been obtained on a dataset without empty tiles addition (red) and with the addition of 23% of empty tiles to the training dataset. Figure A2 reveals that the metrics values remain in a range of constant values while adding extra images to the batch in all cases (with or without empty tiles). A potential effect of adding more images to the batch is the reduction of the metrics variability between replicates of trained models as the range of metrics values is smaller for 8 images per batch than 2 images per batch. However, this observation has to be taken carefully as fewer replicates have been performed with 8 images per batch than for 2 or 4 images per batch. Further investigation would provide stronger insights on this effect. A.3 Evaluation of trained models \u00b6 Table A1 sumup metrics value obtained for all the configuration tested for the project. zoom level model empty tiles (%) image per batch optimum iteration precision recall f1 15 replicate 1 0 2 1000 0.727 0.810 0.766 16 replicate 1 0 2 2000 0.842 0.793 0.817 16 replicate 2 0 2 2000 0.767 0.760 0.763 16 replicate 3 0 2 3000 0.831 0.810 0.820 16 replicate 4 0 2 2000 0.886 0.769 0.826 16 replicate 5 0 2 2000 0.780 0.818 0.798 16 replicate 6 0 2 3000 0.781 0.826 0.803 16 replicate 7 0 4 1000 0.748 0.860 0.800 16 replicate 8 0 4 1000 0.779 0.785 0.782 16 replicate 9 0 8 1500 0.800 0.793 0.797 16 replicate 10 0 4 1000 0.796 0.744 0.769 16 replicate 11 0 8 1000 0.802 0.769 0.785 16 ET-250_allDS_1 34.2 2 2000 0.723 0.770 0.746 16 ET-250_allDS_2 34.2 2 3000 0.748 0.803 0.775 16 ET-1000_allDS_1 73.8 2 6000 0.782 0.815 0.798 16 ET-1000_allDS_2 69.8 2 6000 0.786 0.767 0.776 16 ET-1000_allDS_3 70.9 2 6000 0.777 0.810 0.793 16 ET-1000_allDS_4 73.8 2 6000 0.768 0.807 0.787 16 ET-2000_allDS_1 143.2 2 6000 0.761 0.748 0.754 16 ET-80_trnDS_1 5.4 2 2000 0.814 0.793 0.803 16 ET-80_trnDS_2 5.4 2 2000 0.835 0.752 0.791 16 ET-80_trnDS_3 5.4 2 2000 0.764 0.802 0.782 16 ET-400_trnDS_1 29.5 2 6000 0.817 0.777 0.797 16 ET-400_trnDS_2 29.5 2 5000 0.848 0.785 0.815 16 ET-400_trnDS_3 29.5 2 4000 0.758 0.802 0.779 16 ET-400_trnDS_4 29.5 4 2000 0.798 0.818 0.808 16 ET-400_trnDS_5 29.5 4 1000 0.825 0.777 0.800 16 ET-1000_trnDS_1 0 2 4000 0.758 0.802 0.779 17 replicate 1 0 2 5000 0.819 0.853 0.835 17 replicate 1 0 2 5000 0.803 0.891 0.845 17 replicate 1 0 2 5000 0.872 0.813 0.841 17 ET-250_allDS_1 16.8 2 3000 0.801 0.794 0.797 17 ET-1000_allDS_1 72.2 2 7000 0.743 0.765 0.754 18 replicate 1 0 2 10000 0.864 0.855 0.859 Table A1: Metrics value computed for the validation dataset for all the trained models with the 2020 SWISSIMAGE Journey mosaic at zoom level 16. Victor Maus, Stefan Giljum, Jakob Gutschlhofer, Dieison M. Da Silva, Michael Probst, Sidnei L. B. Gass, Sebastian Luckeneder, Mirko Lieber, and Ian McCallum. A global-scale data set of mining areas. Scientific Data , 7(1):289, September 2020. URL: https://www.nature.com/articles/s41597-020-00624-w , doi:10.1038/s41597-020-00624-w . \u21a9 \u21a9 \u21a9 \u21a9 \u21a9 Vicen\u00e7 Carabassa, Pau Montero, Marc Crespo, Joan-Cristian Padr\u00f3, Xavier Pons, Jaume Balagu\u00e9, Llu\u00eds Brotons, and Josep Maria Alca\u00f1iz. Unmanned aerial system protocol for quarry restoration and mineral extraction monitoring. Journal of Environmental Management , 270:110717, September 2020. URL: https://linkinghub.elsevier.com/retrieve/pii/S0301479720306496 , doi:10.1016/j.jenvman.2020.110717 . \u21a9 \u21a9 \u21a9 \u21a9 Chunsheng Wang, Lili Chang, Lingran Zhao, and Ruiqing Niu. Automatic Identification and Dynamic Monitoring of Open-Pit Mines Based on Improved Mask R-CNN and Transfer Learning. Remote Sensing , 12(21):3474, January 2020. URL: https://www.mdpi.com/2072-4292/12/21/3474 , doi:10.3390/rs12213474 . \u21a9 \u21a9 \u21a9 \u21a9 Haoteng Zhao, Yong Ma, Fu Chen, Jianbo Liu, Liyuan Jiang, Wutao Yao, and Jin Yang. Monitoring Quarry Area with Landsat Long Time-Series for Socioeconomic Study. Remote Sensing , 10(4):517, April 2018. URL: https://www.mdpi.com/2072-4292/10/4/517 , doi:10.3390/rs10040517 . \u21a9 \u21a9 Valentin Tertius Bickel and Andrea Manconi. Decadal Surface Changes and Displacements in Switzerland. Journal of Geovisualization and Spatial Analysis , 6(2):24, December 2022. URL: https://link.springer.com/10.1007/s41651-022-00119-9 , doi:10.1007/s41651-022-00119-9 . \u21a9 \u21a9 George P. Petropoulos, Panagiotis Partsinevelos, and Zinovia Mitraka. Change detection of surface mining activity and reclamation based on a machine learning approach of multi-temporal Landsat TM imagery. Geocarto International , 28(4):323\u2013342, July 2013. URL: http://www.tandfonline.com/doi/abs/10.1080/10106049.2012.706648 , doi:10.1080/10106049.2012.706648 . \u21a9 Huriel Reichel and Nils Hamel. Automatic Detection of Quarries and the Lithology below them in Switzerland. 2022. URL: file:///C:/Users/Clemence/Documents/STDL/Projects/proj-quarries/01_Documentation/Bibliography/Automatic%20Detection%20of%20Quarries%20and%20the%20Lithology%20below%20them%20in%20Switzerland%20-%20Swiss%20Territorial%20Data%20Lab.htm . \u21a9 \u21a9 \u21a9 \u21a9 Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. Detectron2. 2019. URL: https://github.com/facebookresearch/detectron2 . \u21a9 Kaiming He, Georgia Gkioxari, Piotr Doll\u00e1r, and Ross Girshick. Mask R-CNN. January 2018. arXiv:1703.06870 [cs]. URL: http://arxiv.org/abs/1703.06870 , doi:10.48550/arXiv.1703.06870 . \u21a9","title":"Automatic detection and observation of mineral extraction sites in Switzerland"},{"location":"PROJ-DQRY-TM/#automatic-detection-and-observation-of-mineral-extraction-sites-in-switzerland","text":"Cl\u00e9mence Herny (Exolabs), Shanci Li (Uzufly), Alessandro Cerioni (\u00c9tat de Gen\u00e8ve), Roxane Pott (swisstopo) Proposed by swisstopo - PROJ-DQRY-TM October 2022 to February 2023 - Published on January 2024 Abstract : Studying the evolution of mineral extraction sites (MES) is of primary importance for assessing the availability of mineral resources, managing MES and evaluating the impact of mining activity on the environment. In Switzerland, MES are inventoried at local level by the cantons and at federal level by swisstopo. The latter performs manual vectorisation of MES boundaries. Unfortunately, although the data is of high quality, it is not regularly updated. To automate this tedious task and to better observe the evolution of MES, swisstopo has solicited the STDL to carry out an automatic detection of MES in Switzerland over the years. We performed instance segmentation using a deep learning method to automatically detect MES in RGB aerial images with a spatial resolution of 1.6 m px -1 . The detection model was trained with 266 labels and orthophotos from the SWISSIMAGE RGB mosaic published in 2020. The selected trained model achieved a f1-score of 82% on the validation dataset. The model was used to do detection by inference of potential MES in SWISSIMAGE RGB orthophotos from 1999 to 2021. The model shows good ability to detect potential MES with about 82% of labels detected for the 2020 SWISSIMAGE mosaic. The detections obtained with SWISSIMAGE orthophotos acquired over different years can be tracked to observe their temporal evolution. The framework developed can perform detection in an area of interest (about a third of Switzerland at the most) in just a few hours, which is a major advantage over manual mapping. We acknowledge that there are some missed and false detections in the final product, and the results need to be reviewed and validated by domain experts before being analysed and interpreted. The results can be used to perform statistics over time and update MES evolution in future image acquisitions.","title":"Automatic detection and observation of mineral extraction sites in Switzerland"},{"location":"PROJ-DQRY-TM/#1-introduction","text":"","title":"1. Introduction"},{"location":"PROJ-DQRY-TM/#11-context","text":"Mineral extraction constitutes a strategic activity worldwide, including in Switzerland. Demand for mineral resources has been growing significantly in recent decades 1 , mainly due to the rapid increase in the production of batteries and electronic chips, or buildings construction, for example. As a result, the exploitation of some resources, such as rare earth elements, lithium, or sand, is putting pressure on their availability. Being able to observe the development of mineral extraction sites (MES) is of primary importance to adapting mining strategy and anticipating demand and shortage. Mining has also strong environmental and societal impact 2 3 . It implies the extraction of rocks and minerals from water ponds, cliffs, and quarries. The surface affected, initially natural areas, can reach up to thousands of square kilometres 1 . The extraction of some minerals could lead to soil and water pollution and involves polluting truck transport. Economic and political interests of some resources might overwhelm land protection, and conflicts are gradually intensifying 2 . MES are dynamic features that can evolve according to singular patterns, especially if they are small, as is the case in Switzerland. A site can expand horizontally and vertically or be filled to recover the site 4 2 3 5 . Changes can happen quickly, in a couple of months. As a results, updating the MES inventory can be challenging. There is a significant demand for effective MES observation of development worldwide. Majority of MES mapping is performed manually by visual inspection of images 1 . Alternatively, recent improvements in the availability of high spatial and temporal resolution space/airborne imagery and computational methods have encouraged the development of automated image processing. Supervised classification of spectral images is an effective method but requires complex workflow 6 4 2 . More recently, few studies have implemented deep learning algorithms to train models to detect extraction sites in images and have shown high levels of accuracy 3 . In Switzerland, MES management is historically regulated on a canton-based level using GIS data, including information about the MES location, extent, and extracted materials among others. At the federal level, swisstopo and the Federal Office of Statistics (FSO) observe the development of MES. swisstopo has carried out a detailed manual delineation of MES based on SWISSIMAGE dataset over Switzerland. In the scope to fasten and improving the process of MES mapping in Switzerland, we developed a method for automating MES detection over the years. Ultimately, the goal is to keep the database up to date when new images are acquired. The results can be statistically process to better assess the MES evolution over time in Switzerland.","title":"1.1 Context"},{"location":"PROJ-DQRY-TM/#12-approach","text":"The STDL has developed a framework named object-detector to automatically detect objects in a georeferenced imagery dataset based on deep learning method. The framework can be adapted to detect MES (also referred as quarry in the project) in Switzerland. A project to automatically detect MES in Switzerland 7 has been carried out by the STDL in 2021 ( detector-interface framework). Detection of potential MES obtained by automatic detection on the 2020 SWISSIMAGE mosaic has already been delivered to swisstopo (layer 2021_10_STDL_QC1 ). The method has proven its efficiency detecting MES. The numerical model trained with the object detector achieved a f1-score of 82% and detected about 1200 potential MES over Switzerland. In this project, we aim to continue this work and extend it to a second objective, that of observing MES evolution over time. The main challenge is to prove the algorithm reliability for detecting objects in a multi-year dataset images acquired with different sensors. The project workflow is synthesised in Figure 1. First, a deep learning algorithm is trained using a manually mapped MES dataset that serves as ground truth (GT). After evaluating the performance of the trained model, the selected one was used to perform inference detection for a given year dataset and area of interest (AoI). The results were filtered to discard irrelevant detection. The operation was repeated over several years. Finally, each potential MES detected was tracked over the years to observe its evolution. Figure 1: Workflow diagram for automatic MES detection. In this report, we first describe the data used, including the image description and the definition of AoI. Then we explain the model training, evaluation and object detection procedure. Next, we present the results of potential MES detection and the MES tracking strategy. Finally, we provide conclusion and perspectives.","title":"1.2. Approach"},{"location":"PROJ-DQRY-TM/#2-data","text":"","title":"2. Data"},{"location":"PROJ-DQRY-TM/#21-images-and-area-of-interest","text":"Automatic detection of potential MES over the years in Switzerland was performed with aerial orthophotos from the swisstopo product SWISSIMAGE Journey . Images are georeferenced RGB TIF tiles with a size of 256 x 256 pixels (1 km 2 ). Product Year Coordinate system Spatial resolution SWISSIMAGE 10 cm 2017 - current CH1903+/MN95 (EPSG:2056) 0.10 m ( \\(\\sigma\\) \\(\\pm\\) 0.15 m) - 0.25 m SWISSIMAGE 25 cm 2005 - 2016 MN03 (2005 - 2007) and MN95 (since 2008) 0.25 m ( \\(\\sigma\\) \\(\\pm\\) 0.25 m) - 0.50 m ( \\(\\sigma\\) \\(\\pm\\) 3.00 - 5.00 m) SWISSIMAGE 50 cm 1998 - 2004 MN03 0.50 m ( \\(\\sigma\\) \\(\\pm\\) 0.50 m) Table 1: SWISSIMAGE products characteristics. Several SWISSIMAGE products exist, produced from different instrumentation (Table 1). SWISSIMAGE mosaics are built and published yearly. The year of the mosaic corresponds to the last year of the dataset publication, and the most recent orthophotos datasets available are then used to complete the mosaic. For example the 2020 SWISSIMAGE mosaic is a combination of 2020, 2019 and 2018 images acquisition. The 1998 mosaic release corresponds to a year of transition from black and white images ( SWISSIMAGE HIST ) to RGB images. For this study, only RGB data from 1999 to 2021 were considered. Figure 2: Acquisition footprint of SWISSIMAGE aerial orthophotos for the years 2016 to 2021. The SWISSIMAGE Journey mosaic in the background is the 2020 release. Acquisition footprints of yearly acquired orthophotos were used as AoI to perform MES detection through time. Over the years, the footprints may spatially overlap (Fig. 2). Since 2017, the geometry of the acquisition footprints has been quasi-constant, dividing Switzerland into three more or less equal areas, ensuring that the orthophotos are updated every three years. For the years before 2017, the acquisition footprints were not systematic and do not guarantee a periodically update of the orthophotos. The acquisition footprint may also not be spatially contiguous. Figure 3: Illustration of the combination of SWISSIMAGE images and FSO images for the 2007 SWISSIMAGE mosaic. (a) Overview of the 2007 SWISSIMAGE mosaic. The red polygon corresponds to the provided SWISSIMAGE acquisition footprint for 2007. The orange polygon corresponds to the surface covered by the new SWISSIMAGE for 2007. The remaining area of the red polygon corresponds to the FSO image dataset acquired in 2007. The black box indicates the panel (b) location, and the white box indicates the panel (c) location. (b) Side-by-side comparison of image composition in 2006 and 2007 SWISSIMAGE mosaics. (c) Examples of detection polygons (white polygons) obtained by inference on the 2007 SWISSIMAGE dataset (red box) and FSO images 2007 (outlined by black box). SWISSIMAGE Journey mosaics of 2005, 2006, and 2007 present a particularity as it is composed not only of 25 cm resolution SWISSIMAGE but also of orthophotos acquired for the FSO. These are tiff RGB orthophotos with a spatial resolution of 50 cm px -1 (coordinate system: CH1903/LV03 (EPSG:21781)) and have been integrated into the SWISSIMAGE Journey products. However, these images were discarded (modification of the footprint shape) from our dataset because they were causing issues in the MES automatic detection producing odd segmented detection shapes (Fig. 3). This is probably due to the different stretching of pixel colour between datasets. It also has to be noted that there are currently missing images (about 88 tiles at zoom level 16) in the 2020 SWISSIMAGE dataset.","title":"2.1 Images and area of interest"},{"location":"PROJ-DQRY-TM/#22-image-fetching","text":"Pre-rendered SWISSIMAGE tiles (256 x 256 px, 1 km 2 ) are downloaded using the Web Map Tile Service (WMTS) wmts.geo.admin.ch via an XYZ connector. Tiles are served on a cartesian coordinates grid using a Web Mercator Quad projection and a coordinate reference system EPGS 3857. Position of a tile on the grid is defined by x and y coordinates and the pixel resolution of the image is defined by z , its zoom level. Changing the zoom level affects the resolution by a factor of 2 (Fig. 4). For instance a zoom level of 17 corresponds to a resolution of 0.8 m px -1 and a zoom level of 16 to a resolution of 1.6 m px -1 . Figure 4: Examples of tiles geometry at zoom level 16 (z16, black polygons) and at zoom level 17 (z17, blue polygons). The number of tiles for each zoom level is indicated in square brackets. The tiles are selected for model training, i.e. only tiles intersecting swissTLM3D labels (tlm-hr-trn-topo, yellow polygons). Note that in the subsequent project carried out by Reichel and Hamel (2021) 7 , the tiling method adopted was slightly different from the one adopted for this project. Custom size and resolution tiles were built. A sensitivity analysis of these two parameters was conducted and led to the choice of tiles with a size of about 500 m and a pixel resolution of about 1 m (above, the performance was not significantly improved).","title":"2.2 Image fetching"},{"location":"PROJ-DQRY-TM/#23-ground-truth","text":"The MES labels originate from the swiss Topographic Landscape Model 3D ( swissTLM3D ) produced by swisstopo . swissTLM3D is a large-scale topographic landscape model of Switzerland, including manually drawn and georeferenced vectors of objects of interest at a high resolution, including MES features. Domain experts from swisstopo have carried out extensive work to review the labeled MES and to synchronise them with the 2020 SWISSIMAGE mosaic to improve the quality of the labeled dataset. A total of 266 labels are available. The mapped MES reveal the diversity of MES characteristics, such as the presence or absence of buildings/infrastructures, trucks, water pounds, and vegetation (Fig. 5). Figure 5: Examples of MES mapped in swissTLM3D and synchronised to 2020 SWISSIMAGE mosaic. These labels are used as the ground truth (GT) i.e. the reference dataset indicating the presence of a MES in an image. The GT is used both as input to train the model to detect MES and to evaluate the model performance.","title":"2.3 Ground truth"},{"location":"PROJ-DQRY-TM/#3-automatic-detection-methodology","text":"","title":"3. Automatic detection methodology"},{"location":"PROJ-DQRY-TM/#31-deep-learning-algorithm-for-object-detection","text":"Training and inference detection of potential MES in SWISSIMAGE were performed with the object detector framework. This project is based on the open source detectron2 framework 8 implemented with PyTorch by the Facebook Artificial Intelligence Research group (FAIR). Instance segmentation (delineation of object) was performed with a Mask R-CNN deep learning algorithm 9 . It is based on a Recursive-Convolutional Neural Network (CNN) with a backbone pre-trained model ResNet-50 (50 layers deep residual network). Images were annotated with custom COCO object based on the labels (class 'Quarry'). The model is trained with this dataset to later perform inference detection on images. If the object is detected by the algorithm, a pixel mask is produced with a confidence score (0 to 1) attributed to the detection (Fig. 6). Figure 6: Example of detection mask. The pink rectangle corresponds to the bounding box of the object, the object is segmented by the pink polygons associated with the detection class ('Quarry') and a confidence score. The object detector framework permits to convert detection mask to georeferenced polygon that can be used in GIS softwares. The implementation of the Ramer-Douglas-Peucker ( RDP ) algorithm, allows the simplification of the derived polygons by discarding non-essential points based on a smoothing parameter. This allow to considerably reduces the amount of data to be stored and prevent potential memory saturation while deriving detection polygons on large areas as it is the case for this study.","title":"3.1 Deep learning algorithm for object detection"},{"location":"PROJ-DQRY-TM/#32-model-training","text":"Orthophotos from the 2020 SWISSIMAGE mosaic, for which the GT has been defined, were chosen to proceed the model training. Tiles intersecting labels were selected and split randomly into three datasets: the training dataset (70%), the validation dataset (15%), and the test dataset (15%). Addition of empty tiles (no annotation) to confront the model to landscapes not containing the target object has been tested ( Appendix A.1 ) but did not provide significant improvement in the model performance to be adopted. Figure 7: Training curves obtained at zoom level 16 on the 2020 SWISSIMAGE mosaic. The curves were obtained for the trained model 'replicate 3'. (a) Learning rate in function of iteration. The step was defined every 500 iterations. The initial learning rate was 5.0 x 10 -3 with a weight and bias decay of 1.0 x 10 -4 . (b) The total loss is a function of iteration. Raw measurement (light red) and smoothed curve (0.6 factor, solid red) are superposed. (c) The validation loss curve is a function of iteration. Raw measurement (light red) and smoothed curve (0.6 factor, solid red) are superposed. The vertical dashed black lines indicate the iteration minimising the validation loss curve, i.e. 3000. Models were trained with two images per batch ( Appendix A.2 ), a learning rate of 5 x 10 -3 , and a learning rate decay of 1 x 10 -4 every 500 steps (Fig. 7 (a)). For the given model, parameters and a zoom level of 16 ( Section 3.3.3 ), the training is performed over 7000 iterations and lasts about 1 hour on a 16 GiB GPU (NVIDIA Tesla T4) machine compatible with CUDA . The total (train and validation loss) loss curve decreases until reaching a quasi-steady state around 6000 iterations (Fig. 7 (b)). The optimal detection model corresponds to the one minimising the validation loss curve. This minimum is reached between 2000 and 3000 iterations (Fig. 7 (c)).","title":"3.2 Model training"},{"location":"PROJ-DQRY-TM/#33-metrics","text":"The model performance and detection reliability were assessed by comparing the results to the GT. The detection performed by the model can be either (1) a True Positive (TP), i.e. the detection is real (spatially intersecting the GT) ; (2) a False Positive i.e. the detection is not real (not spatially intersecting the GT) or (3) a False Negative (FN) i.e. the labeled object is not detected by the algorithm (Fig. 8). Tagging the detection (Fig. 9(a)) allows to calculate several metrics (Fig. 9(b)) such as: Figure 8: Examples of different detection cases. Label is represented with a yellow polygon and detection with a red polygon. (a) True Positive (TP) detection intersecting the GT, (b) a potential True Positive (TP?) detection with no GT, (c) False Negative (FN) case with no detection while GT exists, (d) False Positive (FP) detection of object that is not a MES. the recall , translating the amount of TP detections predicted by the model: \\[recall = \\frac{TP}{(TP + FN)}\\] the precision , translating the number of well-predicted TP among all the detections: \\[precision = \\frac{TP}{(TP + FP)}\\] the f1-score , the harmonic average of the precision and the recall: \\[f1 = 2 \\times \\frac{recall \\times precision}{recall + precision}\\] Figure 9: Evaluation of the trained model performance obtained at zoom level 16 for the trained model 'replicate 3' (Table 2). (a) Number of TP (blue), FN (red), and FP (green) as a function of detection score threshold for the validation dataset. (b) Metrics value, precision (blue), recall (red), and f1-score (green) as a function of the detection score threshold for the validation dataset. The maximum f1-score value is 82%.","title":"3.3 Metrics"},{"location":"PROJ-DQRY-TM/#4-automatic-detection-model-analysis","text":"","title":"4. Automatic detection model analysis"},{"location":"PROJ-DQRY-TM/#41-model-performance-and-replicability","text":"Trained models reached f1-scores of about 80% with a standard deviation of 2% (Table 2). The performances are similar to the model trained by Reichel and Hamel (2021) 7 . model precision recall f1 replicate 1 0.84 0.79 0.82 replicate 2 0.77 0.76 0.76 replicate 3 0.83 0.81 0.82 replicate 4 0.89 0.77 0.82 replicate 5 0.78 0.82 0.80 Table 2: Metrics value computed for the validation dataset for trained models replicates with the 2020 SWISSIMAGE mosaic at zoom level 16. A variability is expected as the deep learning algorithm displays some random behavior, but it is supposed to be negligible. However, the observed model variability is enough to affect final results that might slightly change by using different trained models with same input parameters (Fig. 10). Figure 10: Detection polygons obtained for the different trained model replicates (Table 2) highlighting results variability. The labels correspond to orange polygons. The number in the square bracket corresponds to the number of polygons. The inference detections have been performed on a subset of 2000 tiles for the 2020 SWISSIMAGE at zoom level 16. Detections have been filtered according to the parameters defined in Section 5.1. To reduce the variability of the trained models, the random seeds of both detectron2 and python have been fixed. Neither of these attempts have been successful, and the variability remains. The nondeterministic behavior of detectron2 has been recognised ( issue 1 , issue 2 ), but no suitable solution has been provided yet. Further investigation on the model performance and consistency should be performed in the future. To mitigate the results variability of model replicates, we could consider in the future to combine the results of several model replicates to remove FP while preserving the TP and potential TP detection. The choice and number of models used should be evaluated. This method is tedious as it requires inference detection from several models, which can be time-consuming and computationally intensive.","title":"4.1. Model performance and replicability"},{"location":"PROJ-DQRY-TM/#42-sensitivity-to-the-zoom-level","text":"Image resolution is dependent on the zoom level ( Section 2.2 ). To select the most suitable zoom level for MES detection, we performed a sensitivity analysis on trained model performance. Increasing the zoom level increases the value of the metrics following a global linear trend (Fig. 11). Figure 11: Metrics values (precision, recall and f1) as function of zoom level for the validation dataset. The results of the replicates performed at each zoom level are included (Table A1). Models trained at a higher zoom level performed better. However, a higher zoom level implies smaller tile and thus, a larger number of tiles to fill the AoI. For a typical AoI, i.e up to a third of Switzerland, this can lead to a large number of tiles to be stored and processed, leading to potential RAM and/or disk space saturation. For 2019 AoI, 89'290 tiles are required at zoom level 16 while 354'867 tiles are required at zoom level 17, taking respectively 3 hours and 11 hours to process on a 30 GiB RAM machine with a 16 GiB GP. Visual comparison of inference detection reveals that there was no significant improvement in the object detection quality from zoom level 16 to zoom level 17. Both zoom level present a similar proportion of detections intersecting labels (82% and 79% for zoom level 16 and zoom level 17 respectively). On the other hand, the quality of object detection at zoom level 15 was depreciated. Indeed, detection scores were lower, with only tens of detection scores above 0.95 while it was about 400 at zoom level 16 and about 64% of detection intersecting labels.","title":"4.2 Sensitivity to the zoom level"},{"location":"PROJ-DQRY-TM/#43-model-choice","text":"Based on tests performed, we selected the 'replicate 3' model, obtained (Tables 2 and A1) at zoom level 16, to perform inference detection. Models trained at zoom level 16 (1.6 m px -1 pixel resolution) have shown satisfying results in accurately detecting MES contour and limiting the number of FP with high detection score (Fig. 11). It represents a good trade-off between results reliability (f1-score between 76% and 82% on the validation dataset) and computational resources. Then, among all the replicates performed at zoom level 16, we selected the trained model 'replicate 3' (Table 2) because it combines both the highest metrics values (for the validation dataset but also the train and test datasets), close precision and recall values and a rather low amount of low score detections.","title":"4.3 Model choice"},{"location":"PROJ-DQRY-TM/#5-automatic-detection-of-mes","text":"","title":"5. Automatic detection of MES"},{"location":"PROJ-DQRY-TM/#51-detection-post-processing","text":"Detection by inference was performed over AoIs with a threshold detection score of 0.3 (Fig. 12). The low score filtering results in a large amount of detections. Several detections may overlap, potentially segmenting a single object. In addition a detection might be split into multiple tiles. To improve the pertinence and the aesthetics of the raw detection polygons, a post-processing procedure was applied. First, a large proportion of FP occurred in mountainous areas (rock outcrops and snow, Fig. 12(a)). We assumed MES are not present (or at least sparse) above a given altitude. An elevation filtering was applied using a Switzerland Digital Elevation Model (about 25 m px -1 ) derived from the SRTM instrument ( USGS - SRTM ). The maximum elevation of the labeled MES is about 1100 m. Second, detection aggregation was applied: - polygons were clustered ( K-means ) according to their centroid position. The method involves setting a predefined number k of clusters. Manual tests performed by Reichel and Hamel (2021) 7 concluded to set k equal to the number of detection divided by three. The highest detection score was assigned to the clustered detection. This method preserves the final integrity of detection polygons by retaining detection that has potentially a low confidence score but belongs to a cluster with a higher confidence score improving the final segmentation of the detected object. The value of the threshold score must be kept relatively low ( i.e. 0.3) when performing the detection to prevent removing too many polygons that could potentially be part of the detected object. We acknowledge that determining the optimal number of clusters by clustering validation indices rather than manual adjustment would be more robust. In addition, exploring other clustering methods, such as DBSCAN , based on local density, can be considered in the future. - score filtering was applied. - spatially close polygons were assumed to belong to the same MES and are merged according to a distance threshold. The averaged score of the merged detection polygons was ultimately computed. Finally, we assumed that a MES covers a minimal area. Detection with an area smaller than a given threshold were filtered out. The minimum MES area in the GT is 2270 m 2 . Figure 12: MES detection filtering. (a) Overview of the automatic detection of MES obtained with 2020 SWISSIMAGE at zoom level 16. Transparent red polygons (with associated confidence score in white) correspond to the raw object detection output and the red line polygons (with associated confidence score in red) correspond to the final filtered detection. The black box outlines the location of the (b) and (c) panel zoom. Note the large number of detection in the mountains (right area of the image). (b) Zoom on several raw detections polygons of a single object with their respective confidence score. (c) Zoom on a filtered detection polygon of a single object with the resulting score. Sensitivity of detections to these filters was investigated (Table 3). The quantitative evaluation of filter combination relevance is tricky as potential MES presence is performed by inference, and the GT provided by swissTLM3D constitutes an incomplete portion of the MES in Switzerland (2020). As indication, we computed the number of spatial intersection between ground truth and detection obtained with the 2020 SWISSIMAGE mosaic. Filter combination number 3 was adopted, allowing to detect about 82% of the GT with a relatively limited amount of FP detection compared to filter combinations 1 and 2 (from visual inspection). filters combination score threshold elevation threshold (m) area threshold (m 2 ) distance threshold (m) number of detection label detection (%) 1 0.95 2000 1100 10 1745 85.1 2 0.95 2000 1200 10 1862 86.6 3 0.95 5000 1200 10 1347 82.1 4 0.96 2000 1100 10 1331 81.3 5 0.96 2000 1200 8 1445 78.7 6 0.96 5000 1200 10 1004 74.3 Table 3: Threshold values of filtering parameters and their respective number of detections and intersection proportion with swissTLM3D labels. The detections have been obtained for the 2020 SWISSIMAGE mosaic. We acknowledged that for the selected filter combination, the area threshold value is higher than the smallest area value of the GT polygons. However, reducing the area value increases significantly the presence of FP. Thirteen labels display an area below 5000 m 2 .","title":"5.1 Detection post-processing"},{"location":"PROJ-DQRY-TM/#52-inference-detections","text":"The trained model was used to perform inference detection on SWISSIMAGE orthophotos from 1999 to 2021. The automatic detection model shows good capabilities to detect MES in different years orthophotos (Fig. 13), despite being trained on the 2020 SWISSIMAGE mosaic. The model also demonstrates capabilities to detect potential MES that have not been mapped yet but are strong candidates. However, the model misses some labeled MES or potential MES (FN, Fig. 8). However, when the model process FSO images, with different colour stretching, it failed to correctly detect potential MES (Fig. 3). It reveals that images must have characteristics close to the training dataset for optimal results with a deep learning model. Figure 13: Examples of object detection segmented by polygons in different year orthophotos. The yellow polygon for the year 2020 panel of object ID 3761 corresponds to the label. Other coloured polygons correspond to the algorithm detection. Then, we acknowledge that a significant amount of FP detection can still be observed in our filtered detection dataset (Figs. 8 and 14). The main sources of FP are the presence of large rock outcrops, mountainous areas without vegetation, snow, river sand beds, brownish-coloured fields, or construction areas. MES present a large variety of features (buildings, water pounds, trucks, vegetation) (Fig. 5) which can be a source of confusion for the algorithm but even sometimes for human eye. Therefore, the robustness of the GT is crucial for reliable detection. The algorithm's results should be taken carefully. Figure 14: Examples of FP detection. (a) Snow patches (2019) ; (b) River sand beds and gullies (2019); (c) Brownish field (2020); (d) vineyards (2005); (e) Airport tarmac (2020); (f) Construction site (2008). The detections produced by the algorithm are potential MES, but the final results must be reviewed by experts in the field to discard remaining FP detection and correct FN before any processing or interpretation.","title":"5.2 Inference detections"},{"location":"PROJ-DQRY-TM/#6-observation-of-mes-evolution","text":"","title":"6. Observation of MES evolution"},{"location":"PROJ-DQRY-TM/#61-object-tracking-strategy","text":"Switzerland is covered by RGB SWISSIMAGE product over more than 20 years (1999 to actual), allowing changes to be detected (Fig. 13). Figure 15: Strategy for MES tracking over time. ID assignment to detection. Spatially intersecting polygons share the same ID allowing the MES to be tracked in a multi-year dataset. We assumed that detection polygons that overlap from one year to another describe a single object (Fig. 15). Overlapping detections and unique detections (which do not overlap with polygons from other years) in the multi-year dataset were assigned a unique object identifier (ID). A new object ID in the timeline indicates: - the first occurrence of the object detected in the dataset of the first year available for the area. It does not mean that the object was not present before, - the creation of a potential new MES. The disappearance of an object ID indicates its potential refill. Therefore, the chronology of MES, creation, evolution and filling, can be constrained.","title":"6.1 Object tracking strategy"},{"location":"PROJ-DQRY-TM/#62-evolution-of-mes-over-years","text":"Figures 13 and 16 illustrate the ability of the trained model to detect and track a single object in a multi-year dataset. The detection over the years appears reliable and consistent, although object detection may be absent from a year dataset ( e.g. due to shadows or colour changes in the surroundings). Remember that the image coverage of a given area is not renewed every year. Characteristics of the potential MES, such as surface evolution (extension or retreat), can be quantified. For example, the surfaces of object IDs 239 and 3861 have more than doubled in about 20 years. Tracking object ID along with image visualisation allows observation of the opening and the closing of potential MES, as object IDs 31, 44, and 229. Figure 16: Detection area (m 2 ) as a function of years for several object ID. Figure 13 provides the visualisation of the object IDs selected. Each point corresponds to an object ID occurrence in the corresponding year dataset. The presence of an object in several years dataset strengthens the likeliness of the detected object to be an actual MES. On the other hand, object detection of only one occurrence is more likely a FP detection.","title":"6.2 Evolution of MES over years"},{"location":"PROJ-DQRY-TM/#7-conclusion-and-perspectives","text":"The project demonstrated the ability to automatically, quickly (a matter of hours for one AoI), and reliably detect potential MES in orthophotos of Switzerland with an automatic detection algorithm (deep learning). The selected trained model achieved a f1-score of 82% on the validation dataset. The final detection polygons accurately delineate the potential MES. We can track single MES through multiple years, emphasising the robustness of the method to detect objects in multi-year datasets despite the detection model being trained on a single dataset (2020 SWISSIMAGE mosaic). However, image colour stretching different from that used to train the model can significantly affect the model's ability to provide reliable detection, as was the case with the FSO images. Although the performance of the trained model is satisfactory, FP and FN are present in the datasets. They are mainly due to confusion of the algorithm between MES and rock outcrops, river sandbeds or construction sites. A manual verification of the relevance of the detection by experts in the field is necessary before processing and interpreting the data. Revision of all the detections from 1999 to 2021 is a time-consuming effort but is necessary to guarantee detection reliability. Despite the required manual checks, the provided framework and detection results constitute a valuable contribution that can greatly assist the inventory and the observation of MES evolution in Switzerland. It provides state-wide detection in a matter of hours, which is a considerable time-saving compared with manual mapping. It also enables MES detection with a standardised method, independent of the information or method adopted by the cantons. Further model improvements could be consider, such as increasing the metrics by improving GT quality, improving model learning strategy, mitigating the model learning variability, or test supervised clustering methods to find relevant detection. This work can be used to compute statistics to study long-term MES in Switzerland and better management of resources and land use in the future. MES detection can be combined with other data, such as the geologic layer, to identify the mineral/rocks exploited and high-resolution DEM ( swissALTI3D ) to infer elevation changes and observe excavation or filling of MES 5 . So far only RGB SWISSIMAGE orthophotos from 1999 to 2021 were processed. Prior to 1999, black and white orthophotos exist but the model trained on RGB images could not be applied trustfully to black and white images. Image colourisation tests (with the help of deep learning algorithm[@farella_colour_2022]) were performed and provided encouraging detection results. This avenue needs to be explored. Finally, automatic detection of MES is rare 1 3 , and most studies perform manual mapping. Therefore, the framework could be the extended to other datasets and/or other countries to provide a valuable asset to the community. A global mapping of MES has been completed with over 21'000 polygons 1 and can be used as a GT database to train an automatic detection model.","title":"7. Conclusion and perspectives"},{"location":"PROJ-DQRY-TM/#code-availability","text":"The codes are stored and available on the STDL's github repository: proj-dqry : mineral extraction site framework object-detector : object detector framework","title":"Code availability"},{"location":"PROJ-DQRY-TM/#acknowledgements","text":"This project was made possible thanks to a tight collaboration between the STDL team and swisstopo . In particular, the STDL team acknowledges key contribution from Thomas Galfetti ( swisstopo ). This project has been funded by \"Strategie Suisse pour la G\u00e9oinformation\".","title":"Acknowledgements"},{"location":"PROJ-DQRY-TM/#appendix","text":"","title":"Appendix"},{"location":"PROJ-DQRY-TM/#a1-influence-of-empty-tiles-addition-to-model-performance","text":"By selecting tiles intersecting only labels, the detection model is mainly confronted with the presence of the targeted object to be detected. Addition of non-label-intersecting tiles, i.e. empty tiles, provides landscape diversity that might help to improve the object detection performance. In order to evaluate the influence of adding empty tiles to the dataset used for the model performance, empty tiles were chosen randomly (not intersecting labels) within Switzerland boundaries and added to the tile dataset used for the model training (Fig. A1). Empty tiles were added to (1) the whole dataset split as for the initial dataset (training: 70%, test: 15%, and validation: 15%) and (2) only to the training dataset. A visual inspection must be performed to prevent a potential unlabeled MES to be present in the image and disturbing the algorithm learning. Figure A1: View of tiles intersecting (black) labels (yellow) and randomly selected empty tiles (red) in Switzerland. This case correspond to the addition of 35% empty tiles. Figure A1 reveals that adding empty tiles to the dataset does not significantly influence the metrics values. The number of TP, FP, and FN do not show significant variation. However, when performing an inference detection test on a subset of tiles (2000) for an AOI, it appears that the number of raw detections (unfiltered) is reduced as the number of empty tiles increases. However, visual inspection of the final detection after applying filters does not show significant improvement compared to a model trained without adding empty tiles. Figure A1: Influence of the addition of empty tiles (relative to the number of tiles intersecting labels) on trained performance for zoom levels 16 and 17 with (a) the F1-score as a function of the percentage of added empty tiles and (b) the normalised (by the number of tiles sampled = 2000) number of detection as a function of added empty tiles. Empty tiles have been added to only the train dataset for the 5% and 30% cases and to all datasets for 9%, 35%, 70%, and 140% cases. A considered solution to improve the results could be to specifically select tiles for which FP occurred and include them in the training dataset as empty tiles. This way, the model could be trained with relevant confounding features such as snow patches, river sandbeds, or gullies not labeled as GT.","title":"A.1 Influence of empty tiles addition to model performance"},{"location":"PROJ-DQRY-TM/#a2-sensitivity-of-the-model-to-the-number-of-images-per-batch","text":"During the model learning phase, the trained model is updated after each batch of samples was processed. Adding more samples, i.e. in our case images, to the batch can influence the model learning capacity. We investigated the role of adding more images per batch for a dataset with and without adding a portion of empty tiles to the learning dataset. Adding more images per batch speeds up the model learning (Table A1), and the minimum of the loss curve is reached for a smaller number of iterations. Figure A2: Metrics (precision, recall and f1-score) evolution with the number of images per batch during the model training. Results have been obtained on a dataset without empty tiles addition (red) and with the addition of 23% of empty tiles to the training dataset. Figure A2 reveals that the metrics values remain in a range of constant values while adding extra images to the batch in all cases (with or without empty tiles). A potential effect of adding more images to the batch is the reduction of the metrics variability between replicates of trained models as the range of metrics values is smaller for 8 images per batch than 2 images per batch. However, this observation has to be taken carefully as fewer replicates have been performed with 8 images per batch than for 2 or 4 images per batch. Further investigation would provide stronger insights on this effect.","title":"A.2 Sensitivity of the model to the number of images per batch"},{"location":"PROJ-DQRY-TM/#a3-evaluation-of-trained-models","text":"Table A1 sumup metrics value obtained for all the configuration tested for the project. zoom level model empty tiles (%) image per batch optimum iteration precision recall f1 15 replicate 1 0 2 1000 0.727 0.810 0.766 16 replicate 1 0 2 2000 0.842 0.793 0.817 16 replicate 2 0 2 2000 0.767 0.760 0.763 16 replicate 3 0 2 3000 0.831 0.810 0.820 16 replicate 4 0 2 2000 0.886 0.769 0.826 16 replicate 5 0 2 2000 0.780 0.818 0.798 16 replicate 6 0 2 3000 0.781 0.826 0.803 16 replicate 7 0 4 1000 0.748 0.860 0.800 16 replicate 8 0 4 1000 0.779 0.785 0.782 16 replicate 9 0 8 1500 0.800 0.793 0.797 16 replicate 10 0 4 1000 0.796 0.744 0.769 16 replicate 11 0 8 1000 0.802 0.769 0.785 16 ET-250_allDS_1 34.2 2 2000 0.723 0.770 0.746 16 ET-250_allDS_2 34.2 2 3000 0.748 0.803 0.775 16 ET-1000_allDS_1 73.8 2 6000 0.782 0.815 0.798 16 ET-1000_allDS_2 69.8 2 6000 0.786 0.767 0.776 16 ET-1000_allDS_3 70.9 2 6000 0.777 0.810 0.793 16 ET-1000_allDS_4 73.8 2 6000 0.768 0.807 0.787 16 ET-2000_allDS_1 143.2 2 6000 0.761 0.748 0.754 16 ET-80_trnDS_1 5.4 2 2000 0.814 0.793 0.803 16 ET-80_trnDS_2 5.4 2 2000 0.835 0.752 0.791 16 ET-80_trnDS_3 5.4 2 2000 0.764 0.802 0.782 16 ET-400_trnDS_1 29.5 2 6000 0.817 0.777 0.797 16 ET-400_trnDS_2 29.5 2 5000 0.848 0.785 0.815 16 ET-400_trnDS_3 29.5 2 4000 0.758 0.802 0.779 16 ET-400_trnDS_4 29.5 4 2000 0.798 0.818 0.808 16 ET-400_trnDS_5 29.5 4 1000 0.825 0.777 0.800 16 ET-1000_trnDS_1 0 2 4000 0.758 0.802 0.779 17 replicate 1 0 2 5000 0.819 0.853 0.835 17 replicate 1 0 2 5000 0.803 0.891 0.845 17 replicate 1 0 2 5000 0.872 0.813 0.841 17 ET-250_allDS_1 16.8 2 3000 0.801 0.794 0.797 17 ET-1000_allDS_1 72.2 2 7000 0.743 0.765 0.754 18 replicate 1 0 2 10000 0.864 0.855 0.859 Table A1: Metrics value computed for the validation dataset for all the trained models with the 2020 SWISSIMAGE Journey mosaic at zoom level 16. Victor Maus, Stefan Giljum, Jakob Gutschlhofer, Dieison M. Da Silva, Michael Probst, Sidnei L. B. Gass, Sebastian Luckeneder, Mirko Lieber, and Ian McCallum. A global-scale data set of mining areas. Scientific Data , 7(1):289, September 2020. URL: https://www.nature.com/articles/s41597-020-00624-w , doi:10.1038/s41597-020-00624-w . \u21a9 \u21a9 \u21a9 \u21a9 \u21a9 Vicen\u00e7 Carabassa, Pau Montero, Marc Crespo, Joan-Cristian Padr\u00f3, Xavier Pons, Jaume Balagu\u00e9, Llu\u00eds Brotons, and Josep Maria Alca\u00f1iz. Unmanned aerial system protocol for quarry restoration and mineral extraction monitoring. Journal of Environmental Management , 270:110717, September 2020. URL: https://linkinghub.elsevier.com/retrieve/pii/S0301479720306496 , doi:10.1016/j.jenvman.2020.110717 . \u21a9 \u21a9 \u21a9 \u21a9 Chunsheng Wang, Lili Chang, Lingran Zhao, and Ruiqing Niu. Automatic Identification and Dynamic Monitoring of Open-Pit Mines Based on Improved Mask R-CNN and Transfer Learning. Remote Sensing , 12(21):3474, January 2020. URL: https://www.mdpi.com/2072-4292/12/21/3474 , doi:10.3390/rs12213474 . \u21a9 \u21a9 \u21a9 \u21a9 Haoteng Zhao, Yong Ma, Fu Chen, Jianbo Liu, Liyuan Jiang, Wutao Yao, and Jin Yang. Monitoring Quarry Area with Landsat Long Time-Series for Socioeconomic Study. Remote Sensing , 10(4):517, April 2018. URL: https://www.mdpi.com/2072-4292/10/4/517 , doi:10.3390/rs10040517 . \u21a9 \u21a9 Valentin Tertius Bickel and Andrea Manconi. Decadal Surface Changes and Displacements in Switzerland. Journal of Geovisualization and Spatial Analysis , 6(2):24, December 2022. URL: https://link.springer.com/10.1007/s41651-022-00119-9 , doi:10.1007/s41651-022-00119-9 . \u21a9 \u21a9 George P. Petropoulos, Panagiotis Partsinevelos, and Zinovia Mitraka. Change detection of surface mining activity and reclamation based on a machine learning approach of multi-temporal Landsat TM imagery. Geocarto International , 28(4):323\u2013342, July 2013. URL: http://www.tandfonline.com/doi/abs/10.1080/10106049.2012.706648 , doi:10.1080/10106049.2012.706648 . \u21a9 Huriel Reichel and Nils Hamel. Automatic Detection of Quarries and the Lithology below them in Switzerland. 2022. URL: file:///C:/Users/Clemence/Documents/STDL/Projects/proj-quarries/01_Documentation/Bibliography/Automatic%20Detection%20of%20Quarries%20and%20the%20Lithology%20below%20them%20in%20Switzerland%20-%20Swiss%20Territorial%20Data%20Lab.htm . \u21a9 \u21a9 \u21a9 \u21a9 Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. Detectron2. 2019. URL: https://github.com/facebookresearch/detectron2 . \u21a9 Kaiming He, Georgia Gkioxari, Piotr Doll\u00e1r, and Ross Girshick. Mask R-CNN. January 2018. arXiv:1703.06870 [cs]. URL: http://arxiv.org/abs/1703.06870 , doi:10.48550/arXiv.1703.06870 . \u21a9","title":"A.3 Evaluation of trained models"},{"location":"PROJ-DTRK/","text":"DIFFERENCE MODELS APPLIED ON LAND REGISTER \u00b6 Nils Hamel (UNIGE) - Huriel Reichel (swisstopo) Project scheduled in the STDL research roadmap - PROJ-DTRK September 2020 to November 2020 - Published on April 23, 2021 Abstract : Being able to track modifications in the evolution of geographical datasets is one important aspect in territory management, as a large amount of information can be extracted out of differences models. Differences detection can also be a tool used to assess the evolution of a geographical model through time. In this research project, we apply differences detection on INTERLIS models of the official Swiss land registers in order to emphasize and follow its evolution and to demonstrate that change in reference frames can be detected and assessed. Introduction \u00b6 Land register models are probably to most living of the geographical models as they are constantly updated to offer a rigorous and up-to-date view of the territory. The applied corrections are always the result of a complex process, involving different territory actors, until the decision is made to integrate them into the land register. In addition, land register models comes with an additional constraint linked to political decisions. Indeed, the land register models are the result of a political mission conducted under federal laws making these models of high importance and requiring constant care. We show in this research project how differences detection tool [1] of the STDL 4D framework can be used to emphasize and analyze these corrections along the time dimension. In addition to the constant updates of the models, changes in the reference frame can also lead to large-scale corrections of the land register models. These global corrections are then made even more complex by the federal laws that impose a high degree of correctness and accuracy. In the context of the introduction of the new reference frame DM.flex [2] for the Swiss land register, being able to assess the applied changes on the geographical model appear as an important aspect. Indeed, changing the reference frame for the land register models is a long and complex technical process that can be error prompt. We also show in this research project how the difference detection algorithm can be helpful to assess and verify the performed corrections. Research Project Specifications \u00b6 In this research project, the difference detection algorithm implemented in the STDL 4D framework is applied on INTERLIS data containing the official land register models of different Swiss Canton . As introduced, two main directions are considered for the difference detection algorithm : Demonstrating the ability to extract information in between land register models Demonstrating the ability of difference models to be used as an assessment tool Through the first direction, the difference detection algorithm is presented. Considering the difference models it allows computing, it is shown how such model are able to extract information in between the models in order to emphasize the ability to represent, and then, to verify the evolution of the land register models. The second direction focuses on demonstrating that difference models are a helpful representation of the large-scale corrections that can be applied to land register during reference frame modification and how they can be used as a tool to assess the modifications and to help to fulfil the complex task of the verification of the corrected models. Research Project Data \u00b6 For the first research direction, the land register models of the Thurgau Kanton are considered. They are selected in order to have a small temporal distance allowing to focus on a small amount of well-defined differences : Thurgau Kanton , 2020-10-13 , INTERLIS Thurgau Kanton , 2020-10-17 , INTERLIS For the second direction, which focus on more complex differences, the models of the Canton of Geneva land register are considered with a much larger temporal gap between them : Canton of Geneva , 2009-10 , INTERLIS Canton of Geneva , 2013-04 , INTERLIS Canton of Geneva , 2017-04 , INTERLIS Canton of Geneva , 2019-04 , INTERLIS Difference Models : A Temporal Derivative \u00b6 This first section focuses on short-term differences to show how difference models work and how they are able to represent the modifications extracted out of the two compared models. The following images give an illustration of the considered dataset, which are the land register models of Thurgau Kanton : Illustration of Thurgau Kanton INTERLIS models - Data : Kanton Thurgau The models are made of vector lines, well geo-referenced in the Swiss coordinates frame EPSG:2056 . The models are also made of different layers that are colored differently with the following correspondences : INTERLIS selected topics and tables colors - Official French and German designations These legends are used all along this research project. Considering two temporal versions of this geographical model, separated of a few days, one is able to extract difference models using the 4D framework algorithm. As an example, one can consider this very specific view of the land register, focusing on a few houses : Close view of the Thurgau INTERLIS model in 2020-10-13 (left) and 2020-10-17 (right) - Data : Kanton Thurgau It is clear that most of the close view is identical for the two models, except for a couple of houses that were added to the land register model between these two temporal versions. By applying the difference detection algorithm, one is able to obtain a difference model comparing the two previous models. The following image gives an illustration of the obtained difference models considering the most recent temporal version as reference : Difference model obtained comparing the two temporal versions - Data : Kanton Thrugau One can see how the difference algorithm is able to emphasize the differences and to represent them in a human-readable third model. The algorithm also displays the identical parts in dark gray to offer the context of the differences to the operator. Of course, in such close view, differences detection can appear as irrelevant, as one is clearly able to see that something changed on the selected example without any help. But difference models can be computed at any scale. For example, taking the example of the Amriswil city : View of Amriswil model in 2020-10-13 (left) and 2020-10-17 (right) - Data : Kanton Thurgau It becomes more complicated to track down the differences that can appear between the two temporal versions. By computing their difference model , one is able to access a third model that ease the analysis of the evolution at the scale of the city itself as illustrated on the following image : Difference model computed for the city of Amriswil - Data : Kanton Thrugau One can see how difference models can be used to track down modifications brought to the land register in a simple manner, while keeping the information of the unchanged elements between the two compared models. This demonstrates that information that exists between models can be extracted and represented for further users or automated processes. In addition, such difference models can be computed at any scale, considering small area up to the whole countries. Difference Models : An Assessment Tool \u00b6 On the previous section, the difference models are computed using two models only separated of a few days, containing only a small amount of clear and simple modifications. This section focuses on detecting differences on larger models, separated by several years. In this case, the land register of the Canton of Geneva is considered : Illustration of the Geneva land register in 2017-04 (left) and 2019-04 (right) - Data : Canton of Geneva One can see that at such a scale, taking into account that the Canton of Geneva is one of the smallest in Switzerland, having a vision and a clear understanding of the modifications made between these two models is difficult by considering the two models separately. It's precisely where differences models can be useful to understand and analyze the evolution of the land register, along both the space and time dimensions. Large-Scale Analysis \u00b6 A first large-scale evaluation can be made on the overall models. A difference model can be computed considering the land register of Geneva in 2019 and 2017 as illustrated on the following image : Difference model on Geneva land register between 2019-04 and 2017-04 - Data : Canton of Geneva Two observations can be already made by looking at the difference model . In the first place, one can see that the amount of modifications brought to the land register is large in only two years. A large portion of the land register were subject to modifications or corrections, the unchanged parts being mostly limited outside the populated area. In the second place, one can observe large portions where differences seem to be accumulating over this period of time. Looking at them more closely leads to the conclusion that these zones were actually completely modified, as all elements are highlighted by the difference detection algorithm. The following image gives a closer view of such an area of differences accumulation : Focus on Carouge area of the 2019-04 and 2017-04 difference model - Data : Canton of Geneva Despite the amount of modifications outside this specific zone is also high, it is clear that the pointed zone contains more of them. Looking at it more closely leads to the conclusion that everything changed. In order to understand these areas of differences accumulation, the the land register experts of the Canton of Geneva ( SITG ) were questioned. They provided an explanation for these specific areas. Between 2017 and 2019 , these areas were subjected to a global correction in order to release the tension between the old reference frame LV03 [3] and the current one LV95 [4]. These corrections were made using the FINELTRA algorithm to modify the elements of the land register of the order of a few centimeters. The land register of Geneva provided the following illustration summarizing these reference frame corrections made between 2017 and 2019 on the Geneva territory : Reference frame corrections performed between 2017 and 2019 - Data : SITG Comparing this map from the land register with the computed model allows seeing how differences detection can emphasize this type of corrections efficiently, as the corrected zones on this previous image corresponds to the difference accumulation areas on the computed difference model . Small-Scale Analysis \u00b6 One can also dive deep into the details of the difference models . As we saw on the large scale analysis, two types of areas can be seen on the 2019-04-2017-04 difference model of Geneva : regular evolution with an accumulation of corrections and areas on which global corrections were applied. The following images propose a close view of these two types of situation : Illustration of the two observed type of evolutions of the land register - Data : Canton of Geneva On the left image above, one can observe the regular evolution of the land register where modifications are brought to the model in order to follow the evolution of the territory. On the right image above, one can see a close view of an area subjected to a global correction (reference frame), leading to a difference model highlighting all the elements. Analyzing more closely the right image above lead the observer to conclude that not all the elements are actually highlighted by the difference detection algorithm. Indeed, some elements are rendered in gray on the difference model , indicating their lack of modification between the two compared times. The following image emphasizes the unchanged elements that can be observed : Unchanged elements in the land register after reference frame correction - Data : SITG These unchanged elements can be surprising as they're found in an area that was subject to a global reference frame correction. This shows how difference models can be helpful to track down these type of events in order to check whether these unchanged elements are expected or are the results of a discrepancy in the land register evolution. Other example can be found in this very same area of the Geneva city. The following images give an illustration of two other close view where the unchanged element can be seen despite the reference frame correction : Unchanged elements in the land register after reference frame correction - Data : SITG On the left image above, one can observe that the unchanged elements are the railway tracks within the commune of Carouge . This is an interesting observation, as railway tracks can be considered as specific elements that can be subjected to different legislations regarding the land register. But it is clear that railway tracks were not considered in the reference frame correction. On the right image above, one can see another example of unchanged elements that are more complicated to explain, as they're in the middle of modified other elements. This clearly demonstrate how difference models can be helpful for analyzing and assessing the evolution of the land register models. Such models are able to drive users or automated processes and lead them to focus on relevant aspects and to define the good question in the context of analyzing the evolution of the land register. Conclusion \u00b6 The presented difference models computed based on two temporal versions of the land register and using the 4D framework algorithm showed how differences can be emphasized for users and automated processes [1]. Difference models can be helpful to determine the amount and nature of changes that appear in the land register. Applying such an algorithm on land register is especially relevant as it is a highly living model, that evolves jointly with the territory it describes. Two main applications can be considered using difference models applied on the land register. In the first place, the difference models can be used to assess and analyze the regular evolution of the territory. Indeed, updating the land register is not a simple task. Such modifications involve a whole chain of decisions and verifications, from surveyors to the highest land register authority before to be integrated in the model. Being able to assess and analyze the modifications in the land register through difference models could be one interesting strengthening of the overall process. The second application of difference models could be as an assessment tool of global corrections applied to the land register or parts of it. These modifications are often linked to the reference frame and its evolution. Being able to assess the corrections through the difference models could add a helpful tool in order to verify that the elements of the land register where correctly processed. In this direction, difference models could be used during the introduction of the DM.flex reference frame for both analyzing its introduction and demonstrating that difference models can be an interesting point of view. Reproduction Resources \u00b6 To reproduce the presented experiments, the STDL 4D framework has to be used and can be found here : STDL 4D framework (eratosthene-suite), STDL You can follow the instructions on the README to both compile and use the framework. Unfortunately, the used data are not currently public. In both cases, the land register INTERLIS datasets were provided to the STDL directly. You can contact both Thurgau Kanton and SITG : INTERLIS land register, Thurgau Kanton INTERLIS land register, SITG (Geneva) to query the data. In order to extract and convert the data from the INTERLIS models, the following code is used : INTERLIS to UV3 (dalai-suite), STDL/EPFL where the README gives all the information needed. For the 3D geographical coordinates conversion and heights restoration, we used two STDL internal tools. You can contact the STDL to obtain the tools and support in this direction : ptolemee-suite : 3D coordinate conversion tool (EPSG:2056 to WGS84) height-from-geotiff : Restoring geographical heights using topographic GeoTIFF ( SRTM ) You can contact STDL for any question regarding the reproduction of the presented results. References \u00b6 [1] Automatic Detection of Changes in the Environment, N. Hamel, STDL 2020 [2] DM.flex reference frame [3] LV03 Reference frame [4] LV95 Reference frame","title":"DIFFERENCE MODELS APPLIED ON LAND REGISTER"},{"location":"PROJ-DTRK/#difference-models-applied-on-land-register","text":"Nils Hamel (UNIGE) - Huriel Reichel (swisstopo) Project scheduled in the STDL research roadmap - PROJ-DTRK September 2020 to November 2020 - Published on April 23, 2021 Abstract : Being able to track modifications in the evolution of geographical datasets is one important aspect in territory management, as a large amount of information can be extracted out of differences models. Differences detection can also be a tool used to assess the evolution of a geographical model through time. In this research project, we apply differences detection on INTERLIS models of the official Swiss land registers in order to emphasize and follow its evolution and to demonstrate that change in reference frames can be detected and assessed.","title":"DIFFERENCE MODELS APPLIED ON LAND REGISTER"},{"location":"PROJ-DTRK/#introduction","text":"Land register models are probably to most living of the geographical models as they are constantly updated to offer a rigorous and up-to-date view of the territory. The applied corrections are always the result of a complex process, involving different territory actors, until the decision is made to integrate them into the land register. In addition, land register models comes with an additional constraint linked to political decisions. Indeed, the land register models are the result of a political mission conducted under federal laws making these models of high importance and requiring constant care. We show in this research project how differences detection tool [1] of the STDL 4D framework can be used to emphasize and analyze these corrections along the time dimension. In addition to the constant updates of the models, changes in the reference frame can also lead to large-scale corrections of the land register models. These global corrections are then made even more complex by the federal laws that impose a high degree of correctness and accuracy. In the context of the introduction of the new reference frame DM.flex [2] for the Swiss land register, being able to assess the applied changes on the geographical model appear as an important aspect. Indeed, changing the reference frame for the land register models is a long and complex technical process that can be error prompt. We also show in this research project how the difference detection algorithm can be helpful to assess and verify the performed corrections.","title":"Introduction"},{"location":"PROJ-DTRK/#research-project-specifications","text":"In this research project, the difference detection algorithm implemented in the STDL 4D framework is applied on INTERLIS data containing the official land register models of different Swiss Canton . As introduced, two main directions are considered for the difference detection algorithm : Demonstrating the ability to extract information in between land register models Demonstrating the ability of difference models to be used as an assessment tool Through the first direction, the difference detection algorithm is presented. Considering the difference models it allows computing, it is shown how such model are able to extract information in between the models in order to emphasize the ability to represent, and then, to verify the evolution of the land register models. The second direction focuses on demonstrating that difference models are a helpful representation of the large-scale corrections that can be applied to land register during reference frame modification and how they can be used as a tool to assess the modifications and to help to fulfil the complex task of the verification of the corrected models.","title":"Research Project Specifications"},{"location":"PROJ-DTRK/#research-project-data","text":"For the first research direction, the land register models of the Thurgau Kanton are considered. They are selected in order to have a small temporal distance allowing to focus on a small amount of well-defined differences : Thurgau Kanton , 2020-10-13 , INTERLIS Thurgau Kanton , 2020-10-17 , INTERLIS For the second direction, which focus on more complex differences, the models of the Canton of Geneva land register are considered with a much larger temporal gap between them : Canton of Geneva , 2009-10 , INTERLIS Canton of Geneva , 2013-04 , INTERLIS Canton of Geneva , 2017-04 , INTERLIS Canton of Geneva , 2019-04 , INTERLIS","title":"Research Project Data"},{"location":"PROJ-DTRK/#difference-models-a-temporal-derivative","text":"This first section focuses on short-term differences to show how difference models work and how they are able to represent the modifications extracted out of the two compared models. The following images give an illustration of the considered dataset, which are the land register models of Thurgau Kanton : Illustration of Thurgau Kanton INTERLIS models - Data : Kanton Thurgau The models are made of vector lines, well geo-referenced in the Swiss coordinates frame EPSG:2056 . The models are also made of different layers that are colored differently with the following correspondences : INTERLIS selected topics and tables colors - Official French and German designations These legends are used all along this research project. Considering two temporal versions of this geographical model, separated of a few days, one is able to extract difference models using the 4D framework algorithm. As an example, one can consider this very specific view of the land register, focusing on a few houses : Close view of the Thurgau INTERLIS model in 2020-10-13 (left) and 2020-10-17 (right) - Data : Kanton Thurgau It is clear that most of the close view is identical for the two models, except for a couple of houses that were added to the land register model between these two temporal versions. By applying the difference detection algorithm, one is able to obtain a difference model comparing the two previous models. The following image gives an illustration of the obtained difference models considering the most recent temporal version as reference : Difference model obtained comparing the two temporal versions - Data : Kanton Thrugau One can see how the difference algorithm is able to emphasize the differences and to represent them in a human-readable third model. The algorithm also displays the identical parts in dark gray to offer the context of the differences to the operator. Of course, in such close view, differences detection can appear as irrelevant, as one is clearly able to see that something changed on the selected example without any help. But difference models can be computed at any scale. For example, taking the example of the Amriswil city : View of Amriswil model in 2020-10-13 (left) and 2020-10-17 (right) - Data : Kanton Thurgau It becomes more complicated to track down the differences that can appear between the two temporal versions. By computing their difference model , one is able to access a third model that ease the analysis of the evolution at the scale of the city itself as illustrated on the following image : Difference model computed for the city of Amriswil - Data : Kanton Thrugau One can see how difference models can be used to track down modifications brought to the land register in a simple manner, while keeping the information of the unchanged elements between the two compared models. This demonstrates that information that exists between models can be extracted and represented for further users or automated processes. In addition, such difference models can be computed at any scale, considering small area up to the whole countries.","title":"Difference Models : A Temporal Derivative"},{"location":"PROJ-DTRK/#difference-models-an-assessment-tool","text":"On the previous section, the difference models are computed using two models only separated of a few days, containing only a small amount of clear and simple modifications. This section focuses on detecting differences on larger models, separated by several years. In this case, the land register of the Canton of Geneva is considered : Illustration of the Geneva land register in 2017-04 (left) and 2019-04 (right) - Data : Canton of Geneva One can see that at such a scale, taking into account that the Canton of Geneva is one of the smallest in Switzerland, having a vision and a clear understanding of the modifications made between these two models is difficult by considering the two models separately. It's precisely where differences models can be useful to understand and analyze the evolution of the land register, along both the space and time dimensions.","title":"Difference Models : An Assessment Tool"},{"location":"PROJ-DTRK/#large-scale-analysis","text":"A first large-scale evaluation can be made on the overall models. A difference model can be computed considering the land register of Geneva in 2019 and 2017 as illustrated on the following image : Difference model on Geneva land register between 2019-04 and 2017-04 - Data : Canton of Geneva Two observations can be already made by looking at the difference model . In the first place, one can see that the amount of modifications brought to the land register is large in only two years. A large portion of the land register were subject to modifications or corrections, the unchanged parts being mostly limited outside the populated area. In the second place, one can observe large portions where differences seem to be accumulating over this period of time. Looking at them more closely leads to the conclusion that these zones were actually completely modified, as all elements are highlighted by the difference detection algorithm. The following image gives a closer view of such an area of differences accumulation : Focus on Carouge area of the 2019-04 and 2017-04 difference model - Data : Canton of Geneva Despite the amount of modifications outside this specific zone is also high, it is clear that the pointed zone contains more of them. Looking at it more closely leads to the conclusion that everything changed. In order to understand these areas of differences accumulation, the the land register experts of the Canton of Geneva ( SITG ) were questioned. They provided an explanation for these specific areas. Between 2017 and 2019 , these areas were subjected to a global correction in order to release the tension between the old reference frame LV03 [3] and the current one LV95 [4]. These corrections were made using the FINELTRA algorithm to modify the elements of the land register of the order of a few centimeters. The land register of Geneva provided the following illustration summarizing these reference frame corrections made between 2017 and 2019 on the Geneva territory : Reference frame corrections performed between 2017 and 2019 - Data : SITG Comparing this map from the land register with the computed model allows seeing how differences detection can emphasize this type of corrections efficiently, as the corrected zones on this previous image corresponds to the difference accumulation areas on the computed difference model .","title":"Large-Scale Analysis"},{"location":"PROJ-DTRK/#small-scale-analysis","text":"One can also dive deep into the details of the difference models . As we saw on the large scale analysis, two types of areas can be seen on the 2019-04-2017-04 difference model of Geneva : regular evolution with an accumulation of corrections and areas on which global corrections were applied. The following images propose a close view of these two types of situation : Illustration of the two observed type of evolutions of the land register - Data : Canton of Geneva On the left image above, one can observe the regular evolution of the land register where modifications are brought to the model in order to follow the evolution of the territory. On the right image above, one can see a close view of an area subjected to a global correction (reference frame), leading to a difference model highlighting all the elements. Analyzing more closely the right image above lead the observer to conclude that not all the elements are actually highlighted by the difference detection algorithm. Indeed, some elements are rendered in gray on the difference model , indicating their lack of modification between the two compared times. The following image emphasizes the unchanged elements that can be observed : Unchanged elements in the land register after reference frame correction - Data : SITG These unchanged elements can be surprising as they're found in an area that was subject to a global reference frame correction. This shows how difference models can be helpful to track down these type of events in order to check whether these unchanged elements are expected or are the results of a discrepancy in the land register evolution. Other example can be found in this very same area of the Geneva city. The following images give an illustration of two other close view where the unchanged element can be seen despite the reference frame correction : Unchanged elements in the land register after reference frame correction - Data : SITG On the left image above, one can observe that the unchanged elements are the railway tracks within the commune of Carouge . This is an interesting observation, as railway tracks can be considered as specific elements that can be subjected to different legislations regarding the land register. But it is clear that railway tracks were not considered in the reference frame correction. On the right image above, one can see another example of unchanged elements that are more complicated to explain, as they're in the middle of modified other elements. This clearly demonstrate how difference models can be helpful for analyzing and assessing the evolution of the land register models. Such models are able to drive users or automated processes and lead them to focus on relevant aspects and to define the good question in the context of analyzing the evolution of the land register.","title":"Small-Scale Analysis"},{"location":"PROJ-DTRK/#conclusion","text":"The presented difference models computed based on two temporal versions of the land register and using the 4D framework algorithm showed how differences can be emphasized for users and automated processes [1]. Difference models can be helpful to determine the amount and nature of changes that appear in the land register. Applying such an algorithm on land register is especially relevant as it is a highly living model, that evolves jointly with the territory it describes. Two main applications can be considered using difference models applied on the land register. In the first place, the difference models can be used to assess and analyze the regular evolution of the territory. Indeed, updating the land register is not a simple task. Such modifications involve a whole chain of decisions and verifications, from surveyors to the highest land register authority before to be integrated in the model. Being able to assess and analyze the modifications in the land register through difference models could be one interesting strengthening of the overall process. The second application of difference models could be as an assessment tool of global corrections applied to the land register or parts of it. These modifications are often linked to the reference frame and its evolution. Being able to assess the corrections through the difference models could add a helpful tool in order to verify that the elements of the land register where correctly processed. In this direction, difference models could be used during the introduction of the DM.flex reference frame for both analyzing its introduction and demonstrating that difference models can be an interesting point of view.","title":"Conclusion"},{"location":"PROJ-DTRK/#reproduction-resources","text":"To reproduce the presented experiments, the STDL 4D framework has to be used and can be found here : STDL 4D framework (eratosthene-suite), STDL You can follow the instructions on the README to both compile and use the framework. Unfortunately, the used data are not currently public. In both cases, the land register INTERLIS datasets were provided to the STDL directly. You can contact both Thurgau Kanton and SITG : INTERLIS land register, Thurgau Kanton INTERLIS land register, SITG (Geneva) to query the data. In order to extract and convert the data from the INTERLIS models, the following code is used : INTERLIS to UV3 (dalai-suite), STDL/EPFL where the README gives all the information needed. For the 3D geographical coordinates conversion and heights restoration, we used two STDL internal tools. You can contact the STDL to obtain the tools and support in this direction : ptolemee-suite : 3D coordinate conversion tool (EPSG:2056 to WGS84) height-from-geotiff : Restoring geographical heights using topographic GeoTIFF ( SRTM ) You can contact STDL for any question regarding the reproduction of the presented results.","title":"Reproduction Resources"},{"location":"PROJ-DTRK/#references","text":"[1] Automatic Detection of Changes in the Environment, N. Hamel, STDL 2020 [2] DM.flex reference frame [3] LV03 Reference frame [4] LV95 Reference frame","title":"References"},{"location":"PROJ-GEPOOL/","text":"Swimming Pool Detection from Aerial Images over the Canton of Geneva \u00b6 Alessandro Cerioni (Canton of Geneva) - Adrian Meyer (FHNW) Proposed by the Canton of Geneva - PROJ-GEPOOL September 2020 to January 2021 - Published on May 18, 2021 Abstract : Object detection is one of the computer vision tasks which can benefit from Deep Learning methods. The STDL team managed to leverage state-of-art methods and already existing open datasets to first build a swimming pool detector, then to use it to potentially detect unregistered swimming pools over the Canton of Geneva. Despite the success of our approach, we will argue that domain expertise still remains key to post-process detections in order to tell objects which are subject to registration from those which aren't. Pairing semi-automatic Deep Learning methods with domain expertise turns out to pave the way to novel workflows allowing administrations to keep cadastral information up to date. Introduction \u00b6 The Canton of Geneva manages a register of swimming pools, counting - in principle - all and only those swimming pools that are in-ground or, at least, permanently fixed to the ground. The swimming pool register is part of a far more general cadastre, including several other classes of objects (cf. this page ). Typically the swimming pool register is updated either by taking building/demolition permits into account, or by manually checking its multiple records (4000+ to date) against aerial images, which is quite a long and tedious task. Exploring the opportunity of leveraging Machine Learning to help domain experts in such an otherwise tedious tasks was one of the main motivations behind this study. As such, no prior requirements/expectations were set by the recipients. The study was autonomously conducted by the STDL team, using Open Source software and Open Data published by the Canton of Geneva. Domain experts were asked for feedback only at a later stage. In the following, details are provided regarding the various steps we followed. We refer the reader to this page for a thorough description of the generic STDL Object Detection Framework. Method \u00b6 Several steps are required to set the stage for object detection and eventually reach the goal of obtaining - ideally - even more than decent results. Despite the linear presentation that the reader will find here-below, multiple back-and-forths are actually required, especially through steps 2-4. 1. Data preparation \u00b6 As a very first step , one has to define the geographical region over which the study has to be conducted, the so-called \" Area of Interest \" (AoI). In the case of this specific application, the AoI was chosen and obtained as the geometric subtraction between the following two polygons: the unary union of all the polygons of the Canton of Geneva's cadastral parcels dataset, published as Open Data by the SITG , cf. PARCELLES DE LA MENSURATION ; the polygon corresponding to the Lake Geneva (\" lac L\u00e9man \" in French), included in the EMPRISE DU LAC LEMAN (Complet) open dataset, published by the SITG as well. The so-defined AoI covers both the known \"ground-truth\" labels and regions over which hypothetical unknown objects are expected to be detected. The second step consists in downloading aerial images from a remote server, following an established tiling strategy. We adopted the so-called \" Slippy Map \" tiling scheme. Aerial images were fetched from a raster web service hosted by the SITG and powered by ESRI ArcGIS Server. More precisely, the following dataset was used: ORTHOPHOTOS AGGLO 2018 . According to our configuration, this second step produces a folder including one GeoTIFF image per tile, each image having a size of 256x256 pixels. In terms of resolution - or better, in terms of \" Ground Sampling Distance \" (GSD) - the combination of 256x256 pixels images and zoom level 18 Slippy Map Tiles yields approximately a GSD of ~ 60 cm/pixel. The tests we performed at twice the resolution showed little gain in terms of predictive power, surely not enough to support the interest in engaging 4x more resources (storage, CPU/GPU, ...). The third step amounts to splitting the tiles covering the AoI (let's label them \"AoI tiles\") twice: first, tiles are partitioned into two subsets, according to whether they include ( GT tiles) or not ( oth tiles) ground-truth labels: \\(\\mbox{AoI tiles} = (\\mbox{GT tiles}) \\cup (\\mbox{oth tiles}),\\; \\mbox{with}\\; (\\mbox{GT tiles}) \\cap (\\mbox{oth tiles}) = \\emptyset\\) Then, ground-truth tiles are partitioned into three other subsets, namely the training ( trn ), validation ( val ) and test ( tst ) datasets: \\(\\mbox{GT tiles} = (\\mbox{trn tiles}) \\cup (\\mbox{val tiles}) \\cup (\\mbox{tst tiles})\\) with \\(A \\neq B \\Rightarrow A \\cap B = \\emptyset, \\quad \\forall A, B \\in \\{\\mbox{trn tiles}, \\mbox{val tiles}, \\mbox{tst tiles}, \\mbox{oth tiles}\\}\\) We opted for the 70%-15%-15% dataset splitting strategy. Slippy Map Tiles at zoom level 18 covering the Area of Interest, partitioned into several subsets: ground-truth (GT = trn + val + tst), other (oth). Zoom over a portion of the previous image. Concerning ground-truth labels, the final results of this study rely on a curated subset of the public dataset including polygons corresponding to the Canton of Geneva's registered swimming pools, cf. PISCINES . Indeed, some \"warming-up\" iterations of this whole process allowed us to semi-automatically identify tiles where the swimming pool register was inconsistent with aerial images, and viceversa. By manually inspecting the tiles displaying inconsistency, we discarded those tiles for which the swimming pool register seemed to be wrong (at least through the eyes of a Data Scientist; in a further iteration, this data curation step should be performed together with domain experts). While not having the ambition to return a \"100% ground-truth\" training dataset, this data curation step yielded a substantial gain in terms of \\(F_1\\) score (from ~82% to ~90%, to be more precise). 2. Model training \u00b6 A predictive model was trained, stemming from one of the pre-trained models provided by Detectron2 . In particular, the \"R50-FPN\" baseline was used (cf. this page ), which implements a Mask R-CNN architecture leveraging a ResNet-50 backbone along with a Feature Pyramid Network (FPN). We refer the reader e.g. to this blog article for further information about this kind of Deep Learning methods. Training a (Deep) Neural Network model means running an algorithm which iteratively adjusts the various parameters of a Neural Network (40+ million parameters in our case), in order to minimize the value of some \"loss function\". In addition to the model parameters (otherwise called \"weights\", too), multiple \"hyper-parameters\" exist, affecting the model and the way how the optimization is performed. In theory, one should automatize the hyper-parameters tuning , in order to eventually single out the best setting among all the possible ones. In practice, the hyper-parameters space is never fully explored; a minima , a systematic search should be performed, in order to find a \"sweet spot\" among a finite, discrete collection of settings. In our case, no systematic hyper-parameters tuning was actually performed. Instead, a few man hours were spent in order to manually tune the hyper-parameters, until a setting was found which the STDL team judged to be reasonably good (~90% \\(F_1\\) score on the test dataset, see details here-below). The optimal number of iterations was chosen so as to approximately minimize the loss on the validation dataset. 3. Prediction \u00b6 Each image resulting from the tiling of the AoI constitutes - let's say - the \"basic unit of computation\" of this analysis. Thus, the model optimized at the previous step was used to make predictions over: the oth images, meaning images covering no already known swimming pools; the trn , val and tst images, meaning images covering already known swimming pools. The combination of predictions 1 and 2 covers the entire AoI and allows us to discover potential new objects as well as to check whether some of the known objects are outdated, respectively. Image by image, the model produces one segmentation mask per detected object, accompanied by a score ranging from a custom minimum value (5% in our setting) to 100%. The higher the score, the most the model is confident about a given prediction. Sample detections of swimming pools, accompanied by scores. Note that multiple detections can concern the same object, if the latter extends over multiple tiles. Let us note that not only swimming pools exhibiting only \"obvious\" features (bluish color, rectangular shape, ...) were detected, but also: swimming pools covered by some tarp; empty swimming pools; etc. As a matter of fact, the training dataset was rich enough to also include samples of such somewhat tricky cases. 4. Prediction assessment \u00b6 As described here in more detail, in order to assess the reliability of the predictive model predictions have to be post-processed so as to switch from the image coordinates - ranging from (0, 0) to (255, 255) in our case, where 256x256 pixel images were used - to geographical coordinates. This amounts to applying an affine transformation to the various predictions, yielding a vector layer which we can compare with ground-truth ( GT ) data by means of spatial joins: objects which are detected and can also be found in GT data are referred to as \"true positives\" (TPs); objects which are detected but cannot be found in GT data are referred to as \"false positives\" (FPs); GT objects which are not detected are referred to as \"false negatives\" (FNs). Example of a true positive (TP), a false positive (FP) and a false negative (FN). Note that both the TP and the FP object are detected twice, as they extend over multiple tiles. The counting of TPs, FPs, FNs allow us to compute some standard metrics such as precision, recall and \\(F_1\\) score (cf. this Wikipedia page for further information). Actually, one count (hence one set of metrics) can be produced per choice of the minimum score that one is willing to accept. Choosing a threshold value (= thr ) means keeping all the predictions having a score >= thr and discarding the rest. Intuitively, a low threshold should yield a few false negatives; a high threshold should yield a few false positives. Such intuitions can be confirmed by the following diagram, which we obtained by sampling the values of thr by steps of 0.05 (= 5%), from 0.05 to 0.95. True positives (TPs), false negatives (FNs), and false positives (FPs) counted over the test dataset, as a function of the threshold on the score: for a given threshold, all and only the predictions exhibiting a bigger score are kept. Performance metrics computed over the test dataset as a function of the threshold on the score: for a given threshold, all and only the predictions exhibiting a bigger score are kept. The latter figure was obtained by evaluating the predictions of our best model over the test dataset. Inferior models exhibited a similar behavior, with a downward offset in terms of \\(F_1\\) score. In practice, upon iterating over multiple realizations (with different hyper-parameters, training data and so on) we aimed at maximizing the value of the \\(F_1\\) score on the validation dataset, and stopped when the \\(F_1\\) score went over the value of 90%. As the ground-truth data we used turned out not to be 100% accurate, the responsibility for mismatching predictions has to be shared between ground-truth data and the predictive model, at least in some cases. In a more ideal setting, ground-truth data would be 100% accurate and differences between a given metric (precision, recall, \\(F_1\\) score) and 100% should be imputed to the model. Domain experts feedback \u00b6 All the predictions having a score \\(\\geq\\) 5% obtained by our best model were exported to Shapefile and shared with the experts in charge of the cadastre of the Canton of Geneva, who carried out a thorough evaluation. By checking predictions against the swimming pool register as well as aerial images, it was empirically found that the threshold on the minimum score (= thr ) should be set as high as 97%, in order not to have too many false positives to deal with. In spite of such a high threshold, 562 potentially new objects were detected (over 4652 objects which were known when this study started), of which: 128 items are objects other than swimming pools (let's say an \"actual false positives\"); 211 items are swimming pools that are NOT subject to registration (temporary, above-ground, on top of a building, ...); 223 items are swimming pools that are subject to registration. This figures show that: on the one hand, the model performs quite well on the task it was trained for, in particular when an appropriate threshold is used; on the other hand, the meticulous review of results by domain experts remain essential. This said, automatic detections can surely be used to drive the domain experts' attention towards the areas which might require some. Examples of \"actual false positives\": a fountain (left) and a tunnel (right). Examples of detected swimming pools which are not subject to registration: placed on top of a building (left), inflatable hence temporary (right). Conclusion \u00b6 The analysis reported in this document confirms the opportunity of using state-of-the-art Deep Learning approaches to assist experts in some of their tasks, in this case that of keeping the cadastre up to date. Not only the opportunity was explored and actually confirmed, but valuable results were also produced, leading to the detection of previously unknown objects. At the same time, our study also shows how essential domain expertise still remains, despite the usage of such advanced methods. As a concluding remark, let us note that our predictive model may be further improved. In particular, it may be rendered less prone to false positives, for instance by: leveraging 3D data ( e.g. point clouds), in order to potentially remove temporary, above-ground swimming pools from the set of detected objects; injecting into the training dataset those predictions which were classified by domain experts as other objects or temporary swimming pools; leveraging some other datasets, already available through the SITG portal : buildings , miscellaneous objects , etc.","title":" Swimming Pool Detection from Aerial Images over the Canton of Geneva "},{"location":"PROJ-GEPOOL/#swimming-pool-detection-from-aerial-images-over-the-canton-of-geneva","text":"Alessandro Cerioni (Canton of Geneva) - Adrian Meyer (FHNW) Proposed by the Canton of Geneva - PROJ-GEPOOL September 2020 to January 2021 - Published on May 18, 2021 Abstract : Object detection is one of the computer vision tasks which can benefit from Deep Learning methods. The STDL team managed to leverage state-of-art methods and already existing open datasets to first build a swimming pool detector, then to use it to potentially detect unregistered swimming pools over the Canton of Geneva. Despite the success of our approach, we will argue that domain expertise still remains key to post-process detections in order to tell objects which are subject to registration from those which aren't. Pairing semi-automatic Deep Learning methods with domain expertise turns out to pave the way to novel workflows allowing administrations to keep cadastral information up to date.","title":" Swimming Pool Detection from Aerial Images over the Canton of Geneva "},{"location":"PROJ-GEPOOL/#introduction","text":"The Canton of Geneva manages a register of swimming pools, counting - in principle - all and only those swimming pools that are in-ground or, at least, permanently fixed to the ground. The swimming pool register is part of a far more general cadastre, including several other classes of objects (cf. this page ). Typically the swimming pool register is updated either by taking building/demolition permits into account, or by manually checking its multiple records (4000+ to date) against aerial images, which is quite a long and tedious task. Exploring the opportunity of leveraging Machine Learning to help domain experts in such an otherwise tedious tasks was one of the main motivations behind this study. As such, no prior requirements/expectations were set by the recipients. The study was autonomously conducted by the STDL team, using Open Source software and Open Data published by the Canton of Geneva. Domain experts were asked for feedback only at a later stage. In the following, details are provided regarding the various steps we followed. We refer the reader to this page for a thorough description of the generic STDL Object Detection Framework.","title":"Introduction"},{"location":"PROJ-GEPOOL/#method","text":"Several steps are required to set the stage for object detection and eventually reach the goal of obtaining - ideally - even more than decent results. Despite the linear presentation that the reader will find here-below, multiple back-and-forths are actually required, especially through steps 2-4.","title":"Method"},{"location":"PROJ-GEPOOL/#1-data-preparation","text":"As a very first step , one has to define the geographical region over which the study has to be conducted, the so-called \" Area of Interest \" (AoI). In the case of this specific application, the AoI was chosen and obtained as the geometric subtraction between the following two polygons: the unary union of all the polygons of the Canton of Geneva's cadastral parcels dataset, published as Open Data by the SITG , cf. PARCELLES DE LA MENSURATION ; the polygon corresponding to the Lake Geneva (\" lac L\u00e9man \" in French), included in the EMPRISE DU LAC LEMAN (Complet) open dataset, published by the SITG as well. The so-defined AoI covers both the known \"ground-truth\" labels and regions over which hypothetical unknown objects are expected to be detected. The second step consists in downloading aerial images from a remote server, following an established tiling strategy. We adopted the so-called \" Slippy Map \" tiling scheme. Aerial images were fetched from a raster web service hosted by the SITG and powered by ESRI ArcGIS Server. More precisely, the following dataset was used: ORTHOPHOTOS AGGLO 2018 . According to our configuration, this second step produces a folder including one GeoTIFF image per tile, each image having a size of 256x256 pixels. In terms of resolution - or better, in terms of \" Ground Sampling Distance \" (GSD) - the combination of 256x256 pixels images and zoom level 18 Slippy Map Tiles yields approximately a GSD of ~ 60 cm/pixel. The tests we performed at twice the resolution showed little gain in terms of predictive power, surely not enough to support the interest in engaging 4x more resources (storage, CPU/GPU, ...). The third step amounts to splitting the tiles covering the AoI (let's label them \"AoI tiles\") twice: first, tiles are partitioned into two subsets, according to whether they include ( GT tiles) or not ( oth tiles) ground-truth labels: \\(\\mbox{AoI tiles} = (\\mbox{GT tiles}) \\cup (\\mbox{oth tiles}),\\; \\mbox{with}\\; (\\mbox{GT tiles}) \\cap (\\mbox{oth tiles}) = \\emptyset\\) Then, ground-truth tiles are partitioned into three other subsets, namely the training ( trn ), validation ( val ) and test ( tst ) datasets: \\(\\mbox{GT tiles} = (\\mbox{trn tiles}) \\cup (\\mbox{val tiles}) \\cup (\\mbox{tst tiles})\\) with \\(A \\neq B \\Rightarrow A \\cap B = \\emptyset, \\quad \\forall A, B \\in \\{\\mbox{trn tiles}, \\mbox{val tiles}, \\mbox{tst tiles}, \\mbox{oth tiles}\\}\\) We opted for the 70%-15%-15% dataset splitting strategy. Slippy Map Tiles at zoom level 18 covering the Area of Interest, partitioned into several subsets: ground-truth (GT = trn + val + tst), other (oth). Zoom over a portion of the previous image. Concerning ground-truth labels, the final results of this study rely on a curated subset of the public dataset including polygons corresponding to the Canton of Geneva's registered swimming pools, cf. PISCINES . Indeed, some \"warming-up\" iterations of this whole process allowed us to semi-automatically identify tiles where the swimming pool register was inconsistent with aerial images, and viceversa. By manually inspecting the tiles displaying inconsistency, we discarded those tiles for which the swimming pool register seemed to be wrong (at least through the eyes of a Data Scientist; in a further iteration, this data curation step should be performed together with domain experts). While not having the ambition to return a \"100% ground-truth\" training dataset, this data curation step yielded a substantial gain in terms of \\(F_1\\) score (from ~82% to ~90%, to be more precise).","title":"1. Data preparation"},{"location":"PROJ-GEPOOL/#2-model-training","text":"A predictive model was trained, stemming from one of the pre-trained models provided by Detectron2 . In particular, the \"R50-FPN\" baseline was used (cf. this page ), which implements a Mask R-CNN architecture leveraging a ResNet-50 backbone along with a Feature Pyramid Network (FPN). We refer the reader e.g. to this blog article for further information about this kind of Deep Learning methods. Training a (Deep) Neural Network model means running an algorithm which iteratively adjusts the various parameters of a Neural Network (40+ million parameters in our case), in order to minimize the value of some \"loss function\". In addition to the model parameters (otherwise called \"weights\", too), multiple \"hyper-parameters\" exist, affecting the model and the way how the optimization is performed. In theory, one should automatize the hyper-parameters tuning , in order to eventually single out the best setting among all the possible ones. In practice, the hyper-parameters space is never fully explored; a minima , a systematic search should be performed, in order to find a \"sweet spot\" among a finite, discrete collection of settings. In our case, no systematic hyper-parameters tuning was actually performed. Instead, a few man hours were spent in order to manually tune the hyper-parameters, until a setting was found which the STDL team judged to be reasonably good (~90% \\(F_1\\) score on the test dataset, see details here-below). The optimal number of iterations was chosen so as to approximately minimize the loss on the validation dataset.","title":"2. Model training"},{"location":"PROJ-GEPOOL/#3-prediction","text":"Each image resulting from the tiling of the AoI constitutes - let's say - the \"basic unit of computation\" of this analysis. Thus, the model optimized at the previous step was used to make predictions over: the oth images, meaning images covering no already known swimming pools; the trn , val and tst images, meaning images covering already known swimming pools. The combination of predictions 1 and 2 covers the entire AoI and allows us to discover potential new objects as well as to check whether some of the known objects are outdated, respectively. Image by image, the model produces one segmentation mask per detected object, accompanied by a score ranging from a custom minimum value (5% in our setting) to 100%. The higher the score, the most the model is confident about a given prediction. Sample detections of swimming pools, accompanied by scores. Note that multiple detections can concern the same object, if the latter extends over multiple tiles. Let us note that not only swimming pools exhibiting only \"obvious\" features (bluish color, rectangular shape, ...) were detected, but also: swimming pools covered by some tarp; empty swimming pools; etc. As a matter of fact, the training dataset was rich enough to also include samples of such somewhat tricky cases.","title":"3. Prediction"},{"location":"PROJ-GEPOOL/#4-prediction-assessment","text":"As described here in more detail, in order to assess the reliability of the predictive model predictions have to be post-processed so as to switch from the image coordinates - ranging from (0, 0) to (255, 255) in our case, where 256x256 pixel images were used - to geographical coordinates. This amounts to applying an affine transformation to the various predictions, yielding a vector layer which we can compare with ground-truth ( GT ) data by means of spatial joins: objects which are detected and can also be found in GT data are referred to as \"true positives\" (TPs); objects which are detected but cannot be found in GT data are referred to as \"false positives\" (FPs); GT objects which are not detected are referred to as \"false negatives\" (FNs). Example of a true positive (TP), a false positive (FP) and a false negative (FN). Note that both the TP and the FP object are detected twice, as they extend over multiple tiles. The counting of TPs, FPs, FNs allow us to compute some standard metrics such as precision, recall and \\(F_1\\) score (cf. this Wikipedia page for further information). Actually, one count (hence one set of metrics) can be produced per choice of the minimum score that one is willing to accept. Choosing a threshold value (= thr ) means keeping all the predictions having a score >= thr and discarding the rest. Intuitively, a low threshold should yield a few false negatives; a high threshold should yield a few false positives. Such intuitions can be confirmed by the following diagram, which we obtained by sampling the values of thr by steps of 0.05 (= 5%), from 0.05 to 0.95. True positives (TPs), false negatives (FNs), and false positives (FPs) counted over the test dataset, as a function of the threshold on the score: for a given threshold, all and only the predictions exhibiting a bigger score are kept. Performance metrics computed over the test dataset as a function of the threshold on the score: for a given threshold, all and only the predictions exhibiting a bigger score are kept. The latter figure was obtained by evaluating the predictions of our best model over the test dataset. Inferior models exhibited a similar behavior, with a downward offset in terms of \\(F_1\\) score. In practice, upon iterating over multiple realizations (with different hyper-parameters, training data and so on) we aimed at maximizing the value of the \\(F_1\\) score on the validation dataset, and stopped when the \\(F_1\\) score went over the value of 90%. As the ground-truth data we used turned out not to be 100% accurate, the responsibility for mismatching predictions has to be shared between ground-truth data and the predictive model, at least in some cases. In a more ideal setting, ground-truth data would be 100% accurate and differences between a given metric (precision, recall, \\(F_1\\) score) and 100% should be imputed to the model.","title":"4. Prediction assessment"},{"location":"PROJ-GEPOOL/#domain-experts-feedback","text":"All the predictions having a score \\(\\geq\\) 5% obtained by our best model were exported to Shapefile and shared with the experts in charge of the cadastre of the Canton of Geneva, who carried out a thorough evaluation. By checking predictions against the swimming pool register as well as aerial images, it was empirically found that the threshold on the minimum score (= thr ) should be set as high as 97%, in order not to have too many false positives to deal with. In spite of such a high threshold, 562 potentially new objects were detected (over 4652 objects which were known when this study started), of which: 128 items are objects other than swimming pools (let's say an \"actual false positives\"); 211 items are swimming pools that are NOT subject to registration (temporary, above-ground, on top of a building, ...); 223 items are swimming pools that are subject to registration. This figures show that: on the one hand, the model performs quite well on the task it was trained for, in particular when an appropriate threshold is used; on the other hand, the meticulous review of results by domain experts remain essential. This said, automatic detections can surely be used to drive the domain experts' attention towards the areas which might require some. Examples of \"actual false positives\": a fountain (left) and a tunnel (right). Examples of detected swimming pools which are not subject to registration: placed on top of a building (left), inflatable hence temporary (right).","title":"Domain experts feedback"},{"location":"PROJ-GEPOOL/#conclusion","text":"The analysis reported in this document confirms the opportunity of using state-of-the-art Deep Learning approaches to assist experts in some of their tasks, in this case that of keeping the cadastre up to date. Not only the opportunity was explored and actually confirmed, but valuable results were also produced, leading to the detection of previously unknown objects. At the same time, our study also shows how essential domain expertise still remains, despite the usage of such advanced methods. As a concluding remark, let us note that our predictive model may be further improved. In particular, it may be rendered less prone to false positives, for instance by: leveraging 3D data ( e.g. point clouds), in order to potentially remove temporary, above-ground swimming pools from the set of detected objects; injecting into the training dataset those predictions which were classified by domain experts as other objects or temporary swimming pools; leveraging some other datasets, already available through the SITG portal : buildings , miscellaneous objects , etc.","title":"Conclusion"},{"location":"PROJ-HETRES/","text":"Dieback of beech trees: methodology for determining the health state of beech trees from airborne images and LiDAR point clouds \u00b6 Clotilde Marmy (ExoLabs) - Gwena\u00eblle Salamin (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo) Proposed by the Canton of Jura - PROJ-HETRES October 2022 to August 2023 - Published on November 13, 2023 All scripts are available on GitHub . Abstract : Beech trees are sensitive to drought and repeated episodes can cause dieback. This issue affects the Jura forests requiring the development of new tools for forest management. In this project, descriptors for the health state of beech trees were derived from LiDAR point clouds, airborne images and satellite images to train a random forest predicting the health state per tree in a study area (5 km\u00b2) in Ajoie. A map with three classes was produced: healthy, unhealthy, dead. Metrics computed on the test dataset revealed that the model trained with all the descriptors has an overall accuracy up to 0.79, as well as the model trained only with descriptors derived from airborne imagery. When all the descriptors are used, the yearly difference of NDVI between 2018 and 2019, the standard deviation of the blue band, the mean of the NIR band, the mean of the NDVI, the standard deviation of the canopy cover and the LiDAR reflectance appear to be important descriptors. 1. Introduction \u00b6 Since the drought episode of 2018, the canton of Jura and other cantons have noticed dieback of the beech trees in their forests 1 . In the canton of Jura, this problem mainly concerns the Ajoie region, where 1000 hectares of deciduous trees are affected 2 . This is of concern for the productivity and management of the forest, as well as for the security of walkers. In this context, the R\u00e9publique et Canton du Jura has contacted the Swiss Territorial Data Lab to develop a new monitoring solution based on data science, airborne images and LiDAR point clouds. The dieback symptoms are observable in the mortality of branches, the transparency of the tree crown and the leaf mass partition 3 . The vegetation health state influences the reflectance in images (airborne and satellite), which is often used as a monitoring tool, in particular under the form of vegetation indices: Normalized Difference Vegetation Index (NDVI), a combination of the near-infrared and red bands quantifying vegetation health; Vegetation Health Index (VHI), an index quantifying the decrease or increase of vegetation in comparison to a reference state. For instance, Brun et al. studied early-wilting in Central European forests with time series of the Normalized Difference Vegetation Index (NDVI) and estimate the surface concerned by early leaf-shedding 4 . Another technology used to monitor forests is light detection and ranging (LiDAR) as it penetrates the canopy and gives 3D information on trees and forest structures. Several forest and tree descriptors such as the canopy cover 5 or the standard deviation of crown return intensity 6 can be derived from the LiDAR point cloud to monitor vegetation health state. In 5 , the study was conducted at tree level, whereas in 6 stand level was studied. To work at tree level, it is necessary to segment individual trees in the LiDAR point cloud. On complex forests, like with a dense understory near tree stems, it is challenging to get correct segments without manual corrections. The aim of this project is to provide foresters with a map to help plan the felling of beech trees in the Ajoie's forests. To do so, we developed a combined method using LiDAR point clouds and airborne and satellite multispectral images to determine the health state of beech trees. 2. Study area \u00b6 The study was conducted in two areas of interest in the Ajoie region (Fig. 1.A); one near Mi\u00e9court (Fig. 1.B), the other one near Beurnev\u00e9sin (Fig. 1.C). Altogether they cover 5 km 2 , 1.4 % of the Canton of Jura's forests 7 . Mi\u00e9court sub-area is west-south and south oriented, whereas Beurnev\u00e9sin sub-area is rather east-south and south oriented. They are in the same altitude range (600-700 m) and are 2 km away from each other, thus near the same weather station. Figure 1: The study area is composed of two areas of interest. 3. Data \u00b6 The project makes use of different data types: LiDAR point cloud, airborne and satellite imagery, and ground truth data. Table 1 gives an overview of the data and their characteristics. Data have been acquired in late summer 2022 to have an actual and temporally correlated information on the health state of beech trees. Table 1: Overview of the data used in the project. Resolution Acquisition time Proprietary LiDAR 50-100 pts/m 2 08.2022 R\u00e9publique et Canton du Jura Airborne images 0.03 m 08.2022 R\u00e9publique et Canton du Jura Yearly variation of NDVI 10 m 06.2015-08.2022 Bern University of Applied Science (HAFL) and the Federal Office for Environment (BAFU) Weekly vegetation health index 10 m 06.2015-08.2022 ExoLabs Ground truth - (point data) 08.-10.2022 R\u00e9publique et Canton du Jura 3.1 LiDAR point cloud \u00b6 The LiDAR dataset was acquired on the 16th of August 2023 and its point density is 50-100 pts/m\u00b2. It is classified in the following classes: ground, low vegetation (2-10m), middle vegetation (10-20m) and high vegetation (20 m and above). It was delivered in the LAS format and had reflectance values 8 in the intensity storage field. 3.2 Airborne images \u00b6 The airborne images have a ground resolution of 3 cm and were acquired simultaneously to the LiDAR dataset. The camera captured the RGB bands, as well as the near infrared (NIR) one. The acquisition of images with a lot of overlap and oblique views allowed the production of a true orthoimage for a perfect match with the LiDAR point cloud and the data of the ground truth. 3.3 Satellite images \u00b6 The Sentinel-2 mission from the European Space Agency is passing every 6 days over Switzerland and allows free temporal monitoring at a 10 m resolution. The archives are available back to the beginning of beech tree dieback in 2018. 3.3.1 Yearly variation of NDVI \u00b6 The Bern University of Applied Science (HAFL) and the Federal Office for Environment (BAFU) have developed Web Services for vegetation monitoring derived from Sentinel-2 images. For this project, the yearly variation of NDVI 9 between two successive years is used. It measures the decrease in vegetation activity between August of one year (e.g. 2018) and June of the following year (e.g. 2019). The decrease is derived from rasters made of maximum values of the NDVI in June, July or August. The data are downloaded from the WCS service which delivers \"row\" indices: the NDVI values are not cut for a minimal threshold. 3.3.2 VHI \u00b6 The Vegetation Health Index (VHI) was generated by ETHZ, WSL and ExoLab within the SILVA project 10 which proposes several indices for forest monitoring. VHI from 2016 to 2022 is used. It is computed mainly out of Sentinel-2 images, but also out of images from other satellite missions, in order to have data to obtain a weekly index with no time gap. 3.4 Ground truth \u00b6 The ground truth was collected between August and October 2022 by foresters. They assessed the health of the beech trees based on four criteria 3 : mortality of branches; transparency of the tree crown; leaf mass partition; trunk condition and other health aspects. In addition, each tree was associated with its coordinates and pictures as illustrated in Figure 1 and Figure 2 respectively. The forester surveyed: 75 healthy, 77 unhealthy and 56 dead trees. Tree locations were first identified in the field with a GPS-enabled tablet on which the 2022 SWISSIMAGE mosaic was displayed. Afterwards, the tree locations were precisely adjusted on the trunk locations by visually locating the corresponding stems in the LiDAR point cloud with the help of the pictures taken in the field. The location and health status of a further 18 beech trees were added in July 2023. These 226 beeches - under which are 76 healthy, 77 affected and 73 dead trees - surveyed at the two dates are defined as the ground truth for this project. Figure 2: Examples of the three health states: left, a healthy tree with a dense green tree crown; center, an unhealthy tree with dead twigs and a scarce foliage; right, a dead tree completely dry. 4. Method \u00b6 The method developed is based on the processing of LiDAR point clouds and of airborne images . Ready-made vegetation indices derived from satellite imagery were also used. First, a segmentation of the trees in the LiDAR point cloud was carried out using the Digital-Forestry-Toolbox (DFT) 11 . Then, descriptors for the health state of the beech trees were derived from each dataset. Boxplots and corresponding t-test are computed to evaluate the ability of descriptors to differentiate the three health states. A t-test value below 0.01 indicates that there is a significant difference between the means of two classes. Finally, the descriptors were used jointly with the ground truth to train a random forest (RF) algorithm, before inferring for the study area. Figure 3: Overview of the methodology, which processes the data into health descriptors for beech trees, before training and evaluating a random forest. 4.1 LiDAR processing \u00b6 At the beginning of LiDAR processing, exploration of the data motivated the segmentation and descriptors computation. 4.1.2 Data exploration \u00b6 In order to get an understanding of the available information at the tree level, we manually segmented three healthy, five unhealthy and three dead trees. More unhealthy trees have been segmented to better represent dieback symptoms. Vertical slices of each tree were rotary extracted, providing visual information on the health state. 4.1.3 Segmentation \u00b6 To be able to describe the health state of each tree, segmentation of the forest was performed using the DFT. Parameters have been tuned to find an appropriate segmentation. Two strategies for peak isolation were tested on the canopy height model (CHM): Maxima smoothing: a height difference is set below which all local maxima are suppressed. Local maxima within search radius: the size of the dilation window for identification of maxima is dependent on the height. Each peak isolation method was tested on a range of parameters and on different cell resolutions for the CHM computation. The detailed plan of the simulation is given in Appendix 1 . The minimum tree height was set to 10 m. For computation time reasons, only 3 LiDAR tiles with 55 ground truth (GT) trees located on them were processed. To find the best segmentation, the locations of the GT trees were compared to the location of the segment peaks. GT trees with a segmented peak less than 4 m away were considered as True Positive (TP). The best segmentation was the one with the most TP. 4.1.4 Structural descriptors \u00b6 An alternative to the segmentation is to change of paradigm and perform the analyses at the stand level. Meng et al. 6 derived structural descriptors for acacia dieback at the stand level based on LiDAR point cloud. By adapting their method to the present case, the following descriptors were derived from the LiDAR point cloud using the LidR library from R 12 : Canopy maximal height. Scale and shape parameters of the Weibull density function fitted for the point distribution along the height. Coefficient of variation of leaf area density (cvLAD) describing the distribution of the vertical structure of photosynthetic tissue along the height. Vertical Complexity Index (VCI): Entropy measure of the vertical distribution of vegetation. Standard deviation of the canopy height model (sdCHM), reflects canopy height variations. Canopy cover (CC) and standard deviation (sdCC), reflects foliage density and coverage. Above ground biomass height (AGH), reflects the understory height until 10 m. Descriptors 1 to 6 are directly overtaken from Meng et al. All the descriptors were first computed for three grid resolutions: 10 m, 5 m and 2.5 m. In a second time, the DFT segments were considered as an adaptive grid around the trees, with the assumption that it is still more natural than a regular grid. Then, structural descriptors for vertical points distribution (descriptors 1 to 4) were computed on each segment, whereas descriptors for horizontal points distribution (descriptors 5 to 7) have been processed for 2.5 m grid. A weight was applied to the value of the latter descriptors according to the area of grid cells included in the footprint of the segments. Furthermore, LiDAR reflectance mean and standard deviation (sd) were computed for the segment crowns to differentiate them by their reflectance. 4.2 Image processing \u00b6 For the image processing, an initial step was to compute the normalized difference vegetation index (NDVI) for each raster image. The normalized difference vegetation index (NDVI) is an index commonly used for the estimation of the health state of vegetation 5 13 14 . \\[\\begin{align} \\ NDVI = {NIR-R \\over NIR+R} \\ \\end{align}\\] where NIR and R are the value of the pixel in the near-infrared and red band respectively. To uncover potential distinctive features between the classes, boxplots and principal component analysis were used on the images four bands (RGB-NIR) and the NDVI. Firstly, we tested if the brute pixel values allowed the distinction between classes at a pixel level. This method avoids the pit of the forest segmentation into trees. Secondly, we tested the same method, but with some low-pass filter to reduce the noise in the data. Thirdly, we tried to find distinct statistical features at the tree level. This approach allows decreasing the noise that can be present in high-resolution information. However, it necessitates having a reasonably good segmentation of the trees. Finally, color filtering and edge detection were tested in order to highlight and extract the linear structure of the branches. For each treatment, it is possible to do it with or without a mask on the tree height. As only trees between 20 m and 40 m tall are affected by dieback, a mask based on the Canopy Height Model (CHM) raster derived from the LiDAR point cloud was tested. Figure 4: Overview of different possible data treatments for the the statistical analysis. 4.2.1 Statistical tests on the original and filtered pixels \u00b6 The statistical tests were performed on the original and filtered pixels. Two low pass filters were tested: Gaussian with a sigma of 5; Bilinear downsampling with scale factors of 1/3, 1/5 and 1/17, corresponding to resolutions of 9, 15 and 50 cm. In the original and the filtered cases, the pixels for each GT tree were extracted from the images and sorted by class. Then, the corresponding NDVI is computed. Each pixel has 5 attributes corresponding to its value on the four bands (R, G, B, NIR) and its NDVI. First, the per-class boxplots of the attributes were executed to see if the distinction between classes was possible on one or several bands or on the NDVI. Then, the principal component analysis (PCA) was computed on the same values to see if their linear combination allowed the distinction of the classes. 4.2.2. Statistical tests at the tree level \u00b6 For the tests at the tree level, the GT trees were segmented by hand. For each tree, the statistics of the pixels were calculated over its polygon, on each band and for the NDVI. Then, the results were sorted by class. Each tree has five attributes per band or index corresponding to the statistics of its pixels: minimum (min), maximum (max), mean, median and standard deviation (std). Like with the pixels, the per-class boxplots of the attributes were executed to see if the distinction between classes was possible. Then, the PCA was computed. 4.2.3 Extraction of branches \u00b6 One of the beneficiaries noted that the branches are clearly visible on the RGB images. Therefore, it may be possible to isolate them with color filtering based on the RGB bands. We calibrated an RGB filter through trial and error to produce a binary mask indicating the location of the branches. A sieve filter was used to reduce the noise due to the lighter parts of the foliage. Then, a binary dilation was performed on the mask to highlight the results. Otherwise, they would be too thin to be visible at a 1:5'000 scale. A mask based on the CHM is integrated to the results to limit the influence of the ground. The branches have a characteristic linear structure. In addition, the branches of dead trees tend to be very light line on the dark forest ground and understory. Therefore, we thought that we may detect the dead branches thanks to edge detection. We used the canny edge detector and tested the python functions of the libraries openCV and skimage . 4.3 Satellite-based indices \u00b6 The yearly variation of NDVI and the VHI were used to take account of historical variations of NDVI from 2015 to 2022. For the VHI, the mean for each year is computed over the months considered for the yearly variation of NDVI. The pertinence of using these indices was explored: the values for each tree in the ground truth were extracted and observed in boxplots per health class in 2022 per year pair over the time span from 2015 to 2022. 4.4 Random Forest \u00b6 In R 12 , the caret and randomForest packages were used to train the random forest and make predictions. First, the ground truth was split into the training and the test datasets, with each class being split 70 % into the training set and 30 % into the test set. Health classes with not enough samples were completed with copies. Optimization of the RF was performed on the number of trees to develop and on the number of randomly sampled descriptors to test at each split. In addition, 5-fold cross-validation was used to ensure the use of different parts of the dataset. The search parameter space was from 100 to 1000 decision trees and from 4 to 10 descriptors as the default value is the square root of all descriptors, i.e. 7. RF was assessed using a custom metric, which is an adaptation of the false positive rate for the healthy class. It minimizes the amount of false healthy detections and of dead trees predicted as unhealthy (false unhealthy). It is called custom false positive rate (cFPR) in the text. It was preferred to have a model with more unhealthy predictions to control on the field, than missing unhealthy or dead trees. The cFPR goes from 0 (best) to 1 (worse). Table 2: Confusion matrix for the three health classes. Ground truth Healthy Unhealthy Dead Prediction Healthy A B C Unhealthy D E F Dead G H I According to the confusion matrix in Table 2, the cFPR is computed as follows: \\[\\begin{align} \\ cFPR = {(\ud835\udc35+\ud835\udc36+\ud835\udc39)\\over(\ud835\udc35+\ud835\udc36+\ud835\udc38+\ud835\udc39+\ud835\udc3b+\ud835\udc3c)}. \\ \\end{align}\\] In addition, the overall accuracy (OA), i.e. the ratio of correct predictions over all the predictions, and the sensitivity, which is, per class, the number of correct predictions divided by the number of samples from that class, are used. An ablation study was performed on descriptors to assess the contribution of the different data sources to the final performance. An \u201cimportant\u201d descriptor is having a strong influence on the increase in prediction errors in the case of random reallocation of the descriptor values in the training set. After the optimization, predictions for each DFT segments were computed using the best model according to the cFPR. The inferences were delivered as a thematic map with colors indicating the health state and hue indicating the fraction of decision trees in the RF having voted for the class (vote fraction). The purpose is to give a confidence information, with high vote fraction indicating robust predictions. Furthermore, the ground truth was evaluated for quantity and quality by two means: Removal of samples and its impact on the metric evaluation Splitting the training set into training subsets to evaluate on the original test set. Finally, after having developed the descriptors and the routine on high-quality data, we downgraded them to have resolutions similar to the ones of the swisstopo products (LiDAR: 20 pt/m 2 , orthoimage: 10 cm) and performed again the optimization and prediction steps. Indeed, the data acquisition was especially commissioned for this project and only covers the study area. If in the future the method should be extended, one would like to test if a lower resolution as the one of the standard national-wide product SWISSIMAGE could be sufficient. 5 Results and discussion \u00b6 In this section, the results obtained during the processing of each data source into descriptors are presented and discussed, followed by a section on the random forest results. 5.1 LiDAR processing \u00b6 For the LiDAR data, the reader will first discover the aspect of beech trees in the LiDAR point cloud according to their health state as studied in the data exploration. Then, the segmentation results and the obtained LiDAR-based descriptors will be presented. 5.1.2 Data exploration for 11 beech trees \u00b6 The vertical slices of 11 beech trees provided visual information on health state: branch shape, clearer horizontal and vertical point distribution. In Figure 5, one can appreciate the information shown by these vertical slices. The linear structure of the dead branches, the denser foliage of the healthy tree and the already smaller tree crown of the dead tree are well recognizable. Figure 5: Slices for three trees with different health state. Vertical slices of each tree were rotary extracted, providing visual information on the health state. Dead twigs and density of foliage are particularly distinctive. Some deep learning image classifier could treat LiDAR point cloud slices as artificial images and learn from them before classifying any arbitrary slice from the LiDAR point cloud. However, the subject is not adapted to transfer learning because 200 samples are not enough to train a model to classify three new classes, especially via images without resemblance to datasets used to pre-train deep learning models. 5.1.3 Segmentation \u00b6 Since the tree health classes were visually recognizable for the 11 trees, it was very interesting to individuate each tree in the LiDAR point cloud. After having searched for optimal parameters in the DFT, the best realization of each peak isolation method either slightly oversegmented or slightly undersegmented the forest. The forest has a complex structure with dominant and co-dominant trees, and with understory. A simple yet frequent example is the situation of a small pine growing in the shadow of a beech tree. It is difficult for an algorithm to differentiate between the points belonging to the pine and those belonging to the beech. Complex tree crowns (not spheric, with two maxima) especially lead to oversegmentation. As best segmentation, the smoothing of maxima on a 0.5 m resolution CHM was identified. Out of 55 GT trees, 52 were within a 4 m distance from the centroid of a segment. The total number of segments is 7347. This corresponds to 272 trees/ha. Report of a forest inventory in the Jura forest between 2003 and 2005 indicated a density of 286 trees/ha in high forest 7 . Since the ground truth is only made of point coordinates, it is difficult to assess quantitatively the correctness of the segments, i.e. the attribution of each point to the right segment. Therefore, the work at the tree level is only approximate. 5.1.4 Structural descriptors \u00b6 Nevertheless, the structural descriptors for each tree were computed from the segmented LiDAR point cloud. The t-test between health classes for each descriptor at each resolution (10 m, 5 m, 2.5 m and per-tree grid) are given in Appendices 2 , 3 , 4 and 5 . The number of significant descriptors per resolution is indicated to understand better the effect on the RF: at 10 m: 13 at 5 m: 17 at 2.5 m: 18 per tree: 15 The simulations at 5 m and at 2.5 m seemed a priori the most promising. In both constellations, t-tests indicated a significant different distribution for: maximal height, between the three health states, sdCHM, between the three health states, cvLAD, healthy trees against the others, mean reflectance, healthy trees against the others, VCI, healthy trees against unhealthy trees, canopy cover, healthy trees against dead trees, standard deviation of the reflectance, dead trees against the others, sdCC, dead trees against the others. The maximal height and the sdCHM appear to be the most suited descriptors to separate the three health states. The other descriptors are differentiating healthy trees from the others or dead trees from the others. From the 11 LiDAR-based descriptors, 8 are at least significant for the comparison between two classes. 5.2 Image processing \u00b6 Boxplots and PCA are given to illustrate the results of the image processing exploration. As the masking of pixels below and above the affected height made no difference in the interpretation of the results, they are presented here with the height mask. 5.2.1 Boxplots and PCA over the pixel values of the original images \u00b6 When the pixel values of the original images per health class are compared in boxplots (ex. Fig. 6), the sole brute value of the pixel is not enough to clearly distinguish between classes. Figure 6: Boxplots of the unfiltered pixel values on the different bands and the NDVI index by health class. The PCA in Figure 7 shows that it is not possible to distinguish the groups based on a linear combination of the brute pixel values of the band and NDVI. Figure 7: Distribution of the pixels in the space of the principal components based on the pixel values on the different branches and the NDVI. 5.2.2 Boxplots and PCA over the pixel values of the filtered images \u00b6 A better separation of the different classes is noticeable after the application of a Gaussian filter. The most promising band is the NIR one for a separation of the healthy and dead classes. On the NDVI, the distinction between those two classes should also be possible as illustrated in Figure 8. In all cases, there is no possible distinction between the healthy and unhealthy classes. The separation between the healthy and dead trees on the NIR band would be around 130 and the slight overlap on the NDVI band is between approx. 0.04 and approx. 0.07. Figure 8: Boxplots of the pixel values on the different bands and the NDVI by health class after a Gaussian filter with sigma=5. As for the brute pixels, the overlap between the different classes is still very present in the PCA (Fig. 9). Figure 9: Distribution of the pixels in the space of the principal components based on the pixel values on the different branches and the NDVI after a Gaussian filter with sigma=5. The boxplots produced on the resampled images (Figure 10) give similar results to the ones with the Gaussian filter. The healthy and dead classes are separated on the NIR band around 130. The unhealthy class stays similar to the healthy one. Figure 10: Boxplots of the pixel values on the different bands and the NDVI by health class after a downsampling filter with a factor 1/3. According to the PCA in Figure 11, it seems indeed not possible to distinguish between the classes only with the information presented in this section. Figure 11: Distribution of the pixels in the space of the principal components based on the pixel values on the different branches and the NDVI after a downsampling filter with a factor 1/3. When the factor for the resampling is decreased, i.e. when the resulting resolution increases, the separation on the NIR band becomes stronger. With a factor of 1/17, the healthy and dead classes on the NDVI are almost entirely separated around the value of 0.04. 5.2.3 Boxplots and PCA over the tree statistics \u00b6 As an example for the per-tree statistics, the boxplots and PCA for the blue band are presented in Figures 12 to 14. On the mean and on the standard deviation, healthy and dead classes are well differentiated on the blue band as visible on Figure 12. The same is observed on the mean, median, and minimum of the NDVI, as well as on the maximum, mean, and median of the NIR band. However, there is no possible differentiation on the red and green bands. Figure 12: Boxplots of the statistics values for each tree on the blue band by health class. In the PCA in Figure 13, the groups of the healthy and dead trees are quite well separated, mostly along the first component. Figure 13: Distribution of the trees in the space of the principal components based on their statistical values on the blue band. On Figure 14, the first principal component is influenced principally by the standard deviation of the blue band. The mean, the median and the max have an influence too. This is in accordance with the boxplots where the standard deviation values presented the largest gap between classes. Figure 14: Influence of the statistics for the blue band on the first and second principal components. The point clouds of the dead and healthy classes are also well separated on the PCA of the NIR band and of the NDVI. No separation is visible on the PCA of the green and red bands. 5.2.4 Extraction of branches \u00b6 Finally, the extraction of dead branches was performed. Use of an RGB filter \u00b6 The result of the RGB filter is displayed in Figure 15. It is important to include the binary CHM in the visualization. Otherwise, the ground can have a significant influence on certain zones and distract from the dead trees. Some interferences can still be seen among the coniferous trees that have a similar light color as dead trees. Figure 15: Results produced by the RGB filter for the detection and highlight of dead branches over a zone with coniferous, healthy deciduous and dead deciduous trees. The parts in grey are the zones masked by the filter on the height. Use of the canny edge detector \u00b6 Figure 16 presents the result for the blue band which was the most promising one. The dead branches are well captured. However, there is a lot of noise around them due to the high contrasts in some parts of the foliage. The result is not usable as is. Using a stricter filter decreased the noise, but it also decreased the captured pixels of the branches. In addition, using a sieve filter or trying to combine the results with the ones of the RGB filter did not improve the situation. Figure 16: Test of the canny edge detector from sklearn over a dead tree by using only the blue band. The parts in grey are the zones masked by the CHM filter on the height. The results for the other bands, RGB images or the NDVI were not usable either. 5.2.5 Discussion \u00b6 The results at the tree level are the most promising ones. They are integrated into the random forest. Choosing to work at the tree-level means that all the trees must be segmented with the DFT. This adds uncertainties to the results. As explained in the dedicated section , the DFT has a tendency of over/under-segmenting the results. The procedures at the pixel level, whether on filtered or unfiltered images, are abandoned. For the branch detection, the results were compared with some observations on the terrain by a forest expert. He assessed the result as incorrect in several parts of the forest. Therefore, the use of dead branch detection was not integrated in the random forest. In addition, the edge detection was maybe not the right choice for dead branches and maybe we should have taken an approach more focused on detection of straight lines or graphs. The chance of success of such methods are difficult to predict as there can be a lot of variations in the form of the dead branches. 5.3 Vegetation indices from satellite imagery \u00b6 The t-test used to evaluate the ability of satellite indices to differentiate between health states are given in Appendices 6 and 7 . In the following two subsections, solely the significant tested groups are mentioned for understanding the RF performance. 5.3.1 Yearly variation of NDVI \u00b6 t-test on the yearly variation of NDVI indicated significance between: all health states in 2018-2019: 2018 was an especially dry and hot year, whereas 2019 was in the seasonal normals. The recovery in 2019 may have differed according to the health classes. healthy and other trees in 2016-2017 and 2019-2020: maybe healthy trees are responding diversely to environmental factors than affected trees. healthy and dead trees in 2021-2022: this reflects a higher increase of NDVI for the dead trees. Is the understory benefitting from clearer forest structure? 5.3.2 Vegetation Healthy Index \u00b6 t-test on the VHI indicated significance between: dead and other trees in 2017 healthy and dead trees in 2018 healthy and other trees in 2019 unhealthy and other trees in 2021 dead and unhealthy trees in 2020 and 2022 Explanations similar to those for NDVI may partly explain the significance obtained. In any case,it is encouraging that the VHI helps to differentiate health classes thanks to different evolution through the years. 5.4 Random Forest \u00b6 The results of the RF that are presented and discussed are: (1) the optimization and ablation study, (2) the ground truth analysis, (3) the predictions for the AOI and (4) the performance with downgraded data. 5.4.1 Optimization and ablation study \u00b6 In Table 3, performance for VHI and yearly variation of NDVI (yvNDVI) descriptors using their value at the location of the GT trees are compared. VHI (cFPR = 0.24, OA = 0.63) performed better than the yearly variation of NDVI (cFPR = 0.39, OA = 0.5). Both groups of descriptors are mostly derived from satellite data with the same resolution (10 m). A conceptual difference is that the VHI is a deviation to a long-term reference value; whereas the yearly variation of NDVI reflects the change between two years. For the latter, values can be high or low independently of the actual health state. Example, a succession of two bad years will indicate few to no differences in NDVI. Table 3: RF performance with satellite-based descriptors. Descriptors cFPR OA VHI 0.24 0.63 yvNDVI 0.39 0.5 Nonetheless, only the yearly variation of NDVI is used hereafter as it is available free of charge. Regarding the LiDAR descriptors, the tested resolutions indicated that the 5 m resolution (cFPR = 0.2 and OA = 0.65) was performing the best for the cFPR, but that the per-tree descriptors had the higher OA (cFPR = 0.33, OA = 0.67). At 5 m resolution, fewer affected trees are missed, but there are more errors in the classification, so more control on the field would have to be done. The question of which grid resolution to use on the forest is a complex one, as the forest consists of trees of different sizes. Further, even if dieback affects some areas more severely than others, it's not a continuous phenomenon, and it is important to be able to clearly delimit each tree. However, a grid, as the 2.5 m one, can also hinder to capture the entirety of some trees and the performance may decrease (LiDAR, 2.5 m, OA=0.63). Table 4: RF performance with LiDAR-based descriptors at different resolutions. Descriptors cFPR OA LiDAR, 10 m 0.3 0.6 LiDAR, 5 m 0.2 0.65 LiDAR, 2.5 m 0.28 0.63 LiDAR, per tree 0.33 0.67 Then, the 5 m resolution descriptors are kept for the rest of the analysis according to the decision of reducing missed dying trees. The ablation study performed on the descriptor sources is summarized in Table 5.A and Table 5.B. The two tables reflect performance for two different partitions of the samples in training and test sets. Since the performance is varying form several percents, the performance is impacted by the repartition of the samples. Following those values, the best setups for each partition respectively are the full model (cFPR = 0.13, OA = 0.76) and the airborne-based model (cFPR = 0.11, OA = 0.79). One notices that all the health classes are not predicted with the same accuracy. The airborne-based model, as described in Section 5.2.3 , is less sensitive to the healthy class; whereas the satellite-based model and the LiDAR-based model is more polarized to healthy and dead classes, with low sensitivity performance in the unhealthy class. Table 5.A: Ablation study results, partition A of the dataset. Descriptor sources cFPR OA Sensitivity healthy Sensitivity unhealthy Sensitivity dead LiDAR 0.2 0.65 0.65 0.61 0.71 Airborne images 0.18 0.63 0.43 0.61 0.94 yvNDVI 0.4 0.49 0.78 0.26 0.41 LiDAR and yvNDVI 0.23 0.7 0.74 0.61 0.76 Airborne images and yvNDVI 0.15 0.73 0.65 0.7 0.88 LiDAR, airborne images and yvNDVI 0.13 0.76 0.65 0.74 0.94 Table 5.B: Ablation study results, partition B of the dataset. Descriptor sources cFPR OA Sensitivity healthy Sensitivity unhealthy Sensitivity dead LiDAR 0.19 0.71 0.76 0.5 0.88 Airborne images 0.11 0.79 0.62 0.8 1 yvdNDVI 0.38 0.62 0.81 0.4 0.65 LiDAR and yvNDVI 0.27 0.74 0.86 0.5 0.88 Airborne images and yvNDVI 0.14 0.78 0.62 0.8 0.94 LiDAR, airborne images and yvNDVI 0.14 0.79 0.71 0.7 1 Even if the performance varies according to the dataset partition, the important descriptors remain quite similar between the two partitions as displayed in Figure 17.A and Figure 17.B. The yearly difference of NDVI between 2018 and 2019 ( NDVI_diff_1918 ) is the most important descriptor; standard deviation on the blue band ( b_std ) and the mean on the NIR band and NDVI ( nir_mean and ndvi_mean ) are standing out in both cases; from the LiDAR, the standard deviation of canopy cover ( sdcc ) and of the LiDAR reflectance ( i_sd_seg ) are the most important descriptors. The order of magnitude explains the better performance on partition B with the airborne-based model: for instance, the b_std has the magnitude of 7.6 instead of 4.6 with Partition B. Figure 17.A: Important descriptors for the full model, dataset partition A. Figure 17.B: Important descriptors for the full model, dataset partition B. The most important descriptor of the full model resulted to be the yearly variation of NDVI between 2018 and 2019. The former was a year with a dry and hot summer which has stressed beech trees and probably participated to cause forest damages 1 . This corroborates the ability of our RF method to monitor the response of trees to extreme drought events. However, the 10 m resolution of the index and the different adaptability of individual beech trees to drought may make the relationship between current health status and the index weak. This can explain that the presence of this descriptor in the full model doesn't offer better performance than the airborne-based model to predict the health state. Both the mean on the NIR band and the standard deviation on the blue band play an important role. Statistical study in Section 5.2.3 indicated that the models might confuse healthy and unhealthy classes. On one hand, airborne imagery only sees the top of the crown and may miss useful information on hidden part. On the other hand, airborne imagery has a good ability to detect dead trees thanks to different reflectance values in NIR and blue bands. One argument that could explain the lower performance of the model based on LiDAR-based descriptors is the difficulty to find the right scale to perform the analysis as beech trees can show a wide range of crown diameters. 5.4.2 Ground truth analysis \u00b6 With progressive removal of sample individuals from the training set, impact of individual beech trees on the performance is further analyzed. The performance variation is shown in Figure 18. The performance is rather stable in the sense that the sensitivities stay in a range of values similar to the initial one up to 40 samples removed, but with each removal, a slight instability in the metrics is visible. The size of the peaks indicates variations of 1 prediction for the dead class, but up to 6 predictions for the unhealthy class and up to 7 for the healthy class. During the sample removal, some samples were always predicted correctly, whereas others were often misclassified leading to the peaks in Figure 18. With the large number of descriptors in the full model, there is no straightforward profile of outliers to identify. Figure 18: Evolution of the per-class sensitivity with removal of samples. In addition, the subsampling of the training set in Table 6 shows that the OA varies only by max. 3% according to the subset used. It indicated again that the amount of ground truth allows to reach a stable OA range, but the characteristics of the samples does not allow a stable OA value. The sensitivity for the dead classes is stable, whereas sensitivity for healthy and unhealthy class are varying. Table 6: Performance according to different random seed for the creation of the training subset. Training set subpartition cFPR OA Sensitivity healthy Sensitivity unhealthy Sensitivity dead Random seed = 2 0.13 0.76 0.61 0.83 0.88 Random seed = 22 0.15 0.78 0.70 0.78 0.88 Random seed = 222 0.18 0.75 0.65 0.74 0.88 Random seed = 2222 0.13 0.76 0.65 0.78 0.88 Random seed = 22222 0.10 0.78 0.65 0.83 0.88 5.4.3 Predictions \u00b6 The full model and the airborne-based-model were used to infer the health state of trees in the study area (Fig. 19). As indicated in Table 7, with the full model, 35.1 % of the segments were predicted as healthy, 53 % as unhealthy and 11.9 % as dead. With the airborne-based model, 42.6 % of the segments were predicted as healthy, 46.2 % as unhealthy and 11.2 % as dead. The two models agree on 74.3 % of the predictions. In the 25.6 % of disagreement, it is about 77.1% of disagreement between healthy and unhealthy predictions. Finally, 1.5% are critical disagreement (between healthy and dead classes). Table 7: Percentage of health in the AOI. Model Healthy [%] Unhealth [%] Dead [%] Full 35.1 53 11.9 Airborne-based 42.6 46.2 11.2 Control by forestry experts reported that the predictions mostly correspond to the field situation and that a weak vote fraction often corresponds to false predictions. They confirmed that the map is delivering useful information to help plan beech tree felling. The final model retained after excursion on the field is the full model. Figure 19: Extract of the predicted thematic health map. Green is for healthy, yellow for unhealthy, and red for dead trees. Hues indicate the RF fraction of votes. The predictions can be compared with the true orthophoto in the background. The polygons approximating the tree crowns correspond to the delimitation of segmented trees. 5.4.4 Downgraded data \u00b6 Finally, random forest models are trained and tested on downgraded data with the partition A of the ground truth for all descriptors and by descriptor sources. With this partition, RF have a better cFPR for the full model (0.08 instead of 0.13), the airborne-based model (0.08 instead of 0.21) and the LiDAR-based model (0.28 instead of 0.31). The OA is also better (full model: 0.84 instead of 0.76, airborne-based model: 0.77 instead of 0.63), except in the case of the LiDAR-based model (0.63 instead of 0.66). It indicated that the resolution of 10 cm in the aerial imagery does not weaken the model and can even improve it. For the LiDAR point cloud, a reduction by a factor 5 of the density has not changed much the performance. Table 7.A: Performance for RF trained and tested with the partition A of the dataset of downgraded data. Simulation cFPR OA Full 0.08 0.84 Airborne-based 0.08 0.77 LiDAR-based 0.28 0.63 Table 7.A: Performance for RF trained and tested with the partition A of the dataset for original data. Simulation cFPR OA Full 0.13 0.76 Airborne-based 0.21 0.63 LiDAR-based 0.31 0.66 When the important descriptors are compared between the original and downgraded model, one notices that the airborne descriptors gained in importance in the full model when data are downgraded. The downgraded model showed sufficient accuracy for the objective of the project. 6 Conclusion and outlook \u00b6 The study has demonstrated the ability of a random forest algorithm to learn from structural descriptors derived from LiDAR point clouds and from vegetation reflectance in airborne and satellite images to predict the health state of beech trees. Depending on the used datasets for training and test, the optimized full model including all descriptors reached an OA of 0.76 or of 0.79, with corresponding cFPR values of 0.13 and 0.14 respectively. These metrics are sufficient for the purpose of prioritizing beech tree felling. The produced map, with the predicted health state and the corresponding votes for the segments, delivers useful information for forest management. The cantonal foresters validated the outcomes of this proof-of-concept and explained how the location of affected beech trees as individuals or as groups are used to target high-priority areas. The full model highlighted the importance of the yearly variation of NDVI between a drought year (2018) and a normal year (2019). The airborne imagery showed good ability to predict dead trees, whereas confusion remained between healthy and unhealthy trees. The quality of the LiDAR point cloud segmentation may explain the limited performance of the LiDAR-based model. Finally, the model trained and tested on downgraded data gave an OA of 0.84 and a cFPR of 0.08. In this model, the airborne-based descriptors gained in importance. It was concluded that a 10 cm resolution may help the model by reducing the noise in the image. Outlooks for improving results include improving the ground truth representativeness of symptoms in the field and continuing research into descriptors for differentiating between healthy and unhealthy trees: For the image processing, suggestions are the integration of more statistics like the skewness and kurtosis of the reflectance as in Junttila et al. (2022) 15 . LiDAR-based descriptors had limited impact on the final results. To better valorize them for an application on beech trees, further research would be needed. Beside producing a cleaner segmentation and finding additional descriptors, it could consist in mixing the descriptors at the different resolutions and, with the help of the importance analysis, estimate at which resolution each descriptor brings the most information to the classification. The results showed the important contribution of vegetation indices derived from satellite imagery reflecting the drought year of 2018. If available, using historical image data of higher resolution to derive more descriptors could help improve individual tree health assessment. The possibility of further developments put aside, the challenge is now the extension of the methodology to a larger area. The simultaneity of the data is necessary to an accurate analysis. It has been shown that the representativeness of the ground truth has to be improved to obtain better and more stable results. Thus, for an extension to further areas, we recommend collecting additional ground truth measurements. The health state of the trees showed some autocorrelation that could have boosted our results and make them less representative of the whole forest. They should be more scattered in the forest. Furthermore, required data are a true orthophoto and a LiDAR point cloud for per-tree analysis. It should be possible to use an old LiDAR acquisition to produce a CHM and renounce to use LiDAR-based descriptors without degrading the performance of the model too much. 7 Appendixes \u00b6 7.1 Simulation plan for DFT parameter tuning \u00b6 Table 8: parameter tuning for DFT. CHM cell size [m] Maxima smoothing Local maxima within search radius 0.50 0.1 (3.09632 + 0.00895 * h^2)/2 0.3 (1.7425 * h^0.5566)/2 0.5 (1.2 + 0.16 * h)/2 1.00 0.1 (3.09632 + 0.00895 * h^2)/2 0.3 (1.7425 * h^0.5566)/2 0.5 (1.2 + 0.16 * h)/2 1.50 0.1 (3.09632 + 0.00895 * h^2)/2 0.3 (1.7425 * h^0.5566)/2 0.5 (1.2 + 0.16 * h)/2 2.00 0.1 (3.09632 + 0.00895 * h^2)/2 0.3 (1.7425 * h^0.5566)/2 0.5 (1.2 + 0.16 * h)/2 7.2 t-tests \u00b6 t-test were computed to evaluate the ability of descriptors to differentiate the three health states. A t-test value below 0.01 indicates that there is a significant difference between the means of two classes. 7.2.1 t-tests on LiDAR-based descriptors at 10 m \u00b6 Table 9: t-test on LiDAR-based descriptors at 10 m. Descriptors healthy vs. unhealthy healthy vs. dead unhealthy vs. dead maximal height 0.002 1.12E-11 3.23E-04 scale parameter 0.005 0.014 0.964 shape parameter 0.037 0.002 0.269 cvLAD 0.001 2.22E-04 0.353 VCI 0.426 0.094 0.358 mean reflectance 4.13E-05 0.002 0.164 sd of reflectance 0.612 3.33E-06 9.21E-05 canopy cover 0.009 0.069 0.340 sdCC 0.002 0.056 0.324 sdCHM 0.316 0.262 0.892 AGH 0.569 0.055 0.120 7.2.2 t-test on LiDAR-based descriptors at 5 m \u00b6 Table 10: t-test on LiDAR-based descriptors at 5 m. Descriptors healthy vs. unhealthy healthy vs. dead unhealthy vs. dead maximal height 0.001 4.67E-12 1.73E-04 scale parameter 0.072 0.831 0.204 shape parameter 0.142 0.654 0.361 cvLAD 9.14E-06 3.22E-05 0.667 VCI 0.006 0.104 0.485 mean reflectance 6.60E-05 2.10E-06 0.249 sd of reflectance 0.862 2.26E-08 9.24E-08 canopy cover 0.288 0.001 0.003 sdCC 1.42E-05 1.94E-11 0.001 sdCHM 0.004 1.94E-08 0.002 AGH 0.783 0.071 0.095 7.2.3 t-test on LiDAR-based descriptors at 2.5 m \u00b6 Table 11: t-test on LiDAR-based descriptors at 2.5 m. Descriptors healthy vs. unhealthy healthy vs. dead unhealthy vs. dead maximal height 3.76E-04 7.28E-11 4.80E-04 scale parameter 0.449 0.283 5.60E-01 shape parameter 0.229 0.087 0.462 cvLAD 3.59E-04 1.06E-07 0.012 VCI 0.004 1.99E-05 0.072 mean reflectance 3.15E-04 5.27E-07 0.068 sd of reflectance 0.498 1.10E-10 4.66E-11 canopy cover 0.431 0.004 0.019 sdCC 0.014 1.94E-13 6.94E-09 sdCHM 0.003 5.56E-07 0.006 AGH 0.910 0.132 0.132 7.2.4 t-test on LiDAR-based descriptors per tree \u00b6 Table 12: t-test on LiDAR-based descriptors per tree. Descriptors healthy vs. unhealthy healthy vs. dead unhealthy vs. dead maximal height 0.001 1.98E-11 2.61E-04 scale parameter 0.726 0.618 0.413 shape parameter 0.739 0.795 0.564 cvLAD 0.001 4.23E-04 0.526 VCI 0.145 0.312 0.763 mean reflectance 1.19E-04 0.001 0.949 sd of reflectance 0.674 3.70E-07 4.79E-07 canopy cover 0.431 0.005 0.023 sdCC 0.014 4.43E-13 1.10E-08 sdCHM 0.003 2.71E-07 0.004 AGH 0.910 0.090 0.087 7.2.5 t-tests on yearly variation of NDVI \u00b6 Table 13: t-test on yearly variation of NDVI. Descriptors healthy vs. unhealthy healthy vs. dead unhealthy vs. dead 2016 0.177 0.441 0.037 2017 0.079 2.20E-06 0.004 2018 0.093 1.57E-04 0.132 2019 0.003 0.001 0.816 2020 0.536 0.041 0.005 2021 0.002 0.894 0.003 2022 0.131 0.103 0.002 7.2.6 t-test on VHI \u00b6 Table 14: t-test on VHI. Descriptors healthy vs. unhealthy healthy vs. dead unhealthy vs. dead 2015-2016 0.402 0.572 0.767 2016-2017 0.005 0.002 0.885 2017-2018 0.769 0.329 0.505 2018-2019 2.64E-05 3.98E-14 0.001 2019-2020 7.86E-06 9.55E-05 0.427 2020-2021 0.028 0.790 0.018 2021-2022 0.218 0.001 0.080 8 Sources and references \u00b6 Indications on software and hardware requirements, as well as the code used to perform the project, are available on GitHub: https://github.com/swiss-territorial-data-lab/proj-hetres/tree/main. Other sources of information mentioned in this documentation are listed here: OFEV et al. (\u00e9d.). La canicule et la s\u00e9cheresse de l\u2019\u00e9t\u00e9 2018. Impacts sur l\u2019homme et l\u2019environnement. Technical Report 1909, Office f\u00e9d\u00e9ral de l\u2019environnement, Berne, 2019. \u21a9 \u21a9 Beno\u00eet Grandclement and Daniel Bachmann. 19h30 - En Suisse, la s\u00e9cheresse qui s\u00e9vit depuis plusieurs semaines frappe durement les arbres - Play RTS. February 2023. URL: https://www.rts.ch/play/tv/19h30/video/en-suisse-la-secheresse-qui-sevit-depuis-plusieurs-semaines-frappe-durement-les-arbres?urn=urn:rts:video:13829524 (visited on 2023-03-28). \u21a9 Xavier Gauquelin, editor. Guide de gestion des for\u00eats en crise sanitaire . Office National des For\u00eats, Institut pour le D\u00e9veloppement Forestier, Paris, 2010. ISBN 978-2-84207-344-2. \u21a9 \u21a9 Philipp Brun, Achilleas Psomas, Christian Ginzler, Wilfried Thuiller, Massimiliano Zappa, and Niklaus E. Zimmermann. Large-scale early-wilting response of Central European forests to the 2018 extreme drought. Global Change Biology , 26(12):7021\u20137035, 2020. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/gcb.15360. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/gcb.15360 (visited on 2022-10-13), doi:10.1111/gcb.15360 . \u21a9 Run Yu, Youqing Luo, Quan Zhou, Xudong Zhang, Dewei Wu, and Lili Ren. A machine learning algorithm to detect pine wilt disease using UAV-based hyperspectral imagery and LiDAR data at the tree level. International Journal of Applied Earth Observation and Geoinformation , 101:102363, September 2021. URL: https://www.sciencedirect.com/science/article/pii/S0303243421000702 (visited on 2022-10-13), doi:10.1016/j.jag.2021.102363 . \u21a9 \u21a9 \u21a9 Pengyu Meng, Hong Wang, Shuhong Qin, Xiuneng Li, Zhenglin Song, Yicong Wang, Yi Yang, and Jay Gao. Health assessment of plantations based on LiDAR canopy spatial structure parameters. International Journal of Digital Earth , 15(1):712\u2013729, December 2022. URL: https://www.tandfonline.com/doi/full/10.1080/17538947.2022.2059114 (visited on 2022-12-07), doi:10.1080/17538947.2022.2059114 . \u21a9 \u21a9 \u21a9 Patrice Eschmann, Pascal Kohler, Vincent Brahier, and Jo\u00ebl Theubet. La for\u00eat jurassienne en chiffres, R\u00e9sultats et interpr\u00e9tation de l'inventaire forestier cantonal 2003 - 2005. Technical Report, R\u00e9publique et Canton du Jura, St-Ursanne, 2006. URL: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjHuZyfhoSBAxU3hP0HHeBtC4sQFnoECDcQAQ&url=https%3A%2F%2Fwww.jura.ch%2FHtdocs%2FFiles%2FDepartements%2FDEE%2FENV%2FFOR%2FDocuments%2Fpdf%2Frapportinventfor0305.pdf%3Fdownload%3D1&usg=AOvVaw0yr9WOtxMyY-87avVMS9YM&opi=89978449However . \u21a9 \u21a9 Agnieska Ptak. (5) Amplitude vs Reflectance \\textbar LinkedIn. June 2020. URL: https://www.linkedin.com/pulse/amplitude-vs-reflectance-agnieszka-ptak/ (visited on 2023-08-11). \u21a9 BFH-HAFL and BAFU. Waldmonitoring.ch : wcs_ndvi_diff_2016_2015, wcs_ndvi_diff_2017_2016, wcs_ndvi_diff_2018_2017, wcs_ndvi_diff_2019_2018, wcs_ndvi_diff_2020_2019, wcs_ndvi_diff_2021_2020, wcs_ndvi_diff_2022_2021. URL: https://geoserver.karten-werk.ch/wfs?request=GetCapabilities . \u21a9 Reik Leiterer, Gillian Milani, Jan Dirk Wegner, and Christian Ginzler. ExoSilva - ein Multi\u00ad-Sensor\u00ad-Ansatz f\u00fcr ein r\u00e4umlich und zeitlich hochaufgel\u00f6stes Monitoring des Waldzustandes. In Neue Fernerkundungs\u00adtechnologien f\u00fcr die Umweltforschung und Praxis , 17\u201322. Swiss Federal Institute for Forest, Snow and Landscape Research, WSL, April 2023. URL: https://www.dora.lib4ri.ch/wsl/islandora/object/wsl%3A33057 (visited on 2023-11-13), doi:10.55419/wsl:33057 . \u21a9 Matthew Parkan. Mparkan/Digital-Forestry-Toolbox: Initial release. April 2018. URL: https://zenodo.org/record/1213013 (visited on 2023-08-11), doi:10.5281/ZENODO.1213013 . \u21a9 R Core Team. R: A Language and Environment for Statistical Computing. 2023. URL: https://www.R-project.org/ . \u21a9 \u21a9 Olga Brovkina, Emil Cienciala, Peter Surov\u00fd, and P\u0159emysl Janata. Unmanned aerial vehicles (UAV) for assessment of qualitative classification of Norway spruce in temperate forest stands. Geo-spatial Information Science , 21(1):12\u201320, January 2018. URL: https://www.tandfonline.com/doi/full/10.1080/10095020.2017.1416994 (visited on 2022-07-15), doi:10.1080/10095020.2017.1416994 . \u21a9 N.K. Gogoi, Bipul Deka, and L.C. Bora. Remote sensing and its use in detection and monitoring plant diseases: A review. Agricultural Reviews , December 2018. doi:10.18805/ag.R-1835 . \u21a9 Samuli Junttila, Roope N\u00e4si, Niko Koivum\u00e4ki, Mohammad Imangholiloo, Ninni Saarinen, Juha Raisio, Markus Holopainen, Hannu Hyypp\u00e4, Juha Hyypp\u00e4, P\u00e4ivi Lyytik\u00e4inen-Saarenmaa, Mikko Vastaranta, and Eija Honkavaara. Multispectral Imagery Provides Benefits for Mapping Spruce Tree Decline Due to Bark Beetle Infestation When Acquired Late in the Season. Remote Sensing , 14(4):909, February 2022. URL: https://www.mdpi.com/2072-4292/14/4/909 (visited on 2023-10-27), doi:10.3390/rs14040909 . \u21a9","title":"Dieback of beech trees: methodology for determining the health state of beech trees from airborne images and LiDAR point clouds"},{"location":"PROJ-HETRES/#dieback-of-beech-trees-methodology-for-determining-the-health-state-of-beech-trees-from-airborne-images-and-lidar-point-clouds","text":"Clotilde Marmy (ExoLabs) - Gwena\u00eblle Salamin (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo) Proposed by the Canton of Jura - PROJ-HETRES October 2022 to August 2023 - Published on November 13, 2023 All scripts are available on GitHub . Abstract : Beech trees are sensitive to drought and repeated episodes can cause dieback. This issue affects the Jura forests requiring the development of new tools for forest management. In this project, descriptors for the health state of beech trees were derived from LiDAR point clouds, airborne images and satellite images to train a random forest predicting the health state per tree in a study area (5 km\u00b2) in Ajoie. A map with three classes was produced: healthy, unhealthy, dead. Metrics computed on the test dataset revealed that the model trained with all the descriptors has an overall accuracy up to 0.79, as well as the model trained only with descriptors derived from airborne imagery. When all the descriptors are used, the yearly difference of NDVI between 2018 and 2019, the standard deviation of the blue band, the mean of the NIR band, the mean of the NDVI, the standard deviation of the canopy cover and the LiDAR reflectance appear to be important descriptors.","title":"Dieback of beech trees: methodology for determining the health state of beech trees from airborne images and LiDAR point clouds"},{"location":"PROJ-HETRES/#1-introduction","text":"Since the drought episode of 2018, the canton of Jura and other cantons have noticed dieback of the beech trees in their forests 1 . In the canton of Jura, this problem mainly concerns the Ajoie region, where 1000 hectares of deciduous trees are affected 2 . This is of concern for the productivity and management of the forest, as well as for the security of walkers. In this context, the R\u00e9publique et Canton du Jura has contacted the Swiss Territorial Data Lab to develop a new monitoring solution based on data science, airborne images and LiDAR point clouds. The dieback symptoms are observable in the mortality of branches, the transparency of the tree crown and the leaf mass partition 3 . The vegetation health state influences the reflectance in images (airborne and satellite), which is often used as a monitoring tool, in particular under the form of vegetation indices: Normalized Difference Vegetation Index (NDVI), a combination of the near-infrared and red bands quantifying vegetation health; Vegetation Health Index (VHI), an index quantifying the decrease or increase of vegetation in comparison to a reference state. For instance, Brun et al. studied early-wilting in Central European forests with time series of the Normalized Difference Vegetation Index (NDVI) and estimate the surface concerned by early leaf-shedding 4 . Another technology used to monitor forests is light detection and ranging (LiDAR) as it penetrates the canopy and gives 3D information on trees and forest structures. Several forest and tree descriptors such as the canopy cover 5 or the standard deviation of crown return intensity 6 can be derived from the LiDAR point cloud to monitor vegetation health state. In 5 , the study was conducted at tree level, whereas in 6 stand level was studied. To work at tree level, it is necessary to segment individual trees in the LiDAR point cloud. On complex forests, like with a dense understory near tree stems, it is challenging to get correct segments without manual corrections. The aim of this project is to provide foresters with a map to help plan the felling of beech trees in the Ajoie's forests. To do so, we developed a combined method using LiDAR point clouds and airborne and satellite multispectral images to determine the health state of beech trees.","title":"1. Introduction"},{"location":"PROJ-HETRES/#2-study-area","text":"The study was conducted in two areas of interest in the Ajoie region (Fig. 1.A); one near Mi\u00e9court (Fig. 1.B), the other one near Beurnev\u00e9sin (Fig. 1.C). Altogether they cover 5 km 2 , 1.4 % of the Canton of Jura's forests 7 . Mi\u00e9court sub-area is west-south and south oriented, whereas Beurnev\u00e9sin sub-area is rather east-south and south oriented. They are in the same altitude range (600-700 m) and are 2 km away from each other, thus near the same weather station. Figure 1: The study area is composed of two areas of interest.","title":"2. Study area"},{"location":"PROJ-HETRES/#3-data","text":"The project makes use of different data types: LiDAR point cloud, airborne and satellite imagery, and ground truth data. Table 1 gives an overview of the data and their characteristics. Data have been acquired in late summer 2022 to have an actual and temporally correlated information on the health state of beech trees. Table 1: Overview of the data used in the project. Resolution Acquisition time Proprietary LiDAR 50-100 pts/m 2 08.2022 R\u00e9publique et Canton du Jura Airborne images 0.03 m 08.2022 R\u00e9publique et Canton du Jura Yearly variation of NDVI 10 m 06.2015-08.2022 Bern University of Applied Science (HAFL) and the Federal Office for Environment (BAFU) Weekly vegetation health index 10 m 06.2015-08.2022 ExoLabs Ground truth - (point data) 08.-10.2022 R\u00e9publique et Canton du Jura","title":"3. Data"},{"location":"PROJ-HETRES/#31-lidar-point-cloud","text":"The LiDAR dataset was acquired on the 16th of August 2023 and its point density is 50-100 pts/m\u00b2. It is classified in the following classes: ground, low vegetation (2-10m), middle vegetation (10-20m) and high vegetation (20 m and above). It was delivered in the LAS format and had reflectance values 8 in the intensity storage field.","title":"3.1 LiDAR point cloud"},{"location":"PROJ-HETRES/#32-airborne-images","text":"The airborne images have a ground resolution of 3 cm and were acquired simultaneously to the LiDAR dataset. The camera captured the RGB bands, as well as the near infrared (NIR) one. The acquisition of images with a lot of overlap and oblique views allowed the production of a true orthoimage for a perfect match with the LiDAR point cloud and the data of the ground truth.","title":"3.2 Airborne images"},{"location":"PROJ-HETRES/#33-satellite-images","text":"The Sentinel-2 mission from the European Space Agency is passing every 6 days over Switzerland and allows free temporal monitoring at a 10 m resolution. The archives are available back to the beginning of beech tree dieback in 2018.","title":"3.3 Satellite images"},{"location":"PROJ-HETRES/#34-ground-truth","text":"The ground truth was collected between August and October 2022 by foresters. They assessed the health of the beech trees based on four criteria 3 : mortality of branches; transparency of the tree crown; leaf mass partition; trunk condition and other health aspects. In addition, each tree was associated with its coordinates and pictures as illustrated in Figure 1 and Figure 2 respectively. The forester surveyed: 75 healthy, 77 unhealthy and 56 dead trees. Tree locations were first identified in the field with a GPS-enabled tablet on which the 2022 SWISSIMAGE mosaic was displayed. Afterwards, the tree locations were precisely adjusted on the trunk locations by visually locating the corresponding stems in the LiDAR point cloud with the help of the pictures taken in the field. The location and health status of a further 18 beech trees were added in July 2023. These 226 beeches - under which are 76 healthy, 77 affected and 73 dead trees - surveyed at the two dates are defined as the ground truth for this project. Figure 2: Examples of the three health states: left, a healthy tree with a dense green tree crown; center, an unhealthy tree with dead twigs and a scarce foliage; right, a dead tree completely dry.","title":"3.4 Ground truth"},{"location":"PROJ-HETRES/#4-method","text":"The method developed is based on the processing of LiDAR point clouds and of airborne images . Ready-made vegetation indices derived from satellite imagery were also used. First, a segmentation of the trees in the LiDAR point cloud was carried out using the Digital-Forestry-Toolbox (DFT) 11 . Then, descriptors for the health state of the beech trees were derived from each dataset. Boxplots and corresponding t-test are computed to evaluate the ability of descriptors to differentiate the three health states. A t-test value below 0.01 indicates that there is a significant difference between the means of two classes. Finally, the descriptors were used jointly with the ground truth to train a random forest (RF) algorithm, before inferring for the study area. Figure 3: Overview of the methodology, which processes the data into health descriptors for beech trees, before training and evaluating a random forest.","title":"4. Method"},{"location":"PROJ-HETRES/#41-lidar-processing","text":"At the beginning of LiDAR processing, exploration of the data motivated the segmentation and descriptors computation.","title":"4.1 LiDAR processing"},{"location":"PROJ-HETRES/#42-image-processing","text":"For the image processing, an initial step was to compute the normalized difference vegetation index (NDVI) for each raster image. The normalized difference vegetation index (NDVI) is an index commonly used for the estimation of the health state of vegetation 5 13 14 . \\[\\begin{align} \\ NDVI = {NIR-R \\over NIR+R} \\ \\end{align}\\] where NIR and R are the value of the pixel in the near-infrared and red band respectively. To uncover potential distinctive features between the classes, boxplots and principal component analysis were used on the images four bands (RGB-NIR) and the NDVI. Firstly, we tested if the brute pixel values allowed the distinction between classes at a pixel level. This method avoids the pit of the forest segmentation into trees. Secondly, we tested the same method, but with some low-pass filter to reduce the noise in the data. Thirdly, we tried to find distinct statistical features at the tree level. This approach allows decreasing the noise that can be present in high-resolution information. However, it necessitates having a reasonably good segmentation of the trees. Finally, color filtering and edge detection were tested in order to highlight and extract the linear structure of the branches. For each treatment, it is possible to do it with or without a mask on the tree height. As only trees between 20 m and 40 m tall are affected by dieback, a mask based on the Canopy Height Model (CHM) raster derived from the LiDAR point cloud was tested. Figure 4: Overview of different possible data treatments for the the statistical analysis.","title":"4.2 Image processing"},{"location":"PROJ-HETRES/#43-satellite-based-indices","text":"The yearly variation of NDVI and the VHI were used to take account of historical variations of NDVI from 2015 to 2022. For the VHI, the mean for each year is computed over the months considered for the yearly variation of NDVI. The pertinence of using these indices was explored: the values for each tree in the ground truth were extracted and observed in boxplots per health class in 2022 per year pair over the time span from 2015 to 2022.","title":"4.3 Satellite-based indices"},{"location":"PROJ-HETRES/#44-random-forest","text":"In R 12 , the caret and randomForest packages were used to train the random forest and make predictions. First, the ground truth was split into the training and the test datasets, with each class being split 70 % into the training set and 30 % into the test set. Health classes with not enough samples were completed with copies. Optimization of the RF was performed on the number of trees to develop and on the number of randomly sampled descriptors to test at each split. In addition, 5-fold cross-validation was used to ensure the use of different parts of the dataset. The search parameter space was from 100 to 1000 decision trees and from 4 to 10 descriptors as the default value is the square root of all descriptors, i.e. 7. RF was assessed using a custom metric, which is an adaptation of the false positive rate for the healthy class. It minimizes the amount of false healthy detections and of dead trees predicted as unhealthy (false unhealthy). It is called custom false positive rate (cFPR) in the text. It was preferred to have a model with more unhealthy predictions to control on the field, than missing unhealthy or dead trees. The cFPR goes from 0 (best) to 1 (worse). Table 2: Confusion matrix for the three health classes. Ground truth Healthy Unhealthy Dead Prediction Healthy A B C Unhealthy D E F Dead G H I According to the confusion matrix in Table 2, the cFPR is computed as follows: \\[\\begin{align} \\ cFPR = {(\ud835\udc35+\ud835\udc36+\ud835\udc39)\\over(\ud835\udc35+\ud835\udc36+\ud835\udc38+\ud835\udc39+\ud835\udc3b+\ud835\udc3c)}. \\ \\end{align}\\] In addition, the overall accuracy (OA), i.e. the ratio of correct predictions over all the predictions, and the sensitivity, which is, per class, the number of correct predictions divided by the number of samples from that class, are used. An ablation study was performed on descriptors to assess the contribution of the different data sources to the final performance. An \u201cimportant\u201d descriptor is having a strong influence on the increase in prediction errors in the case of random reallocation of the descriptor values in the training set. After the optimization, predictions for each DFT segments were computed using the best model according to the cFPR. The inferences were delivered as a thematic map with colors indicating the health state and hue indicating the fraction of decision trees in the RF having voted for the class (vote fraction). The purpose is to give a confidence information, with high vote fraction indicating robust predictions. Furthermore, the ground truth was evaluated for quantity and quality by two means: Removal of samples and its impact on the metric evaluation Splitting the training set into training subsets to evaluate on the original test set. Finally, after having developed the descriptors and the routine on high-quality data, we downgraded them to have resolutions similar to the ones of the swisstopo products (LiDAR: 20 pt/m 2 , orthoimage: 10 cm) and performed again the optimization and prediction steps. Indeed, the data acquisition was especially commissioned for this project and only covers the study area. If in the future the method should be extended, one would like to test if a lower resolution as the one of the standard national-wide product SWISSIMAGE could be sufficient.","title":"4.4 Random Forest"},{"location":"PROJ-HETRES/#5-results-and-discussion","text":"In this section, the results obtained during the processing of each data source into descriptors are presented and discussed, followed by a section on the random forest results.","title":"5 Results and discussion"},{"location":"PROJ-HETRES/#51-lidar-processing","text":"For the LiDAR data, the reader will first discover the aspect of beech trees in the LiDAR point cloud according to their health state as studied in the data exploration. Then, the segmentation results and the obtained LiDAR-based descriptors will be presented.","title":"5.1 LiDAR processing"},{"location":"PROJ-HETRES/#52-image-processing","text":"Boxplots and PCA are given to illustrate the results of the image processing exploration. As the masking of pixels below and above the affected height made no difference in the interpretation of the results, they are presented here with the height mask.","title":"5.2 Image processing"},{"location":"PROJ-HETRES/#53-vegetation-indices-from-satellite-imagery","text":"The t-test used to evaluate the ability of satellite indices to differentiate between health states are given in Appendices 6 and 7 . In the following two subsections, solely the significant tested groups are mentioned for understanding the RF performance.","title":"5.3 Vegetation indices from satellite imagery"},{"location":"PROJ-HETRES/#54-random-forest","text":"The results of the RF that are presented and discussed are: (1) the optimization and ablation study, (2) the ground truth analysis, (3) the predictions for the AOI and (4) the performance with downgraded data.","title":"5.4 Random Forest"},{"location":"PROJ-HETRES/#6-conclusion-and-outlook","text":"The study has demonstrated the ability of a random forest algorithm to learn from structural descriptors derived from LiDAR point clouds and from vegetation reflectance in airborne and satellite images to predict the health state of beech trees. Depending on the used datasets for training and test, the optimized full model including all descriptors reached an OA of 0.76 or of 0.79, with corresponding cFPR values of 0.13 and 0.14 respectively. These metrics are sufficient for the purpose of prioritizing beech tree felling. The produced map, with the predicted health state and the corresponding votes for the segments, delivers useful information for forest management. The cantonal foresters validated the outcomes of this proof-of-concept and explained how the location of affected beech trees as individuals or as groups are used to target high-priority areas. The full model highlighted the importance of the yearly variation of NDVI between a drought year (2018) and a normal year (2019). The airborne imagery showed good ability to predict dead trees, whereas confusion remained between healthy and unhealthy trees. The quality of the LiDAR point cloud segmentation may explain the limited performance of the LiDAR-based model. Finally, the model trained and tested on downgraded data gave an OA of 0.84 and a cFPR of 0.08. In this model, the airborne-based descriptors gained in importance. It was concluded that a 10 cm resolution may help the model by reducing the noise in the image. Outlooks for improving results include improving the ground truth representativeness of symptoms in the field and continuing research into descriptors for differentiating between healthy and unhealthy trees: For the image processing, suggestions are the integration of more statistics like the skewness and kurtosis of the reflectance as in Junttila et al. (2022) 15 . LiDAR-based descriptors had limited impact on the final results. To better valorize them for an application on beech trees, further research would be needed. Beside producing a cleaner segmentation and finding additional descriptors, it could consist in mixing the descriptors at the different resolutions and, with the help of the importance analysis, estimate at which resolution each descriptor brings the most information to the classification. The results showed the important contribution of vegetation indices derived from satellite imagery reflecting the drought year of 2018. If available, using historical image data of higher resolution to derive more descriptors could help improve individual tree health assessment. The possibility of further developments put aside, the challenge is now the extension of the methodology to a larger area. The simultaneity of the data is necessary to an accurate analysis. It has been shown that the representativeness of the ground truth has to be improved to obtain better and more stable results. Thus, for an extension to further areas, we recommend collecting additional ground truth measurements. The health state of the trees showed some autocorrelation that could have boosted our results and make them less representative of the whole forest. They should be more scattered in the forest. Furthermore, required data are a true orthophoto and a LiDAR point cloud for per-tree analysis. It should be possible to use an old LiDAR acquisition to produce a CHM and renounce to use LiDAR-based descriptors without degrading the performance of the model too much.","title":"6 Conclusion and outlook"},{"location":"PROJ-HETRES/#7-appendixes","text":"","title":"7 Appendixes"},{"location":"PROJ-HETRES/#71-simulation-plan-for-dft-parameter-tuning","text":"Table 8: parameter tuning for DFT. CHM cell size [m] Maxima smoothing Local maxima within search radius 0.50 0.1 (3.09632 + 0.00895 * h^2)/2 0.3 (1.7425 * h^0.5566)/2 0.5 (1.2 + 0.16 * h)/2 1.00 0.1 (3.09632 + 0.00895 * h^2)/2 0.3 (1.7425 * h^0.5566)/2 0.5 (1.2 + 0.16 * h)/2 1.50 0.1 (3.09632 + 0.00895 * h^2)/2 0.3 (1.7425 * h^0.5566)/2 0.5 (1.2 + 0.16 * h)/2 2.00 0.1 (3.09632 + 0.00895 * h^2)/2 0.3 (1.7425 * h^0.5566)/2 0.5 (1.2 + 0.16 * h)/2","title":"7.1 Simulation plan for DFT parameter tuning"},{"location":"PROJ-HETRES/#72-t-tests","text":"t-test were computed to evaluate the ability of descriptors to differentiate the three health states. A t-test value below 0.01 indicates that there is a significant difference between the means of two classes.","title":"7.2 t-tests"},{"location":"PROJ-HETRES/#8-sources-and-references","text":"Indications on software and hardware requirements, as well as the code used to perform the project, are available on GitHub: https://github.com/swiss-territorial-data-lab/proj-hetres/tree/main. Other sources of information mentioned in this documentation are listed here: OFEV et al. (\u00e9d.). La canicule et la s\u00e9cheresse de l\u2019\u00e9t\u00e9 2018. Impacts sur l\u2019homme et l\u2019environnement. Technical Report 1909, Office f\u00e9d\u00e9ral de l\u2019environnement, Berne, 2019. \u21a9 \u21a9 Beno\u00eet Grandclement and Daniel Bachmann. 19h30 - En Suisse, la s\u00e9cheresse qui s\u00e9vit depuis plusieurs semaines frappe durement les arbres - Play RTS. February 2023. URL: https://www.rts.ch/play/tv/19h30/video/en-suisse-la-secheresse-qui-sevit-depuis-plusieurs-semaines-frappe-durement-les-arbres?urn=urn:rts:video:13829524 (visited on 2023-03-28). \u21a9 Xavier Gauquelin, editor. Guide de gestion des for\u00eats en crise sanitaire . Office National des For\u00eats, Institut pour le D\u00e9veloppement Forestier, Paris, 2010. ISBN 978-2-84207-344-2. \u21a9 \u21a9 Philipp Brun, Achilleas Psomas, Christian Ginzler, Wilfried Thuiller, Massimiliano Zappa, and Niklaus E. Zimmermann. Large-scale early-wilting response of Central European forests to the 2018 extreme drought. Global Change Biology , 26(12):7021\u20137035, 2020. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/gcb.15360. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/gcb.15360 (visited on 2022-10-13), doi:10.1111/gcb.15360 . \u21a9 Run Yu, Youqing Luo, Quan Zhou, Xudong Zhang, Dewei Wu, and Lili Ren. A machine learning algorithm to detect pine wilt disease using UAV-based hyperspectral imagery and LiDAR data at the tree level. International Journal of Applied Earth Observation and Geoinformation , 101:102363, September 2021. URL: https://www.sciencedirect.com/science/article/pii/S0303243421000702 (visited on 2022-10-13), doi:10.1016/j.jag.2021.102363 . \u21a9 \u21a9 \u21a9 Pengyu Meng, Hong Wang, Shuhong Qin, Xiuneng Li, Zhenglin Song, Yicong Wang, Yi Yang, and Jay Gao. Health assessment of plantations based on LiDAR canopy spatial structure parameters. International Journal of Digital Earth , 15(1):712\u2013729, December 2022. URL: https://www.tandfonline.com/doi/full/10.1080/17538947.2022.2059114 (visited on 2022-12-07), doi:10.1080/17538947.2022.2059114 . \u21a9 \u21a9 \u21a9 Patrice Eschmann, Pascal Kohler, Vincent Brahier, and Jo\u00ebl Theubet. La for\u00eat jurassienne en chiffres, R\u00e9sultats et interpr\u00e9tation de l'inventaire forestier cantonal 2003 - 2005. Technical Report, R\u00e9publique et Canton du Jura, St-Ursanne, 2006. URL: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjHuZyfhoSBAxU3hP0HHeBtC4sQFnoECDcQAQ&url=https%3A%2F%2Fwww.jura.ch%2FHtdocs%2FFiles%2FDepartements%2FDEE%2FENV%2FFOR%2FDocuments%2Fpdf%2Frapportinventfor0305.pdf%3Fdownload%3D1&usg=AOvVaw0yr9WOtxMyY-87avVMS9YM&opi=89978449However . \u21a9 \u21a9 Agnieska Ptak. (5) Amplitude vs Reflectance \\textbar LinkedIn. June 2020. URL: https://www.linkedin.com/pulse/amplitude-vs-reflectance-agnieszka-ptak/ (visited on 2023-08-11). \u21a9 BFH-HAFL and BAFU. Waldmonitoring.ch : wcs_ndvi_diff_2016_2015, wcs_ndvi_diff_2017_2016, wcs_ndvi_diff_2018_2017, wcs_ndvi_diff_2019_2018, wcs_ndvi_diff_2020_2019, wcs_ndvi_diff_2021_2020, wcs_ndvi_diff_2022_2021. URL: https://geoserver.karten-werk.ch/wfs?request=GetCapabilities . \u21a9 Reik Leiterer, Gillian Milani, Jan Dirk Wegner, and Christian Ginzler. ExoSilva - ein Multi\u00ad-Sensor\u00ad-Ansatz f\u00fcr ein r\u00e4umlich und zeitlich hochaufgel\u00f6stes Monitoring des Waldzustandes. In Neue Fernerkundungs\u00adtechnologien f\u00fcr die Umweltforschung und Praxis , 17\u201322. Swiss Federal Institute for Forest, Snow and Landscape Research, WSL, April 2023. URL: https://www.dora.lib4ri.ch/wsl/islandora/object/wsl%3A33057 (visited on 2023-11-13), doi:10.55419/wsl:33057 . \u21a9 Matthew Parkan. Mparkan/Digital-Forestry-Toolbox: Initial release. April 2018. URL: https://zenodo.org/record/1213013 (visited on 2023-08-11), doi:10.5281/ZENODO.1213013 . \u21a9 R Core Team. R: A Language and Environment for Statistical Computing. 2023. URL: https://www.R-project.org/ . \u21a9 \u21a9 Olga Brovkina, Emil Cienciala, Peter Surov\u00fd, and P\u0159emysl Janata. Unmanned aerial vehicles (UAV) for assessment of qualitative classification of Norway spruce in temperate forest stands. Geo-spatial Information Science , 21(1):12\u201320, January 2018. URL: https://www.tandfonline.com/doi/full/10.1080/10095020.2017.1416994 (visited on 2022-07-15), doi:10.1080/10095020.2017.1416994 . \u21a9 N.K. Gogoi, Bipul Deka, and L.C. Bora. Remote sensing and its use in detection and monitoring plant diseases: A review. Agricultural Reviews , December 2018. doi:10.18805/ag.R-1835 . \u21a9 Samuli Junttila, Roope N\u00e4si, Niko Koivum\u00e4ki, Mohammad Imangholiloo, Ninni Saarinen, Juha Raisio, Markus Holopainen, Hannu Hyypp\u00e4, Juha Hyypp\u00e4, P\u00e4ivi Lyytik\u00e4inen-Saarenmaa, Mikko Vastaranta, and Eija Honkavaara. Multispectral Imagery Provides Benefits for Mapping Spruce Tree Decline Due to Bark Beetle Infestation When Acquired Late in the Season. Remote Sensing , 14(4):909, February 2022. URL: https://www.mdpi.com/2072-4292/14/4/909 (visited on 2023-10-27), doi:10.3390/rs14040909 . \u21a9","title":"8 Sources and references"},{"location":"PROJ-LANDSTATS/","text":"Using spatio-temporal neighbor data information to detect changes in land use and land cover \u00b6 Shanci Li (Uzufly) - Alessandro Cerioni (Canton of Geneva) - Clotilde Marmy (ExoLabs) - Roxane Pott (swisstopo) Proposed by the Swiss Federal Statistical Office - PROJ-LANDSTATS September 2022 to March 2023 - Published on April 2023 All scripts are available on GitHub . Abstract : From 2020 on, the Swiss Federal Statistical Office started to update the land use/cover statistics over Switzerland for the fifth time. To help and lessen the heavy workload of the interpretation process, partially or fully automated approaches are being considered. The goal of this project was to evaluate the role of spatio-temporal neighbors in predicting class changes between two periods for each survey sample point. The methodolgy focused on change detection, by finding as many unchanged tiles as possible and miss as few changed tiles as possible. Logistic regression was used to assess the contribution of spatial and temporal neighbors to the change detection. While time deactivation and less-neighbors have a 0.2% decrease on the balanced accuracy, the space deactivation causes 1% decrease. Furthermore, XGBoost, random forest (RF), fully convolutional network (FCN) and recurrent convolutional neural network (RCNN) performance are compared by the means of a custom metric, established with the help of the interpretation team. For the spatial-temporal module, FCN outperforms all the models with a value of 0.259 for the custom metric, whereas the logistic regression indicates a custom metrics of 0.249. Then, FCN and RF are tested to combine the best performing model with the model trained by OFS on image data only. When using temporal-spatial neighors and image data as inputs, the final integration module achieves 0.438 in custom metric, against 0.374 when only the the image data is used. It was conclude that temporal-spatial neighbors showed that they could light the process of tile interpretation. 1. Introduction \u00b6 The introduction presents the background and the objectives of the projects, but also introduces the input data and its specific features. 1.1 Background \u00b6 Since 1979, the Swiss Federal Statistical Office (FSO) provides detailed and accurate information on the state and evolution of the land use and the land cover in Switzerland. It is a crucial tool for long-term spatial observation. With these statistics, it is possible to determine whether and to what extent changes in land cover and land use are consistent with the goals of Swiss spatial development policies ( FSO ). Figure 1: Visualization of the land cover and land use classification. Every few years, the FSO carries out a survey on aerial or satellite images all over Switzerland. A grid with sample points spaced 100 meters apart overlays the images, providing 4.1 million sample points on which the statistics are based. The classification of the hectare tile is assigned on the center dot, as shown in Figure 1. Currently, a time series of four surveys is accessible, based on aerial images captured in the following years: 1979\u20131985 (1st survey, 1985) 1992\u20131997 (2nd survey, 1997) 2004\u20132009 (3rd survey, 2009) 2013\u20132018 (4th survey, 2018) The first two surveys of the land statistics in 1979 and 1992 were made by visual interpretation of aerial analogue photos using stereoscopes. Since the 2004 survey, the methodology was deeply renewed, in particular through the use of digital aerial photographs, which are observed stereoscopically on workstations using specific photogrammetry software . A new nomenclature (2004 NOAS04) has also been introduced in 2004 which systematically distinguishes 46 land use categories and 27 land cover categories. A numerical label from this catalogue is assigned to each point by a team of trained interpreters. The 1979 and 1992 surveys have been revised according to the nomenclature NOAS04, so that all readings (1979, 1992, 2004, 2013) are comparable. On this page you will find the geodata of the Land Use Statistics at the hectare level since 1979, as well as documentation on the data and the methodology used to produce these data. Detailed information on basic categories and principal domains can be found in Appendix 1 . 1.2 Objectives \u00b6 It is known that manual interpretation work is time-consuming and expensive. However, in a feasibility study , the machine learning technique showed great potential capacity to help speed up the interpretation, especially with deep learning algorithms. According to the study, 50% of the estimated interpretation workload could be saved. Therefore, FSO is currently carrying out a project to assess the relevance of learning and mastering the use of artificial intelligence (AI) technologies to automate (even partially) the interpretation of aerial images for change detection and classification. The project is called Area Statistics Deep Learning (ADELE). FSO had already developed tools for change detection and multi-class classification using the image data. However, the current workflow does not exploit the spatial and temporal dependencies between different points in the surveys. The aim of this project is therefore to evaluate the potential of spatial-temporal neighbors in predicting whether or not points in the land statistics will change class. The methodolgy will be focused on change detection, by finding as many unchanged tiles as possible (automatized capacity) and miss as few changed tiles as possible. The detailed objectives of this project are to: explore the internal transformation patterns of tile classification from a data analytics perspective build a prototype that performs change detection for tiles in the next survey help the domain experts to integrate the prototype within the OFS workflow 1.3 Input data \u00b6 The raw data delivered by the domain experts is a table with 4'163'496 records containing the interpretation results of both land cover and land use from survey 1 to survey 4. An example record is shown in Table 1 and gives following information: RELI: 8-digit number composed by the EAST hectare number concatenated with the NORTH hectare number EAST: EAST coordinates (EPSG:2056) NORTH: NORTH coordinates (EPSG:2056) LUJ: Land Use label for survey J LCJ: Land Cover label for survey J training: value 0 or 1. A value of 1 means that the point can be included in the training or validation set Table 1: Example record of raw data delivered by the domain experts. RELI EAST NORTH LU4* LC4 LU3 LC3 LU2 LC2 LU1 LC1 training 74222228 2742200 1222800 242 21 242 21 242 21 242 21 0 75392541 2753900 1254100 301 41 301 41 301 41 301 41 0 73712628 2737100 1262800 223 46 223 46 223 46 223 46 0 * The shortened LC1/LU1 to LC4/LU4 will be used to simplify the notation of Land Cover/Use of survey 1 to survey 4 in the following documentation. For machine learning, training data quality has strong influence on model performance. With the training label, domain experts from FSO selected data points that are more reliable and representative. These 348'474 tiles and their neighbors composed the training and testing dataset for machine learning methodology. 2. Exploratory data analysis \u00b6 As suggested by domain experts, exploratory data analysis (EDA) is of significance to understand the data statistics and find the potential internal patterns of class transformation. The EDA is implemented from three different perspectives: distribution, quantity and probability. With the combination of the three, we can find that there do exist certain trends in the transformation of both land cover and land use classes. For the land cover, main findings are: distribution: most surface of Switzerland is covered by vegetation or forest, bare land and water areas take up a considerable portion as well, artificial areas take up a small portion of the land cover probability: transformation between some classes had never happened during the past four decades, all classes of land cover are most likely to keep their status rather than to change quantity: there are some clear patterns in quantitative changes Open Forest goes to Closed Forest Brush Meadows go to Shrubs Garden Plants go to Grass and Herb Vegetation Shrubs go to Closed Forest Cluster of Tree goes to Grass and Herb Vegetation For the land use, main findings are: distribution: agricultural and forest areas are the main land uses, unused area also stands out from others classes. probability: transformation between some classes had never happened during the past four decades; on the contrary, construction site, non-exploited urban areas and forest areas tend to change to other classes rather than keep unchanged quantity: the most transformations happened inside the superclasses of Arable and Grassland and of Forest not Agricultural . Readers particularly interested by the change detection methods can directly go to Section 3 ; otherwise, readers are welcomed to read the illustrated and detailed EDA given hereafter. 2.1 Distribution statistics \u00b6 Figure 2: Land cover distribution plot. Figure 3: Land use distribution plot. First, a glance at the overall distribution of land cover and land use is shown in Figure 2 and 3. The X-axis is the label of each class while the Y-axis is the number of tiles in the Log scale. The records of the four surveys are plotted in different colors chronologically. By observation, some trends can be found across the four surveys. Artificial areas only take up a small portion of the land cover (labels between 10 to 20), while most surface of Switzerland is covered by vegetation or forest (20 - 50). Bare land (50 - 60) and water areas (60 - 70) take up a considerable portion as well. For land use, it is obvious that the agricultural (200 - 250) and forest (300 - 310) areas are the main components while the unused area (421) also stands out from others. Most classes kept the same tendency during the past 40 years. There are 11 out of 27 land cover classes and 32 out of 46 land use classes which are continuously increasing or decreasing all the time. Especially for land use, compared with 10 classes rising with time, 22 classes dropping, which indicates that there is some transformation patterns that caused the leakage from some classes to those 10 classes. We will dive into these patterns in the following sections. 2.2 Quantity statistics \u00b6 The data are explored in a quantitative way by three means: visualization of transformations between 2 surveys visualization of sequential transformation over time identifying patterns and most occured transformations in different periods. 2.2.1 Transformation visualization \u00b6 Figure 4: Land cover transformation from 1985 to 2018. The analysis of the transformation patterns in quantitative perspective has been implemented in the interactive visualization in Figure 4. The nodes of the same color belong to a common superclass (principle domain). The size of the node represents the number of tiles for the class and the width of the link reflects the number of transformations in log scale. When hanging over your mouse on these elements, detailed information such as the class label code and the number of transformations will be shown. Clicking the legend will enable you to select the superclasses in which the transformation should be analyzed. Pre-processing had been done for the transformation data. To simplify the graph and stand out the major transformations, links with the number of transformations less than 0.1% of the total were removed from the graph. The filter avoids too many trivial links (580) connecting nearly all the nodes, leaving significant links (112) only. The process filtered 6.5% of the transformations in land cover and 11.5% in land use, which is acceptable considering it is a quantitative analysis focusing on the major transformation. 2.2.2 Sequential transformation visualization \u00b6 Figure 5: Land cover sequential transformation. In addition to the transformation between the 2 surveys, the sequential transformation over time had also been visualized. Here, a similar filter is implemented as well to simplify the result and only tiles that had changed during the 4 surveys are visualized. In Figure 5, the box of a class in column 1985 (survey 1) is composed of different colors while the box of a class in column 2018 (survey 4) only has one color. This is caused by the color of the link showing a kind of sequential transformation. The different colors of a class in the first column show the end status (classification) of the tiles in survey 4. There are some clear patterns we can find in the graph. For example, the red lines point out four diamond patterns in the graph. The diamond pattern with the edges in the same color illustrates the continuous trend that one class of tiles is transferred to the other class. In this figure, it is obvious that the Tree Clusters are degraded to the Grass and Herb , while Grass and Herb are transferred to the Consolidated Surfaces , showing the expansion of urban areas and the destruction to the natural environment. 2.2.3 Quantity statistics analysis \u00b6 Comparing the visualization of different periods, a constant pattern has been spotted in both land cover and land use. For example in land cover, the most transformation happened between the superclass of Tree Vegetation and Brush Vegetation . Also, a visible bi-direction transformation between Grass and Herb Vegetation and Clusters of Trees is witnessed. Greenhouses, wetlands and reedy marshes hardly have edges linked to them all over time, which illustrates that either they have a limited area or they hardly change. A similar property can also be captured in land use classes. The most transformation happened inside the superclass of Arable and Grassland and Forest not Agricultural . Also, a visible transformation from Unused to Forest is highlighted by others. Combining the findings above, it is clear that the transformation related to the Forest and Vegetation is the main part of the story. The forest shrinks or expands over time, changing to shrubs and getting back later. The Arable and Grassland keeps changing based on the need for agriculture or animal husbandry during the survey year. Different kinds of forests interconvert with each other which is a rational natural phenomenon. 2.3 Probability matrix \u00b6 The above analysis demonstrates the occurrence of transformation with quantitative statistics. However, the number of tiles for different classes is not a uniform distribution as shown in the distribution analysis. The largest class is thousands of times more than the smallest one. Sometimes, the quantity of a transformation is trivial compared with the majority, but it is caused by the small amount of tiles for the class. Even if the negligible class would not have a significant impact on the performance of change detection, it is of great importance to reveal the internal transformation pattern of the land statistics and support the multi-class classification task. Therefore, the probability analysis is designed as below: The probability analysis for land cover/use contains 3 parts: The probability matrix presents the probability of transformation from the source class (Y-axis) to the destination class (X-axis). The value of the probability is illustrated by the depth of the color in the log scale. The distribution of the probability that a class does not change, which is a more detailed visualization of the diagonal value of the probability matrix. The distribution of the maximum probability that a class changes to another certain class. This is a deeper inspection to look for a fixed transformation pattern that exists between two classes. The probability is calculated by the status change between the beginning survey and the end survey stated in the figure title. For example Figure 6 is calculated by the transformation between survey 1 and survey 4, without taking into account possible intermediate changes in survey 2 and 3. 2.3.1 Land cover analysis \u00b6 Figure 6: Land cover probability matrix from LC1 to LC4. The first information that the matrix provides is the blank blocks with zero probability of conversion. This discloses that transformation between some classes had never happened during the past four decades. Besides, all the diagonal blocks are with distinct color depth, illustrating that all classes of land cover are most likely to keep their status rather than to change. Another evident features of this matrix are the columns with destination classes Grass and Herb Vegetation (21) and Closed Forest (41) . There are a few classes such as Shrubs (31) , Fruit Tree (33) , Garden Plants (35) and Open Forest (44) which have a noticeable trend to convert to these two classes, which is partially consistent with the quantity analysis while revealing some new findings. Figure 7: Land cover transformation probability without change. When it comes to the refined visualization of the diagonal blocks, it is clear that half of the classes have more than an 80% probability of not transforming, while the minimum one only has about 35%. This is caused by the accumulation of the 4 surveys together which lasts 40 years. For a single decade, as the first 3 sub-graphs of Figure 23 in the Appendix A2.1 , the majority are over 90% probability and the minimum rises to 55%. Figure 8: Maximum transformation probability to a certain class when land cover changes. For those transformed tiles, the maximum probability of converting into another class is shown in Figure 8. This graph together with the matrix in Figure 6 can point out the internal transformation pattern. The top 5 possible transformations between the first survey and the forth survey are: 1. 38% Open Forest (44) --> Closed Forest (41) 2. 36% Brush Meadows (32) --> Shrubs (31) 3. 34% Garden Plants (35) --> Grass and Herb Vegetation (21) 4. 29% Shrubs (31) --> Closed Forest (41) 5. 26% Cluster of Tree (47) --> Grass and Herb Vegetation (21) In this case, the accumulation takes effect as well. For a single decade, the maximum probability decreases to 25%, but the general distribution of the probability is consistent between the four surveys according to Figure 24 in the Appendix A2.1 . 2.3.2 Land use analysis \u00b6 Figure 9: Land use probability matrix from LU1 to LU4. The land use probability matrix has different features compared with the land cover probability matrix. Although most diagonal blocks are with the deepest color depth, there are two areas highlighted by the red line presenting different statistics. The upper area is related to Construction sites (146) and Unexploited Urban areas (147) . These two classes tend to change to other classes rather than keep unchanged, which is reasonable since the construction time of buildings or infrastructures hardly exceeds 10 years. This is confirmed by the left side of the red-edged rectangular block, which has a deeper color depth. This illustrates that construction and unexploited areas ended in the Settlement and Urban Areas (superclass of 100 - 170). The lower red area account for the pattern concerning the Forest Areas (301 -304) . The Afforestation (302) , Lumbering areas (303) and Damaged Forest (304) would thrive and recover between the surveys, and finally become Forest (301) again. Figure 10: Land use transformation probability without change. Figure 10 further validates the assumptions. With most classes with a high probability of not changing, there are two deep valleys for classes 144 to 147 and 302 to 304, which are exactly the results of the stories mentioned above. Figure 11: Maximum transformation probability to a certain class when land use changes. Figure 11 tells the difference in the diversity of transformation destination. The construction and unexploited areas would turn into all kinds of urban areas, with more than 95% changed and the maximum probability to a fixed class is less than 35%. While the Afforestation , Lumbering areas and Damaged Forest returned to Forest with a probability of more than 90%, the transformation pattern within these four classes is fairly fixed. The distribution statistics, the quantity statistics and the probability matrices have shown to validate and complement each other during the exploratory analysis of the data. 3. Methods \u00b6 The developed method should be integrated in the OFS framework for change detection and classification of land use and land cover illustrated in Figure 12. The interesting parts for this project are highlighted in orange and will be presented in the following. Figure 12: Planned structure in FSO framework for final prediction. Figure 12 shows on the left the input data type in the OFS framework. The current project work on the LC/LU neighbors introduced in Section 1.3 . The main objective of the project - to detect change by means of these neighbors - is the temporal-spatial module in Figure 12. As proposed by the feasibility study, FSO had implement studies on change detection and multi-class classification on swisstopo aerial images time series to accelerate the efficiency of the interpretation work. The predicted LC and LU probabilities and information obtained by deep learning are defined as the image-level module. In a second stage of the project, the best model for combining the temporal-spatial and the image-level module outputs is explored to evaluate the gain in performance after integration of the spatial-temporal module in the OFS framework. This is the so-called integration module. The rest of the input data will not be part of the performance evaluation. 3.1 Temporal-spatial module \u00b6 Figure 13: Time and space structure of a tile and its neighbors. The input data to the temporal-spatial module will be the historical interpretation results of the tile to predict and its 8 neighbors. The first three surveys are used as inputs to train the models while the forth survey serves as the ground truth of the prediction. This utilizes both the time and space information in the dataset like depicted in Figure 13. During the preprocessing, the tiles with missing neighbors were discarded from the dataset to keep the data format consistent, which is insignificant (about 400 out of 348'868). The determination of change is influenced by both land cover and land use. When there is a disparity between the classifications in the fourth survey and the third one for a specific tile, it is identified as altered (positive) in change detection. The joint prediction of land cover and land use is based on the assumption that a correlation may exist between them. If the land cover of a tile undergoes a change, it is probable that its land use will also change. Moreover, the tile is assigned numerical labels. Nevertheless, the model does not desire a numerical association between classes, even when they belong to the same superclass and are closely related. To address this, we employ one-hot encoding, which transforms a single land cover column into 26 columns, with all values set to '0' except for one column marked as '1' to indicate the class. Despite increasing the model's complexity with almost two thousand input columns, this is a necessary trade-off to eliminate the risk of numerical misinterpretation. 3.2 Change detection \u00b6 Usually, spatial change detection is a remote sensing application performed on aerial or satellite images for multiclass change detection. However, in this project, a table of point records is used for binary classification into changed and not changed classes. Different traditional and new deep learning approach have been explored to perform this task. The motivations to use them are given hereinafter. An extended version of this section with detailed introduction to the machine learning models is available in Appendix A3 . Three traditional classification models, logistic regression (LR) , XGBoost and random forest (RF) , are tested. The three models represent the most popular approaches in the field - the linear, boosting, and bagging models. In this project, logistic regression is well adapted because it can explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables. Concerning XGBoost, it has the advantage that weaker classifiers are introduced sequentially to focus on the areas where the current model is struggling, while misclassified observations would receive extra weight during training. Finally, in random forest, higher accuracy may be obtained and overfitting still avoided through the larger number of trees and the sampling process. Beyond these traditional popular approaches, another two deep learning algorithms are explored as well: fully connected network and convolutional recurrent neural network . Different from the traditional machine learning algorithms, deep learning does not require manual feature extraction or engineering. Deep neural networks capture the desired feature with back-propagation optimization process. Besides, these deep neural networks have some special design for temporal or spatial inputs, because it is assumed that the internal pattern of the dataset would match with the network structure and the model will have better performance. 3.2.1 Focal loss \u00b6 Deep neural networks need differentiable loss function for optimization training. For this project with imbalanced classification task, the local loss was chosen rather than the traditional (binary) cross entropy loss . \\[\\begin{align} \\\\ FL(p_t) = -{\\alpha}(1-p_t)^{\\gamma} \\ log(p_t) \\\\ \\end{align}\\] where \\(p_t\\) is the probability of predicting the correct class, \\(\\alpha\\) is a balance factor between positive and negative classes, and \\(\\gamma\\) is a modulation factor that controls how much weight is given to examples hard to classify. Focal loss is a type of loss function that aims to solve the problem of class imbalance in tasks like classification. Focal loss modifies the cross entropy loss by adding a factor that reduces the loss for easy examples and increases the loss for examples hard to classify. This way, focal loss focuses more on learning from misclassified examples. Compared with other loss functions such as cross entropy, binary cross entropy and dice loss, some advantages of focal loss are: it can reduce the dominance of well-classified examples and prevent them from overwhelming the gradient it can adaptively adjust the weight of each example based on its difficulty level it can improve the accuracy and recall of rare classes by adjusting \\(\\alpha\\) to give more weight to them \\(\\alpha\\) should be chosen based on the class frequency. A common choice is to set \\(\\alpha_t\\) is 1 minus the frequency of class t. This way, rare classes get more weight than frequent classes. \\(\\gamma\\) should be chosen based on how much you want to focus on hard samples. A larger gamma means more focus on hard samples, while a smaller gamma means less focus. The original paper suggested that a gamma equal to 2 is an effective value for most cases. 3.2.2 Fully connected network (FCN) \u00b6 Fully connected network (FCN) in deep learning is a type of neural network that consists of a series of fully connected layers. The major advantage of fully connected networks for this project is that they are structure agnostic. That is, no special assumptions need to be made about the input (for example, that the input consists of images or videos). A disadvantage of FCN is that it can be very computationally expensive and prone to overfitting due to the large number of parameters involved. Another disadvantage is that it does not exploit any spatial or temporal structure in the input data, which can lead to poor performance for some tasks. For implementation, the FCN employ 4 hidden layers (2048, 2048, 1024, 512 neurons respectively) besides the input and output layer. Relu activation function are chosen before the output layer while sigmoid function is applied at the end to scale the result to probability representation. 3.2.3 Convolutional recurrent neural network (ConvRNN) \u00b6 Convolutional recurrent neural network (ConvRNN) is a type of neural network that combines convolutional neural networks (CNNs) and recurrent neural networks (RNNs). CNNs are good at extracting spatial features from images, while RNNs are good at capturing temporal features from sequences. ConvRNNs can be used for tasks that require both spatial and temporal features as it is meant to be achieved in this project. Furthermore, the historical data of the land cover and land use can be translated to some synthetic images. The synthetic images use channels to represent sequence of surveys and the pixel value represents ground truth label. Thus, the spatial relationship of the neighbour tiles could be extracted from the data structure with the CNN. Figure 14: Convolutional Recurrent Neural Network Pipeline. In this project, we explored ConvRNN with structure shown in Figure 14. The sequence of surveys are treated as sequence of input \\(x^t\\) . With the recurrent structure and hidden states \\(h^t\\) to transmit information, the temporal information could be extracted. Different from the traditional RNN, the function \\(f\\) in hidden layers of the recurrent structure use convolutional operation instead of matrix computation and an additional CNN module is applied to the sequence output to detect the spatial information. 3.3 Performance metric \u00b6 Once the different machine learning models are trained for the respective module, comparison has to be made on the test set to evaluate their performance. This will be performed with the help of metrics. 3.3.1 Traditional metrics \u00b6 As discovered in the distribution analysis , the dataset is strongly unbalanced. Some class is thousands of others. This is of importance to change detection. Moreover, among 348'474 tiles in the dataset, only 58'737 (16.86%) tiles have changed. If the overall accuracy is chosen as the performance metric, the biased distribution would make the model tend to predict everything unchanged. In that case, the accuracy of the model can achieve 83.1%, which is a quite high value achieved without any effort. Therefore, avoiding the problem during the model training and selecting the suitable metric that can represent the desired performance are the initial steps. The constant model is defined as a model which predicts the third survey interpretation values as the prediction of the forth survey. In simple words, the constant model predicts that everything does not change. By this definition, we can calculate all kinds of metrics for other change detection models and compare them to the constant model metrics to indentify models with better performance. For change detection with the constant model, the performance is as below: Figure 15: Confusion matrix of constant distribution as prediction: TP=True Positive, TN=True Negative, FP=False Positive, FN=False Negative. Table 2: Metrics evaluation for constant model. Models Accuracy Balanced Accuracy Precision (PPV/NPV) Recall (TPR/TNR) F1-score Constant 0.831 0.500 Positive Negative 0.000 0.831 0.000 1.000 0.000 0.907 Definition of abbreviations: For positive case: Precision = TP / (TP + FP) Recall = TP / (TP + FN) (Positive predictive value, PPV) (True positive rate, TPR) For negative case: Precision = TN / (TN + FN) Recall = TN / (TN + FP) (Negative predictive value, NPV) (True negative rate, TNR) Detailed metrics definition can be found here The aim of the change detection is to predict the tiles of high confidence that do not change, so that the interpretation from the last survey can be used directly. However, the negative-case-related metrics above and the accuracy are not suitable for the present task because of the imbalance nature of the problem. Indeed, they indicate a high performance for the constant model, which we know is not depicting the reality, because of the large amount of unchanged tiles. After the test, the balanced accuracy which is the mean of the true positive rate and the true negative rate is considered a suitable metric for change detection. 3.3.2 Specific weighted metric for change detection \u00b6 In theory, true negative rate is equivalent to 1 minus false positive rate. Optimizing balanced accuracy typically results in minimizing the false positive rate. However, our primary objective is to reduce false negative instances (i.e., changed cases labeled as unchanged), while maximizing the true positive rate and true negative rate. False positives are of lesser concern, as they will be manually identified in subsequent steps. Consequently, balanced accuracy does not adequately reflect the project's primary objective. With the help of FSO interpretation team, an additionnal, specific metric targeting on the objective has been designed to measure the model performance. Reminding the Exploratory Data Analysis , some transformation patterns have been found and applied in this metric as well. Figure 16: Workflow with multiple input to define a weighted metric. As depicted in Figure 16, the FSO interpretation team designed two filters to derive a custom metric. The first filter combines inputs from all the possible modules (in this case, the image-level and temporal-spatial modules). The input modules give the probability of change detection or multi-class classification prediction with confidence. As prediction from modules might be different, the first filter will set the final prediction of a tile as positive if any input module gives a positive prediction. Here the threshold to define positive is a significant hyperparameter to finetune. The Weights Matrix defined by the human experts is the core of the entire metric. Based on professional experience and observation of EDA, the experts assigned different weights to all possible transformations. These weights demonstrate the importance of the transformation to the overall statistics. Besides, part of the labels is defined as Small Classes , which means that these classes are negligible or we do not consider them in this study. The second filter removes all the transformations related to the small classes and apply the weights matrix to all the remained tiles. Finally, the weighted metric is calculated as below: \\[\\begin{align} Automatized \\ Tiles &= {\\#Predicted \\ Negatives} \\\\ \\\\ Automatized \\ Capacity &= {{\\#Automatized \\ Tiles} \\over {\\#Negatives \\ (ground \\ truth)}} \\\\ \\\\ Missed \\ Weighted \\ Changed \\ Ratio &= {{\\sum \\{Missed \\ Change \\times Weight\\}} \\over {\\sum \\{All \\ Change \\times Weight\\}}} \\\\ \\\\ Weighted \\ Metric &= Automatized \\ Capacity \\times (0.1 - Missed \\ Weighted \\ Changed \\ Ratio) \\ / \\ 0.1 \\end{align}\\] From now on, we will still calculate metrics like balanced accuracy and recall for reference and analysis; however, the Weighted Metric is the decisive metric for model selection. 3.4 Training and testing plan \u00b6 Introduced in Section 1.3 , the 348'474 tiles with temporal-spatial information are selected for training. The 80%-20% split is applied to the selected tiles to create the train set and the test set respectively. Adam optimizer and multi-step learning rate scheduler are deployed for better convergence. For the temporal-spatial module, metrics for ablation study on the descriptors and descriptor importance are first computed. The descriptor importance is taken from XGBoost simulations. The ablation study is performed with the logistic regression and consists of training the model with: 8 neighbors and time activated, \"baseline\" 4 neighbors (northern, western, southern, eastern neighbors) no spatial-neighbors, \"space deactivate\" no temporal-neighbors, \"time deactivate\" Then, the baseline configuration is used to trained the traditional algorithms and the deep learning ones. Metrics are compared and the best performing models are kept for the integration module. Finally, the performance of several configurations are compared for the integration module. direct outputs of the image-level module direct outputs of the best performing temporal-spatial module outputs of the best performing temporal-spatial module, followed by RF training for the integration module. outputs of the best performing temporal-spatial module, followed by FCN training for the integration module. The extra information gain from the temporal-spatial module will be studied by comparison with image-level performance only. The image-level data contain multi-class classification prediction and its confidence. We can calculate the change probability according to the probability of each class. Therefore, the weighted metric can also be applied at the image-level only. Then, the RF and FCN are tested for the integration module which combines various types of information sources. 4. Experiments \u00b6 The Experiments section covers the results obtained when performing the planned simulations for the temporal-spatial module and the integration module. 4.1 Temporal-spatial module \u00b6 4.1.1 Feature engineering (time and space deactivation) \u00b6 In the temporal-spatial module, the studied models take advantages of both the space (the neighbors) and the time (different surveys) information as introduced in Section 3.1 . Ablation study is performed here to acknowledge the feature importance and which information really matters in the model. Table 3: Model metrics for ablation plan. Logistic Regression Best threshold Accuracy Balanced Accuracy Precision (PPV/NPV) Recall (TPR/TNR) F1-score Time deactivate 0.515 0.704 0.718 Positive Negative 0.330 0.930 0.740 0.696 0.457 0.796 Space deactivate 0.505 0.684 0.711 Positive Negative 0.316 0.930 0.752 0.670 0.445 0.779 4 neighbors 0.525 0.707 0.718 Positive Negative 0.332 0.929 0.734 0.701 0.458 0.799 Baseline* 0.525 0.711 0.720 Positive Negative 0.337 0.928 0.734 0.706 0.462 0.802 *Baseline: 8 neighbors with time and space activated Table 3 reveals the performance change when time or space information is totally or partially (4-neighbors instead of 8-neighbors) deactivated. While time deactivation and less-neighbors hardly have an influence on the balanced accuracy (only 0.2% decrease), the one for space deactivation decreased by about 1%. The result demonstrates that space information is more vital to the algorithm than time information, even though both have a minor impact. Figure 17: Feature importance analysis comparasion of 4 (left) and 8 (right) neighbors. Figure 18: Feature importance analysis comparasion of time (left) and space (right) deactivation. Figure 17 and 18 give the feature importance analysis from the XGBoost model. The sum of feature importance from variables all related to the tile itself and its neighbors are plotted in the charts. The 4-neighbor and 8-neighbor have similar capacities but the importance of neighbors for the latter is much more than for the former. This is caused by the number of variables. With more neighbors, the number of variables related to the neighbor increases and the sum of the feature importance grows as well. The feature importance illustrates the weight assigned to the input variables. From Figure 17, it is obvious that the variable related to the tile itself from past surveys is the most critical. Furthermore, the more recent, the more important. The neighbor on the east and west (neighbors 3 and 4) are more significant than others and even more than the land use of the tile in the first survey. In conclusion, the feature importance is not evenly distributed. However, the ablation study shows that the model with all the features as input achieved the best performance. 4.1.2 Baseline models with probability or tree models \u00b6 Utilizing the time and space information from the neighbors, three baseline methods with probability or tree model are fine-tuned. The logistic regression outperforms the other two, achieving 72.0% balanced accuracy. As result, more than 41'000 tiles are correctly predicted as unchanged while only about 3'000 changed tiles are missed as they are the false negatives. Detailed metrics of each method are listed in Table 4. Table 4: Performance metrics for traditional machine learning simulation of spatial-temporal model. Models Accuracy Balanced Accuracy Precision (PPV/NPV) Recall (TPR/TNR) F1-score Logistic Regression 0.711 0.720 Positive Negative 0.337 0.928 0.734 0.706 0.462 0.802 Random Forest 0.847 0.715 Positive Negative 0.775 0.849 0.134 0.992 0.229 0.915 XGBoost 0.837 0.715 Positive Negative 0.533 0.869 0.297 0.947 0.381 0.906 Constant 0.830 0.500 Positive Negative 0.000 0.830 0.000 1.000 0.000 0.907 21: Metric changes with different threshold for logistic regression. Besides the optimal performance with balanced accuracy, logistic regression can manually adjust its ability by changing the decision threshold as its output is the probability to change instead of prediction only. For example, we can trade off between the true positive rate and the negative predictive value. As shown in Figure 19, if we decrease the threshold probability, the precision of the negative case (NPV) will increase while the true negative rate goes down. This means more tiles need manual checks; however, fewer changed tiles are missed. Considering both the performance and the characteristics, Logistic Regression is selected as the baseline model. 4.1.3 Neural networks: FCN and ConvRNN \u00b6 FCN and ConvRNN work differently: FCN does not have special structure designed for temporal-spatial data while ConvRNN has specific designation for time and space information respectively. To study these two extreme situations, we explored their performance and compared with the logistic regression which is the best of the baseline models. Table 5: Performance metrics for deep machine learning simulation of spatial-temporal model Models Weighted Metric Raw Metric Balanced Accuracy Recall Missed Changes Missed Changes Ratio Missed Weighted Changes Missed Weighted Changes Ratio Automatized Points Automatized Capacity LR (Macro)* 0.237 0.197 0.655 0.954 349 0.046 18995 0.035 14516 0.364 LR (BA)* 0.249 0.207 0.656 0.957 326 0.043 17028 0.031 14478 0.363 FCN 0.259 0.21 0.656 0.958 322 0.042 15563 0.029 14490 0.363 ConvRNN 0.176 0.133 0.606 0.949 388 0.051 19026 0.035 10838 0.272 Constant -10.717 -10.72 0.500 0.000 7607 1.000 542455 1.00 47491 1.191 *Macro: the model is trained with Macro F1-score ; BA: the model is trained with Balanced Accuracy. As a result of its implementation (see Section 3.2.2 ), FCN outperforms all the models with a value of 0.259 for the weighted metric, slightly above the logistic regression with 0.249. ConvRNN does not perform well even if we have increased the size of hidden states to 1024. Following deliberation, we posit that the absence of one-hot encoding during the generation of synthetic images may be the cause, given that an increased number of channels could substantially explodes computational expenses. Since the ground truth label is directly applied to pixel values, the model may attempt to discern numerical relationships among distinct pixel values that, in reality, do not exist. This warrants further investigation in subsequent phases of our research. 4.2 Integration module \u00b6 Table 5 compares the performance of FCN or image-level only to several configurations for the integration module. Table 5: Performance metrics for the integration model in combination with a spatial-temporal model. Model Weighted Metric Raw Metric Balanced Accuracy Recall Missed Changes Missed Changes Ratio Missed Weighted Changes Missed Weighted Changes Ratio Automatized Points Automatized Capacity FCN 0.259 0.210 0.656 0.958 322 0.042 15563 0.029 14490 0.363 image-level 0.374 0.305 0.737 0.958 323 0.042 15735 0.029 20895 0.524 LR + RF 0.434 0.372 0.752 0.969 241 0.031 10810 0.020 21567 0.541 FCN + RF 0.438 0.373 0.757 0.968 250 0.032 11277 0.021 22010 0.552 FCN + FCN 0.438 0.376 0.750 0.970 229 0.030 9902 0.018 21312 0.534 LR + FCN 0.423 0.354 0.745 0.967 255 0.033 10993 0.020 21074 0.528 The study demonstrates that the image-level contains more information related to change detection compared with temporal-spatial neighbors ( FCN row in the Table 5). However, performance improvement from the temporal-spatial module when combined with image-level data, achieving 0.438 in weighted metric in the end ( FCN+RF and FCN+FCN ). Regarding the composition of different models for the two modules, FCN is proved to be the best one for the temporal-spatial module, while RF and FCN have similar performance in the integration module. The choice of integration module could be influenced by the data format of other potential modules. This will be further studied by the FSO team. 5. Conclusion and outlook \u00b6 This project studied the potential of historical and spatial neighbor data in change detection task for the fifth interpretation process of the areal statistic of FSO. For the evaluation of this specific project, a weighted metric was defined by the FSO team. The temporal-spatial information was proved not to be as powerful as image-level information which directly detects change within visual data. However, an efficient prototype was built with 6% performance improvement in weighted metric combining the temporal-spatial module and the image-level module. It is validated that integration of modules with different source information can help to enhance the final capacity of the entire workflow. The next research step of the project would be to modify the current implementation of ConvRNN. If the numerical relationship is removed from the synthetic image data, ConvRNN should have similar performance as FCN theoretically. Also, CNN is worth trying to validate whether the temporal pattern matters in this dataset. Besides, by changing the size of the synthetic images, we can figure out how does the number of neighbour tiles impact the model performance. Appendix \u00b6 A1. Classes of land cover and land use \u00b6 Figure 20: Land Cover classification labels. Figure 21: Land Use classification labels. A2. Probability analysis of different periods \u00b6 A2.1 Land cover \u00b6 Figure 22: Land cover probability matrix. Figure 23: Land cover transformation probability without change. Figure 24: Maximum transformation probability to a certain class when land cover changes. A2.2 Land use \u00b6 Figure 25: Land use probability matrix. Figure 26: Land use transformation probability without change. Figure 27: Maximum transformation probability to a certain class when land use changes. A3 Alternative version of Section 3.2 \u00b6 A3.1 Logistic regression \u00b6 Logistic regression is a kind of Generalized Linear Model . It is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, logistic regression is a predictive analysis in this project. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables. A3.2 XGBoost & random forest \u00b6 Figure 28: Comparison of boosting and bagging models. XGBoost and Random Forest both originate from the tree model, while one is the sequential variant and the other is the parallel variant. Extreme Gradient Boosting (XGBoost) is a distributed, scalable gradient-boosted decision tree (GBDT) machine learning algorithm. Gradient boosting is a flexible method used for regression, multi-class classification, and other tasks since it is compatible with all kinds of loss functions. It recasts boosting as a numerical optimization problem with the goal of reducing the loss function of the model by adding weak classifiers while employing gradient descent. Later, a first-order iterative approach, gradient descent, is used to find the local optimal of its differentiable function. Weaker classifiers are introduced sequentially to focus on the areas where the current model is struggling while misclassified observations would receive extra weight during training. Random Forest is a bagging technique that contains a number of decision trees generated from the dataset. Instead of relying solely on one decision tree, it takes the average of a number of trees to improve the predictive accuracy. For each tree, the input feature is a different sampled subset from all the features, making the model more robust and avoiding overfitting. Then, these trees are trained with a bootstrapping-sampled subset of the dataset respectively. Finally, the random forest takes the prediction from each tree and based on the majority votes makes the final decision. Higher accuracy is obtained and overfitting is avoided through the larger number of trees and the sampling process. A3.3 Focal loss \u00b6 The next two methods are Deep Neural Networks which need differentiable loss function for optimization training. Here we first tell the difference between the loss function and evaluation metric. The choice of loss function and evaluation metric depends on the task and data. The loss function should be chosen based on whether it is suitable for the model architecture and output type, while the evaluation metric should be relevant for the problem domain and application objectives. The loss function and the evaluation metric are two different concepts in deep learning. The loss function is used to optimize the model parameters during training, while the evaluation metric is used to measure the performance of the model on a test set. The loss function and the evaluation metric may not be the same. For example, Here we use focal loss to train a classification model, but use balanced accuracy or specific defined metric to evaluate its performance. The reason for this is that some evaluation metrics may not be differentiable or easy to optimize, or they may not match with the objective of the model. For this project with imbalanced classification task, we think the Focal Loss is a better choice than the traditional (binary) Cross Entropy Loss. \\[\\begin{align} \\\\ FL(p_t) = -{\\alpha}(1-p_t)^{\\gamma} \\ log(p_t) \\\\ \\end{align}\\] where p_t is the probability of predicting the correct class, \\(\\alpha\\) is a balance factor between positive and negative classes, and \\(\\gamma\\) is a modulation factor that controls how much weight is given to examples hard to classify. Focal loss is a type of loss function that aims to solve the problem of class imbalance in tasks like classification. Focal loss modifies the cross entropy loss by adding a factor that reduces the loss for easy examples and increases the loss for examples hard to classify. This way, focal loss focuses more on learning from misclassified examples. Compared with other loss functions such as cross entropy, binary cross entropy and dice loss, some advantages of focal loss are: It can reduce the dominance of well-classified examples and prevent them from overwhelming the gradient. It can adaptively adjust the weight of each example based on its difficulty level. It can improve the accuracy and recall of rare classes by adjusting \\(\\alpha\\) to give more weight to them. \\(\\alpha\\) should be chosen based on the class frequency. A common choice is to set \\(\\alpha_t\\) = 1 - frequency of class t. This way, rare classes get more weight than frequent classes. \\(\\gamma\\) should be chosen based on how much you want to focus on hard samples. A larger gamma means more focus on hard samples, while a smaller gamma means less focus. The original paper suggested gamma = 2 as an effective value for most cases. A3.4 Fully connected network (FCN) \u00b6 Figure 29: Network structure of FCN. The fully connected network (FCN) in deep learning is a type of neural network that consists of a series of fully connected layers. A fully connected layer is a function from \\(\\mathbb{R}_m\\) to \\(\\mathbb{R}_n\\) that maps each input dimension to each output dimension. The FCN can learn complex patterns and features from data using backpropagation algorithm. The major advantage of fully connected networks is that they are \u201cstructure agnostic.\u201d That is, no special assumptions need to be made about the input (for example, that the input consists of images or videos). Fully connected networks are used for thousands of applications, such as image recognition, natural language processing, and recommender systems. A disadvantage of FCN is that it can be very computationally expensive and prone to overfitting due to the large number of parameters involved. Another disadvantage is that it does not exploit any spatial or temporal structure in the input data, which can lead to poor performance for some tasks. A possible alternative to fully connected network is convolutional neural network (CNN), which uses convolutional layers that apply filters to local regions of the input data, reducing the number of parameters and capturing spatial features. A3.5 Convolutional neural network (CNN) \u00b6 CNN stands for convolutional neural network, which is a type of deep learning neural network designed for processing structured arrays of data such as images. CNNs are very good at detecting patterns in the input data, such as lines, shapes, colors, or even faces and objects. CNNs use a special technique called convolution, which is a mathematical operation that applies a filter (also called a kernel) to each part of the input data and produces an output called a feature map. Convolution helps to extract features from the input data and reduce its dimensionality. CNNs usually have multiple layers of convolution, followed by other types of layers such as pooling (which reduces the size of the feature maps), activation (which adds non-linearity to the network), dropout (which prevents overfitting), and fully connected (which performs classification or regression tasks). CNNs can be trained using backpropagation and gradient descent algorithms. CNNs are widely used in computer vision and have become the state of the art for many visual applications such as image classification, object detection, face recognition, semantic segmentation, etc. They have also been applied to other domains such as natural language processing for text analysis. Figure 30: Workflow of Convolutional Neural Network. In this project, the historical data of the land cover and land use can be translated to some synthetic images. The synthetic images use channels to represent sequence of surveys and the pixel value represents ground truth label. Thus, the spatial relationship of the neighbour tiles could be extracted from the data structure with the CNN. A3.6 Convolutional recurrent neural network (ConvRNN) \u00b6 A convolutional recurrent neural network (ConvRNN) is a type of neural network that combines convolutional neural networks (CNNs) and recurrent neural networks (RNNs). CNNs are good at extracting spatial features from images, while RNNs are good at capturing temporal features from sequences. ConvRNNs can be used for tasks that require both spatial and temporal features, such as image captioning and speech recognition. A ConvRNN consists of two main parts: a CNN part and an RNN part. The CNN part takes an input image or signal and applies convolutional filters to extract features. The RNN part takes these features as a sequence and processes them with recurrent units that have memory. The output of the RNN part can be a single vector or a sequence of vectors, depending on the task. A ConvRNN can learn both spatial and temporal patterns from data that have both dimensions, such as audio signals or video frames. For example, a ConvRNN can detect multiple sound events from an audio signal by extracting frequency features with CNNs and capturing temporal dependencies with RNNs. Figure 31: Convolutional Recurrent Neural Network Pipeline. In this project, we explored ConvRNN with structure shown in Figure 31. The sequence of surveys are treated as sequence of input \\(x^t\\) . With the recurrent structure and hidden states \\(h^t\\) to transmit information, the temporal information could be extracted. Different from the traditional Recurrent Neural Network, the function \\(f\\) in hidden layers of the recurrent structure use Convolutional operation instead of matrix computation and an additional CNN module is applied to the sequence output to detect the spatial information.","title":"Using spatio-temporal neighbor data information to detect changes in land use and land cover"},{"location":"PROJ-LANDSTATS/#using-spatio-temporal-neighbor-data-information-to-detect-changes-in-land-use-and-land-cover","text":"Shanci Li (Uzufly) - Alessandro Cerioni (Canton of Geneva) - Clotilde Marmy (ExoLabs) - Roxane Pott (swisstopo) Proposed by the Swiss Federal Statistical Office - PROJ-LANDSTATS September 2022 to March 2023 - Published on April 2023 All scripts are available on GitHub . Abstract : From 2020 on, the Swiss Federal Statistical Office started to update the land use/cover statistics over Switzerland for the fifth time. To help and lessen the heavy workload of the interpretation process, partially or fully automated approaches are being considered. The goal of this project was to evaluate the role of spatio-temporal neighbors in predicting class changes between two periods for each survey sample point. The methodolgy focused on change detection, by finding as many unchanged tiles as possible and miss as few changed tiles as possible. Logistic regression was used to assess the contribution of spatial and temporal neighbors to the change detection. While time deactivation and less-neighbors have a 0.2% decrease on the balanced accuracy, the space deactivation causes 1% decrease. Furthermore, XGBoost, random forest (RF), fully convolutional network (FCN) and recurrent convolutional neural network (RCNN) performance are compared by the means of a custom metric, established with the help of the interpretation team. For the spatial-temporal module, FCN outperforms all the models with a value of 0.259 for the custom metric, whereas the logistic regression indicates a custom metrics of 0.249. Then, FCN and RF are tested to combine the best performing model with the model trained by OFS on image data only. When using temporal-spatial neighors and image data as inputs, the final integration module achieves 0.438 in custom metric, against 0.374 when only the the image data is used. It was conclude that temporal-spatial neighbors showed that they could light the process of tile interpretation.","title":"Using spatio-temporal neighbor data information to detect changes in land use and land cover"},{"location":"PROJ-LANDSTATS/#1-introduction","text":"The introduction presents the background and the objectives of the projects, but also introduces the input data and its specific features.","title":"1. Introduction"},{"location":"PROJ-LANDSTATS/#11-background","text":"Since 1979, the Swiss Federal Statistical Office (FSO) provides detailed and accurate information on the state and evolution of the land use and the land cover in Switzerland. It is a crucial tool for long-term spatial observation. With these statistics, it is possible to determine whether and to what extent changes in land cover and land use are consistent with the goals of Swiss spatial development policies ( FSO ). Figure 1: Visualization of the land cover and land use classification. Every few years, the FSO carries out a survey on aerial or satellite images all over Switzerland. A grid with sample points spaced 100 meters apart overlays the images, providing 4.1 million sample points on which the statistics are based. The classification of the hectare tile is assigned on the center dot, as shown in Figure 1. Currently, a time series of four surveys is accessible, based on aerial images captured in the following years: 1979\u20131985 (1st survey, 1985) 1992\u20131997 (2nd survey, 1997) 2004\u20132009 (3rd survey, 2009) 2013\u20132018 (4th survey, 2018) The first two surveys of the land statistics in 1979 and 1992 were made by visual interpretation of aerial analogue photos using stereoscopes. Since the 2004 survey, the methodology was deeply renewed, in particular through the use of digital aerial photographs, which are observed stereoscopically on workstations using specific photogrammetry software . A new nomenclature (2004 NOAS04) has also been introduced in 2004 which systematically distinguishes 46 land use categories and 27 land cover categories. A numerical label from this catalogue is assigned to each point by a team of trained interpreters. The 1979 and 1992 surveys have been revised according to the nomenclature NOAS04, so that all readings (1979, 1992, 2004, 2013) are comparable. On this page you will find the geodata of the Land Use Statistics at the hectare level since 1979, as well as documentation on the data and the methodology used to produce these data. Detailed information on basic categories and principal domains can be found in Appendix 1 .","title":"1.1 Background"},{"location":"PROJ-LANDSTATS/#12-objectives","text":"It is known that manual interpretation work is time-consuming and expensive. However, in a feasibility study , the machine learning technique showed great potential capacity to help speed up the interpretation, especially with deep learning algorithms. According to the study, 50% of the estimated interpretation workload could be saved. Therefore, FSO is currently carrying out a project to assess the relevance of learning and mastering the use of artificial intelligence (AI) technologies to automate (even partially) the interpretation of aerial images for change detection and classification. The project is called Area Statistics Deep Learning (ADELE). FSO had already developed tools for change detection and multi-class classification using the image data. However, the current workflow does not exploit the spatial and temporal dependencies between different points in the surveys. The aim of this project is therefore to evaluate the potential of spatial-temporal neighbors in predicting whether or not points in the land statistics will change class. The methodolgy will be focused on change detection, by finding as many unchanged tiles as possible (automatized capacity) and miss as few changed tiles as possible. The detailed objectives of this project are to: explore the internal transformation patterns of tile classification from a data analytics perspective build a prototype that performs change detection for tiles in the next survey help the domain experts to integrate the prototype within the OFS workflow","title":"1.2 Objectives"},{"location":"PROJ-LANDSTATS/#13-input-data","text":"The raw data delivered by the domain experts is a table with 4'163'496 records containing the interpretation results of both land cover and land use from survey 1 to survey 4. An example record is shown in Table 1 and gives following information: RELI: 8-digit number composed by the EAST hectare number concatenated with the NORTH hectare number EAST: EAST coordinates (EPSG:2056) NORTH: NORTH coordinates (EPSG:2056) LUJ: Land Use label for survey J LCJ: Land Cover label for survey J training: value 0 or 1. A value of 1 means that the point can be included in the training or validation set Table 1: Example record of raw data delivered by the domain experts. RELI EAST NORTH LU4* LC4 LU3 LC3 LU2 LC2 LU1 LC1 training 74222228 2742200 1222800 242 21 242 21 242 21 242 21 0 75392541 2753900 1254100 301 41 301 41 301 41 301 41 0 73712628 2737100 1262800 223 46 223 46 223 46 223 46 0 * The shortened LC1/LU1 to LC4/LU4 will be used to simplify the notation of Land Cover/Use of survey 1 to survey 4 in the following documentation. For machine learning, training data quality has strong influence on model performance. With the training label, domain experts from FSO selected data points that are more reliable and representative. These 348'474 tiles and their neighbors composed the training and testing dataset for machine learning methodology.","title":"1.3 Input data"},{"location":"PROJ-LANDSTATS/#2-exploratory-data-analysis","text":"As suggested by domain experts, exploratory data analysis (EDA) is of significance to understand the data statistics and find the potential internal patterns of class transformation. The EDA is implemented from three different perspectives: distribution, quantity and probability. With the combination of the three, we can find that there do exist certain trends in the transformation of both land cover and land use classes. For the land cover, main findings are: distribution: most surface of Switzerland is covered by vegetation or forest, bare land and water areas take up a considerable portion as well, artificial areas take up a small portion of the land cover probability: transformation between some classes had never happened during the past four decades, all classes of land cover are most likely to keep their status rather than to change quantity: there are some clear patterns in quantitative changes Open Forest goes to Closed Forest Brush Meadows go to Shrubs Garden Plants go to Grass and Herb Vegetation Shrubs go to Closed Forest Cluster of Tree goes to Grass and Herb Vegetation For the land use, main findings are: distribution: agricultural and forest areas are the main land uses, unused area also stands out from others classes. probability: transformation between some classes had never happened during the past four decades; on the contrary, construction site, non-exploited urban areas and forest areas tend to change to other classes rather than keep unchanged quantity: the most transformations happened inside the superclasses of Arable and Grassland and of Forest not Agricultural . Readers particularly interested by the change detection methods can directly go to Section 3 ; otherwise, readers are welcomed to read the illustrated and detailed EDA given hereafter.","title":"2. Exploratory data analysis"},{"location":"PROJ-LANDSTATS/#21-distribution-statistics","text":"Figure 2: Land cover distribution plot. Figure 3: Land use distribution plot. First, a glance at the overall distribution of land cover and land use is shown in Figure 2 and 3. The X-axis is the label of each class while the Y-axis is the number of tiles in the Log scale. The records of the four surveys are plotted in different colors chronologically. By observation, some trends can be found across the four surveys. Artificial areas only take up a small portion of the land cover (labels between 10 to 20), while most surface of Switzerland is covered by vegetation or forest (20 - 50). Bare land (50 - 60) and water areas (60 - 70) take up a considerable portion as well. For land use, it is obvious that the agricultural (200 - 250) and forest (300 - 310) areas are the main components while the unused area (421) also stands out from others. Most classes kept the same tendency during the past 40 years. There are 11 out of 27 land cover classes and 32 out of 46 land use classes which are continuously increasing or decreasing all the time. Especially for land use, compared with 10 classes rising with time, 22 classes dropping, which indicates that there is some transformation patterns that caused the leakage from some classes to those 10 classes. We will dive into these patterns in the following sections.","title":"2.1 Distribution statistics"},{"location":"PROJ-LANDSTATS/#22-quantity-statistics","text":"The data are explored in a quantitative way by three means: visualization of transformations between 2 surveys visualization of sequential transformation over time identifying patterns and most occured transformations in different periods.","title":"2.2 Quantity statistics"},{"location":"PROJ-LANDSTATS/#23-probability-matrix","text":"The above analysis demonstrates the occurrence of transformation with quantitative statistics. However, the number of tiles for different classes is not a uniform distribution as shown in the distribution analysis. The largest class is thousands of times more than the smallest one. Sometimes, the quantity of a transformation is trivial compared with the majority, but it is caused by the small amount of tiles for the class. Even if the negligible class would not have a significant impact on the performance of change detection, it is of great importance to reveal the internal transformation pattern of the land statistics and support the multi-class classification task. Therefore, the probability analysis is designed as below: The probability analysis for land cover/use contains 3 parts: The probability matrix presents the probability of transformation from the source class (Y-axis) to the destination class (X-axis). The value of the probability is illustrated by the depth of the color in the log scale. The distribution of the probability that a class does not change, which is a more detailed visualization of the diagonal value of the probability matrix. The distribution of the maximum probability that a class changes to another certain class. This is a deeper inspection to look for a fixed transformation pattern that exists between two classes. The probability is calculated by the status change between the beginning survey and the end survey stated in the figure title. For example Figure 6 is calculated by the transformation between survey 1 and survey 4, without taking into account possible intermediate changes in survey 2 and 3.","title":"2.3 Probability matrix"},{"location":"PROJ-LANDSTATS/#3-methods","text":"The developed method should be integrated in the OFS framework for change detection and classification of land use and land cover illustrated in Figure 12. The interesting parts for this project are highlighted in orange and will be presented in the following. Figure 12: Planned structure in FSO framework for final prediction. Figure 12 shows on the left the input data type in the OFS framework. The current project work on the LC/LU neighbors introduced in Section 1.3 . The main objective of the project - to detect change by means of these neighbors - is the temporal-spatial module in Figure 12. As proposed by the feasibility study, FSO had implement studies on change detection and multi-class classification on swisstopo aerial images time series to accelerate the efficiency of the interpretation work. The predicted LC and LU probabilities and information obtained by deep learning are defined as the image-level module. In a second stage of the project, the best model for combining the temporal-spatial and the image-level module outputs is explored to evaluate the gain in performance after integration of the spatial-temporal module in the OFS framework. This is the so-called integration module. The rest of the input data will not be part of the performance evaluation.","title":"3. Methods"},{"location":"PROJ-LANDSTATS/#31-temporal-spatial-module","text":"Figure 13: Time and space structure of a tile and its neighbors. The input data to the temporal-spatial module will be the historical interpretation results of the tile to predict and its 8 neighbors. The first three surveys are used as inputs to train the models while the forth survey serves as the ground truth of the prediction. This utilizes both the time and space information in the dataset like depicted in Figure 13. During the preprocessing, the tiles with missing neighbors were discarded from the dataset to keep the data format consistent, which is insignificant (about 400 out of 348'868). The determination of change is influenced by both land cover and land use. When there is a disparity between the classifications in the fourth survey and the third one for a specific tile, it is identified as altered (positive) in change detection. The joint prediction of land cover and land use is based on the assumption that a correlation may exist between them. If the land cover of a tile undergoes a change, it is probable that its land use will also change. Moreover, the tile is assigned numerical labels. Nevertheless, the model does not desire a numerical association between classes, even when they belong to the same superclass and are closely related. To address this, we employ one-hot encoding, which transforms a single land cover column into 26 columns, with all values set to '0' except for one column marked as '1' to indicate the class. Despite increasing the model's complexity with almost two thousand input columns, this is a necessary trade-off to eliminate the risk of numerical misinterpretation.","title":"3.1 Temporal-spatial module"},{"location":"PROJ-LANDSTATS/#32-change-detection","text":"Usually, spatial change detection is a remote sensing application performed on aerial or satellite images for multiclass change detection. However, in this project, a table of point records is used for binary classification into changed and not changed classes. Different traditional and new deep learning approach have been explored to perform this task. The motivations to use them are given hereinafter. An extended version of this section with detailed introduction to the machine learning models is available in Appendix A3 . Three traditional classification models, logistic regression (LR) , XGBoost and random forest (RF) , are tested. The three models represent the most popular approaches in the field - the linear, boosting, and bagging models. In this project, logistic regression is well adapted because it can explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables. Concerning XGBoost, it has the advantage that weaker classifiers are introduced sequentially to focus on the areas where the current model is struggling, while misclassified observations would receive extra weight during training. Finally, in random forest, higher accuracy may be obtained and overfitting still avoided through the larger number of trees and the sampling process. Beyond these traditional popular approaches, another two deep learning algorithms are explored as well: fully connected network and convolutional recurrent neural network . Different from the traditional machine learning algorithms, deep learning does not require manual feature extraction or engineering. Deep neural networks capture the desired feature with back-propagation optimization process. Besides, these deep neural networks have some special design for temporal or spatial inputs, because it is assumed that the internal pattern of the dataset would match with the network structure and the model will have better performance.","title":"3.2 Change detection"},{"location":"PROJ-LANDSTATS/#33-performance-metric","text":"Once the different machine learning models are trained for the respective module, comparison has to be made on the test set to evaluate their performance. This will be performed with the help of metrics.","title":"3.3 Performance metric"},{"location":"PROJ-LANDSTATS/#34-training-and-testing-plan","text":"Introduced in Section 1.3 , the 348'474 tiles with temporal-spatial information are selected for training. The 80%-20% split is applied to the selected tiles to create the train set and the test set respectively. Adam optimizer and multi-step learning rate scheduler are deployed for better convergence. For the temporal-spatial module, metrics for ablation study on the descriptors and descriptor importance are first computed. The descriptor importance is taken from XGBoost simulations. The ablation study is performed with the logistic regression and consists of training the model with: 8 neighbors and time activated, \"baseline\" 4 neighbors (northern, western, southern, eastern neighbors) no spatial-neighbors, \"space deactivate\" no temporal-neighbors, \"time deactivate\" Then, the baseline configuration is used to trained the traditional algorithms and the deep learning ones. Metrics are compared and the best performing models are kept for the integration module. Finally, the performance of several configurations are compared for the integration module. direct outputs of the image-level module direct outputs of the best performing temporal-spatial module outputs of the best performing temporal-spatial module, followed by RF training for the integration module. outputs of the best performing temporal-spatial module, followed by FCN training for the integration module. The extra information gain from the temporal-spatial module will be studied by comparison with image-level performance only. The image-level data contain multi-class classification prediction and its confidence. We can calculate the change probability according to the probability of each class. Therefore, the weighted metric can also be applied at the image-level only. Then, the RF and FCN are tested for the integration module which combines various types of information sources.","title":"3.4 Training and testing plan"},{"location":"PROJ-LANDSTATS/#4-experiments","text":"The Experiments section covers the results obtained when performing the planned simulations for the temporal-spatial module and the integration module.","title":"4. Experiments"},{"location":"PROJ-LANDSTATS/#41-temporal-spatial-module","text":"","title":"4.1 Temporal-spatial module"},{"location":"PROJ-LANDSTATS/#42-integration-module","text":"Table 5 compares the performance of FCN or image-level only to several configurations for the integration module. Table 5: Performance metrics for the integration model in combination with a spatial-temporal model. Model Weighted Metric Raw Metric Balanced Accuracy Recall Missed Changes Missed Changes Ratio Missed Weighted Changes Missed Weighted Changes Ratio Automatized Points Automatized Capacity FCN 0.259 0.210 0.656 0.958 322 0.042 15563 0.029 14490 0.363 image-level 0.374 0.305 0.737 0.958 323 0.042 15735 0.029 20895 0.524 LR + RF 0.434 0.372 0.752 0.969 241 0.031 10810 0.020 21567 0.541 FCN + RF 0.438 0.373 0.757 0.968 250 0.032 11277 0.021 22010 0.552 FCN + FCN 0.438 0.376 0.750 0.970 229 0.030 9902 0.018 21312 0.534 LR + FCN 0.423 0.354 0.745 0.967 255 0.033 10993 0.020 21074 0.528 The study demonstrates that the image-level contains more information related to change detection compared with temporal-spatial neighbors ( FCN row in the Table 5). However, performance improvement from the temporal-spatial module when combined with image-level data, achieving 0.438 in weighted metric in the end ( FCN+RF and FCN+FCN ). Regarding the composition of different models for the two modules, FCN is proved to be the best one for the temporal-spatial module, while RF and FCN have similar performance in the integration module. The choice of integration module could be influenced by the data format of other potential modules. This will be further studied by the FSO team.","title":"4.2 Integration module"},{"location":"PROJ-LANDSTATS/#5-conclusion-and-outlook","text":"This project studied the potential of historical and spatial neighbor data in change detection task for the fifth interpretation process of the areal statistic of FSO. For the evaluation of this specific project, a weighted metric was defined by the FSO team. The temporal-spatial information was proved not to be as powerful as image-level information which directly detects change within visual data. However, an efficient prototype was built with 6% performance improvement in weighted metric combining the temporal-spatial module and the image-level module. It is validated that integration of modules with different source information can help to enhance the final capacity of the entire workflow. The next research step of the project would be to modify the current implementation of ConvRNN. If the numerical relationship is removed from the synthetic image data, ConvRNN should have similar performance as FCN theoretically. Also, CNN is worth trying to validate whether the temporal pattern matters in this dataset. Besides, by changing the size of the synthetic images, we can figure out how does the number of neighbour tiles impact the model performance.","title":"5. Conclusion and outlook"},{"location":"PROJ-LANDSTATS/#appendix","text":"","title":"Appendix"},{"location":"PROJ-LANDSTATS/#a1-classes-of-land-cover-and-land-use","text":"Figure 20: Land Cover classification labels. Figure 21: Land Use classification labels.","title":"A1. Classes of land cover and land use"},{"location":"PROJ-LANDSTATS/#a2-probability-analysis-of-different-periods","text":"","title":"A2. Probability analysis of different periods"},{"location":"PROJ-LANDSTATS/#a3-alternative-version-of-section-32","text":"","title":"A3 Alternative version of Section 3.2"},{"location":"PROJ-QALIDAR/","text":"Cross-generational change detection in classified LiDAR point clouds for a semi-automated quality control \u00b6 Nicolas M\u00fcnger (Uzufly) - Gwena\u00eblle Salamin (ExoLabs) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo) Proposed by the Federal Office of Topography swisstopo - PROJ-QALIDAR September 2023 to February 2024 - Published in March 2024 All scripts are available on GitHub . Abstract : The acquisition of LiDAR data has become standard practice at national and cantonal levels during the recent years in Switzerland. In 2024, the Federal Office of Topography (swisstopo) will complete a comprehensive campaign of 6 years covering the whole Swiss territory. The point clouds produced are classified post-acquisition, i.e. each point is attributed to a certain category, such as \"building\" or \"vegetation\". Despite the global control performed by providers, local inconsistencies in the classification persist. To ensure the quality of a Swiss-wide product, extensive time is invested by swisstopo in the control of the classification. This project aims to highlight changes in a new point cloud compared to a previous generation acting as reference. We propose here a method where a common grid is defined for the two generations of point clouds and their information is converted in voxels, summarizing the distribution of classes and comparable one-to-one. This method highlights zones of change by clustering the concerned voxels. Experts of the swisstopo LiDAR team declared themselves satisfied with the precision of the method. 1. Introduction \u00b6 The usage of light detection and ranging (LiDAR) technology has seen a large increase in the field of geo-surveying over the recent years 1 . Data obtained from airborne acquisition provides rich 3D information about land cover in the form of a point cloud. These point clouds are typically processed after acquisition in order to assign a class to each point, as displayed in Figure 1. Figure 1: View of the Rhine Falls in the classified point cloud of the product swissSURFACE3D . To conduct their LiDAR surveys, the Federal Office of Topography (swisstopo) mandates external companies, in charge of the airborne acquisition and classification in post-processing. The process of verifying the quality of the data supplied is tedious, with an estimated duration of 42 working hours for the verification of an area of 216 km 2 . A significant portion of this verification process is dedicated to ensuring the precision of the point classification. With the first generation of the LiDAR product 2 nearing completion, swisstopo is keen to leverage the considerable time and effort invested to facilitate the quality assessment of the next generation. In this context, the swisstopo's LiDAR development team contacted the STDL to develop a change detection method. As reviewed by Stilla & Xu (2023), change detection in point clouds has already been explored in numerous ways 3 . The majority of the research focus, however, on changes of geometry. Deep learning solutions are being extensively researched to apply the advancements in this field to change detection in point clouds 4 . However, to the best of our knowledge, no solution currently address the problem of change detection in the classification of two point clouds. Most challenges of change detection in point clouds come from the unstructured nature of LiDAR data, making it impossible to reproduce the same result across acquisition. Therefore, the production of ground truth and application of deep learning to point clouds of different generations can be challenging. To overcome this, data discretization by voxelization has already been studied in several works on change detection in point clouds, with promising results 5 6 . The goal of this project is to create a mapping of the changes observed between two generations of point clouds for a common scene, with an emphasis on classification changes. The proposed method creates a voxel map for the reference point cloud and the new point cloud for which classification was not controlled as thoroughly. By using the same voxel grid for both generations, direct comparisons can be performed on the occupancy of voxels by the previous and the new classes. Based on the domain expert's criteria, an urgency level is assigned to all voxels: non-problematic, grey zone or problematic. Problematic voxels are then clustered into high priority areas. The summarized process is displayed in Figure 2. Figure 2: Overview of the workflow for change detection and assignment of a criticality level to the detected changes. 2. Data \u00b6 2.1 LiDAR point clouds \u00b6 The algorithm required two temporally distinct acquisitions for a same area. Throughout the document, we refer to the first point cloud as v.1 . It served as reference data and is assumed to have a properly controlled classification. The subsequent point cloud, representing a new generation, is referred as v.2 . 2.1.1 Choice of the LiDAR products \u00b6 The swissSURFACE3D product was extensively controlled by swisstopo's LiDAR team before its publication. Therefore, its classification has the quality expected by the domain expert. It acted as the v.1 , i.e as the generation of reference. We thus needed to find some newer acquisition which fulfilled the following conditions: Availability in swissSURFACE3D : At the time of development, the initial classification control was not over for the whole of Switzerland. Point clouds were still in production at swisstopo for some cantons. Gap in acquisition dates : To have an exhaustive and representative panel of changes, point cloud of the v.2 had to be acquired at least two years apart from the v.1 data. Difference in density : The method aimed for robustness against changes in density, considering that advancements in technology lead to higher-quality sensors and an increased point density per square meter. Larger number of classes : Anticipating that swisstopo next iteration would introduce five additional classes, we required the v.2 point cloud to have a higher number of classes than in the v.1 point cloud. Furthermore, to be comparable, the v.2 classes needed to be subsets of the ones present in v.1 . Typology of the point cloud : The selected point clouds needed to be acquired, at least partially, in an urban environment, where classification errors are more prevalent and, consequently, more control time is allocated. For our v.2 , we used the point cloud produced by the State of Neuch\u00e2tel, which covers the area within its cantonal borders. The characteristics of each point cloud are summarized in Table 1. Table 1: Characteristics of swissSURFACE3D, used as v1 , and the LiDAR product of the State of Neuch\u00e2tel, used as v2 . swissSURFACE3D Neuch\u00e2tel Acquisition period 2018-19 2022 Planimetric precision 20 cm 10 cm Altimetric precision 10 cm 5 cm Spatial density ~15-20 pts/m 2 ~100 pts/m 2 Number of class 7 21 Dimension of one tile 1000 x 1000 m 500 x 500 m Provided file format LAZ LAZ 2.1.2 Area of interest \u00b6 The delimitation of the LiDAR tiles used in this project is shown in Figure 3. We chose to work with tiles of the dimensions of the Neuch\u00e2tel data, i.e. 500 x 500 m. The tiles are designated by a letter that we refer to in the continuation of this document. The tiles are located in the region of Le Locle. The zone covers an urban area, where quality control is the most time-consuming. It also possesses a variety of land covers, such as a large band of dense forest or agricultural fields. Figure 3: Tiles used for the development of our method: A for a result control for the hyperparameter tuning, B for the choice of the voxel size and C for a control of the results by the domain expert. 2.2 Annotations by the domain expert \u00b6 To understand the expected result, the domain expert controlled the v.2 point cloud in the region of Le Locle as if it was a new acquisition. A perimeter of around 1.2 km 2 was controlled. The problematic zones were each defined by a polygon with a textual description, as well as the current and the correct class as numbers. A sample of annotations are shown in Figure 4. Figure 4: Controlled area (left) and examples of control annotations within the detail zone, with the reported error as color and with the original and the corrected class as labels (right). This provided us with annotations of areas where the point cloud data were incorrect. The annotations were used to calibrate the change detection. It must be noted that this control was achieved without referring the v.1 point cloud. In this case we assume that the v.1 contains no classification error, and that the annotated areas therefore represent classification changes between the two generations. 3. Method \u00b6 3.1 Correspondence between classes \u00b6 To compare the classes between generations, we needed to establish their correspondence. We selected the classes from the swisstopo point cloud, i.e the reference generation, as the common ground. Any added classes in the new generation must come from a subdivision of an existing class, as explained in the requirements for the v.2 point cloud . This is the case with Neuch\u00e2tel data. Each class from Neuch\u00e2tel data was mapped to an overarching class from the reference generation, in accordance with the inputs from the domain expert. The details of this mapping are given in table 2. Notice that the class Ground level noise received the label -1. It means that this class was not treated in our algorithm and every such point is removed from the point cloud. This was agreed with the domain expert as this class is very different from the class Low Point (Noise) and doesn't provide any useful information. Table 2: Mapping between the v.2 and v.1 point cloud. The field \"original ID\" provides the class number for v.2 , the class name corresponds to the class description from the metadata, and the corresponding ID shows the class number from v.1 to which it is assigned. Original ID Class name Corresponding ID 1 Unclassified 1 2 Ground 2 3 Low vegetation 3 4 Medium vegetation 3 5 High vegetation 3 6 Building roofs 6 7 Low Point (Noise) 7 9 Water 9 11 Piles, heaps (natural materials) 1 14 Cables 1 15 Masts, antennas 1 17 Bridges 17 18 Ground level noise -1 19 Street lights 1 21 Cars 1 22 Building facades 6 25 Cranes, trains, temporary objects 1 26 Roof structures 6 29 Walls 1 31 Additional ground points 2 41 Water (synthetic points) 9 Figure 5: Reallocation of points from the v.2 classes (left) to the v.1 classes (right) for tile B with the class numbers from the second generation indicated between parenthesis. As visible on Figure 5, seven classes were reassigned to class 1 Undefined . However, they represented a small part of the point cloud. The most important classes were ground , with in equal parts of ground and additional ground points, vegetation , with mainly points in high vegetation, and building , with mainly points on building roofs. 3.2 Voxelization of the point clouds \u00b6 The method relies on the voxelization of both point clouds. As defined in Xu et al. (2021) 7 , voxels are a geometry in 3D space, defined on a regular 3D grid. They can be seen as the 3D equivalent to pixels in 2D. Figure 6 8 shows how a voxel grid is defined over a point cloud. Figure 6: Representation of a point cloud (a) and its voxel grid (b), courtesy of Shi et al. (2018). 3.2.1 Preprocessing of LiDAR tiles \u00b6 It must be noted that the approach operated under the assumption that both point clouds were already projected in the same reference frame, and that the 3D positions of the points were accurate. We did not perform any point-set registration as part of the workflow, as the method focuses on finding errors of classification in the point cloud. Before creating the voxels, the tiles were cropped to the size of the generation with the smallest tiling grid. Here, the v.1 tiles were cropped from 1000 x 1000 m to the dimensions of v.2 , i.e 500 x 500 m. A v.2 tile corresponds exactly to one quarter of a v.1 tile, so no additional operations were needed. 3.2.3 Voxelization process \u00b6 In the interest of keeping our solution free of charge for users, and to have greater flexibility in the voxelization process, we chose to develop our own solution, rather than use pre-existing tools. We used the python libraries laspy and pandas . Given a point cloud provided as a LAS or LAZ file, it returned a table with one row per voxel. The voxels were identified by their center coordinates. In addition, the columns provided the number of points for each class contained within the voxel for each generation. Figure 7 shows a visual representation of the voxelization process for one voxel element. Figure 7: Summarized process for the creation of one voxel in the v.1 (left) and the v.2 (right) generation from the point cloud to the class distribution as a vector. The class distribution is saved for both generations in a table. 3.3 Determination of the voxel size \u00b6 The voxels must be sized to efficiently locate area of changes without being sensitive to negligible local variations in the point location and density. We assumed that although a point cloud changes between two generations, the vast majority of its features would remain consistent on a tile of 500 x 500 m. Following this hypothesis, we evaluated how the voxel size influenced the proportion of voxels not filled with the same classes in two separate generations. We called this situation a \"categorical change\". A visual example is given in Figure 8. Figure 8: Example of a situation with no categorical change (left) and a second situation with a categorical change (right) When the proportion of voxels presenting a categorical change was calculated for different voxel sizes, it rose drastically around a size of 1.5 m, as visible on Figure 9. We postulated that this is the minimum voxel size which allows observing changes without interference from the noisy nature of point clouds. Figure 9: Proportion of categorical changes for different voxel size in tile B. The horizontal axis is the voxel size. The vertical axis represents the percentage of voxels experiencing a categorical change between the two generations. For the rest of the development process, square voxels of 1.5 m are used. However, the voxel width and height can be modified in the scripts if desired. 3.4 Criticality tree \u00b6 The algorithm must not only detect changes, but also assign them a criticality level. We translated the domain expert's criteria into a decision tree, which sorts the voxels into different criticality levels for control. The decision tree went through several iterations, in a dialogue with the domain expert. Figure 10 provides the final architecture of the tree. Figure 10: Decision tree used to classify the voxels based on the different types of changes and their criticality. The decision tree classifies the voxels into three buckets of criticality level: \"non-problematic\", \"grey zone\" and \"problematic\". \"non-problematic\" voxels have little to no change. No control is necessary. \"Grey zone\" voxels undergo a change that is coherent with their neighbors or due to the presence of the class 1, i.e. the Undefined class, in the new generation. The detected changes should not be problematic, but might be interesting to verify in the case of a very thorough control. \"Problematic\" voxels are the ones with important changes such as strong variations in the class proportions or changes not reflected in their neighborhood. They are relevant to control. Let us note that although only three final buckets were output, we preserved an individual number for each outgoing branch of the criticality tree, as they provided a more detailed information. Those numbers are referred as \"criticality numbers\". The decisions of the criticality tree are divided into two major categories. Some are based on qualitative criteria which is by definition true or false. Others, however, depend on some threshold which had to be defined. 3.4.1 Qualitative decisions \u00b6 Decision A: Is there only one class in both generations and is it the same? Every voxel that contains a single, common class in both generations is automatically identified as non-problematic. Decision B: Is noise absent from the new generation? Any noise presence is possibly an object wrongly classified and necessitates a control. Any voxel containing noise in the new generation is directed to the \"problematic\" bucket. Decision G: Is the change a case of complete appearance or disappearance of a voxel? If the voxel is only present in one generation, it means that the voxel is part of a new or disappearing geometry that might or not be problematic, depending on decisions H and J. If the voxel is present in both generations, we are facing a change in the class distribution due to new classes in it. The decision I will compare the voxel with its neighbors to determine if it is problematic. Decision J: Is it the specific case of building facade or vegetation? Due to the higher point density in the v.2 point cloud, point proportions may change in voxels compared to the v.1 point cloud, even though the geometry already existed. We particularly noticed this on building facades and under dense trees, as shown in the example given in Figure 11. To avoid classifying these detections as problematic, a voxel with an appearance of points in the class building or vegetation is not problematic if it is located under a non-problematic voxel containing points of the same class. Figure 11: Example of non-problematic appearance of points in the v.2 point cloud due to the difference of density between the two generations. 3.4.2 Threshold based decisions \u00b6 The various thresholds were set iteratively by visualization of the results on tile A and visual comparison with the expert's annotations described in section 2.2 . Once the global result seemed satisfying, we assessed the criticality label for a subset of voxels. Eight voxels were selected randomly for each criticality number. Given that there are 13 possible outcomes, 104 voxels were evaluated. A first evaluation was performed on tile A without the input of the domain expert. It allowed for the hyperparameter tuning. A second evaluation was conducted by the domain expert on tile C and he declared that no further adjustment of the threshold was necessary. Cosine similarity The decision C, D and E require to evaluate the similarity between the distribution of the previous and the new classes occupying a voxel. We thus sought a metric adapted to compare the two distributions. Many ways exist to measure the similarity between two distributions 9 . We settled for the well-known cosine similarity. Given two vectors X and Y , it is defined as: \\(\\text{Cosine Similarity}(\\mathbf{X}, \\mathbf{Y}) = \\frac{\\mathbf{X} \\cdot \\mathbf{Y}}{\\|\\mathbf{X}\\| \\|\\mathbf{Y}\\|}\\) This metric measures the angle between two vectors. The magnitude of the vectors holds no influence on the results. Therefore, this measure is unaffected by the density of the point clouds. The more the two vectors point in the same direction, the closer the metric is to one. Vectors having null cosine similarity correspond to voxels where none of the classes present in the previous generation match those from the new one. One limitation of the cosine similarity is its requirement for both vectors to be non-zero. For cases where a voxel is only occupied in a single generation, an arbitrary cosine similarity of -1 is set. Decision C: Does the proportion of class stay similar and the classes don't change? We assessed whether the proportion of class stays similar between generations. A threshold of 0.8 is set on the cosine similarity. flowchart LR A[Prev. gen.