Skip to content

Latest commit

 

History

History
162 lines (123 loc) · 20.1 KB

ML Component Specification.md

File metadata and controls

162 lines (123 loc) · 20.1 KB

Machine Learning Component Specification v1.0

Revision History

Author(s) Date Description Version
Markus Borg 2021-09-21 Initial template. 0.1
Markus Borg 2021-09-27 Toward a complete draft. 0.2
Markus Borg 2022-06-16 Complete draft. 0.9
Markus Borg, Kasper Socha, Jens Henriksson 2022-06-16 Beta Release - Ready for peer-review 0.99
Markus Borg, Kasper Socha 2022-12-19 Production Release - Peer-reviewed by Software Quality Journal according to Issue #27. 1.0

1 Introduction

This document contains the machine learning (ML) component specification for SMIRK – a pedestrian automatic emergency braking (PAEB) system. SMIRK is an advanced driver-assistance system (ADAS), intended to act as one of several systems supporting the driver in the dynamic driving task, i.e., all the real-time operational and tactical functions required to operate a vehicle in on-road traffic. SMIRK, including the accompanying safety case, is developed with full transparancy under an open-source software (OSS) license. We develop SMIRK as a demonstrator in a simulated environment provided by ESI Pro-SiVIC.

1.1 Purpose

This document describes the ML-based pedestrian recognition component used in SMIRK. Two established third-party OSS libraries are important constituents. First, the document describes how the object detection architecture YOLOv5 by Ultralytics is used and trained for the SMIRK operational design domain (ODD). Second, we introduce how the safety cage architecture is realized using out-of-distribution (OOD) detection provided by SeldonIO's Alibi Detect. Third, we provide the ML model learning argument patterns in line with the Guidance on the Assurance of Machine Learning in Autonomous Systems (AMLAS).

1.2 Document Conventions

The number of academic publications in the list of references is unconventional for technical project doumentation. This is a conscious decision. SMIRK is developed as a prototype in the context of a research project with limited resources. As part of our research, we aim to integrate (sometimes scattered) pieces from the state-of-the-art literature. Synthesis is a fundamental tool in our research and we seek novel insights while focusing on refinement and integration. We actively choose to rely on reuse of design decisions from previously peer-reviewed publications. Building on previous work, i.e., standing on the shoulders of others, is a core concept in research that allows validation of previous work, incl. previously proposed requirements. When available, and unless open access publication models have been used, links to academic publications point to preprints on open repositories such as arXiv rather than peer-reviewed revisions behind paywalls.

Headings with a reference in brackets [X] refer to artifacts prescribed by the AMLAS process (Guidance on the Assurance of Machine Learning in Autonomous Systems). Due to formatting limitations in GitHub MarkDown, all figure and table captions appear in italic font to distinguish them from the running text. Explanatory text copied verbatim from public documents are highlighted using the quote formatting available in GitHub Markdown.

1.3 Glossary

  • AMLAS: Guidance on the Assurance of Machine Learning in Autonomous Systems
  • DNN: Deep Neural Network
  • ML: Machine Learning
  • ODD: Operational Design Domain
  • OOD: Out-Of-Distribution
  • OSS: Open Source Software
  • YOLO: You Only Look Once

1.4 Intended Audience and Reading Suggestions

The section is organized into internal stakeholders, i.e., roles that are directly involved in the SMIRK development, and external stakeholders who are linked indirectly but have significant contribution in the successful completion of the SMIRK project. External stakeholders also include the ML safety community at large. Note that AMLAS prescribes a split between testers that are involved during the development and testers that are "sufficiently independent from the development activities." We refer to these roles as internal testers and independent testers, respectively.

Internal stakeholders

External stakeholders

1.6 References

The references are organized into 1) internal SMIRK documentation, 2) SMIRK data sets, 3) peer-reviewed publications, and 4) gray literature and white papers. When a reference listed under category 2) or 3) is used to motivate a design decision or a specific requirement, there is an explicit reference in the running text. Note that this DMS is self-contained, the references are provided for traceability to the underlying design rationales. Interested readers are referred to the discussions in the original sources.

Internal SMIRK documentation

SMIRK data sets

  • Development Data [N]
  • Internal Test Data [O]
  • Verification Data [P]

Peer-reviewed publications

Gray literature and white papers

2 ML Component Description [D]

The SMIRK pedestrian recognition component consists of, among other things, two ML-based constituents: a pedestrian detector and an anomaly detector. Further details are available in the Logical View of the system architecture. In this section, we describe the pedestrian detector. The anomaly detection is described in Section 4.

The SMIRK pedestrian detector uses the third-party OSS framework YOLOv5 by Ultralytics implemented using PyTorch. YOLO is an established real-time object detection algorithm that was originally released by Redmon et al. (2016). The first version of YOLO introduced a novel object detection process that uses a single deep neural network (DNN) to perform both prediction of bounding boxes around objects and classification at once. Compared to the alternatives, YOLO was heavily optimized for fast inference to support real-time applications. A fundamental concept of YOLO is that the algorithm considers each image only once, hence its name "You Only Look Once." YOLO is referred to as a single-stage object detector. While there have been several versions of YOLO (and the original authors maintained them until v3), the fundamental ideas of YOLO remains the same across versions - including YOLOv5 used in SMIRK.

YOLO segments input images into smaller images. Each input image is split into a square grid of individual cells. Each cell predicts bounding boxes capturing potential objects and provides confidence scores for each box. Furthermore, YOLO does a class prediction for objects in the bounding boxes. Note that for the SMIRK MVP, the only class we predict is pedestrian. Relying on the Intersection over Union method for evaluating bounding boxes, YOLO eliminates redundant bounding boxes. The final output from YOLO consists of unique bounding boxes with class predictions. Further details are available in the original paper by Redmon et al. (2016).

The pedestrian recognition component in SMIRK uses the YOLOv5 architecture without any modifications. This paragraph presents a high-level description of the model architecture and the key techincal details. We refer the interested reader to further details provided by Rajput (2020) and the OSS repository on GitHub. YOLOv5 provides several alternative DNN architectures. To enable real-time performance for SMIRK, we select YOLOv5s with 191 layers and about 7.5 million parameters.

Figure 1 shows the speed/accuracy tradeoffs for different YOLOv5 architectures with YOLOv5s depicted in orange. The results are provided by Ultralytics including instructions for reproduction. On the y-axis, COCO AP val denotes the mAP@0.5:0.95 metric measured on the 5,000-image COCO val2017 dataset over various inference sizes from 256 to 1,536. On the x-axis, GPU Speed measures average inference time per image on the COCO val2017 dataset using an AWS p3.2xlarge V100 instance at batch-size 32. The curve EfficientDet illustrates results from Google AutoML at batch size 8.

YOLOv5-tradeoffs

Figure 1: Speed/accuracy tradeoffs for different YOLOv5 architectures. (Image source: Ultralytics)

As an single-stage object detector, YOLOv5s consists of three core parts: 1) the model backbone, 2) the model neck, and 3) the model head. The model backbone extracts important features from input images. The model neck generates so called "feature pyramids" using PANet (Liu et al., 2018) that support generalization to different sizes and scales. The model head performs the detection task, i.e., it generates the final output vectors with bounding boxes and class probabilities.

In SMIRK, we use the default configurations proposed in YOLOv5s regarding activation, optimization, and cost functions. As activation functions, YOLOv5s uses Leaky ReLU in the hidden layers and the sigmoid function in the final layer. We use the default optimization function in YOLOv5s, i.e., stochastic gradient descent. The default cost function in YOLOv5s is binary cross-entropy with logits loss as provided in PyTorch, which we also use.

3 Model Development Log [U]

This section describes how the YOLOv5s model has been trained for the SMIRK MVP. We followed the general process presented by Ultralytics for training on custom data.

First, we manually prepared two SMIRK datasets to match the input format of YOLOv5. In this step, we prepared the development data [N] and the internal test data [O] according to Ultralytic's instructions. We created a dataset.yaml with the paths to the two data sets and specified that we train YOLOv5 for a single class, i.e., pedestrians. The data sets were already annotated using ESI Pro-SiVIC, thus we only needed to export the labels to the YOLO format with one txt-file per image. Finally, we organize the individual files (images and labels) according to the YOLOv5 instructions. More specifically, each label file contains the following information:

  • One row per object
  • Each row contains class, x_center, y_center, width, and height.
  • Box coordinates are stored in normalized xywh format (from 0 - 1).
  • Class numbers are zero-indexed, i.e., they start from 0.

Second, we trained the YOLOv5s model using the development data (as specified in dataset.yaml) from the pretrained weights in yolov5s.pt. The model was trained for 3 epochs with a batch-size of 16. The Internal Test Results [X] provides evidence that the ML model satisfies the requirements on the internal test data.

The final pedestrian detection model, i.e., the ML model [V], has a size of about 14 MB.

4 Outlier Detection for the Safety Cage Architecture

SMIRK relies on the open-source third-party library Alibi Detect from Seldon for outlier detection. The outlier detection is part of the safety cage architecture. Alibi Detect is a Python library that provides several algorithms for outlier, adversarial, and drift detection for various types of data (Klaise, 2020). For SMIRK, we trained Alibi Detect's autoencoder for outlier detection, with three convolutional and deconvolutional layers for the encoder and decoder respectively. The final OOD detection model is roughly 150 MB.

Figure 2 shows an overview of the DNN architecture of an autoencoder. An encoder and a decoder are trained jointly in two steps to minimize a reconstruction error. First, the autoencoder receives input data X and encodes it into a latent space of fewer dimensions. Second, the decoder tries to reconstruct the original data and produces output X'. An and Cho (2015) proposed using the reconstruction error from a autoencoder to identify input that differs from the training data. Intuitively, if inlier data is processed by the autoencoder, the difference between X and X' will be smaller than for outlier data. By carefully selecting a threshold, this approach can be used for OOD detection.

Autoencoder

*Figure 2: Overview architecture of an autoencoder. Adapted from WikiUser:EugenioTL (CC BY-SA 4.0)

For SMIRK, we trained Alibi Detect's autoencoder for OOD detection on the training data subset of the development data. The encoder part is designed with three convolutional layers followed by a dense layer resulting in a bottleneck that compresses the input by 96.66%. The latent dimension is limited to 1,024 variables to limit requirements on processing VRAM of the GPU. The reconstruction error from the autoencoder is measured as the mean squared error between the input and the reconstructed instance. The mean squared error is used for OOD detection by computing the reconstruction error and considering an input image as an outlier if the error surpasses a threshold theta. The threshold used for OOD detection in SMIRK is 0.004, roughly corresponding to the threshold that rejects a number of samples that equals the amount of outliers in the validation set. As explained in the Erroneous Behaviour Log, the OOD detection is only active for objects at least 10 m away from ego car as the results for close-up images are highly unreliable. Furthermore, as the constrained SMIRK ODD ensures that only one single object appears in each scenario, the safety cage architecture applies the policy ``once an anomaly, always an anomaly'' - objects that get rejected once will remain anomalous no matter what subsequent frames might contain.

5 ML Model Learning Argument Pattern [W]

The figure below shows the ML model learning argument pattern using GSN. The pattern closely resembles the example provided in AMLAS, but adapts it to the specific SMIRK case.

GSN-ML-Model_Learning_Argument_Pattern

Figure 3: SMIRK ML Model Learning Argument Pattern.

The top claim (G4.1) in this argument pattern is that the development of the learned model [V] is sufficient. The strategy is to argue over the internal testing of the model and that the ML development was appropriate (S4.1) in context of creating a valid model that meets practical constraints such as real-time performance and cost (C4.2). Sub-claim (G4.2) is that the ML model satisfies the ML safety requirements when using the internal test data [O]. We justify that the internal test results indicate that the ML model satisfies the ML safety requirements (J3.1) by presenting evidence from the internal test results [X].

Sub-claim G4.3 addresses the approach that was used when developing the model. The claim is supported by three claims regarding the type of model selected, the transfer learning process used, and the model parameters selected, respectively. First, G4.5 claims that the type of model is appropriate for the specified ML safety requirements and the other model constraints. Second, G4.6 claims that the process followed to allow transfer learning is appropriate. ML development processes, including transfer learning, are highly iterative thus rationales for development decisions must be recorded. Third, G4.7 claims that the parameters of the ML model are appropriately selected to tune performance toward the object detection task in the specified ODD. Rationales for any decisions in G4.4 and G4.5 are recorded in the model development log [U].

6 ML Learning Argument [Y]

SMIRK instantiates the ML Learning Argument through a subset of the artifacts listed in the Safety Assurance Table. This instantiation activity uses as input the ML Learning Argument Pattern [W], as well as the following artifacts from preceding AMLAS activities: