Skip to content

Commit

Permalink
add document for configuration level resources (#96)
Browse files Browse the repository at this point in the history
Signed-off-by: Johnson Shih <jshih@microsoft.com>
  • Loading branch information
johnsonshih authored Oct 3, 2023
1 parent 4a33693 commit 5fd4328
Show file tree
Hide file tree
Showing 3 changed files with 136 additions and 1 deletion.
4 changes: 4 additions & 0 deletions docs/architecture/agent-in-depth.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ To enable resource sharing, the Akri Agent creates and updates the `Instance.dev

For more detailed information, see the [in-depth resource sharing doc](resource-sharing-in-depth.md).

Akri Agent also exposes all discovered resources at Configuration level. Configuration level resources can be referred by the name of Configuration so Configuration name can be used to requst resources without the need to know the specific Instances id to request. Agent will behind the scenes do the work of selecting which Instances to reserve.

For more detailed information about Configuration level resource, see the [Configuration-level resources doc](configuration-level-resource-in-depth.md).

## Resource discovery

The Agent discovers resources via Discovery Handlers (DHs). A Discovery Handler is anything that implements the `DiscoveryHandler` service defined in [`discovery.proto`](https://github.com/project-akri/akri/blob/main/discovery-utils/proto/discovery.proto). In order to be utilized, a DH must register with the Agent, which hosts the `Registration` service defined in [`discovery.proto`](https://github.com/project-akri/akri/blob/main/discovery-utils/proto/discovery.proto). The Agent maintains a list of registered DHs and their connectivity statuses, which is either `Waiting`, `Active`, or `Offline(Instant)`. When registered, a DH's status is `Waiting`. Once a Configuration requesting resources discovered by a DH is applied to the Akri-enabled cluster, the Agent will create a connection with the DH requested in the Configuration and set the status of the DH to `Active`. If the Agent is unable to connect or loses a connection with a DH, its status is set to `Offline(Instant)`. The `Instant` marks the time at which the DH became unresponsive. If the DH has been offline for more than 5 minutes, it is removed from the Agent's list of registered Discovery Handlers. If a Configuration is deleted, the Agent drops the connection it made with all DHs for that Configuration and marks the DHs' statuses as `Waiting`. Note, while probably not commonplace, the Agent allows for multiple DHs to be registered for the same protocol. IE: you could have two udev DHs running on a node on different sockets.
Expand Down
102 changes: 102 additions & 0 deletions docs/architecture/configuration-level-resource-in-depth.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Configuration-level Resources

Akri supports creating a Kubernetes resource (i.e. device plugin) for each individual device. Since each device in Akri is represented as an Instance custom resource, these are called Instance-level resources. Instance-level resources are named in the format `<configuration-name>-<instance-id>`. Akri also creates a Kubernetes Device Plugin for a Configuration called Configuration-level resource. A Configuration-level resource is a resource that represents all of the devices discovered via a Configuration. With Configuration-level resources, instead of needing to know the specific Instances to request, resources could be requested by the Configuration name and the Agent will do the work of selecting which Instances to reserve. The example below shows a deployment that requests the resource at Configuration level and would deploy a nginx broker to each discovered device respectively.

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: onvif-camera-broker-deployment
labels:
app: onvif-camera-broker
spec:
replicas: 1
selector:
matchLabels:
app: onvif-camera-broker
template:
metadata:
labels:
app: onvif-camera-broker
spec:
containers:
- name: onvif-camera-broker
image: nginx
resources:
limits:
akri.sh/onvif-camera: "2"
requests:
akri.sh/onvif-camera: "2"
```
With Configuration-level resources, users could use higher level Kubernetes objects (Deployments, ReplicaSets, DaemonSets, etc.) or develop their own deployment strategies, rather than relying on the Akri Controller to deploy Pods to discovered devices.
### Maintaining Device Usage
The [in-depth resource sharing doc](resource-sharing-in-depth.md) describes how the `Configuration.capacity` and `Instance.deviceUsage` are used to achieve resource sharing between nodes.
The same data is used to achieve sharing the same resource between Configuration-level and Instance-level resources.

The `Instance.deviceUsage` in Akri Instances is extended to support Configuration device plugin.
The `Instance.deviceUsage` may look like this:

```yaml
deviceUsage:
my-resource-00095f-0: ""
my-resource-00095f-1: ""
my-resource-00095f-2: ""
my-resource-00095f-3: "node-a"
my-resource-00095f-4: ""
```
where empty string means the slot is free and non-empty string indicates the slot is used (by the node). To support Configuration device plugin,
the `Instance.deviceUsage` format is extended to hold the additional information, the deviceUsage can be a "<node_name>" (for Instance) or a "C:<virtual_device_id>:<node_name>" (for
Configuration). For example, the `Instance.deviceUsage` shows the slot `my-resource-00095f-2` is used by virtual device id "0" of the
Configuration device plugin on `node-b`. The slot `my-resource-00095f-3` is used by Instance device plugin on `node-a`. The other 3 slots are
free.

```yaml
deviceUsage:
my-resource-00095f-0: ""
my-resource-00095f-1: ""
my-resource-00095f-2: "C:0:node-b"
my-resource-00095f-3: "node-a"
my-resource-00095f-4: ""
```

## Deployment Strategies with Configuration-level resources

The Akri Agent and Discovery Handlers enable device discovery and Kubernetes resource creation: they discover devices, create Kubernetes resources to represent the devices, and ensure only `capacity` containers are using a device at once via the device plugin framework. The Akri Controller eases device use. If a broker is specified in a Configuration, the Controller will automatically deploy Kubernetes Pods or Jobs to discovered devices. Currently the Controller only supports two deployment strategies: either deploying a non-terminating Pod (that Akri calls a "broker") to each Node that can see a device or deploying a single Job to the cluster for each device discovered. There are plenty of scenarios that do not fit these two strategies such as a ReplicaSet like deployment of n number of Pods to the cluster. With Configuration-level resources, users could easily achieve their own scenarios without the Akri Controller, as selecting resources is more declarative. A user specifies in a resource request how many OPC UA servers are needed rather than needing to delineate the exact ones already discovered by Akri, as explained in Akri's current documentation on [requesting Akri resources](../docs/user-guide/requesting-akri-resources.md).

For example, with Configuration-level resources, the following Deployment could be applied to a cluster:

```yaml
apiVersion: "apps/v1"
kind: Deployment
metadata:
name: onvif-broker-deployment
spec:
replicas: 2
selector:
matchLabels:
name: onvif-broker
template:
metadata:
labels:
name: onvif-broker
spec:
containers:
- name: nginx
image: "nginx:latest"
resources:
requests:
"akri.sh/akri-onvif": "2"
limits:
"akri.sh/akri-onvif": "2"
```


Pods will only be successfully scheduled to a Node and run if the resources exist and are available. In the case of the
above scenario, if there were two cameras on the network, two Pods would be deployed to the cluster. If there are not
enough resources, say there is only one camera on the network,
the two Pods will be left in a `Pending` state until another is discovered. This is the case with any deployment on
Kubernetes where there are not enough resources. However, `Pending` Pods do not use up cluster resources.
31 changes: 30 additions & 1 deletion docs/user-guide/requesting-akri-resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,36 @@ spec:
Apply your Deployment to the cluster and watch the broker start to run. If you inspect the Instance of the resource you requested in your deployment, you will see one of the slots has now been reserved by the node that is currently running the broker.
```bash
kubectl apply -f deployment-requesting-onvif-camera.yaml
kubectl apply -f deployment-requesting-onvif-camera.yaml
kubectl get akrii onvif-camera-<id> -o yaml
```

## Requesting resources at Configuration level
Akri also exposes all discovered devices as resources at Configuration level. Configuration level resources can be referred by the name of Configuration. With Configuration-level resources, instead of needing to know the specific Instances id `onvif-camera-<id>` to request, you can use Configuration name `<configuration-name>` to requst resources. Agent will behind the scenes do the work of selecting which Instances to reserve.

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: onvif-camera-broker-deployment
labels:
app: onvif-camera-broker
spec:
replicas: 1
selector:
matchLabels:
app: onvif-camera-broker
template:
metadata:
labels:
app: onvif-camera-broker
spec:
containers:
- name: onvif-camera-broker
image: nginx
resources:
limits:
akri.sh/onvif-camera: "1"
requests:
akri.sh/onvif-camera: "1"
```

0 comments on commit 5fd4328

Please sign in to comment.