-
Notifications
You must be signed in to change notification settings - Fork 110
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: add event-bus and integration docs
Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>
- Loading branch information
1 parent
f4e3656
commit d684d3b
Showing
3 changed files
with
316 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,149 @@ | ||
# Call-Home | ||
|
||
The metrics collected by Call-Home are documented on the [openebs website](https://openebs.io/docs/main/user-guides/replicated-storage-user-guide/replicated-pv-mayastor/additional-information/call-home-metrics). | ||
|
||
This document describes the original design of the call-home feature. | ||
|
||
## What do we send home (on each transmission)? | ||
|
||
- "Stable" Cluster Identity Information | ||
- A hash the [K8s] cluster UUID \ | ||
Don’t send the actual UUID as plain text (although we encrypt the report, so maybe this is irrelevant) | ||
|
||
- [K8s] namespace of the installed chart | ||
- We can infer this from the call-home pod itself (eg: `$.metadata.namespace`) | ||
- hash the value \ | ||
Don’t send the user’s actual namespace name | ||
|
||
- Product Identity | ||
- A string value, to distinguish the reports being sent, e.g. "Mayastor" \ | ||
This may allow extending call-home to other OpenEBS sub-projects | ||
|
||
- Product version | ||
|
||
We take the control-plane version from the core-agent as we support running with multiple data-plane versions for upgrade scenarios | ||
|
||
- Deployment Scale Metrics: | ||
- count (live state) of: | ||
- volumes | ||
- pools | ||
- nodes | ||
- replicas | ||
|
||
This information is state-based, which can be taken from the control-plane (via the public REST API) as required (once per 24 hours won’t cause excessive load on the control-plane) | ||
|
||
- Volume Size Metrics (Provisioned Size) | ||
- maximum size | ||
- minimum size | ||
- average size | ||
- percentiles: 50%, 75%, 90% | ||
|
||
This information is state-based, which can be taken from the control-plane (via the public REST API) as required (once per 24 hours won’t cause excessive load on the control-plane) | ||
|
||
- Pool Size Metrics (Total Capacity) | ||
- maximum capacity | ||
- minimum capacity | ||
- average capacity | ||
- percentiles: 50%, 75%, 90% | ||
|
||
This information is state-based, which can be taken from the control-plane (via the public REST API) as required (once per 24 hours won’t cause excessive load on the control-plane) | ||
|
||
- Health Metrics | ||
|
||
To be defined in the "Health Metrics" section, below | ||
|
||
- Churn Metrics | ||
|
||
For both the past 60 minute and past 24hr periods: | ||
- count of volumes created | ||
- count of volumes deleted | ||
- count of replicas created | ||
- count of replicas deleted | ||
|
||
These require time series calculation and storage, to be performed by the "Statistics Module" (see below) | ||
|
||
## Where are we sending it to? | ||
|
||
- Remote "Health Reports" S3 bucket: <https://openebs.phonehome.datacore.com/openebs/report> | ||
- It's sent as an encrypted [JSON] document | ||
|
||
## When / How frequently do we send the information? | ||
|
||
- On first start of the telemetry pod | ||
- Every 24 hours thereafter (default) | ||
|
||
## Solution Components | ||
|
||
A modular architecture, decoupling the call-home transmission from the actual collection of data and from the (user) Health metrics exporter (Prometheus scrapes). | ||
|
||
This makes it very simple for the user to disable call-home report sending, which is a requirement as some users may not be comfortable with sending | ||
this information out. | ||
|
||
### Event Bus | ||
|
||
Any call-home required events (e.g. volume/replica creation and deletion) are collected from the event bus. | ||
|
||
These events should be raised within the appropriate code path, data-plane / control-plane. | ||
|
||
The Metrics / Statistics modules subscribe to these event/message topics and process the messages, updating their internal metrics and statistical values accordingly. | ||
|
||
The event bus need no be persistent since an event loss does not affect the products main functionality, though it may impair its traceability. | ||
|
||
### Events (control plane and/or data plane) | ||
|
||
There are creation/deletion events for various resources, eg: | ||
|
||
- create/delete volumes | ||
- create/delete replicas | ||
|
||
As well as edge-triggered events such as: | ||
|
||
- Volume degraded | ||
- Replica faulted | ||
|
||
### Health Statistics Component | ||
|
||
A component whose role is to store time series information for call-home and health metrics: | ||
|
||
- metrics which can’t be simply derived from current state of the control-plane or data-plane components | ||
- i.e. number of replicas created since installation | ||
- volume creation/deletion count | ||
- replica creation/deletion count | ||
- replica fault count | ||
- volume degradation count | ||
|
||
In order to do this, this component subscribes to the events via the message bus. | ||
|
||
Since we we don't want the users to provision additional storage for this component, we're constrained to storing all calculated data in memory. | ||
However, since the accumulator data itself is relatively small, we can persist it to a [K8s] `configmap`. | ||
|
||
### Health Metrics Exporter Component | ||
|
||
A side-car to the health stats, collecting the data and exporting it in a [prometheus] format. | ||
This serves as a dual purpose, allowing users's observability tools to scrape this information should they want to. | ||
|
||
A cache can be used to avoid overloading the control-plane since this type of data is not expected to be varying at high rates anyway. \ | ||
This cache should be an optional component. | ||
|
||
### Call Home Component | ||
|
||
A stateless component which acts as a collector and transmitter of data. | ||
|
||
The information which is transmitted is obtained from various sources: | ||
|
||
- Control-plane/REST API | ||
|
||
Mostly for state information (eg: disk types on nodes) | ||
As implied, this is retried via the public REST OpenAPI | ||
|
||
- Statistics Module: | ||
- time series stats | ||
- health metrics stats | ||
|
||
This is retrieved from the stats module by scraping its [prometheus] export. | ||
|
||
The collected data is then combined in a single [JSON] document which is encrypted before sending it the reporting endpoint. | ||
|
||
[K8s]: https://k8s.io | ||
[JSON]: https://datatracker.ietf.org/doc/html/rfc7159 | ||
[prometheus]: https://prometheus.io/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,155 @@ | ||
# Event Bus | ||
|
||
As part of mayastor we wanted to have event driven capabilities which can allow us to respond to certain events and perform specific actions. \ | ||
A message bus ([NATS]) was initially used in early versions of mayastor, but due to several reasons such as some bugs in the client libraries and how we used them (to be fair!), we ended up temporarily moving away from it in favour of p2p [gRPC] (yes we were probably using the wrong stick if many cases). \ | ||
As a result of it, we ended up with high coupling between components, such as the io-engine and the core-agent. | ||
|
||
With that out of the way, we still believe a message bus is a good solution for many use cases within mayastor: | ||
|
||
1. Event driven reconcilers | ||
2. Event accruing for metrics | ||
3. Fault diagnostics system | ||
4. etc | ||
|
||
> **NOTE**: What's a message bus after all? It's a messaging system that allows applications to communicate with each other by sending and receiving messages. It acts as a broker that routes messages between senders and receivers which are loosely coupled. | ||
## Enter NATS Jetstream | ||
|
||
We've compared several options and ended up selecting [NATS] (again!) as the message bus for our eventing system. | ||
|
||
"NATS has a built-in persistence engine called [Jetstream] which enables messages to be stored and replayed at a later time. Unlike NATS Core which requires you to have an active subscription to process messages as they happen, JetStream allows the NATS server to capture messages and replay them to consumers as needed. This functionality enables a different quality of service for your NATS messages, and enables fault-tolerant and high-availability configurations." | ||
|
||
### Pros of NATS | ||
|
||
- Always on and available (Highly Available) | ||
- Low CPU-consuming | ||
- Fast: A high-velocity communication bus | ||
- High scalability | ||
- Light-weight | ||
- Supports wildcard-based subjects subscription | ||
|
||
### Cons of NATS | ||
|
||
- Fire and forget in the case of Core NATS but with JetStream it provides ‘at least once’ and ‘exactly once’ delivery guarantees | ||
- No persistence in the Core NATS but it is possible with JetStream | ||
|
||
--- | ||
|
||
We don't currently have a requirement for a messaging queue where order is important, nor do we rely or this information to be persistent. \ | ||
However, for optimum functionality we prefer a highly available deployment ensuring smooth operation of the event consumers. | ||
|
||
We deploy a highly available Nats with Jetstream enabled, but with an in-memory storage configuration. | ||
Here's how we configure via its helm chart: | ||
|
||
```yaml | ||
nats: | ||
jetstream: | ||
enabled: true | ||
memStorage: | ||
enabled: true | ||
size: "5Mi" | ||
fileStorage: | ||
enabled: false | ||
cluster: | ||
enabled: true | ||
replicas: 3 | ||
``` | ||
## Events | ||
Here we list the events which we're currently publishing on the event bus. | ||
### Volume Events | ||
| Category | Action | Source | Description | | ||
|----------|--------|---------------|--------------------------------------------------| | ||
| Volume | Create | Control plane | Generated when a volume is successfully created | | ||
| Volume | Delete | Control plane | Generated when a volume is successfully deleted | | ||
### Replica Events | ||
| Category | Action | Source | Description | | ||
|----------|--------------|------------|--------------------------------------------------| | ||
| Replica | Create | Data plane | Generated when a replica is successfully created | | ||
| Replica | Delete | Data plane | Generated when a replica is successfully deleted | | ||
| Replica | StateChange | Data plane | Created upon a change in replica state | | ||
### Pool Events | ||
| Category | Action | Source | Description | | ||
|----------|--------|------------|------------------------------------------------| | ||
| Pool | Create | Data plane | Generated when a pool is successfully created | | ||
| Pool | Delete | Data plane | Generated when a pool is successfully deleted | | ||
### Nexus Events | ||
| Category | Action | Source | Description | | ||
|-------------------|-------------------|------------|------------------------------------------------------| | ||
| Nexus | Create | Data plane | Created when a nexus is successfully created | | ||
| Nexus | Delete | Data plane | Created when a nexus is successfully deleted | | ||
| Nexus | StateChange | Data plane | Created upon a change in nexus state | | ||
| Nexus | RebuildBegun | Data plane | Created when a nexus child rebuild operation begins | | ||
| Nexus | RebuildEnd | Data plane | Created when a nexus child rebuild operation ends | | ||
| Nexus | AddChild | Data plane | Created when a child is added to nexus | | ||
| Nexus | RemoveChild | Data plane | Created when a child is removed from nexus | | ||
| Nexus | OnlineChild | Data plane | Created when a nexus child becomes online | | ||
| Nexus | SubsystemPause | Data plane | Created when an I/O subsystem is paused | | ||
| Nexus | SubsystemResume | Data plane | Created when an I/O subsystem is resumed | | ||
| Nexus | Init | Data plane | Created when nexus enters into init state | | ||
| Nexus | Reconfiguring | Data plane | Created when nexus enters into reconfiguring state | | ||
| Nexus | Shutdown | Data plane | Created when a nexus is destroyed | | ||
### Node Events | ||
| Category | Action | Source | Description | | ||
|-----------|-------------|---------------|----------------------------------------------| | ||
| Node | StateChange | Control plane | Created upon a change in node state | | ||
### High Availability Events | ||
| Category | Action | Source | Description | | ||
|--------------------|-------------|---------------|------------------------------------------------------------------------| | ||
| HighAvailability | SwitchOver | Control plane | Created when a volume switch over operation starts, fails or completes | | ||
### Nvme Path Events | ||
| Category | Action | Source | Description | | ||
|------------|-----------------|---------------|---------------------------------------------------------| | ||
| NvmePath | NvmePathSuspect | Control plane | Created when an NVMe path enters into suspect state | | ||
| NvmePath | NvmePathFail | Control plane | Created when an NVMe path transitions into failed state | | ||
| NvmePath | NvmePathFix | Control plane | Created when an NVMe controller reconnects to a nexus | | ||
### Host Initiator Events | ||
| Category | Action | Source | Description | | ||
|----------------|-----------------------|------------|----------------------------------------------------------| | ||
| HostInitiator | NvmeConnect | Data plane | Created upon a host connection to a nexus | | ||
| HostInitiator | NvmeDisconnect | Data plane | Created upon a host disconnection to a nexus | | ||
| HostInitiator | NvmeKeepAliveTimeout | Data plane | Created upon a host keep alive timeout (KATO) on a nexus | | ||
### IO-Engine Events | ||
| Category | Action | Source | Description | | ||
|-------------------|-----------------|------------|----------------------------------------------------| | ||
| IoEngineCategory | Start | Data plane | Created when io-engine initializes | | ||
| IoEngineCategory | Shutdown | Data plane | Created when io-engine shutdown starts | | ||
| IoEngineCategory | Stop | Data plane | Created when an io-engine is stopped | | ||
| IoEngineCategory | ReactorUnfreeze | Data plane | Created when an io-engine reactor is healthy again | | ||
| IoEngineCategory | ReactorFreeze | Data plane | Created when an io-engine reactor is frozen | | ||
### Snapshot and Clone Events | ||
| Category | Action | Source | Description | | ||
|-----------|--------|------------|-------------------------------------------------| | ||
| Snapshot | Create | Data plane | Created when a snapshot is successfully created | | ||
| Clone | Create | Data plane | Created when a clone is successfully created | | ||
## Consumers | ||
- [x] call-home | ||
- [x] e2e testing | ||
- [ ] support dump (kubectl-plugin) | ||
[NATS]: https://nats.io/ | ||
[Jetstream]: https://docs.nats.io/nats-concepts/jetstream | ||
[gRPC]: https://grpc.io/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Integrations with other projects [ WIP ] | ||
|
||
| Technology | Integration | Description | | ||
|:---------|:------------------:|:------:| | ||
| [SPDK](https://spdk.io/) | [spdk-rs](https://github.com/openebs/spdk-rs) <br> [io-engine](./design/mayastor.md) | Mayastor uses SPDK to build a high-speed low-latency storage backend | | ||
| [gRPC](https://grpc.io/) | | Used as internal service communication | | ||
| [etcd](https://etcd.io/) | | Used as persistent configuration (not volume data) | | ||
| [NATS](https://nats.io/) | | Used as event bus | | ||
| [OpenTelemetry](https://opentelemetry.io/) | [Tracing](./design/control-plane.md#distributed-tracing) | Tracing system for observability | | ||
| [Helm](https://helm.sh/) | [Install Guide](https://openebs.io/docs/quickstart-guide/installation#installation-via-helm) | Installs/upgrades on K8s cluster | | ||
| [Prometheus](https://prometheus.io/) | [Monitoring](https://openebs.io/docs/user-guides/replicated-storage-user-guide/replicated-pv-mayastor/advanced-operations/monitoring) | Export stats | | ||
| [Kubernetes](https://kubernetes.io/) | [Install Guide](https://openebs.io/docs/quickstart-guide/installation) | Runs on K8s | |