Skip to content

Commit

Permalink
Update Flow Aggregator documentation with v2.3 changes
Browse files Browse the repository at this point in the history
We include information about the new Proxy mode, as well as about the
supported version skew between Antrea Agent and Flow Aggregator.

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
  • Loading branch information
antoninbas committed Feb 25, 2025
1 parent 87def6d commit 8c28629
Show file tree
Hide file tree
Showing 2 changed files with 137 additions and 44 deletions.
1 change: 1 addition & 0 deletions ci/jenkins/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,7 @@ DOCKER_REGISTRY="$(head -n1 ci/docker-registry)"

* [matrix-test [weekly]](https://jenkins.antrea.io/job/antrea-weekly-matrix-compatibility-test/):
runs Antrea e2e, K8s Conformance and NetworkPolicy tests, using different combinations of various operating systems and K8s releases.

| K8s Version | Node OS | Status |
| :------------: | :-------------: | :------: |
| 1.17.5 | CentOS 7 |[![Build Status](https://jenkins.antrea.io/buildStatus/icon?job=antrea-weekly-matrix-compatibility-test%2FIS_MATRIX_TEST%3DTrue%2CK8S_VERSION%3Dv1.17.5%2CTEST_OS%3Dcentos-7%2Clabels%3Dantrea-test-node)](https://jenkins.antrea.io/job/antrea-weekly-matrix-compatibility-test/IS_MATRIX_TEST=True,K8S_VERSION=v1.17.5,TEST_OS=centos-7,labels=antrea-test-node/)|
Expand Down
180 changes: 136 additions & 44 deletions docs/network-flow-visibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,22 @@
- [Types of Flows and Associated Information](#types-of-flows-and-associated-information)
- [Connection Metrics](#connection-metrics)
- [Flow Aggregator](#flow-aggregator)
- [Deployment](#deployment)
- [Configuration](#configuration-1)
- [Configuring secure connections to the ClickHouse database](#configuring-secure-connections-to-the-clickhouse-database)
- [Example of flow-aggregator.conf](#example-of-flow-aggregatorconf)
- [IPFIX Information Elements (IEs) in an Aggregated Flow Record](#ipfix-information-elements-ies-in-an-aggregated-flow-record)
- [IEs from Antrea IE Registry](#ies-from-antrea-ie-registry-1)
- [Supported Capabilities](#supported-capabilities-1)
- [Storage of Flow Records](#storage-of-flow-records)
- [Correlation of Flow Records](#correlation-of-flow-records)
- [Aggregation of Flow Records](#aggregation-of-flow-records)
- [Antctl Support](#antctl-support)
- [Deciding which mode to use](#deciding-which-mode-to-use)
- [Aggregate Mode](#aggregate-mode)
- [Installation](#installation)
- [Configuring secure connections to the ClickHouse database](#configuring-secure-connections-to-the-clickhouse-database)
- [Example of flow-aggregator.conf](#example-of-flow-aggregatorconf)
- [IPFIX Information Elements (IEs) in an Aggregated Flow Record](#ipfix-information-elements-ies-in-an-aggregated-flow-record)
- [IEs from Antrea IE Registry](#ies-from-antrea-ie-registry-1)
- [Supported Capabilities](#supported-capabilities-1)
- [Storage of Flow Records](#storage-of-flow-records)
- [Correlation of Flow Records](#correlation-of-flow-records)
- [Aggregation of Flow Records](#aggregation-of-flow-records)
- [Antctl Support](#antctl-support)
- [Proxy Mode](#proxy-mode)
- [Installation](#installation-1)
- [IPFIX Information Elements (IEs) in an Proxied Flow Record](#ipfix-information-elements-ies-in-an-proxied-flow-record)
- [Version skew between Flow Aggregator and Antrea Agent](#version-skew-between-flow-aggregator-and-antrea-agent)
- [Quick Deployment](#quick-deployment)
- [Image-building Steps](#image-building-steps)
- [Deployment Steps](#deployment-steps)
Expand Down Expand Up @@ -243,38 +248,62 @@ through [Antrea Agent apiserver endpoint](prometheus-integration.md):

## Flow Aggregator

Flow Aggregator is deployed as a Kubernetes Service. The main functionality of Flow
Aggregator is to store, correlate and aggregate the flow records received from the
Flow Exporter of Antrea Agents. More details on the functionality are provided in
the [Supported Capabilities](#supported-capabilities-1) section.

Flow Aggregator is implemented as IPFIX mediator, which
consists of IPFIX Collector Process, IPFIX Intermediate Process and IPFIX Exporter
Process. We use the [go-ipfix](https://github.com/vmware/go-ipfix) library to implement
the Flow Aggregator.

### Deployment

To deploy a released version of Flow Aggregator Service, pick a deployment manifest from the
The Flow Aggregator consists of a K8s Deployment and Service. It has 2 main
modes of operation:

* `Aggregate` (default): in this mode, the Flow Aggregator stores, correlates
and aggregates the flow records received from the Flow Exporter of Antrea
Agents. For more information about this mode, including installation
instructions, refer to the [Aggregate Mode section](#aggregate-mode).
* `Proxy`: in this mode, the Flow Aggregator operates statelessly, and the flow
records received from the Flow Exporter of Antrea Agents are sent directly to
an IPFIX collector, without buffering or correlation / aggregation. For more
information about this mode, including installation instructions, refer to the
[Proxy Mode section](#proxy-mode).

The Flow Aggregator is implemented as an IPFIX mediator. It consists of an IPFIX
Collector Process, an IPFIX Intermediate Process, and an IPFIX Exporter
Process. We use the [go-ipfix](https://github.com/vmware/go-ipfix) library to
implement the Flow Aggregator. The "Intermediate Process" differs greatly based
on whether the Flow Aggregator operates in `Aggregate` or `Proxy` mode.

### Deciding which mode to use

If you are looking to export "raw" IPFIX records to an external IPFIX collector,
and want to minimize the footprint of the Flow Aggregator, you may be interested
in the `Proxy` mode. Note that the external IPFIX collector will be responsible
for correlating flow records generated by the source and destination K8s Nodes,
based on the 5-tuple (if the source and destination Nodes are the same, flow
records will be sent by a single Flow Exporter). Note that the same information
will not be present based on whether the record was exported from the source or
destination. For example, for Pod-to-Service traffic, destination Service
information will only be included in the source records. See
[#6773](https://github.com/antrea-io/antrea/issues/6773) for more background on
why `Proxy` mode was introduced. Note that `Proxy` mode cannot be used with a
non-IPFIX destination collector.

Otherwise, the default `Aggregate` mode is probably right for you. It supports
all available destination collectors (e.g., ClickHouse), and its output is
easier to consume as the source and destination records have already been
correlated / aggregated.

### Aggregate Mode

#### Installation

To deploy a released version of the Flow Aggregator, pick a deployment manifest from the
[list of releases](https://github.com/antrea-io/antrea/releases). For any
given release `<TAG>` (e.g. `v0.12.0`), you can deploy Flow Aggregator as follows:
given release `<TAG>` (e.g. `v2.3.0`), you can deploy Flow Aggregator as follows:

```bash
kubectl apply -f https://github.com/antrea-io/antrea/releases/download/<TAG>/flow-aggregator.yml
```

To deploy the latest version of Flow Aggregator Service (built from the main branch), use the
checked-in [deployment yaml](../build/yamls/flow-aggregator.yml):

```bash
kubectl apply -f https://raw.githubusercontent.com/antrea-io/antrea/main/build/yamls/flow-aggregator.yml
```

### Configuration
We recommend updating the manifest with your desired configuration before
applying it.

The following configuration parameters have to be provided through the Flow
Aggregator ConfigMap. Flow aggregator needs to be configured with at least one
of the supported [Flow Collectors](#flow-collectors).
Aggregator ConfigMap.
`flowCollector` is mandatory for [go-ipfix collector](#deployment-steps), and
`clickHouse` is mandatory for [Grafana Flow Collector](#grafana-flow-collector-migrated).
We provide an example value for this parameter in the following snippet.
Expand All @@ -290,7 +319,7 @@ configuration is required. If a different FQDN or IP is desired, please use
the URL for `clickHouse.databaseURL` in the following format:
`<protocol>://<ClickHouse server FQDN or IP>:<ClickHouse port>`.

#### Configuring secure connections to the ClickHouse database
##### Configuring secure connections to the ClickHouse database

Starting with Antrea v1.13, you can enable TLS when connecting to the ClickHouse
Server by setting `clickHouse.databaseURL` with protocol `tls` or `https`.
Expand Down Expand Up @@ -324,7 +353,7 @@ Prior to Antrea v1.13, secure connections to ClickHouse are not supported,
and TCP is the only supported protocol when connecting to the ClickHouse
server from the Flow Aggregator.

#### Example of flow-aggregator.conf
##### Example of flow-aggregator.conf

```yaml
flow-aggregator.conf: |
Expand Down Expand Up @@ -453,12 +482,12 @@ that flow aggregator has a cache limit of ~500k records for ClickHouse-Grafana
collector. If `clickHouse.commitInterval` is set to a value too large, there's
a risk of losing records.

### IPFIX Information Elements (IEs) in an Aggregated Flow Record
#### IPFIX Information Elements (IEs) in an Aggregated Flow Record

In addition to IPFIX information elements provided in the [above section](#ipfix-information-elements-ies-in-a-flow-record),
the Flow Aggregator adds the following fields to the flow records.

#### IEs from Antrea IE Registry
##### IEs from Antrea IE Registry

| IPFIX Information Element | Field ID | Type | Description |
|-------------------------------------------|----------|-------------|-------------|
Expand Down Expand Up @@ -488,18 +517,19 @@ the Flow Aggregator adds the following fields to the flow records.
| reverseThroughputFromDestinationNode | 150 | unsigned64 | The average amount of reverse traffic flowing from destination to source, since the previous report for this flow at the observation point, based on the records sent from the destination Node. The unit is bits per second. |
| flowEndSecondsFromSourceNode | 151 | unsigned32 | The absolute timestamp of the last packet of this flow, based on the records sent from the source Node. The unit is seconds. |
| flowEndSecondsFromDestinationNode | 152 | unsigned32 | The absolute timestamp of the last packet of this flow, based on the records sent from the destination Node. The unit is seconds. |
| clusterId | 158 | string | UUID of the cluster as generated by the Antrea Controller, in string format. |

### Supported Capabilities
#### Supported Capabilities

#### Storage of Flow Records
##### Storage of Flow Records

Flow Aggregator stores the received flow records from Antrea Agents in a hash map,
where the flow key is 5-tuple of a network connection. 5-tuple consists of Source IP,
Destination IP, Source Port, Destination Port and Transport protocol. Therefore,
Flow Aggregator maintains one flow record for any given connection, and this flow
record gets updated till the connection in the Kubernetes cluster becomes invalid.

#### Correlation of Flow Records
##### Correlation of Flow Records

In the case of inter-Node flows, there are two flow records, one
from the source Node, where the flow originates from, and another one from the destination
Expand All @@ -509,7 +539,7 @@ Aggregator provides support for the correlation of the flow records from the
source Node and the destination Node, and it exports a single flow record with complete
information for both inter-Node and intra-Node flows.

#### Aggregation of Flow Records
##### Aggregation of Flow Records

Flow Aggregator aggregates the flow records that belong to a single connection.
As part of aggregation, fields such as flow timestamps, flow statistics etc. are
Expand All @@ -518,12 +548,74 @@ the [new fields](#ies-from-antrea-ie-registry) in Antrea Enterprise IPFIX regist
corresponding to the Source Node and Destination Node, so that flow statistics from
different Nodes can be preserved.

### Antctl Support
#### Antctl Support

antctl can access the Flow Aggregator API to dump flow records and print metrics
about flow record processing. Refer to the
[antctl documentation](antctl.md#flow-aggregator-commands) for more information.

### Proxy Mode

#### Installation

To deploy a released version of the Flow Aggregator, pick a deployment manifest from the
[list of releases](https://github.com/antrea-io/antrea/releases). For any
given release `<TAG>` (e.g. `v2.3.0`), you can deploy Flow Aggregator as follows:

```bash
kubectl apply -f https://github.com/antrea-io/antrea/releases/download/<TAG>/flow-aggregator.yml
```

We recommend updating the manifest with your desired configuration before
applying it. In order to proxy records to an external collector, you will need
to populate the `flowCollector` section of the Flow Aggregator ConfigMap.

Alternatively, you can use Helm to easily install the latest released version of
the Flow Aggregator:

```bash
# If you are installing an Antrea Helm chart for the first time:
helm repo add antrea https://charts.antrea.io
# To update available charts:
helm repo update
helm install flow-aggregator antrea/flow-aggregator --set mode=Proxy,flowCollector.enable=true,flowCollector.address="<addr:port:proto>" -n flow-aggregator --create-namespace
```

In `Proxy` mode, not all configuration parameters are applicable.
`flowCollector` is the only supported collector, and `activeFlowRecordTimeout` /
`inactiveFlowRecordTimeout` are not applicable.

#### IPFIX Information Elements (IEs) in an Proxied Flow Record

In addition to IPFIX information elements provided in the [above section](#ipfix-information-elements-ies-in-a-flow-record),
the Flow Aggregator adds the following fields to the flow records before
forwarding them to the external collector:

| IPFIX Registry | IPFIX Information Element | Field ID | Type | Description |
|----------------|-------------------------------------------|----------|-------------|-------------|
| Antrea | sourcePodLabels* | 143 | string | K8s labels for the source Pod *if `recordContents.podLabels` is `true`. |
| | destinationPodLabels* | 144 | string | K8s labels for the destination Pod *if `recordContents.podLabels` is `true`. |
| | clusterId | 158 | string | UUID of the cluster as generated by the Antrea Controller, in string format. |
| IANA | flowDirection | 61 | unsigned8 | The direction of the flow as observed by the Flow Exporter: `0x00` (ingress flow), `0x01` (egress flow), `0xff` (direction N/A such as for intra-Node flows). |
| | originalExporterIPv4Address | 403 | ipv4Address | The IPv4 address (if any) used by the Flow Exporter in the Antrea Agent. |
| | originalExporterIPv4Address | 404 | ipv6Address | The IPv6 address (if any) used by the Flow Exporter in the Antrea Agent. |
| | originalObservationDomainId | 405 | unsigned32 | The Observation Domain ID originally reported by the Flow Exporter in the Antrea Agent. |

### Version skew between Flow Aggregator and Antrea Agent

As a rule, we recommend keeping the Flow Aggregator and the Antrea Agent at the
same (minor) version. During upgrades, there will be a small time windows during
which this is not possible. Prior to Antrea v2.3, the Flow Aggregator Pod may
crash (and restart) in case of version mismatch, e.g., because of the
introduction of new new Information Elements. Starting with Antrea v2.3, the
Flow Aggregator should be able to handle older or newer Agents gracefully. If
possible, we do recommend upgrading the Flow Aggregator Deployment last (i.e.,
after all Antrea Agents have been upgraded). The version skew between the Flow
Aggregator and Antrea Agents should be at most 4 minor versions, which also
corresponds to the maximum supported version "delta" for Antrea upgrades, as per
our [versioning policy](versioning.md).

## Quick Deployment

If you would like to quickly try Network Flow Visibility feature, you can deploy
Expand Down

0 comments on commit 8c28629

Please sign in to comment.