Skip to content

Commit

Permalink
metrics doc updates
Browse files Browse the repository at this point in the history
  • Loading branch information
maeddes committed Jul 12, 2024
1 parent 9159f5d commit 3d47dcf
Showing 1 changed file with 141 additions and 69 deletions.
210 changes: 141 additions & 69 deletions tutorial/content/exercises/instrumentation/manual/metrics/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,23 +40,76 @@ In this lab, we output metrics to the local console to keep things simple.
The environment consists of a Python service that we want to instrument.
It is built with the [Flask](https://flask.palletsprojects.com) web framework, listens on port 5000 and serves serveral HTTP endpoints.

To work on this lab, **open two terminals**.
* This exercise is based on the following repository [repository](https://github.com/NovatecConsulting/opentelemetry-training/)
* All exercises are in the subdirectory `exercises`. There is also an environment variable `$EXERCISES` pointing to this directory. All directories given are relative to this one.
* Initial directory: `manual-instrumentation-metrics/initial`
* Solution directory: `manual-instrumentation-metrics/solution`
* Python source code: `manual-instrumentation-metrics/initial/src`
The environment consists of two components:
1. a Python service
- uses [Flask](https://flask.palletsprojects.com) web framework
- listens on port 5000 and serves serveral HTTP endpoints
- simulates an application we want to instrument
2. echo server
- listens on port 6000, receives requests and sends them back to the client
- called by the Python application to simulate communication with a remote service
- allows us to inspect outbound requests

To work on this lab, **open three terminals**.
1. Terminal to run the echo server

Navigate to

```sh
cd $EXERCISES
cd manual-instrumentation-metrics/initial
```

Start the echo server using
```sh
docker compose up
```

2. Terminal to run the application and view it's output

Change to the Python source directory
```sh
cd $EXERCISES
cd manual-instrumentation-metrics/initial/src
```

Start the Python app/webserver
```sh
python app.py
```

3. Terminal to send request to the HTTP endpoints of the service

The directory doesn't matter here

Test the Python app:
```sh
curl -XGET localhost:5000; echo
```

1. to start the application and view it's output
- navigate to `cd exercises/manual-instrumentation-metrics/initial/src`
- to start the webserver run `python app.py`, to terminate it hit: `CTRL + C`
2. Start the echo server `docker compose up -d`
3. to send request to the HTTP endpoints of the service
- for example: `curl -XGET localhost:5000; echo`

To keep things concise, code snippets only contain what's relevant to that step.
If you get stuck, you can find the solution in the `exercises/manual-instrumentation-metrics/solution`

---

### configure metrics pipeline and obtain a meter
### Configure metrics pipeline and obtain a meter

```py
Let's create a new file `metric_utils.py` inside the `src` directory.
We'll use it to bundle configuration related to the metrics signal.
At the top of the file, specify the following imports for OpenTelemetry's metrics SDK.
Then, create a new function `create_metrics_pipeline`.
In a production scenario, one would deploy a backend to store and a front-end to analyze time-series data.
Create a `ConsoleMetricExporter` to write a JSON representation of metrics generated by the SDK to stdout.

Next, we instantiate a `PeriodicExportingMetricReader` that collects metrics at regular intervals and passes them to the exporter. Add the following code to the file `metric_utils.py`.

```py { title="metric_utils.py" }
# OTel SDK
from opentelemetry.sdk.metrics.export import (
ConsoleMetricExporter,
Expand All @@ -72,18 +125,12 @@ def create_metrics_pipeline(export_interval: int) -> MetricReader:
return reader
```

Let's create a new file `metric_utils.py` inside the `src` directory.
We'll use it to bundle configuration related to the metrics signal.
At the top of the file, specify the following imports for OpenTelemetry's metrics SDK.
Then, create a new function `create_metrics_pipeline`.
In a production scenario, one would deploy a backend to store and a front-end to analyze time-series data.
Create a `ConsoleMetricExporter` to write a JSON representation of metrics generated by the SDK to stdout.

<!-- TODO -->

Next, we instantiate a `PeriodicExportingMetricReader` that collects metrics at regular intervals and passes them to the exporter.
Then, define a new function `create_meter`.
To obtain a `Meter` we must first create a `MeterProvider`.
To connect the MeterProvider to our metrics pipeline, pass the `PeriodicExportingMetricReader` to the constructor.
Use the metrics API to register the global MeterProvider and retrieve the meter. Extend the file with the following code:

```py
```py { title="metric_utils.py" }
# OTel API
from opentelemetry import metrics as metric_api

Expand All @@ -103,14 +150,12 @@ def create_meter(name: str, version: str) -> metric_api.Meter:
return meter
```

Then, define a new function `create_meter`.
To obtain a `Meter` we must first create a `MeterProvider`.
To connect the MeterProvider to our metrics pipeline, pass the `PeriodicExportingMetricReader` to the constructor.
Use the metrics API to register the global MeterProvider and retrieve the meter.

<!-- TODO re-use resource_utils -->

```py
Finally, open `app.py` and import `create_meter`.
Invoke the function and assign the return value to a global variable `meter`.

```py { title="app.py" }
# custom
from metric_utils import create_meter

Expand All @@ -119,10 +164,7 @@ app = Flask(__name__)
meter = create_meter("app.py", "0.1")
```

Finally, open `app.py` and import `create_meter`.
Invoke the function and assign the return value to a global variable `meter`.

### create instruments to record measurements
### Create instruments to record measurements

As you have noticed, thus far, everything was fairly similar to the tracing lab.
However, in contrast to tracers, we do not use meters directly to generate metrics.
Expand All @@ -145,7 +187,12 @@ Each type of instrument, except for histograms, has a synchronous and asynchrono

For now, we will focus on the basic concepts and keep things simple, but as you become more familiar with OpenTelemetry, you will be able to leverage these components to create more sophisticated metric collection and analysis strategies.

```py
In `metric_utils.py` add a new function `create_request_instruments`.
Here, we'll define workload-related instruments for the application.
As a first example, use the `meter` to create a `Counter` instrument to measure the number of requests to the `/` endpoint.
Every instrument must have a `name`, but we'll also supply the `unit` of measurement and a short `description`.

```py { title="metric_utils.py" }
def create_request_instruments(meter: metric_api.Meter) -> dict[str, metric_api.Instrument]:
index_counter = meter.create_counter(
name="index_called",
Expand All @@ -159,13 +206,14 @@ def create_request_instruments(meter: metric_api.Meter) -> dict[str, metric_api.
return instruments
```

In `metric_utils.py` and add a new function `create_request_instruments`.
Here, we'll define workload-related instruments for the application.
As a first example, use the `meter` to create a `Counter` instrument to measure the number of requests to the `/` endpoint.
Every instrument must have a `name`, but we'll also supply the `unit` of measurement and a short `description`.
For analysis tools to interpret the metric correctly, the name should follow OpenTelemetry's [semantic conventions](https://opentelemetry.io/docs/specs/semconv/general/metrics/) and the unit should follow the [Unified Code for Units of Measure (UCUM)](https://opentelemetry.io/docs/specs/semconv/general/metrics/).

```py
Now that we have defined our first instrument, import the helper function into `app.py`.
Let's generate some metrics.
Call `create_request_instruments` in the file's main section and assign it to a global variable.
In our `index` function, reference the counter instrument and call the `add` method to increment its value.

```py { title="app.py" }
from metric_utils import create_meter, create_request_instruments

@app.route("/", methods=["GET", "POST"])
Expand All @@ -178,11 +226,18 @@ if __name__ == "__main__":
# ...
```

Now that we have defined our first instrument, import the helper function into `app.py`.
Let's generate some metrics.
Call `create_request_instruments` in the file's main section and assign it to a global variable.
In our `index` function, reference the counter instrument and call the `add` method to increment its value.
Start the web server with `python app.py` and use the second terminal to send a request to `/` via `curl -XGET localhost:5000; echo`.
Start the web server using
```sh
python app.py
```

Use the second terminal to send a request to `/` via

```bash
curl -XGET localhost:5000; echo
```

Observe the result:

```json
"resource": { // <- origin
Expand Down Expand Up @@ -239,9 +294,14 @@ Second, the `data` second contains a list of `data_points`, which are measuremen
Each measurement typically consists of a `value`, `attributes`, and a `timestamp`.
The `aggregation_temporality` indicates whether the metric is cumulative, and `is_monotonic` specifies whether the metric only increases (or decreases, in the case of a gauge). This model is designed to be flexible and extensible, ensuring compatibility with existing monitoring systems and standards like Prometheus and StatsD, facilitating interoperability with various monitoring tools.

### metric dimensions
### Metric dimensions

```py
So far, we only used the `add` method to increment the counter.
However, `add` also has a second optional parameter to specify attributes.
This brings us to the topic of metric dimensions.
To illustrate their use, modify the `index` function as shown below.

```py { title="app.py" }
from flask import Flask, make_response, request

@app.route("/", methods=["GET", "POST"])
Expand All @@ -252,11 +312,16 @@ def index():
)
```

So far, we only used the `add` method to increment the counter.
However, `add` also has a second optional parameter to specify attributes.
This brings us to the topic of metric dimensions.
To illustrate their use, modify the `index` function as shown above.
Send a couple of POST and GET requests to `/` via `curl -XPOST localhost:5000; echo` and `curl -XGET localhost:5000; echo`.
Send a couple of POST and GET requests to `/` via

```bash
curl -XPOST localhost:5000; echo
```

```bash
curl -XGET localhost:5000; echo
```

Look at the output, what do you notice?

```json
Expand Down Expand Up @@ -290,7 +355,7 @@ Moreover, specific metrics may have less aggregative quality, which can make it

In conclusion, the selection of metric dimensions is a delicate balancing act. Metrics with high cardinality, which result from introducing many attributes or a wide range of values, can lead to numerous unique combinations. This can increase storage requirements, network traffic, and processing overhead, as each unique combination of attributes represents a distinct time series that must be tracked. Moreover, metrics with low aggregative quality may be less useful when aggregated, making it more challenging to derive meaningful insights from the data. Therefore, it is essential to carefully consider the dimensions of the metrics to ensure that they are both informative and manageable within the constraints of the monitoring system.

### instruments to measure golden signals
### Instruments to measure golden signals

{{< figure src="images/resource_workload_analysis.PNG" width=600 caption="workload and resource analysis" >}}

Expand All @@ -314,9 +379,15 @@ However, the [four golden signals](https://sre.google/sre-book/monitoring-distri

Let's instrument our application accordingly.

#### traffic
#### Traffic

```py
Let's measure the total amount of traffic for a service.
First, go to `create_request_instruments` and `index` to delete everything related to the `index_counter` instrument.
Incrementing a counter on every route we serve would lead to a lot of code duplication.

Modify the code to look like this:

```py { title="metric_utils.py" }
def create_request_instruments(meter: metrics.Meter) -> dict:
traffic_volume = meter.create_counter(
name="traffic_volume",
Expand All @@ -331,23 +402,26 @@ def create_request_instruments(meter: metrics.Meter) -> dict:
return instruments
```

```py
Instead, let's create a custom function `before_request_func` and annotate it with Flask's `@app.before_request` decorator.
Thereby, the function is executed on incoming requests before they are handled by the view serving a route.

```py { title="app.py" }
@app.before_request
def before_request_func():
request_instruments["traffic_volume"].add(
1, attributes={"http.route": request.path}
)
```

Let's measure the total amount of traffic for a service.
First, go to `create_request_instruments` and `index` to delete everything related to the `index_counter` instrument.
Incrementing a counter on every route we serve would lead to a lot of code duplication.
Instead, let's create a custom function `before_request_func` and annotate it with Flask's `@app.before_request` decorator.
Thereby, the function is executed on incoming requests before they are handled by the view serving a route.
#### Error rate

As a next step, let's track the error rate of the service.
Create a separate Counter instrument.
Ultimately, the decision of what constitutes a failed request is up to us.
In this example, we'll simply refer to the status code of the response.

#### error rate

```py
```py { title="metric_utils.py" }
def create_request_instruments(meter: metrics.Meter) -> dict:
error_rate = meter.create_counter(
name="error_rate",
Expand All @@ -361,7 +435,9 @@ def create_request_instruments(meter: metrics.Meter) -> dict:
}
```

```py
To access it, create a function `after_request_func` and use Flask's `@app.after_request` decorator to execute it after a view function returns.

```py { title="app.py" }
from flask import Flask, make_response, request, Response

@app.after_request
Expand All @@ -375,13 +451,9 @@ def after_request_func(response: Response) -> Response:
return response
```

As a next step, let's track the error rate of the service.
Create a separate Counter instrument.
Ultimately, the decision of what constitutes a failed request is up to us.
In this example, we'll simply refer to the status code of the response.
To access it, create a function `after_request_func` and use Flask's `@app.after_request` decorator to execute it after a view function returns.

#### latency

#### Latency

The time it takes a service to process a request is a crucial indicator of potential problems.
The tracing lab showed that spans contain timestamps that measure the duration of an operation.
Expand All @@ -399,7 +471,7 @@ A major challenge is that there is no unified definition of how to measure laten
We could measure the time a service spends processing application code, the time it takes to get a response from a remote service, and so on.
To interpret measurements correctly, it is vital to have information on what was measured.

```py
```py { title="metric_utils.py" }
def create_request_instruments(meter: metrics.Meter) -> dict:
request_latency = meter.create_histogram(
name="http.server.request.duration",
Expand All @@ -414,7 +486,7 @@ def create_request_instruments(meter: metrics.Meter) -> dict:
}
```

```py
```py { title="app.py" }
@app.before_request
def before_request_func():
request.environ["request_start"] = time.time_ns()
Expand Down Expand Up @@ -482,7 +554,7 @@ We expect a majority of requests to be served in milliseconds.
Therefore, the default bucket bounds defined by the Histogram aren't a good fit.
We'll address this later, so ignore this for now.

#### saturation
#### Saturation

All the previous metrics have been request-oriented.
For completeness, we'll also capture some resource-oriented metrics.
Expand Down

0 comments on commit 3d47dcf

Please sign in to comment.