Skip to content

Commit

Permalink
Merge pull request #117 from nordic-institute/OPMONDEV-185-docs
Browse files Browse the repository at this point in the history
doc: fix typos and broken links
  • Loading branch information
melbeltagy authored Jun 26, 2024
2 parents 3c3090c + 19cf313 commit 848d669
Show file tree
Hide file tree
Showing 15 changed files with 77 additions and 65 deletions.
24 changes: 12 additions & 12 deletions docs/anonymizer_module.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ which include following modules:

The **Anonymizer module** is responsible of preparing the operational monitoring data for publication through
the [Opendata module](opendata_module.md). Anonymizer configuration allows X-Road Metrics extension administrator to set
fine-grained rules for excluding whole operatinal monitoring data records or to modify selected data fields before the data is published.
fine-grained rules for excluding whole operational monitoring data records or to modify selected data fields before the data is published.

The anonymizer module uses the operational monitoring data that [Corrector module](corrector_module.md) has prepared and stored
to MongoDb as input. The anonymizer processes the data using the configured ruleset and stores the output to the
Expand All @@ -40,7 +40,7 @@ through [Opendata module](opendata_module.md) is diagram below:

MongoDb is used to store "non-anonymized" operational monitoring data that should be accessible only by the X-Road Metrics administrators.
Anonymized operational monitoring data that can be published for wider audience is stored in the PostgreSQL. The Opendata UI needs
access only to the PostgreSQL. To follow the "principal of least priviledge" it is recommended to
access only to the PostgreSQL. To follow the "principal of least privilege" it is recommended to
install Opendata UI on a dedicated host that has no access at all to MongoDb.
However, the Anonymizer module needs access also to the "not-public" data, so it should
run on a host that has access to both MongoDb and PostgreSQL.
Expand All @@ -56,7 +56,7 @@ See [Opendata database](opendata_module.md)
For a connection to be known SSL-secured, SSL usage must be configured on both the client and the server before the connection is made.
If it is only configured on the server, the client may end up sending sensitive information before it knows that the server requires high security.

To ensure secure connections `ssl-mode` and `ssl-root-cert` parameterers has to be provided in settings file.
To ensure secure connections `ssl-mode` and `ssl-root-cert` parameters has to be provided in settings file.
Possible values for `ssl-mode`: `disable`, `allow`, `prefer`, `require`, `verify-ca`, `verify-full`
For detailed information see https://www.postgresql.org/docs/current/libpq-ssl.html

Expand Down Expand Up @@ -142,7 +142,7 @@ Settings that the user must fill in:
* name of PostgreSQL database where to store the anonymized data
* list of PostgreSQL users that should have read-only access to the anonymized data

The read-only PostgrSQL users should be the users that Opendata-UI and Networking modules use to read data from the
The read-only PostgreSQL users should be the users that Opendata-UI and Networking modules use to read data from the
PostgreSQL.


Expand Down Expand Up @@ -183,7 +183,7 @@ records that fulfill a set of conditions. These _substitution rules_ are defined
A substitution rule has two parts. First *conditions* has a set of rules that defines the set of records
where the substitution applies. These conditions have same format as the _hiding rules_ above.
Second, there is the *subtitutions* part that consists of feature-value pairs, where feature is the name of the field
Second, there is the *substitutions* part that consists of feature-value pairs, where feature is the name of the field
to be substituted and value contains the substitute string.
The below example defines two substitution rules.
Expand Down Expand Up @@ -229,7 +229,7 @@ flag when running xroad-metrics-anonymizer. For example to run anonymizer manual
xroad-metrics-anonymizer --profile TEST
```

`xroad-metrics-anonymizer` command searches the settings file first in current working direcrtory, then in
`xroad-metrics-anonymizer` command searches the settings file first in current working directory, then in
_/etc/xroad-metrics/anonymizer/_

### Manual usage
Expand All @@ -242,7 +242,7 @@ sudo su xroad-metrics

Currently following command line arguments are supported:
```bash
xroad-metrics-anonymizer --help # Show description of the command line argumemts
xroad-metrics-anonymizer --help # Show description of the command line arguments
xroad-metrics-anonymizer --limit <number> # Optional flag to limit the number of records to process.
xroad-metrics-anonymizer --profile <profile name> # Run with a non-default settings profile
```
Expand Down Expand Up @@ -285,7 +285,7 @@ To anonymize opendata add crontab entry to _/etc/cron.d/xroad-metrics-anonymizer

### Database indexes

Anonymizer module would benefit in `insertTime` index while perfoming opendata anonymization.
Anonymizer module would benefit in `insertTime` index while performing opendata anonymization.
Refer to [Indexes](database_module.md#indexes)

## Monitoring and Status
Expand Down Expand Up @@ -355,7 +355,7 @@ logger:

```

The heartbeat file is written to `heartbeat-path` and hearbeat file name contains the X-Road instance name.
The heartbeat file is written to `heartbeat-path` and heartbeat file name contains the X-Road instance name.
The above example configuration would write logs to
`/var/log/xroad-metrics/anonymizer/heartbeat/heartbeat_anonymizer_EXAMPLE.json`.

Expand All @@ -365,8 +365,8 @@ The heartbeat file consists last message of log file and status

## Metrics statistics

Metrics statistics is executable script to calculate usefull statistical data on Metrics.
Gethered data is stored in database.
Metrics statistics is executable script to calculate useful statistical data on Metrics.
Gathered data is stored in database.
Opendata module has API endpoint to view this data by accessing `api/statistics`

### Database Configuration
Expand All @@ -376,7 +376,7 @@ and created the database credentials. See [Database_Module](database_module.md#s

### Cron Settings

Add cronjob entry to calculate metrics statistics regulary:
Add cronjob entry to calculate metrics statistics regularly:

```
* * * * * xroad-metrics-statistics --profile TEST
Expand Down
4 changes: 2 additions & 2 deletions docs/collector_module.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ Every log line includes:
- **"local_timestamp"**: timestamp in local format '%Y-%m-%d %H:%M:%S %z'
- **"module"**: "collector"
- **"version"**: in form of "v${MINOR}.${MAJOR}"
- **"activity"**: possible valuse "collector_start", "collector_worker", "collector_end"
- **"activity"**: possible values "collector_start", "collector_worker", "collector_end"
- **level**: possible values "INFO", "WARNING", "ERROR"
- **msg**: message

Expand Down Expand Up @@ -273,7 +273,7 @@ logger:
```

The heartbeat file is written to `heartbeat-path` and hearbeat file name contains the X-Road instance name.
The heartbeat file is written to `heartbeat-path` and heartbeat file name contains the X-Road instance name.
The above example configuration would write logs to `/var/log/xroad-metrics/collector/heartbeat/heartbeat_collector_EXAMPLE.json`.

The heartbeat file consists last message of log file and status
Expand Down
8 changes: 4 additions & 4 deletions docs/corrector_module.md
Original file line number Diff line number Diff line change
Expand Up @@ -307,7 +307,7 @@ file named `settings_DEV.yaml`, `settings_TEST.yaml`, and `settings_PROD.yaml`.
xroad-metrics-correctord --profile TEST
```
> [!IMPORTANT]
> `xroad-metrics-corrector` command searches the settings file first in current working direcrtory, then in
> `xroad-metrics-corrector` command searches the settings file first in current working directory, then in
`/etc/xroad-metrics/corrector/`

### Manual usage
Expand All @@ -332,7 +332,7 @@ xroad-metrics-correctord
> - The `CORRECTOR_DOCUMENTS_LIMIT` defines the processing batch size, and is executed continuously until the total of documents left is smaller than `CORRECTOR_DOCUMENTS_MIN` documents (default set to `CORRECTOR_DOCUMENTS_MIN` = `1`).
> - The estimated amount of memory per processing batch is indicated at [System Architecture](system_architecture.md) documentation.

### sysetmd Service
### systemd Service

#### Default Settings Profile

Expand Down Expand Up @@ -417,7 +417,7 @@ Every log line includes:
- **"local_timestamp"**: timestamp in local format '%Y-%m-%d %H:%M:%S %z'
- **"module"**: "corrector"
- **"version"**: in form of "v${MINOR}.${MAJOR}"
- **"activity"**: possible valuse "corrector_main", "corrector_batch_run", "corrector_batch_start", "corrector_batch_raw", "DatabaseManager.get_raw_documents", "corrector_batch_update_timeout", "corrector_batch_update_old_to_done", "corrector_batch_remove_duplicates_from_raw", "corrector_batch_end"
- **"activity"**: possible values "corrector_main", "corrector_batch_run", "corrector_batch_start", "corrector_batch_raw", "DatabaseManager.get_raw_documents", "corrector_batch_update_timeout", "corrector_batch_update_old_to_done", "corrector_batch_remove_duplicates_from_raw", "corrector_batch_end"
- **level**: possible values "INFO", "WARNING", "ERROR"
- **msg**: message

Expand Down Expand Up @@ -463,7 +463,7 @@ logger:
```

The heartbeat file is written to `heartbeat-path` and hearbeat file name contains the X-Road instance name.
The heartbeat file is written to `heartbeat-path` and heartbeat file name contains the X-Road instance name.
The above example configuration would write logs to `/var/log/xroad-metrics/corrector/heartbeat/heartbeat_corrector_EXAMPLE.json`.

The heartbeat file consists last message of log file and status
Expand Down
6 changes: 3 additions & 3 deletions docs/database_module.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ use admin
db.createUser(
{
user: "root",
pwd: passwordPrompt(), // or cleartext password
pwd: passwordPrompt(), // or clear text password
roles: [ { role: "userAdminAnyDatabase", db: "admin" }, "readWriteAnyDatabase" ]
}
)
Expand Down Expand Up @@ -231,7 +231,7 @@ For X-Road instance `EX` auth_db should have following users and access rights:
* anonymizer_state_EX: readWrite
* **collector_EX**:
* query_db_EX: readWrite,
* collcetor_state_EX: readWrite
* collector_state_EX: readWrite
* **corrector_EX**:
* query_db_EX: readWrite
* **reports_EX**:
Expand Down Expand Up @@ -491,7 +491,7 @@ vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
```
### Swapiness
### Swappiness
See also https://en.wikipedia.org/wiki/Paging#Swappiness
Expand Down
8 changes: 4 additions & 4 deletions docs/experimental/analysis_module/analyzer_installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,10 +66,10 @@ file named `settings_DEV.yaml`, `settings_TEST.yaml` and `settings_PROD.yaml`.
Then fill the profile specific settings to each file and use the --profile
flag when running opmon-analyzer. For example to run model update using the TEST profile:
```
opmon-analyzer --profile TEST upate
opmon-analyzer --profile TEST update
```

`opmon-analyzer` command searches the settings file first in current working direcrtory, then in
`opmon-analyzer` command searches the settings file first in current working directory, then in
_/etc/opmon/analyzer/_

### Manual usage
Expand Down Expand Up @@ -192,7 +192,7 @@ time period (e.g. 10 days), after which they are considered "expired" and will n
Requests that are part of a "true incident"
(an anomaly that was marked as "incident" before the expiration date)
are not used to update the model.
This way, the historic averages remain to describe the "normal" behaviour.
This way, the historic averages remain to describe the "normal" behavior.
Note that updating the model does not change the anomalies that have already been found
(the existing anomalies are not recalculated).

Expand All @@ -201,7 +201,7 @@ only the data from time intervals that have already completed are used. This is
for example, the number of requests within 10 minutes is compared to the (historic) number of requests within 1 hour,
as such comparison would almost certainly yield an anomaly.

It is recommended that the model is given some time to learn the behaviour of a particular service call (e.g. 3 months).
It is recommended that the model is given some time to learn the behavior of a particular service call (e.g. 3 months).
Therefore, the following approach is implemented for **new** service calls:
1. For the first 3 months since the first request was made by a given service call,
no anomalies are reported (this is the training period)
Expand Down
4 changes: 2 additions & 2 deletions docs/experimental/analysis_module/customization.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ The Analyzer back-end can be configured from **analysis_module/analyzer/analyzer
| relevant_cols_general | Database fields from the clean_data collection that are relevant for the analyzer and appear at the top level of the request. | relevant_cols_general = ["_id", 'totalDuration', 'producerDurationProducerView', 'requestNwDuration', 'responseNwDuration'] |
| relevant_cols_nested | Database fields from the clean_data collection that are relevant for the analyzer and are nested inside 'client' and 'producer'. | relevant_cols_nested = ["succeeded", "messageId", timestamp_field] + service_call_fields |
| relevant_cols_general_alternative | Database fields from the clean_data collection that are relevant for the analyzer and appear at the top level of the request, but are analogous for 'client' and 'producer' side. <br> For the Analyzer, only one field from each pair is necessary. In other words, if the field exists for the client side, then this value is used, otherwise the value from the producer side is used. <br> In configuration, these fields are presented as triplets, where the first element refers to the general name used in the Analyzer, the second and third value are the alternative fields in the database. | relevant_cols_general_alternative = [('requestSize', 'clientRequestSize', 'producerRequestSize')] |
<timeunit\>_aggregation_time_window | Settings for a given aggregation time window. The following attributes should be speficied: <br> 1) 'agg_window_name' - a name (can be chosen arbitrarily) that will be used to refer to the aggregation window, <br>2) 'agg_minutes' - number of minutes to use for aggregation, <br> 3) 'pd_timeunit' - used in the pandas.to_timedelta method to refer to the same time period, should be one of (D,h,m,s,ms,us,ns). (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_timedelta.html). | hour_aggregation_time_window = \{'agg_window_name': 'hour', 'agg_minutes': 60, 'pd_timeunit': 'h'}|
| <timeunits\>_similarity_time_window | Settings for a given similarity time window. For example, if the aggregation time window is hour, the similarity time window can be hour+weekday, meaning that the aggregated values from a given hour are compared to historic values collected from the same hour on the same weekday. The following attributes should be speficied: <br> 1) 'timeunit_name' - a name (can be chosen arbitrarily) that will be used to refer to the similarity window, <br> 2) 'agg_window' - one of <timeunit\>_aggregation_time_window, <br> 3) 'similar_periods' - a list of time periods. A given set of aggregated requests will be compared to the combination of these periods. Each value in the list is used to extract the necessary time component from a pandas.DatetimeIndex object, so each value should be one of (year, month, day, hour, minute, second, microsecond, nanosecond, dayofyear, weekofyear, week, dayofweek, weekday, quarter). (http://pandas.pydata.org/pandas-docs/version/0.17.0/api.html#time-date-components) | hour_weekday_similarity_time_window = {'timeunit_name': 'hour_weekday', 'agg_window': hour_aggregation_time_window, 'similar_periods': ['hour', 'weekday']\} |
<timeunit\>_aggregation_time_window | Settings for a given aggregation time window. The following attributes should be specified: <br> 1) 'agg_window_name' - a name (can be chosen arbitrarily) that will be used to refer to the aggregation window, <br>2) 'agg_minutes' - number of minutes to use for aggregation, <br> 3) 'pd_timeunit' - used in the pandas.to_timedelta method to refer to the same time period, should be one of (D,h,m,s,ms,us,ns). (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_timedelta.html). | hour_aggregation_time_window = \{'agg_window_name': 'hour', 'agg_minutes': 60, 'pd_timeunit': 'h'}|
| <timeunits\>_similarity_time_window | Settings for a given similarity time window. For example, if the aggregation time window is hour, the similarity time window can be hour+weekday, meaning that the aggregated values from a given hour are compared to historic values collected from the same hour on the same weekday. The following attributes should be specified: <br> 1) 'timeunit_name' - a name (can be chosen arbitrarily) that will be used to refer to the similarity window, <br> 2) 'agg_window' - one of <timeunit\>_aggregation_time_window, <br> 3) 'similar_periods' - a list of time periods. A given set of aggregated requests will be compared to the combination of these periods. Each value in the list is used to extract the necessary time component from a pandas.DatetimeIndex object, so each value should be one of (year, month, day, hour, minute, second, microsecond, nanosecond, dayofyear, weekofyear, week, dayofweek, weekday, quarter). (http://pandas.pydata.org/pandas-docs/version/0.17.0/api.html#time-date-components) | hour_weekday_similarity_time_window = {'timeunit_name': 'hour_weekday', 'agg_window': hour_aggregation_time_window, 'similar_periods': ['hour', 'weekday']\} |
| time_windows | A dictionary of pairs (anomaly_type, previously defined <timeunit\>_aggregation_time_window) for anomaly types that do not require comparison with historic values. The specified time window will be used to aggregate requests for the given anomaly type. | time_windows = \{ <br> "failed_request_ratio": hour_aggregation_time_window, <br> "duplicate_message_ids": day_aggregation_time_window, <br> "time_sync_errors": hour_aggregation_time_window} |
| historic_averages_time_windows | A list of previously defined <timeunits\>_similarity_time_windows for anomaly types that require comparison with historic averages. A separate AveragesByTimeperiodModel is constructed for each such similarity time window. | historic_averages_time_windows = [hour_weekday_similarity_time_window, weekday_similarity_time_window] |
| historic_averages_thresholds | A dictionary of confidence thresholds used in the AveragesByTimeperiodModel(s). An observation (an aggregation of requests within a given time window) is considered an anomaly if the confidence (estimated by the model) of being an anomaly is larger than this threshold. | historic_averages_thresholds = \{ <br> 'request_count': 0.95, <br> 'mean_request_size': 0.95, <br> 'mean_response_size': 0.95, <br> 'mean_client_duration': 0.95, <br> 'mean_producer_duration': 0.95} ] |
Expand Down
Loading

0 comments on commit 848d669

Please sign in to comment.