Merge pull request #117 from nordic-institute/OPMONDEV-185-docs

doc: fix typos and broken links
nordic-institute · Jun 26, 2024 · 848d669 · 848d669
2 parents 3c3090c + 19cf313
commit 848d669
Show file tree

Hide file tree

Showing 15 changed files with 77 additions and 65 deletions.
diff --git a/docs/anonymizer_module.md b/docs/anonymizer_module.md
@@ -24,7 +24,7 @@ which include following modules:
 
 The **Anonymizer module** is responsible of preparing the operational monitoring data for publication through
 the [Opendata module](opendata_module.md). Anonymizer configuration allows X-Road Metrics extension administrator to set
-fine-grained rules for excluding whole operatinal monitoring data records or to modify selected data fields before the data is published.
+fine-grained rules for excluding whole operational monitoring data records or to modify selected data fields before the data is published.
 
 The anonymizer module uses the operational monitoring data that [Corrector module](corrector_module.md) has prepared and stored
 to MongoDb as input. The anonymizer processes the data using the configured ruleset and stores the output to the
@@ -40,7 +40,7 @@ through  [Opendata module](opendata_module.md) is diagram below:
 
 MongoDb is used to store "non-anonymized" operational monitoring data that should be accessible only by the X-Road Metrics administrators.
 Anonymized operational monitoring data that can be published for wider audience is stored in the PostgreSQL. The Opendata UI needs
-access only to the PostgreSQL. To follow the "principal of least priviledge" it is recommended to
+access only to the PostgreSQL. To follow the "principal of least privilege" it is recommended to
 install Opendata UI on a dedicated host that has no access at all to MongoDb.
 However, the Anonymizer module needs access also to the "not-public" data, so it should
 run on a host that has access to both MongoDb and PostgreSQL.
@@ -56,7 +56,7 @@ See [Opendata database](opendata_module.md)
 For a connection to be known SSL-secured, SSL usage must be configured on both the client and the server before the connection is made.
 If it is only configured on the server, the client may end up sending sensitive information before it knows that the server requires high security.
 
-To ensure secure connections `ssl-mode` and `ssl-root-cert` parameterers has to be provided in settings file.
+To ensure secure connections `ssl-mode` and `ssl-root-cert` parameters has to be provided in settings file.
 Possible values for `ssl-mode`: `disable`, `allow`, `prefer`, `require`, `verify-ca`, `verify-full`
 For detailed information see https://www.postgresql.org/docs/current/libpq-ssl.html
 
@@ -142,7 +142,7 @@ Settings that the user must fill in:
 * name of PostgreSQL database where to store the anonymized data
 * list of PostgreSQL users that should have read-only access to the anonymized data
 
-The read-only PostgrSQL users should be the users that Opendata-UI and Networking modules use to read data from the
+The read-only PostgreSQL users should be the users that Opendata-UI and Networking modules use to read data from the
 PostgreSQL.
 
 
@@ -183,7 +183,7 @@ records that fulfill a set of conditions. These _substitution rules_ are defined
 
 A substitution rule has two parts. First *conditions* has a set of rules that defines the set of records
 where the substitution applies. These conditions have same format as the _hiding rules_ above.
-Second, there is the *subtitutions* part that consists of feature-value pairs, where feature is the name of the field
+Second, there is the *substitutions* part that consists of feature-value pairs, where feature is the name of the field
 to be substituted and value contains the substitute string.
 
 The below example defines two substitution rules.
@@ -229,7 +229,7 @@ flag when running xroad-metrics-anonymizer. For example to run anonymizer manual
 xroad-metrics-anonymizer --profile TEST
 ```
 
-`xroad-metrics-anonymizer` command searches the settings file first in current working direcrtory, then in
+`xroad-metrics-anonymizer` command searches the settings file first in current working directory, then in
 _/etc/xroad-metrics/anonymizer/_
 
 ### Manual usage
@@ -242,7 +242,7 @@ sudo su xroad-metrics
 
 Currently following command line arguments are supported:
 ```bash
-xroad-metrics-anonymizer --help                     # Show description of the command line argumemts
+xroad-metrics-anonymizer --help                     # Show description of the command line arguments
 xroad-metrics-anonymizer --limit <number>           # Optional flag to limit the number of records to process.
 xroad-metrics-anonymizer --profile <profile name>   # Run with a non-default settings profile
 ```
@@ -285,7 +285,7 @@ To anonymize opendata add crontab entry to _/etc/cron.d/xroad-metrics-anonymizer
 
 ### Database indexes
 
-Anonymizer module would benefit in  `insertTime` index while perfoming opendata anonymization.
+Anonymizer module would benefit in  `insertTime` index while performing opendata anonymization.
 Refer to [Indexes](database_module.md#indexes)
 
 ## Monitoring and Status
@@ -355,7 +355,7 @@ logger:
 
 ```
 
-The heartbeat file is written to `heartbeat-path` and hearbeat file name contains the X-Road instance name.
+The heartbeat file is written to `heartbeat-path` and heartbeat file name contains the X-Road instance name.
 The above example configuration would write logs to
  `/var/log/xroad-metrics/anonymizer/heartbeat/heartbeat_anonymizer_EXAMPLE.json`.
 
@@ -365,8 +365,8 @@ The heartbeat file consists last message of log file and status
 
 ## Metrics statistics
 
-Metrics statistics is executable script to calculate usefull statistical data on Metrics.
-Gethered data is stored in database.
+Metrics statistics is executable script to calculate useful statistical data on Metrics.
+Gathered data is stored in database.
 Opendata module has API endpoint to view this data by accessing `api/statistics`
 
 ### Database Configuration
@@ -376,7 +376,7 @@ and created the database credentials. See [Database_Module](database_module.md#s
 
 ### Cron Settings
 
-Add cronjob entry to calculate metrics statistics regulary:
+Add cronjob entry to calculate metrics statistics regularly:
 
 ```
 * * * * * xroad-metrics-statistics --profile TEST

diff --git a/docs/collector_module.md b/docs/collector_module.md
@@ -227,7 +227,7 @@ Every log line includes:
 - **"local_timestamp"**: timestamp in local format '%Y-%m-%d %H:%M:%S %z'
 - **"module"**: "collector"
 - **"version"**: in form of "v${MINOR}.${MAJOR}"
-- **"activity"**: possible valuse "collector_start", "collector_worker", "collector_end"
+- **"activity"**: possible values "collector_start", "collector_worker", "collector_end"
 - **level**: possible values "INFO", "WARNING", "ERROR"
 - **msg**: message
 
@@ -273,7 +273,7 @@ logger:
 
 ```
 
-The heartbeat file is written to `heartbeat-path` and hearbeat file name contains the X-Road instance name.
+The heartbeat file is written to `heartbeat-path` and heartbeat file name contains the X-Road instance name.
 The above example configuration would write logs to `/var/log/xroad-metrics/collector/heartbeat/heartbeat_collector_EXAMPLE.json`.
 
 The heartbeat file consists last message of log file and status

diff --git a/docs/corrector_module.md b/docs/corrector_module.md
@@ -307,7 +307,7 @@ file named `settings_DEV.yaml`, `settings_TEST.yaml`, and `settings_PROD.yaml`.
    xroad-metrics-correctord --profile TEST
    ```
 > [!IMPORTANT]  
-> `xroad-metrics-corrector` command searches the settings file first in current working direcrtory, then in
+> `xroad-metrics-corrector` command searches the settings file first in current working directory, then in
 `/etc/xroad-metrics/corrector/`
 
 ### Manual usage
@@ -332,7 +332,7 @@ xroad-metrics-correctord
 > - The `CORRECTOR_DOCUMENTS_LIMIT` defines the processing batch size, and is executed continuously until the total of documents left is smaller than `CORRECTOR_DOCUMENTS_MIN` documents (default set to `CORRECTOR_DOCUMENTS_MIN` = `1`). 
 > - The estimated amount of memory per processing batch is indicated at [System Architecture](system_architecture.md) documentation.
 
-### sysetmd Service
+### systemd Service
 
 #### Default Settings Profile
 
@@ -417,7 +417,7 @@ Every log line includes:
 - **"local_timestamp"**: timestamp in local format '%Y-%m-%d %H:%M:%S %z'
 - **"module"**: "corrector"
 - **"version"**: in form of "v${MINOR}.${MAJOR}"
-- **"activity"**: possible valuse "corrector_main", "corrector_batch_run", "corrector_batch_start", "corrector_batch_raw", "DatabaseManager.get_raw_documents", "corrector_batch_update_timeout", "corrector_batch_update_old_to_done", "corrector_batch_remove_duplicates_from_raw", "corrector_batch_end"
+- **"activity"**: possible values "corrector_main", "corrector_batch_run", "corrector_batch_start", "corrector_batch_raw", "DatabaseManager.get_raw_documents", "corrector_batch_update_timeout", "corrector_batch_update_old_to_done", "corrector_batch_remove_duplicates_from_raw", "corrector_batch_end"
 - **level**: possible values "INFO", "WARNING", "ERROR"
 - **msg**: message
 
@@ -463,7 +463,7 @@ logger:
 
 ```
 
-The heartbeat file is written to `heartbeat-path` and hearbeat file name contains the X-Road instance name.
+The heartbeat file is written to `heartbeat-path` and heartbeat file name contains the X-Road instance name.
 The above example configuration would write logs to `/var/log/xroad-metrics/corrector/heartbeat/heartbeat_corrector_EXAMPLE.json`.
 
 The heartbeat file consists last message of log file and status

diff --git a/docs/database_module.md b/docs/database_module.md
@@ -106,7 +106,7 @@ use admin
 db.createUser(
   {
     user: "root",
-    pwd: passwordPrompt(), // or cleartext password
+    pwd: passwordPrompt(), // or clear text password
     roles: [ { role: "userAdminAnyDatabase", db: "admin" }, "readWriteAnyDatabase" ]
   }
 )
@@ -231,7 +231,7 @@ For X-Road instance `EX` auth_db should have following users and access rights:
     * anonymizer_state_EX: readWrite
 * **collector_EX**:
     * query_db_EX: readWrite,
-    * collcetor_state_EX: readWrite
+    * collector_state_EX: readWrite
 * **corrector_EX**:
     * query_db_EX: readWrite
 * **reports_EX**:
@@ -491,7 +491,7 @@ vm.dirty_ratio = 15
 vm.dirty_background_ratio = 5
 ```
 
-### Swapiness
+### Swappiness
 
 See also https://en.wikipedia.org/wiki/Paging#Swappiness
 

diff --git a/docs/experimental/analysis_module/analyzer_installation.md b/docs/experimental/analysis_module/analyzer_installation.md
@@ -66,10 +66,10 @@ file named `settings_DEV.yaml`, `settings_TEST.yaml` and `settings_PROD.yaml`.
 Then fill the profile specific settings to each file and use the --profile
 flag when running opmon-analyzer. For example to run model update using the TEST profile:
 ```
-opmon-analyzer --profile TEST upate
+opmon-analyzer --profile TEST update
 ```
 
-`opmon-analyzer` command searches the settings file first in current working direcrtory, then in
+`opmon-analyzer` command searches the settings file first in current working directory, then in
 _/etc/opmon/analyzer/_
 
 ### Manual usage
@@ -192,7 +192,7 @@ time period (e.g. 10 days), after which they are considered "expired" and will n
 Requests that are part of a "true incident" 
 (an anomaly that was marked as "incident" before the expiration date) 
 are not used to update the model. 
-This way, the historic averages remain to describe the "normal" behaviour. 
+This way, the historic averages remain to describe the "normal" behavior. 
 Note that updating the model does not change the anomalies that have already been found 
 (the existing anomalies are not recalculated).
 
@@ -201,7 +201,7 @@ only the data from time intervals that have already completed are used. This is
 for example, the number of requests within 10 minutes is compared to the (historic) number of requests within 1 hour, 
 as such comparison would almost certainly yield an anomaly. 
 
-It is recommended that the model is given some time to learn the behaviour of a particular service call (e.g. 3 months). 
+It is recommended that the model is given some time to learn the behavior of a particular service call (e.g. 3 months). 
 Therefore, the following approach is implemented for **new** service calls:
 1. For the first 3 months since the first request was made by a given service call, 
 no anomalies are reported (this is the training period)

diff --git a/docs/experimental/analysis_module/customization.md b/docs/experimental/analysis_module/customization.md
@@ -23,8 +23,8 @@ The Analyzer back-end can be configured from **analysis_module/analyzer/analyzer
 | relevant_cols_general | Database fields from the clean_data collection that are relevant for the analyzer and appear at the top level of the request. | relevant_cols_general = ["_id", 'totalDuration', 'producerDurationProducerView', 'requestNwDuration', 'responseNwDuration'] |
 | relevant_cols_nested | Database fields from the clean_data collection that are relevant for the analyzer and are nested inside 'client' and 'producer'. | relevant_cols_nested = ["succeeded", "messageId", timestamp_field] + service_call_fields |
 | relevant_cols_general_alternative |  Database fields from the clean_data collection that are relevant for the analyzer and appear at the top level of the request, but are analogous for 'client' and 'producer' side. <br> For the Analyzer, only one field from each pair is necessary. In other words, if the field exists for the client side, then this value is used, otherwise the value from the producer side is used. <br> In configuration, these fields are presented as triplets, where the first element refers to the general name used in the Analyzer, the second and third value are the alternative fields in the database. | relevant_cols_general_alternative = [('requestSize', 'clientRequestSize', 'producerRequestSize')] | 
-<timeunit\>_aggregation_time_window | Settings for a given aggregation time window. The following attributes should be speficied:  <br> 1) 'agg_window_name' - a name (can be chosen arbitrarily) that will be used to refer to the aggregation window,  <br>2) 'agg_minutes' - number of minutes to use for aggregation, <br> 3) 'pd_timeunit' - used in the pandas.to_timedelta method to refer to the same time period, should be one of (D,h,m,s,ms,us,ns). (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_timedelta.html). | hour_aggregation_time_window = \{'agg_window_name': 'hour', 'agg_minutes': 60, 'pd_timeunit': 'h'}| 
-| <timeunits\>_similarity_time_window | Settings for a given similarity time window. For example, if the aggregation time window is hour, the similarity time window can be hour+weekday, meaning that the aggregated values from a given hour are compared to historic values collected from the same hour on the same weekday. The following attributes should be speficied:  <br> 1)  'timeunit_name' - a name (can be chosen arbitrarily) that will be used to refer to the similarity window, <br> 2) 'agg_window' - one of <timeunit\>_aggregation_time_window, <br> 3) 'similar_periods' - a list of time periods. A given set of aggregated requests will be compared to the combination of these periods. Each value in the list is used to extract the necessary time component from a pandas.DatetimeIndex object, so each value should be one of (year, month, day, hour, minute, second, microsecond, nanosecond, dayofyear, weekofyear, week, dayofweek, weekday, quarter). (http://pandas.pydata.org/pandas-docs/version/0.17.0/api.html#time-date-components) | hour_weekday_similarity_time_window = {'timeunit_name': 'hour_weekday', 'agg_window': hour_aggregation_time_window, 'similar_periods': ['hour', 'weekday']\} | 
+<timeunit\>_aggregation_time_window | Settings for a given aggregation time window. The following attributes should be specified:  <br> 1) 'agg_window_name' - a name (can be chosen arbitrarily) that will be used to refer to the aggregation window,  <br>2) 'agg_minutes' - number of minutes to use for aggregation, <br> 3) 'pd_timeunit' - used in the pandas.to_timedelta method to refer to the same time period, should be one of (D,h,m,s,ms,us,ns). (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_timedelta.html). | hour_aggregation_time_window = \{'agg_window_name': 'hour', 'agg_minutes': 60, 'pd_timeunit': 'h'}| 
+| <timeunits\>_similarity_time_window | Settings for a given similarity time window. For example, if the aggregation time window is hour, the similarity time window can be hour+weekday, meaning that the aggregated values from a given hour are compared to historic values collected from the same hour on the same weekday. The following attributes should be specified:  <br> 1)  'timeunit_name' - a name (can be chosen arbitrarily) that will be used to refer to the similarity window, <br> 2) 'agg_window' - one of <timeunit\>_aggregation_time_window, <br> 3) 'similar_periods' - a list of time periods. A given set of aggregated requests will be compared to the combination of these periods. Each value in the list is used to extract the necessary time component from a pandas.DatetimeIndex object, so each value should be one of (year, month, day, hour, minute, second, microsecond, nanosecond, dayofyear, weekofyear, week, dayofweek, weekday, quarter). (http://pandas.pydata.org/pandas-docs/version/0.17.0/api.html#time-date-components) | hour_weekday_similarity_time_window = {'timeunit_name': 'hour_weekday', 'agg_window': hour_aggregation_time_window, 'similar_periods': ['hour', 'weekday']\} | 
 | time_windows | A dictionary of pairs (anomaly_type, previously defined <timeunit\>_aggregation_time_window) for anomaly types that do not require comparison with historic values. The specified time window will be used to aggregate requests for the given anomaly type. | time_windows = \{ <br> "failed_request_ratio": hour_aggregation_time_window, <br> "duplicate_message_ids": day_aggregation_time_window, <br> "time_sync_errors": hour_aggregation_time_window} |
 | historic_averages_time_windows | A list of previously defined <timeunits\>_similarity_time_windows for anomaly types that require comparison with historic averages. A separate AveragesByTimeperiodModel is constructed for each such similarity time window. | historic_averages_time_windows = [hour_weekday_similarity_time_window, weekday_similarity_time_window] |
 | historic_averages_thresholds | A dictionary of confidence thresholds used in the AveragesByTimeperiodModel(s). An observation (an aggregation of requests within a given time window) is considered an anomaly if the confidence (estimated by the model) of being an anomaly is larger than this threshold. | historic_averages_thresholds = \{ <br> 'request_count': 0.95, <br> 'mean_request_size': 0.95, <br> 'mean_response_size': 0.95, <br> 'mean_client_duration': 0.95, <br> 'mean_producer_duration': 0.95} ] |