-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #74 from intergral/run_books
docs(runbooks): add docs for runbooks
- Loading branch information
Showing
7 changed files
with
111 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Report Issues | ||
|
||
If there are any errors that have been reported in the logs, or you are experiencing strange behaviour please create an | ||
issue on the [Github](https://github.com/intergral/deep/issues/new/choose) project. This will allow us to improve Deep | ||
and hopefully help you resolve your issues. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Compaction/Retention | ||
|
||
Compaction and Retention are the methods that are used by Deep to reduce the number of blocks that are stored to both | ||
improve performance by reducing the block count, and remove older data that is no longer needed. | ||
|
||
## Compaction | ||
|
||
Compaction works by grouping blocks by time frame and combining the blocks together to reduce the overall number of | ||
blocks that have to be scanned when performing a query. | ||
|
||
The compaction can be configured using the settings: | ||
|
||
| Name | Default | Description | | ||
|---------------------------|---------|----------------------------------------------------------------------------| | ||
| compaction_window | 1h | This is the maximum time range a block should contain. | | ||
| max_compaction_objects | 6000000 | This is the maximum number of Snapshots that will be stored in each block. | | ||
| max_block_bytes | 100 GiB | This is the maximum size in bytes that each block can be. | | ||
| block_retention | 14d | This is the total time a block will be stored for. | | ||
| compacted_block_retention | 1h | This is the duration a compacted block will be stored for. | | ||
| compaction_cycle | 1h | The time between each compaction cycle. | | ||
|
||
By modifying these settings you can control how often blocks are compacted, how big they should be and how much time | ||
they should span. There is no one size fits all config for compaction. | ||
|
||
# Retention | ||
|
||
Retention is when blocks are deleted, once a block has been compacted it is marked for deletion. This deletion cycle | ||
occurs based on the config and will scan for marked and eligible blocks to be deleted. A block is eligible for deletion | ||
if it has been compacted and the `compacted_block_retention` period has expired. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Block Increase | ||
|
||
## Meaning | ||
|
||
This alert is indicating that the number of blocks has increased more than expected. This alert is trying to identify | ||
the case where compaction or retention is not working correctly. | ||
|
||
## Impact | ||
|
||
If the number of blocks is increasing steadily for a long enough period this can impact the performance and cost of | ||
Deep. As the number of blocks increase the time spent indexing and the cost of performing the indexing will increase. | ||
|
||
## Diagnosis | ||
|
||
Check the 'Deep/Tenants' dashboard for the number of blocks for the given tenant (tenant id should appear on the alert). | ||
This graph should give you insight into the block growth. | ||
|
||
A graph like this below indicates that the blocks are not getting deleted, the compactor logs should be inspected for | ||
any errors. | ||
|
||
 | ||
|
||
If there is only a short spike in the blocks, this probably means there was a sudden large increase in usage. The blocks | ||
should be monitored for further issues. | ||
|
||
## Mitigation | ||
|
||
If compaction/retention is nor working then the compactor should be restarted. This could resolve any issues within the | ||
system memory, or if there are any failed tasks. Additionally check the permissions on the storage provider to ensure | ||
Deep has permission to delete data. | ||
|
||
It is also advisable to check the [compaction settings](../config/compaction.md) that are being used to ensure they best suite your use case. | ||
|
||
{!_sections/bug_report.md!} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Missing Ring Node | ||
|
||
## Meaning | ||
|
||
This error happens when a ring is configured to have _n_ nodes but the actual number of nodes in the ring is either more | ||
or less than that number. | ||
|
||
## Impact | ||
|
||
If the number of nodes continues to be in error, then the health of the ring can become unstable. | ||
|
||
## Diagnosis | ||
|
||
The diagnosis of the alert depends on if there are more or fewer nodes than expected. | ||
|
||
### More nodes | ||
|
||
If there are more nodes than there should be this could be due to a failure to shut down a node correctly. This can lead | ||
to an [unhealthy node](./unhealthy_ring_node.md) scenario. | ||
|
||
It could also be a sign that the number of replicas has been changed manually to address a scaling issue. Additionally, | ||
it is possible that the `HorizontalPodAutoscaler` as kicked in to address a resource issue. In either case of scaling | ||
the helm chart config should be updated to reflect the changes if they are to become permanent. This way the alert | ||
config will be updated to reflect this change. | ||
|
||
### Fewer nodes | ||
|
||
If there are fewer nodes than there should be this could be due to a failure to start a new node. This could be due to a | ||
resource starvation on the kubernetes cluster. The description of the deployment should be reviewed, and any errors here | ||
addressed. Reviewing the pod logs for any failures in start up is also advised. | ||
|
||
## Mitigation | ||
|
||
There is no generic way to correct this issue, it would depend on the cause. Using the notes above identify the error | ||
and look for ways to resolve the root cause | ||
|
||
{!_sections/bug_report.md!}. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters