Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[9.0] [Security Solution][Siem migrations] Implement rate limit backo…
…ff (#211469) (#212177) # Backport This will backport the following commits from `main` to `9.0`: - [[Security Solution][Siem migrations] Implement rate limit backoff (#211469)](#211469) <!--- Backport version: 9.6.6 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Sergi Massaneda","email":"sergi.massaneda@elastic.co"},"sourceCommit":{"committedDate":"2025-02-21T19:54:40Z","message":"[Security Solution][Siem migrations] Implement rate limit backoff (#211469)\n\n## Summary\n\nImplements an exponential backoff retry strategy when the LLM API throws\nrate limit (`429`) errors.\n\n### Backoff implementation\n\n- The `run` method from the `RuleMigrationsTaskClient` has been moved to\nthe new `RuleMigrationTaskRunner` class.\n- The settings for the backoff are defined in this class with:\n```ts\n/** Exponential backoff configuration to handle rate limit errors */\nconst RETRY_CONFIG = {\n initialRetryDelaySeconds: 1,\n backoffMultiplier: 2,\n maxRetries: 8,\n // max waiting time 4m15s (1*2^8 = 256s)\n} as const;\n```\n- Only one rule will be retried at a time, the rest of the concurrent\nrule translations blocked by the rate limit will await for the API to\nrecover before attempting the translation again.\n\n```ts\n/** Executor sleep configuration\n * A sleep time applied at the beginning of each single rule translation in the execution pool,\n * The objective of this sleep is to spread the load of concurrent translations, and prevent hitting the rate limit repeatedly.\n * The sleep time applied is a random number between [0-value]. Every time we hit rate limit the value is increased by the multiplier, up to the limit.\n */\nconst EXECUTOR_SLEEP = {\n initialValueSeconds: 3,\n multiplier: 2,\n limitSeconds: 96, // 1m36s (5 increases)\n} as const;\n```\n\n### Migration batching changes\n\n```ts\n/** Number of concurrent rule translations in the pool */\nconst TASK_CONCURRENCY = 10 as const;\n/** Number of rules loaded in memory to be translated in the pool */\nconst TASK_BATCH_SIZE = 100 as const;\n```\n\n#### Before \n\n- Batches of 15 rules were retrieved and executed in a `Promise.all`,\nrequiring all of them to be completed before proceeding to the next\nbatch.\n- A \"batch sleep\" of 10s was executed at the end of each iteration.\n\n#### In this PR\n\n- Batches of 100 rules are retrieved and kept in memory. The execution\nis performed in a task pool with a concurrency of 10 rules. This ensures\nthere are always 10 rules executing at a time.\n- The \"batch sleep\" has been removed in favour of an \"execution sleep\"\nof rand[1-3]s at the start of each single rule migration. This\nindividual sleep serves two goals:\n - Spread the load when the migration is first launched.\n- Prevent hitting the rate limit consistently: The sleep duration is\nincreased every time we hit a rate limit.\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"64426b2b4d99901a01ecef66a17db01049b05f1a","branchLabelMapping":{"^v9.1.0$":"main","^v8.19.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","Team:Threat Hunting","backport:version","v8.18.0","v9.1.0","v8.19.0"],"title":"[Security Solution][Siem migrations] Implement rate limit backoff","number":211469,"url":"https://github.com/elastic/kibana/pull/211469","mergeCommit":{"message":"[Security Solution][Siem migrations] Implement rate limit backoff (#211469)\n\n## Summary\n\nImplements an exponential backoff retry strategy when the LLM API throws\nrate limit (`429`) errors.\n\n### Backoff implementation\n\n- The `run` method from the `RuleMigrationsTaskClient` has been moved to\nthe new `RuleMigrationTaskRunner` class.\n- The settings for the backoff are defined in this class with:\n```ts\n/** Exponential backoff configuration to handle rate limit errors */\nconst RETRY_CONFIG = {\n initialRetryDelaySeconds: 1,\n backoffMultiplier: 2,\n maxRetries: 8,\n // max waiting time 4m15s (1*2^8 = 256s)\n} as const;\n```\n- Only one rule will be retried at a time, the rest of the concurrent\nrule translations blocked by the rate limit will await for the API to\nrecover before attempting the translation again.\n\n```ts\n/** Executor sleep configuration\n * A sleep time applied at the beginning of each single rule translation in the execution pool,\n * The objective of this sleep is to spread the load of concurrent translations, and prevent hitting the rate limit repeatedly.\n * The sleep time applied is a random number between [0-value]. Every time we hit rate limit the value is increased by the multiplier, up to the limit.\n */\nconst EXECUTOR_SLEEP = {\n initialValueSeconds: 3,\n multiplier: 2,\n limitSeconds: 96, // 1m36s (5 increases)\n} as const;\n```\n\n### Migration batching changes\n\n```ts\n/** Number of concurrent rule translations in the pool */\nconst TASK_CONCURRENCY = 10 as const;\n/** Number of rules loaded in memory to be translated in the pool */\nconst TASK_BATCH_SIZE = 100 as const;\n```\n\n#### Before \n\n- Batches of 15 rules were retrieved and executed in a `Promise.all`,\nrequiring all of them to be completed before proceeding to the next\nbatch.\n- A \"batch sleep\" of 10s was executed at the end of each iteration.\n\n#### In this PR\n\n- Batches of 100 rules are retrieved and kept in memory. The execution\nis performed in a task pool with a concurrency of 10 rules. This ensures\nthere are always 10 rules executing at a time.\n- The \"batch sleep\" has been removed in favour of an \"execution sleep\"\nof rand[1-3]s at the start of each single rule migration. This\nindividual sleep serves two goals:\n - Spread the load when the migration is first launched.\n- Prevent hitting the rate limit consistently: The sleep duration is\nincreased every time we hit a rate limit.\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"64426b2b4d99901a01ecef66a17db01049b05f1a"}},"sourceBranch":"main","suggestedTargetBranches":["9.0","8.x"],"targetPullRequestStates":[{"branch":"9.0","label":"v9.0.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"8.18","label":"v8.18.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"url":"https://github.com/elastic/kibana/pull/212154","number":212154,"state":"MERGED","mergeCommit":{"sha":"4bf719063c6015b1a68703cbcafb56a281a4b491","message":"[8.18] [Security Solution][Siem migrations] Implement rate limit backoff (#211469) (#212154)\n\n# Backport\n\nThis will backport the following commits from `main` to `8.18`:\n- [[Security Solution][Siem migrations] Implement rate limit backoff\n(#211469)](https://github.com/elastic/kibana/pull/211469)\n\n\n\n### Questions ?\nPlease refer to the [Backport tool\ndocumentation](https://github.com/sorenlouv/backport)\n\n\n\nCo-authored-by: Sergi Massaneda <sergi.massaneda@elastic.co>"}},{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/211469","number":211469,"mergeCommit":{"message":"[Security Solution][Siem migrations] Implement rate limit backoff (#211469)\n\n## Summary\n\nImplements an exponential backoff retry strategy when the LLM API throws\nrate limit (`429`) errors.\n\n### Backoff implementation\n\n- The `run` method from the `RuleMigrationsTaskClient` has been moved to\nthe new `RuleMigrationTaskRunner` class.\n- The settings for the backoff are defined in this class with:\n```ts\n/** Exponential backoff configuration to handle rate limit errors */\nconst RETRY_CONFIG = {\n initialRetryDelaySeconds: 1,\n backoffMultiplier: 2,\n maxRetries: 8,\n // max waiting time 4m15s (1*2^8 = 256s)\n} as const;\n```\n- Only one rule will be retried at a time, the rest of the concurrent\nrule translations blocked by the rate limit will await for the API to\nrecover before attempting the translation again.\n\n```ts\n/** Executor sleep configuration\n * A sleep time applied at the beginning of each single rule translation in the execution pool,\n * The objective of this sleep is to spread the load of concurrent translations, and prevent hitting the rate limit repeatedly.\n * The sleep time applied is a random number between [0-value]. Every time we hit rate limit the value is increased by the multiplier, up to the limit.\n */\nconst EXECUTOR_SLEEP = {\n initialValueSeconds: 3,\n multiplier: 2,\n limitSeconds: 96, // 1m36s (5 increases)\n} as const;\n```\n\n### Migration batching changes\n\n```ts\n/** Number of concurrent rule translations in the pool */\nconst TASK_CONCURRENCY = 10 as const;\n/** Number of rules loaded in memory to be translated in the pool */\nconst TASK_BATCH_SIZE = 100 as const;\n```\n\n#### Before \n\n- Batches of 15 rules were retrieved and executed in a `Promise.all`,\nrequiring all of them to be completed before proceeding to the next\nbatch.\n- A \"batch sleep\" of 10s was executed at the end of each iteration.\n\n#### In this PR\n\n- Batches of 100 rules are retrieved and kept in memory. The execution\nis performed in a task pool with a concurrency of 10 rules. This ensures\nthere are always 10 rules executing at a time.\n- The \"batch sleep\" has been removed in favour of an \"execution sleep\"\nof rand[1-3]s at the start of each single rule migration. This\nindividual sleep serves two goals:\n - Spread the load when the migration is first launched.\n- Prevent hitting the rate limit consistently: The sleep duration is\nincreased every time we hit a rate limit.\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"64426b2b4d99901a01ecef66a17db01049b05f1a"}},{"branch":"8.x","label":"v8.19.0","branchLabelMappingKey":"^v8.19.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
- Loading branch information