-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution][Siem migrations] Implement rate limit backoff #211469
Conversation
Pinging @elastic/security-threat-hunting (Team:Threat Hunting) |
…_backoff' into siem_migration/task_backoff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the initial review it looks LGTM to me. Its easier to test more once its merged and we might stumble over something but from what I can see it looks great!
I was mostly looking at the telemetry, graph execution, error detection etc, I think testing it with a larger batch of rules can better be done once its merged considering there is also unit tests and manual tests performed by you as well.
Gj!
...ugins/security_solution/server/lib/siem_migrations/rules/task/rule_migrations_task_runner.ts
Show resolved
Hide resolved
...ugins/security_solution/server/lib/siem_migrations/rules/task/rule_migrations_task_runner.ts
Show resolved
Hide resolved
...ugins/security_solution/server/lib/siem_migrations/rules/task/rule_migrations_task_runner.ts
Show resolved
Hide resolved
...ugins/security_solution/server/lib/siem_migrations/rules/task/rule_migrations_task_runner.ts
Show resolved
Hide resolved
...ugins/security_solution/server/lib/siem_migrations/rules/task/rule_migrations_task_runner.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look great 🚀 . Just posted some small nits and questions.
Starting backport for target branches: 8.18, 8.x, 9.0 |
💚 Build Succeeded
Metrics [docs]
History
cc @semd |
…astic#211469) ## Summary Implements an exponential backoff retry strategy when the LLM API throws rate limit (`429`) errors. ### Backoff implementation - The `run` method from the `RuleMigrationsTaskClient` has been moved to the new `RuleMigrationTaskRunner` class. - The settings for the backoff are defined in this class with: ```ts /** Exponential backoff configuration to handle rate limit errors */ const RETRY_CONFIG = { initialRetryDelaySeconds: 1, backoffMultiplier: 2, maxRetries: 8, // max waiting time 4m15s (1*2^8 = 256s) } as const; ``` - Only one rule will be retried at a time, the rest of the concurrent rule translations blocked by the rate limit will await for the API to recover before attempting the translation again. ```ts /** Executor sleep configuration * A sleep time applied at the beginning of each single rule translation in the execution pool, * The objective of this sleep is to spread the load of concurrent translations, and prevent hitting the rate limit repeatedly. * The sleep time applied is a random number between [0-value]. Every time we hit rate limit the value is increased by the multiplier, up to the limit. */ const EXECUTOR_SLEEP = { initialValueSeconds: 3, multiplier: 2, limitSeconds: 96, // 1m36s (5 increases) } as const; ``` ### Migration batching changes ```ts /** Number of concurrent rule translations in the pool */ const TASK_CONCURRENCY = 10 as const; /** Number of rules loaded in memory to be translated in the pool */ const TASK_BATCH_SIZE = 100 as const; ``` #### Before - Batches of 15 rules were retrieved and executed in a `Promise.all`, requiring all of them to be completed before proceeding to the next batch. - A "batch sleep" of 10s was executed at the end of each iteration. #### In this PR - Batches of 100 rules are retrieved and kept in memory. The execution is performed in a task pool with a concurrency of 10 rules. This ensures there are always 10 rules executing at a time. - The "batch sleep" has been removed in favour of an "execution sleep" of rand[1-3]s at the start of each single rule migration. This individual sleep serves two goals: - Spread the load when the migration is first launched. - Prevent hitting the rate limit consistently: The sleep duration is increased every time we hit a rate limit. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit 64426b2)
💔 Some backports could not be created
Note: Successful backport PRs will be merged automatically after passing CI. Manual backportTo create the backport manually run:
Questions ?Please refer to the Backport tool documentation |
…off (#211469) (#212154) # Backport This will backport the following commits from `main` to `8.18`: - [[Security Solution][Siem migrations] Implement rate limit backoff (#211469)](#211469) <!--- Backport version: 9.6.6 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Sergi Massaneda","email":"sergi.massaneda@elastic.co"},"sourceCommit":{"committedDate":"2025-02-21T19:54:40Z","message":"[Security Solution][Siem migrations] Implement rate limit backoff (#211469)\n\n## Summary\n\nImplements an exponential backoff retry strategy when the LLM API throws\nrate limit (`429`) errors.\n\n### Backoff implementation\n\n- The `run` method from the `RuleMigrationsTaskClient` has been moved to\nthe new `RuleMigrationTaskRunner` class.\n- The settings for the backoff are defined in this class with:\n```ts\n/** Exponential backoff configuration to handle rate limit errors */\nconst RETRY_CONFIG = {\n initialRetryDelaySeconds: 1,\n backoffMultiplier: 2,\n maxRetries: 8,\n // max waiting time 4m15s (1*2^8 = 256s)\n} as const;\n```\n- Only one rule will be retried at a time, the rest of the concurrent\nrule translations blocked by the rate limit will await for the API to\nrecover before attempting the translation again.\n\n```ts\n/** Executor sleep configuration\n * A sleep time applied at the beginning of each single rule translation in the execution pool,\n * The objective of this sleep is to spread the load of concurrent translations, and prevent hitting the rate limit repeatedly.\n * The sleep time applied is a random number between [0-value]. Every time we hit rate limit the value is increased by the multiplier, up to the limit.\n */\nconst EXECUTOR_SLEEP = {\n initialValueSeconds: 3,\n multiplier: 2,\n limitSeconds: 96, // 1m36s (5 increases)\n} as const;\n```\n\n### Migration batching changes\n\n```ts\n/** Number of concurrent rule translations in the pool */\nconst TASK_CONCURRENCY = 10 as const;\n/** Number of rules loaded in memory to be translated in the pool */\nconst TASK_BATCH_SIZE = 100 as const;\n```\n\n#### Before \n\n- Batches of 15 rules were retrieved and executed in a `Promise.all`,\nrequiring all of them to be completed before proceeding to the next\nbatch.\n- A \"batch sleep\" of 10s was executed at the end of each iteration.\n\n#### In this PR\n\n- Batches of 100 rules are retrieved and kept in memory. The execution\nis performed in a task pool with a concurrency of 10 rules. This ensures\nthere are always 10 rules executing at a time.\n- The \"batch sleep\" has been removed in favour of an \"execution sleep\"\nof rand[1-3]s at the start of each single rule migration. This\nindividual sleep serves two goals:\n - Spread the load when the migration is first launched.\n- Prevent hitting the rate limit consistently: The sleep duration is\nincreased every time we hit a rate limit.\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"64426b2b4d99901a01ecef66a17db01049b05f1a","branchLabelMapping":{"^v9.1.0$":"main","^v8.19.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","Team:Threat Hunting","backport:version","v8.18.0","v9.1.0","v8.19.0"],"title":"[Security Solution][Siem migrations] Implement rate limit backoff","number":211469,"url":"https://github.com/elastic/kibana/pull/211469","mergeCommit":{"message":"[Security Solution][Siem migrations] Implement rate limit backoff (#211469)\n\n## Summary\n\nImplements an exponential backoff retry strategy when the LLM API throws\nrate limit (`429`) errors.\n\n### Backoff implementation\n\n- The `run` method from the `RuleMigrationsTaskClient` has been moved to\nthe new `RuleMigrationTaskRunner` class.\n- The settings for the backoff are defined in this class with:\n```ts\n/** Exponential backoff configuration to handle rate limit errors */\nconst RETRY_CONFIG = {\n initialRetryDelaySeconds: 1,\n backoffMultiplier: 2,\n maxRetries: 8,\n // max waiting time 4m15s (1*2^8 = 256s)\n} as const;\n```\n- Only one rule will be retried at a time, the rest of the concurrent\nrule translations blocked by the rate limit will await for the API to\nrecover before attempting the translation again.\n\n```ts\n/** Executor sleep configuration\n * A sleep time applied at the beginning of each single rule translation in the execution pool,\n * The objective of this sleep is to spread the load of concurrent translations, and prevent hitting the rate limit repeatedly.\n * The sleep time applied is a random number between [0-value]. Every time we hit rate limit the value is increased by the multiplier, up to the limit.\n */\nconst EXECUTOR_SLEEP = {\n initialValueSeconds: 3,\n multiplier: 2,\n limitSeconds: 96, // 1m36s (5 increases)\n} as const;\n```\n\n### Migration batching changes\n\n```ts\n/** Number of concurrent rule translations in the pool */\nconst TASK_CONCURRENCY = 10 as const;\n/** Number of rules loaded in memory to be translated in the pool */\nconst TASK_BATCH_SIZE = 100 as const;\n```\n\n#### Before \n\n- Batches of 15 rules were retrieved and executed in a `Promise.all`,\nrequiring all of them to be completed before proceeding to the next\nbatch.\n- A \"batch sleep\" of 10s was executed at the end of each iteration.\n\n#### In this PR\n\n- Batches of 100 rules are retrieved and kept in memory. The execution\nis performed in a task pool with a concurrency of 10 rules. This ensures\nthere are always 10 rules executing at a time.\n- The \"batch sleep\" has been removed in favour of an \"execution sleep\"\nof rand[1-3]s at the start of each single rule migration. This\nindividual sleep serves two goals:\n - Spread the load when the migration is first launched.\n- Prevent hitting the rate limit consistently: The sleep duration is\nincreased every time we hit a rate limit.\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"64426b2b4d99901a01ecef66a17db01049b05f1a"}},"sourceBranch":"main","suggestedTargetBranches":["9.0","8.18","8.x"],"targetPullRequestStates":[{"branch":"9.0","label":"v9.0.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"8.18","label":"v8.18.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/211469","number":211469,"mergeCommit":{"message":"[Security Solution][Siem migrations] Implement rate limit backoff (#211469)\n\n## Summary\n\nImplements an exponential backoff retry strategy when the LLM API throws\nrate limit (`429`) errors.\n\n### Backoff implementation\n\n- The `run` method from the `RuleMigrationsTaskClient` has been moved to\nthe new `RuleMigrationTaskRunner` class.\n- The settings for the backoff are defined in this class with:\n```ts\n/** Exponential backoff configuration to handle rate limit errors */\nconst RETRY_CONFIG = {\n initialRetryDelaySeconds: 1,\n backoffMultiplier: 2,\n maxRetries: 8,\n // max waiting time 4m15s (1*2^8 = 256s)\n} as const;\n```\n- Only one rule will be retried at a time, the rest of the concurrent\nrule translations blocked by the rate limit will await for the API to\nrecover before attempting the translation again.\n\n```ts\n/** Executor sleep configuration\n * A sleep time applied at the beginning of each single rule translation in the execution pool,\n * The objective of this sleep is to spread the load of concurrent translations, and prevent hitting the rate limit repeatedly.\n * The sleep time applied is a random number between [0-value]. Every time we hit rate limit the value is increased by the multiplier, up to the limit.\n */\nconst EXECUTOR_SLEEP = {\n initialValueSeconds: 3,\n multiplier: 2,\n limitSeconds: 96, // 1m36s (5 increases)\n} as const;\n```\n\n### Migration batching changes\n\n```ts\n/** Number of concurrent rule translations in the pool */\nconst TASK_CONCURRENCY = 10 as const;\n/** Number of rules loaded in memory to be translated in the pool */\nconst TASK_BATCH_SIZE = 100 as const;\n```\n\n#### Before \n\n- Batches of 15 rules were retrieved and executed in a `Promise.all`,\nrequiring all of them to be completed before proceeding to the next\nbatch.\n- A \"batch sleep\" of 10s was executed at the end of each iteration.\n\n#### In this PR\n\n- Batches of 100 rules are retrieved and kept in memory. The execution\nis performed in a task pool with a concurrency of 10 rules. This ensures\nthere are always 10 rules executing at a time.\n- The \"batch sleep\" has been removed in favour of an \"execution sleep\"\nof rand[1-3]s at the start of each single rule migration. This\nindividual sleep serves two goals:\n - Spread the load when the migration is first launched.\n- Prevent hitting the rate limit consistently: The sleep duration is\nincreased every time we hit a rate limit.\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"64426b2b4d99901a01ecef66a17db01049b05f1a"}},{"branch":"8.x","label":"v8.19.0","branchLabelMappingKey":"^v8.19.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Sergi Massaneda <sergi.massaneda@elastic.co>
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…astic#211469) ## Summary Implements an exponential backoff retry strategy when the LLM API throws rate limit (`429`) errors. ### Backoff implementation - The `run` method from the `RuleMigrationsTaskClient` has been moved to the new `RuleMigrationTaskRunner` class. - The settings for the backoff are defined in this class with: ```ts /** Exponential backoff configuration to handle rate limit errors */ const RETRY_CONFIG = { initialRetryDelaySeconds: 1, backoffMultiplier: 2, maxRetries: 8, // max waiting time 4m15s (1*2^8 = 256s) } as const; ``` - Only one rule will be retried at a time, the rest of the concurrent rule translations blocked by the rate limit will await for the API to recover before attempting the translation again. ```ts /** Executor sleep configuration * A sleep time applied at the beginning of each single rule translation in the execution pool, * The objective of this sleep is to spread the load of concurrent translations, and prevent hitting the rate limit repeatedly. * The sleep time applied is a random number between [0-value]. Every time we hit rate limit the value is increased by the multiplier, up to the limit. */ const EXECUTOR_SLEEP = { initialValueSeconds: 3, multiplier: 2, limitSeconds: 96, // 1m36s (5 increases) } as const; ``` ### Migration batching changes ```ts /** Number of concurrent rule translations in the pool */ const TASK_CONCURRENCY = 10 as const; /** Number of rules loaded in memory to be translated in the pool */ const TASK_BATCH_SIZE = 100 as const; ``` #### Before - Batches of 15 rules were retrieved and executed in a `Promise.all`, requiring all of them to be completed before proceeding to the next batch. - A "batch sleep" of 10s was executed at the end of each iteration. #### In this PR - Batches of 100 rules are retrieved and kept in memory. The execution is performed in a task pool with a concurrency of 10 rules. This ensures there are always 10 rules executing at a time. - The "batch sleep" has been removed in favour of an "execution sleep" of rand[1-3]s at the start of each single rule migration. This individual sleep serves two goals: - Spread the load when the migration is first launched. - Prevent hitting the rate limit consistently: The sleep duration is increased every time we hit a rate limit. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit 64426b2)
…astic#211469) ## Summary Implements an exponential backoff retry strategy when the LLM API throws rate limit (`429`) errors. ### Backoff implementation - The `run` method from the `RuleMigrationsTaskClient` has been moved to the new `RuleMigrationTaskRunner` class. - The settings for the backoff are defined in this class with: ```ts /** Exponential backoff configuration to handle rate limit errors */ const RETRY_CONFIG = { initialRetryDelaySeconds: 1, backoffMultiplier: 2, maxRetries: 8, // max waiting time 4m15s (1*2^8 = 256s) } as const; ``` - Only one rule will be retried at a time, the rest of the concurrent rule translations blocked by the rate limit will await for the API to recover before attempting the translation again. ```ts /** Executor sleep configuration * A sleep time applied at the beginning of each single rule translation in the execution pool, * The objective of this sleep is to spread the load of concurrent translations, and prevent hitting the rate limit repeatedly. * The sleep time applied is a random number between [0-value]. Every time we hit rate limit the value is increased by the multiplier, up to the limit. */ const EXECUTOR_SLEEP = { initialValueSeconds: 3, multiplier: 2, limitSeconds: 96, // 1m36s (5 increases) } as const; ``` ### Migration batching changes ```ts /** Number of concurrent rule translations in the pool */ const TASK_CONCURRENCY = 10 as const; /** Number of rules loaded in memory to be translated in the pool */ const TASK_BATCH_SIZE = 100 as const; ``` #### Before - Batches of 15 rules were retrieved and executed in a `Promise.all`, requiring all of them to be completed before proceeding to the next batch. - A "batch sleep" of 10s was executed at the end of each iteration. #### In this PR - Batches of 100 rules are retrieved and kept in memory. The execution is performed in a task pool with a concurrency of 10 rules. This ensures there are always 10 rules executing at a time. - The "batch sleep" has been removed in favour of an "execution sleep" of rand[1-3]s at the start of each single rule migration. This individual sleep serves two goals: - Spread the load when the migration is first launched. - Prevent hitting the rate limit consistently: The sleep duration is increased every time we hit a rate limit. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit 64426b2)
…ff (#211469) (#212177) # Backport This will backport the following commits from `main` to `9.0`: - [[Security Solution][Siem migrations] Implement rate limit backoff (#211469)](#211469) <!--- Backport version: 9.6.6 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Sergi Massaneda","email":"sergi.massaneda@elastic.co"},"sourceCommit":{"committedDate":"2025-02-21T19:54:40Z","message":"[Security Solution][Siem migrations] Implement rate limit backoff (#211469)\n\n## Summary\n\nImplements an exponential backoff retry strategy when the LLM API throws\nrate limit (`429`) errors.\n\n### Backoff implementation\n\n- The `run` method from the `RuleMigrationsTaskClient` has been moved to\nthe new `RuleMigrationTaskRunner` class.\n- The settings for the backoff are defined in this class with:\n```ts\n/** Exponential backoff configuration to handle rate limit errors */\nconst RETRY_CONFIG = {\n initialRetryDelaySeconds: 1,\n backoffMultiplier: 2,\n maxRetries: 8,\n // max waiting time 4m15s (1*2^8 = 256s)\n} as const;\n```\n- Only one rule will be retried at a time, the rest of the concurrent\nrule translations blocked by the rate limit will await for the API to\nrecover before attempting the translation again.\n\n```ts\n/** Executor sleep configuration\n * A sleep time applied at the beginning of each single rule translation in the execution pool,\n * The objective of this sleep is to spread the load of concurrent translations, and prevent hitting the rate limit repeatedly.\n * The sleep time applied is a random number between [0-value]. Every time we hit rate limit the value is increased by the multiplier, up to the limit.\n */\nconst EXECUTOR_SLEEP = {\n initialValueSeconds: 3,\n multiplier: 2,\n limitSeconds: 96, // 1m36s (5 increases)\n} as const;\n```\n\n### Migration batching changes\n\n```ts\n/** Number of concurrent rule translations in the pool */\nconst TASK_CONCURRENCY = 10 as const;\n/** Number of rules loaded in memory to be translated in the pool */\nconst TASK_BATCH_SIZE = 100 as const;\n```\n\n#### Before \n\n- Batches of 15 rules were retrieved and executed in a `Promise.all`,\nrequiring all of them to be completed before proceeding to the next\nbatch.\n- A \"batch sleep\" of 10s was executed at the end of each iteration.\n\n#### In this PR\n\n- Batches of 100 rules are retrieved and kept in memory. The execution\nis performed in a task pool with a concurrency of 10 rules. This ensures\nthere are always 10 rules executing at a time.\n- The \"batch sleep\" has been removed in favour of an \"execution sleep\"\nof rand[1-3]s at the start of each single rule migration. This\nindividual sleep serves two goals:\n - Spread the load when the migration is first launched.\n- Prevent hitting the rate limit consistently: The sleep duration is\nincreased every time we hit a rate limit.\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"64426b2b4d99901a01ecef66a17db01049b05f1a","branchLabelMapping":{"^v9.1.0$":"main","^v8.19.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","Team:Threat Hunting","backport:version","v8.18.0","v9.1.0","v8.19.0"],"title":"[Security Solution][Siem migrations] Implement rate limit backoff","number":211469,"url":"https://github.com/elastic/kibana/pull/211469","mergeCommit":{"message":"[Security Solution][Siem migrations] Implement rate limit backoff (#211469)\n\n## Summary\n\nImplements an exponential backoff retry strategy when the LLM API throws\nrate limit (`429`) errors.\n\n### Backoff implementation\n\n- The `run` method from the `RuleMigrationsTaskClient` has been moved to\nthe new `RuleMigrationTaskRunner` class.\n- The settings for the backoff are defined in this class with:\n```ts\n/** Exponential backoff configuration to handle rate limit errors */\nconst RETRY_CONFIG = {\n initialRetryDelaySeconds: 1,\n backoffMultiplier: 2,\n maxRetries: 8,\n // max waiting time 4m15s (1*2^8 = 256s)\n} as const;\n```\n- Only one rule will be retried at a time, the rest of the concurrent\nrule translations blocked by the rate limit will await for the API to\nrecover before attempting the translation again.\n\n```ts\n/** Executor sleep configuration\n * A sleep time applied at the beginning of each single rule translation in the execution pool,\n * The objective of this sleep is to spread the load of concurrent translations, and prevent hitting the rate limit repeatedly.\n * The sleep time applied is a random number between [0-value]. Every time we hit rate limit the value is increased by the multiplier, up to the limit.\n */\nconst EXECUTOR_SLEEP = {\n initialValueSeconds: 3,\n multiplier: 2,\n limitSeconds: 96, // 1m36s (5 increases)\n} as const;\n```\n\n### Migration batching changes\n\n```ts\n/** Number of concurrent rule translations in the pool */\nconst TASK_CONCURRENCY = 10 as const;\n/** Number of rules loaded in memory to be translated in the pool */\nconst TASK_BATCH_SIZE = 100 as const;\n```\n\n#### Before \n\n- Batches of 15 rules were retrieved and executed in a `Promise.all`,\nrequiring all of them to be completed before proceeding to the next\nbatch.\n- A \"batch sleep\" of 10s was executed at the end of each iteration.\n\n#### In this PR\n\n- Batches of 100 rules are retrieved and kept in memory. The execution\nis performed in a task pool with a concurrency of 10 rules. This ensures\nthere are always 10 rules executing at a time.\n- The \"batch sleep\" has been removed in favour of an \"execution sleep\"\nof rand[1-3]s at the start of each single rule migration. This\nindividual sleep serves two goals:\n - Spread the load when the migration is first launched.\n- Prevent hitting the rate limit consistently: The sleep duration is\nincreased every time we hit a rate limit.\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"64426b2b4d99901a01ecef66a17db01049b05f1a"}},"sourceBranch":"main","suggestedTargetBranches":["9.0","8.x"],"targetPullRequestStates":[{"branch":"9.0","label":"v9.0.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"8.18","label":"v8.18.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"url":"https://github.com/elastic/kibana/pull/212154","number":212154,"state":"MERGED","mergeCommit":{"sha":"4bf719063c6015b1a68703cbcafb56a281a4b491","message":"[8.18] [Security Solution][Siem migrations] Implement rate limit backoff (#211469) (#212154)\n\n# Backport\n\nThis will backport the following commits from `main` to `8.18`:\n- [[Security Solution][Siem migrations] Implement rate limit backoff\n(#211469)](https://github.com/elastic/kibana/pull/211469)\n\n\n\n### Questions ?\nPlease refer to the [Backport tool\ndocumentation](https://github.com/sorenlouv/backport)\n\n\n\nCo-authored-by: Sergi Massaneda <sergi.massaneda@elastic.co>"}},{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/211469","number":211469,"mergeCommit":{"message":"[Security Solution][Siem migrations] Implement rate limit backoff (#211469)\n\n## Summary\n\nImplements an exponential backoff retry strategy when the LLM API throws\nrate limit (`429`) errors.\n\n### Backoff implementation\n\n- The `run` method from the `RuleMigrationsTaskClient` has been moved to\nthe new `RuleMigrationTaskRunner` class.\n- The settings for the backoff are defined in this class with:\n```ts\n/** Exponential backoff configuration to handle rate limit errors */\nconst RETRY_CONFIG = {\n initialRetryDelaySeconds: 1,\n backoffMultiplier: 2,\n maxRetries: 8,\n // max waiting time 4m15s (1*2^8 = 256s)\n} as const;\n```\n- Only one rule will be retried at a time, the rest of the concurrent\nrule translations blocked by the rate limit will await for the API to\nrecover before attempting the translation again.\n\n```ts\n/** Executor sleep configuration\n * A sleep time applied at the beginning of each single rule translation in the execution pool,\n * The objective of this sleep is to spread the load of concurrent translations, and prevent hitting the rate limit repeatedly.\n * The sleep time applied is a random number between [0-value]. Every time we hit rate limit the value is increased by the multiplier, up to the limit.\n */\nconst EXECUTOR_SLEEP = {\n initialValueSeconds: 3,\n multiplier: 2,\n limitSeconds: 96, // 1m36s (5 increases)\n} as const;\n```\n\n### Migration batching changes\n\n```ts\n/** Number of concurrent rule translations in the pool */\nconst TASK_CONCURRENCY = 10 as const;\n/** Number of rules loaded in memory to be translated in the pool */\nconst TASK_BATCH_SIZE = 100 as const;\n```\n\n#### Before \n\n- Batches of 15 rules were retrieved and executed in a `Promise.all`,\nrequiring all of them to be completed before proceeding to the next\nbatch.\n- A \"batch sleep\" of 10s was executed at the end of each iteration.\n\n#### In this PR\n\n- Batches of 100 rules are retrieved and kept in memory. The execution\nis performed in a task pool with a concurrency of 10 rules. This ensures\nthere are always 10 rules executing at a time.\n- The \"batch sleep\" has been removed in favour of an \"execution sleep\"\nof rand[1-3]s at the start of each single rule migration. This\nindividual sleep serves two goals:\n - Spread the load when the migration is first launched.\n- Prevent hitting the rate limit consistently: The sleep duration is\nincreased every time we hit a rate limit.\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"64426b2b4d99901a01ecef66a17db01049b05f1a"}},{"branch":"8.x","label":"v8.19.0","branchLabelMappingKey":"^v8.19.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
…ff (#211469) (#212178) # Backport This will backport the following commits from `main` to `8.x`: - [[Security Solution][Siem migrations] Implement rate limit backoff (#211469)](#211469) <!--- Backport version: 9.6.6 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Sergi Massaneda","email":"sergi.massaneda@elastic.co"},"sourceCommit":{"committedDate":"2025-02-21T19:54:40Z","message":"[Security Solution][Siem migrations] Implement rate limit backoff (#211469)\n\n## Summary\n\nImplements an exponential backoff retry strategy when the LLM API throws\nrate limit (`429`) errors.\n\n### Backoff implementation\n\n- The `run` method from the `RuleMigrationsTaskClient` has been moved to\nthe new `RuleMigrationTaskRunner` class.\n- The settings for the backoff are defined in this class with:\n```ts\n/** Exponential backoff configuration to handle rate limit errors */\nconst RETRY_CONFIG = {\n initialRetryDelaySeconds: 1,\n backoffMultiplier: 2,\n maxRetries: 8,\n // max waiting time 4m15s (1*2^8 = 256s)\n} as const;\n```\n- Only one rule will be retried at a time, the rest of the concurrent\nrule translations blocked by the rate limit will await for the API to\nrecover before attempting the translation again.\n\n```ts\n/** Executor sleep configuration\n * A sleep time applied at the beginning of each single rule translation in the execution pool,\n * The objective of this sleep is to spread the load of concurrent translations, and prevent hitting the rate limit repeatedly.\n * The sleep time applied is a random number between [0-value]. Every time we hit rate limit the value is increased by the multiplier, up to the limit.\n */\nconst EXECUTOR_SLEEP = {\n initialValueSeconds: 3,\n multiplier: 2,\n limitSeconds: 96, // 1m36s (5 increases)\n} as const;\n```\n\n### Migration batching changes\n\n```ts\n/** Number of concurrent rule translations in the pool */\nconst TASK_CONCURRENCY = 10 as const;\n/** Number of rules loaded in memory to be translated in the pool */\nconst TASK_BATCH_SIZE = 100 as const;\n```\n\n#### Before \n\n- Batches of 15 rules were retrieved and executed in a `Promise.all`,\nrequiring all of them to be completed before proceeding to the next\nbatch.\n- A \"batch sleep\" of 10s was executed at the end of each iteration.\n\n#### In this PR\n\n- Batches of 100 rules are retrieved and kept in memory. The execution\nis performed in a task pool with a concurrency of 10 rules. This ensures\nthere are always 10 rules executing at a time.\n- The \"batch sleep\" has been removed in favour of an \"execution sleep\"\nof rand[1-3]s at the start of each single rule migration. This\nindividual sleep serves two goals:\n - Spread the load when the migration is first launched.\n- Prevent hitting the rate limit consistently: The sleep duration is\nincreased every time we hit a rate limit.\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"64426b2b4d99901a01ecef66a17db01049b05f1a","branchLabelMapping":{"^v9.1.0$":"main","^v8.19.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","Team:Threat Hunting","backport:version","v8.18.0","v9.1.0","v8.19.0"],"title":"[Security Solution][Siem migrations] Implement rate limit backoff","number":211469,"url":"https://github.com/elastic/kibana/pull/211469","mergeCommit":{"message":"[Security Solution][Siem migrations] Implement rate limit backoff (#211469)\n\n## Summary\n\nImplements an exponential backoff retry strategy when the LLM API throws\nrate limit (`429`) errors.\n\n### Backoff implementation\n\n- The `run` method from the `RuleMigrationsTaskClient` has been moved to\nthe new `RuleMigrationTaskRunner` class.\n- The settings for the backoff are defined in this class with:\n```ts\n/** Exponential backoff configuration to handle rate limit errors */\nconst RETRY_CONFIG = {\n initialRetryDelaySeconds: 1,\n backoffMultiplier: 2,\n maxRetries: 8,\n // max waiting time 4m15s (1*2^8 = 256s)\n} as const;\n```\n- Only one rule will be retried at a time, the rest of the concurrent\nrule translations blocked by the rate limit will await for the API to\nrecover before attempting the translation again.\n\n```ts\n/** Executor sleep configuration\n * A sleep time applied at the beginning of each single rule translation in the execution pool,\n * The objective of this sleep is to spread the load of concurrent translations, and prevent hitting the rate limit repeatedly.\n * The sleep time applied is a random number between [0-value]. Every time we hit rate limit the value is increased by the multiplier, up to the limit.\n */\nconst EXECUTOR_SLEEP = {\n initialValueSeconds: 3,\n multiplier: 2,\n limitSeconds: 96, // 1m36s (5 increases)\n} as const;\n```\n\n### Migration batching changes\n\n```ts\n/** Number of concurrent rule translations in the pool */\nconst TASK_CONCURRENCY = 10 as const;\n/** Number of rules loaded in memory to be translated in the pool */\nconst TASK_BATCH_SIZE = 100 as const;\n```\n\n#### Before \n\n- Batches of 15 rules were retrieved and executed in a `Promise.all`,\nrequiring all of them to be completed before proceeding to the next\nbatch.\n- A \"batch sleep\" of 10s was executed at the end of each iteration.\n\n#### In this PR\n\n- Batches of 100 rules are retrieved and kept in memory. The execution\nis performed in a task pool with a concurrency of 10 rules. This ensures\nthere are always 10 rules executing at a time.\n- The \"batch sleep\" has been removed in favour of an \"execution sleep\"\nof rand[1-3]s at the start of each single rule migration. This\nindividual sleep serves two goals:\n - Spread the load when the migration is first launched.\n- Prevent hitting the rate limit consistently: The sleep duration is\nincreased every time we hit a rate limit.\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"64426b2b4d99901a01ecef66a17db01049b05f1a"}},"sourceBranch":"main","suggestedTargetBranches":["9.0","8.x"],"targetPullRequestStates":[{"branch":"9.0","label":"v9.0.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"8.18","label":"v8.18.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"url":"https://github.com/elastic/kibana/pull/212154","number":212154,"state":"MERGED","mergeCommit":{"sha":"4bf719063c6015b1a68703cbcafb56a281a4b491","message":"[8.18] [Security Solution][Siem migrations] Implement rate limit backoff (#211469) (#212154)\n\n# Backport\n\nThis will backport the following commits from `main` to `8.18`:\n- [[Security Solution][Siem migrations] Implement rate limit backoff\n(#211469)](https://github.com/elastic/kibana/pull/211469)\n\n\n\n### Questions ?\nPlease refer to the [Backport tool\ndocumentation](https://github.com/sorenlouv/backport)\n\n\n\nCo-authored-by: Sergi Massaneda <sergi.massaneda@elastic.co>"}},{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/211469","number":211469,"mergeCommit":{"message":"[Security Solution][Siem migrations] Implement rate limit backoff (#211469)\n\n## Summary\n\nImplements an exponential backoff retry strategy when the LLM API throws\nrate limit (`429`) errors.\n\n### Backoff implementation\n\n- The `run` method from the `RuleMigrationsTaskClient` has been moved to\nthe new `RuleMigrationTaskRunner` class.\n- The settings for the backoff are defined in this class with:\n```ts\n/** Exponential backoff configuration to handle rate limit errors */\nconst RETRY_CONFIG = {\n initialRetryDelaySeconds: 1,\n backoffMultiplier: 2,\n maxRetries: 8,\n // max waiting time 4m15s (1*2^8 = 256s)\n} as const;\n```\n- Only one rule will be retried at a time, the rest of the concurrent\nrule translations blocked by the rate limit will await for the API to\nrecover before attempting the translation again.\n\n```ts\n/** Executor sleep configuration\n * A sleep time applied at the beginning of each single rule translation in the execution pool,\n * The objective of this sleep is to spread the load of concurrent translations, and prevent hitting the rate limit repeatedly.\n * The sleep time applied is a random number between [0-value]. Every time we hit rate limit the value is increased by the multiplier, up to the limit.\n */\nconst EXECUTOR_SLEEP = {\n initialValueSeconds: 3,\n multiplier: 2,\n limitSeconds: 96, // 1m36s (5 increases)\n} as const;\n```\n\n### Migration batching changes\n\n```ts\n/** Number of concurrent rule translations in the pool */\nconst TASK_CONCURRENCY = 10 as const;\n/** Number of rules loaded in memory to be translated in the pool */\nconst TASK_BATCH_SIZE = 100 as const;\n```\n\n#### Before \n\n- Batches of 15 rules were retrieved and executed in a `Promise.all`,\nrequiring all of them to be completed before proceeding to the next\nbatch.\n- A \"batch sleep\" of 10s was executed at the end of each iteration.\n\n#### In this PR\n\n- Batches of 100 rules are retrieved and kept in memory. The execution\nis performed in a task pool with a concurrency of 10 rules. This ensures\nthere are always 10 rules executing at a time.\n- The \"batch sleep\" has been removed in favour of an \"execution sleep\"\nof rand[1-3]s at the start of each single rule migration. This\nindividual sleep serves two goals:\n - Spread the load when the migration is first launched.\n- Prevent hitting the rate limit consistently: The sleep duration is\nincreased every time we hit a rate limit.\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"64426b2b4d99901a01ecef66a17db01049b05f1a"}},{"branch":"8.x","label":"v8.19.0","branchLabelMappingKey":"^v8.19.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT-->
Summary
Implements an exponential backoff retry strategy when the LLM API throws rate limit (
429
) errors.Backoff implementation
run
method from theRuleMigrationsTaskClient
has been moved to the newRuleMigrationTaskRunner
class.Migration batching changes
Before
Promise.all
, requiring all of them to be completed before proceeding to the next batch.In this PR