Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: optimize ratelimit logs #2875

Merged
merged 3 commits into from
Feb 6, 2025
Merged

refactor: optimize ratelimit logs #2875

merged 3 commits into from
Feb 6, 2025

Conversation

ogzhanolguncu
Copy link
Contributor

@ogzhanolguncu ogzhanolguncu commented Feb 6, 2025

What does this PR do?

Fixes # (issue)

If there is not an issue for this, please create one first. This is used to tracking purposes and also helps use understand why this PR exists

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Chore (refactoring code, technical debt, workflow improvements)
  • Enhancement (small improvements)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How should this be tested?

  • Test A
  • Test B

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Contributing Guide
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand areas
  • Ran pnpm build
  • Ran pnpm fmt
  • Checked for warnings, there are none
  • Removed all console.logs
  • Merged the latest changes from main onto my branch with git pull origin main
  • My changes don't cause any responsiveness issues

Appreciated

  • If a UI change was made: Added a screen recording or screenshots to this PR
  • Updated the Unkey Docs if changes were necessary

Summary by CodeRabbit

  • New Features

    • Enhanced database indexing has been implemented to optimize query performance and improve overall responsiveness.
  • Refactor

    • Simplified and clarified the logic for retrieving rate limit log data, ensuring more consistent and reliable operations.

Copy link

changeset-bot bot commented Feb 6, 2025

⚠️ No Changeset found

Latest commit: 32c2d0c

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link

vercel bot commented Feb 6, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
dashboard ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 6, 2025 11:57am
engineering ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 6, 2025 11:57am
play ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 6, 2025 11:57am
www ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 6, 2025 11:57am

Copy link
Contributor

coderabbitai bot commented Feb 6, 2025

📝 Walkthrough

Walkthrough

This pull request introduces SQL migrations to add reversible index management on two ClickHouse tables. Two new indexes—idx_workspace_time (a composite index on workspace_id and time) and idx_request_id (a single-column index on request_id)—are created on both the ratelimits.raw_ratelimits_v1 and metrics.raw_api_requests_v1 tables using the minmax type with a granularity of 1. Additionally, modifications in the ratelimits.ts file reformat Zod schema definitions, adjust type inferences, and refactor the getRatelimitLogs function to use a common table expression (CTE) and a LEFT JOIN for improved clarity.

Changes

File(s) Change Summary
internal/clickhouse/schema/048_..._indexes_v1.sql
internal/clickhouse/schema/049_..._indexes_v1.sql
Added Goose migrations for index management on ClickHouse tables. Each file adds two indexes—idx_workspace_time (composite on workspace_id, time) and idx_request_id (on request_id)—using the minmax type with granularity 1, along with corresponding rollback commands.
internal/clickhouse/src/ratelimits.ts Refactored Zod schema formatting and type inference for ratelimit logs. Modified the getRatelimitLogs function to simplify condition construction by introducing a filtered CTE and incorporating a LEFT JOIN with metrics data for enhanced query clarity and maintainability.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant S as Ratelimit Service
    participant DB as Database

    C->>S: Request ratelimit logs with filters
    S->>S: Build filter conditions and construct CTE (filtered_ratelimits)
    S->>DB: Execute SQL query with LEFT JOIN on metrics data
    DB-->>S: Return filtered results
    S-->>C: Respond with ratelimit logs
Loading

Suggested reviewers

  • mcstepp
  • chronark
  • perkinsjr
  • MichaelUnkey
✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

github-actions bot commented Feb 6, 2025

Thank you for following the naming conventions for pull request titles! 🙏

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
internal/clickhouse/src/ratelimits.ts (1)

309-326: LGTM! Well-structured CTE with optimized filtering.

The CTE effectively organizes the filtering logic and should work well with the new indexes mentioned in the PR summary (idx_workspace_time and idx_request_id).

Consider adding a comment explaining the cursor-based pagination logic for better maintainability:

+    -- Cursor-based pagination using time and request_id as composite cursor
     AND (({cursorTime: Nullable(UInt64)} IS NULL AND {cursorRequestId: Nullable(String)} IS NULL) 
          OR (time, request_id) < ({cursorTime: Nullable(UInt64)}, {cursorRequestId: Nullable(String)}))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 990d7d1 and 010aadf.

📒 Files selected for processing (3)
  • internal/clickhouse/schema/048_raw_ratelimits_metrics_indexes_v1.sql (1 hunks)
  • internal/clickhouse/schema/049_raw_api_metrics_ratelimit_indexes_v1.sql (1 hunks)
  • internal/clickhouse/src/ratelimits.ts (8 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (15)
  • GitHub Check: Test Packages / Test ./packages/hono
  • GitHub Check: Test Packages / Test ./packages/cache
  • GitHub Check: Test Packages / Test ./packages/api
  • GitHub Check: Test Packages / Test ./internal/clickhouse
  • GitHub Check: Test Packages / Test ./internal/resend
  • GitHub Check: Test Packages / Test ./internal/keys
  • GitHub Check: Test Packages / Test ./internal/id
  • GitHub Check: Test Packages / Test ./internal/hash
  • GitHub Check: Test Packages / Test ./internal/encryption
  • GitHub Check: Test Packages / Test ./internal/billing
  • GitHub Check: Build / Build
  • GitHub Check: Test API / API Test Local
  • GitHub Check: Test Agent Local / test_agent_local
  • GitHub Check: autofix
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (10)
internal/clickhouse/schema/049_raw_api_metrics_ratelimit_indexes_v1.sql (3)

1-4: Goose Up Migration for API Metrics Table Indexes
The migration commands are correctly encapsulated with the goose marker for upgrades. The composite index idx_workspace_time on (workspace_id, time) using the minmax type with a granularity of 1 appears correctly defined. Please verify that your query patterns on metrics.raw_api_requests_v1 actually benefit from this composite index.


5-7: Additional Index for API Metrics Table
The addition of the single-column index idx_request_id on request_id is defined appropriately. Reusing similar index types across tables helps maintain consistency, but ensure that this index aligns with the workload and query filtering requirements.


8-13: Rollback Commands for API Metrics Table Indexes
The "goose down" section provides clear and reversible commands that drop the newly added indexes. It is important to double-check these commands in a staging environment to ensure that dropping these indexes does not unexpectedly degrade query performance.

internal/clickhouse/schema/048_raw_ratelimits_metrics_indexes_v1.sql (3)

1-4: Goose Up Migration for Ratelimits Table Indexes
This up migration correctly adds the composite index idx_workspace_time on (workspace_id, time) for the ratelimits.raw_ratelimits_v1 table. It follows a similar pattern to the metrics table, which promotes consistency across the codebase. Confirm that this index optimizes the queries related to ratelimit logs' performance.


5-7: Additional Index for Ratelimits Table
The single-column index idx_request_id on the request_id column is properly defined with the minmax strategy and a granularity of 1. Having parallel indexing strategies across different tables can simplify maintenance. Verify that this change provides tangible query performance improvements.


8-13: Rollback Migration for Ratelimits Table Indexes
The rollback commands for dropping the indexes seem straightforward and complete. It is a good practice to validate the rollback process in a test environment to ensure that the removal of indexes does not interrupt any critical query paths.

internal/clickhouse/src/ratelimits.ts (4)

42-47: LGTM! Type definitions are well-formatted.

The type definitions have been reformatted for better readability while maintaining the same functionality.


270-284: LGTM! Status condition construction is simplified.

The status condition construction is now more readable and maintainable with clear null checks and boolean conversions.


285-302: LGTM! Identifier conditions are well-structured.

The identifier conditions are now more readable with clear null checks and proper operator handling.


346-363: Verify metrics table join performance.

The LEFT JOIN with the metrics table looks good, but consider these points:

  1. The join is only on request_id without additional time-based conditions
  2. The subquery filters on workspace_id and time which should help performance

Please verify that:

  • The join performance is acceptable with large datasets
  • No duplicate records are produced when multiple metrics exist for the same request_id

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
internal/clickhouse/src/ratelimits.ts (3)

257-270: Consider simplifying the status condition construction.

The status condition construction can be simplified by using array methods more effectively.

-    const statusCondition = !hasStatusFilters
-      ? "TRUE"
-      : args.status
-          ?.map((filter, index) => {
-            if (filter.operator === "is") {
-              const paramName = `statusValue_${index}`;
-              paramSchemaExtension[paramName] = z.boolean();
-              parameters[paramName] = filter.value === "passed";
-              return `passed = {${paramName}: Boolean}`;
-            }
-            return null;
-          })
-          .filter(Boolean)
-          .join(" OR ") || "TRUE";
+    const statusCondition = !hasStatusFilters
+      ? "TRUE"
+      : args.status
+          ?.filter(filter => filter.operator === "is")
+          .map((filter, index) => {
+            const paramName = `statusValue_${index}`;
+            paramSchemaExtension[paramName] = z.boolean();
+            parameters[paramName] = filter.value === "passed";
+            return `passed = {${paramName}: Boolean}`;
+          })
+          .join(" OR ") || "TRUE";

272-289: Consider using LIKE for better performance in identifier filtering.

The position function might not be as performant as LIKE for string pattern matching in ClickHouse.

             switch (p.operator) {
               case "is":
                 return `identifier = {${paramName}: String}`;
               case "contains":
-                return `position({${paramName}: String}, identifier) > 0`;
+                return `identifier LIKE concat('%', {${paramName}: String}, '%')`;
               default:
                 return null;
             }

332-349: Consider optimizing the metrics subquery.

The metrics subquery could benefit from the new indexes. Also, consider adding workspace_id to the JOIN condition for additional safety.

 LEFT JOIN (
     SELECT 
         request_id,
+        workspace_id,
         host,
         method,
         path,
         request_headers,
         request_body,
         response_status,
         response_headers,
         response_body,
         service_latency,
         user_agent,
         colo
     FROM metrics.raw_api_requests_v1
     WHERE workspace_id = {workspaceId: String}
         AND time BETWEEN {startTime: UInt64} AND {endTime: UInt64}
- ) m ON fr.request_id = m.request_id
+ ) m ON fr.request_id = m.request_id AND fr.workspace_id = m.workspace_id
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 010aadf and 32c2d0c.

📒 Files selected for processing (1)
  • internal/clickhouse/src/ratelimits.ts (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (16)
  • GitHub Check: Test Packages / Test ./packages/rbac
  • GitHub Check: Test Packages / Test ./packages/nextjs
  • GitHub Check: Test Packages / Test ./packages/cache
  • GitHub Check: Test Packages / Test ./packages/api
  • GitHub Check: Test Packages / Test ./internal/clickhouse
  • GitHub Check: Test Packages / Test ./internal/resend
  • GitHub Check: Test Packages / Test ./internal/keys
  • GitHub Check: Test Packages / Test ./internal/id
  • GitHub Check: Test Packages / Test ./internal/hash
  • GitHub Check: Test Packages / Test ./internal/encryption
  • GitHub Check: Test Packages / Test ./internal/billing
  • GitHub Check: Test API / API Test Local
  • GitHub Check: Test Agent Local / test_agent_local
  • GitHub Check: Build / Build
  • GitHub Check: autofix
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (3)
internal/clickhouse/src/ratelimits.ts (3)

253-255: LGTM! Clear and descriptive boolean flags.

The introduction of boolean flags improves code readability by making the filter presence checks explicit.


295-312: LGTM! Well-structured CTE for improved readability.

The introduction of the CTE improves query organization and maintainability. The filtering conditions are properly aligned with the new indexes on workspace_id and time.


350-351: Verify index usage for ORDER BY clause.

The ORDER BY clause on time and request_id should leverage the new indexes.

@perkinsjr perkinsjr merged commit 3b7d228 into main Feb 6, 2025
28 of 29 checks passed
@perkinsjr perkinsjr deleted the improve-ratelimit-logs branch February 6, 2025 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants