Skip to content

Add Transporter framework to make indexing strategy configurable #554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 71 commits into from
Apr 26, 2025

Conversation

jtnelson
Copy link
Member

@jtnelson jtnelson commented Apr 20, 2025

  • New Transporter Configuration: Added the transporter configuration option in concourse.yaml that allows you to specify how Concourse moves data from the write-optimized Buffer to the read-optimized Database, where it becomes fully indexed.

  • Two Transport Strategies:

    • Streaming Transporter (legacy): Processes writes incrementally in small batches, with transport operations competing with reads and writes for system resources. While this provides consistent throughput by amortizing indexing costs across operations, it can lead to "stop-the-world" situations during high load when transport operations block all reads (and implicitly writes, which perform verification reads).
    • Batch Transporter (default/new): Performs indexing entirely in the background, allowing reads and writes to continue uninterrupted during the indexing phase. Only a brief critical section is required when merging the fully-indexed data into the Database. This approach dramatically improves overall system throughput by eliminating the resource contention between transport operations and normal read/write operations.
  • Configuration Options:

# Simple configuration
transporter: batch  # or "streaming"

# Advanced configuration
transporter:
  type: batch       # or "streaming"
  num_threads: 2    # default: 1
  • Performance Benefits: The Batch Transporter significantly improves system throughput by:

    • Moving the time-consuming indexing work to background threads
    • Minimizing the duration of critical sections where locks block concurrent operations
    • Reducing "stop-the-world" situations during high load
    • Making system performance more predictable and responsive
  • Use Case Recommendation: The Batch Transporter is particularly beneficial for workloads with high concurrent activity or large data volumes that require extensive indexing.

This enhancement represents a fundamental improvement to Concourse's architecture, addressing a key bottleneck in the storage engine by separating the indexing process from the critical path of normal operations.

jtnelson added 30 commits April 20, 2025 07:37
1. currentPage is cleared on start
2. The currentPage is not batched for transport
3. There is a check to make sure the currentPage is not trying to be removed
jtnelson added 28 commits April 23, 2025 20:31
…ner case where accepting Writes causing issues
…eServer which is not suitable for production"

This reverts commit 4de376e.
…leServer which is not suitable for production"

This reverts commit a369cf6.
@jtnelson jtnelson merged commit 2882cdd into develop Apr 26, 2025
6 checks passed
@jtnelson jtnelson deleted the feature/refactor-indexing branch April 26, 2025 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant