Skip to content

Add Transporter framework to make indexing strategy configurable #554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 71 commits into from
Apr 26, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
bc01437
Encapsulate cache updating logic in a distinct method
jtnelson Apr 20, 2025
cfff11b
Add a method for appending Segments to the database and updating stor…
jtnelson Apr 20, 2025
301a495
add some StoreTest hooks into a higher level abstract class
jtnelson Apr 20, 2025
b5258e6
add unit tests for Database#append
jtnelson Apr 20, 2025
6d23480
rename Database#append as Database#merge
jtnelson Apr 20, 2025
03ac11d
add pluggable framework for transport data from Buffer to Database
jtnelson Apr 20, 2025
de90b7b
add config for transporter and config handling
jtnelson Apr 20, 2025
f5c699f
incorporate BatchTransportable interface into the buffer
jtnelson Apr 20, 2025
f712fe2
update test suite
jtnelson Apr 20, 2025
d632dae
formatting
jtnelson Apr 20, 2025
0aa6658
update package name
jtnelson Apr 20, 2025
17241b7
fix a corner case with Keys util class
jtnelson Apr 21, 2025
94310b3
update documentation
jtnelson Apr 21, 2025
55dec20
Integrate StreamingImporter with Engine and add unit tests specifical…
jtnelson Apr 21, 2025
e0dbd93
don't batch a null page
jtnelson Apr 21, 2025
d4a82ff
don't instantiate transporter until the Engine is started because it …
jtnelson Apr 21, 2025
6435ce6
BatchTransporter needs to handle null writes at the end of the Page
jtnelson Apr 21, 2025
0397e20
refactor unit test to switch between modes in a more realistic manner
jtnelson Apr 21, 2025
c86639d
fix unit test
jtnelson Apr 21, 2025
0d1dd8a
move most of the buffer transport monitoring from the Buffer and Engi…
jtnelson Apr 21, 2025
b3ffa8f
rename config
jtnelson Apr 21, 2025
6da1f32
fix formatting
jtnelson Apr 21, 2025
a9185af
Merge branch 'develop' into feature/refactor-indexing
jtnelson Apr 22, 2025
257e1d8
fix bug that made it possible for BatchTransporter to cause an arrayi…
jtnelson Apr 22, 2025
c658374
stop engine after transporter test is finished
jtnelson Apr 22, 2025
3d4053b
grab transport lock in Engine write methods...
jtnelson Apr 22, 2025
f0b271e
Merge branch 'develop' into feature/refactor-indexing
jtnelson Apr 22, 2025
87a2b03
Merge branch 'develop' into feature/refactor-indexing
jtnelson Apr 22, 2025
6b5e1ad
Merge branch 'develop' into feature/refactor-indexing
jtnelson Apr 22, 2025
1aec615
Make sure that
jtnelson Apr 22, 2025
4b3dda9
add exception handling and tracking for the Transporter
jtnelson Apr 22, 2025
c486bd6
add unit test to reproduce a rare corner cases of transports modifyin…
jtnelson Apr 22, 2025
0303af3
update masterLock documentation in Database
jtnelson Apr 22, 2025
1cb211d
move remaining transport remnants from the buffer
jtnelson Apr 22, 2025
957bbf1
add some ETE tests for transporter throughput
jtnelson Apr 23, 2025
a829f4f
Remove tracking of verify scans in buffer because it was never used a…
jtnelson Apr 23, 2025
903d36c
update changelog
jtnelson Apr 23, 2025
1521b05
fix formatting
jtnelson Apr 23, 2025
bfb2268
fix test
jtnelson Apr 23, 2025
d0b4eab
Refactor Buffer to streamline locking (e.g., remove transportLoc) and…
jtnelson Apr 23, 2025
389e5a2
fix unit test
jtnelson Apr 23, 2025
3c11d34
fix documentation
jtnelson Apr 23, 2025
9e30aae
remove busy wait from Buffer in favor of wait/notify
jtnelson Apr 23, 2025
ce5171d
add stats and hang detection to base Transporter
jtnelson Apr 24, 2025
75a2885
small updates
jtnelson Apr 24, 2025
2017b8f
fix unit test
jtnelson Apr 24, 2025
2092e51
Add static method to check if source has pending batches
jtnelson Apr 24, 2025
e4dd3b1
doc updates
jtnelson Apr 24, 2025
79d9f88
these changes should be okay
jtnelson Apr 24, 2025
68a7233
Merge branch 'develop' into feature/refactor-indexing
jtnelson Apr 24, 2025
c0bfeb6
fix some variable names
jtnelson Apr 24, 2025
c4984b2
add some disagnostic logging about transporter type
jtnelson Apr 24, 2025
810a9bd
make Batch immutable, add logging
jtnelson Apr 25, 2025
545ede3
Merge branch 'develop' into feature/refactor-indexing
jtnelson Apr 25, 2025
c82ff69
add inline documentation to Segment#compile
jtnelson Apr 25, 2025
04646c4
compile the Segment before merging it, in BatchTransporter
jtnelson Apr 25, 2025
61591a8
Merge branch 'develop' into feature/refactor-indexing
jtnelson Apr 25, 2025
3026de8
ignore test that might be causing problems
jtnelson Apr 25, 2025
1a9603d
update logging
jtnelson Apr 26, 2025
28b330a
make the batch transporter enabled by default
jtnelson Apr 26, 2025
c831366
fix formatting
jtnelson Apr 26, 2025
856c372
update changelog
jtnelson Apr 26, 2025
584bd25
update documentation
jtnelson Apr 26, 2025
c419cd6
grab transport read lock un add/removeUnlocked methods to prevent cor…
jtnelson Apr 26, 2025
86bc7c0
only grab the transport read lock in add/removeUnlocked if we're actu…
jtnelson Apr 26, 2025
123b4cf
Revert "ignore test that might be causing problems"
jtnelson Apr 26, 2025
a369cf6
Revert "use TThreadPoolServer for management server instead of TSimpl…
jtnelson Apr 26, 2025
8e25f01
Reapply "use TThreadPoolServer for management server instead of TSimp…
jtnelson Apr 26, 2025
df22b66
Merge branch 'develop' into feature/refactor-indexing
jtnelson Apr 26, 2025
e287856
fix bug...
jtnelson Apr 26, 2025
85b3caa
update documentation to indicate the batch transformer is default
jtnelson Apr 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,37 @@

#### Version 0.12.0 (TBD)

##### Batch Transporter for Improved Automatic Indexing

We've introduced a new mechanism to control how Concourse Server transports data from the Buffer to the Database, significantly improving system throughput and responsiveness:

* **New Transporter Configuration**: Added the `transporter` configuration option in `concourse.yaml` that allows you to specify how Concourse moves data from the write-optimized Buffer to the read-optimized Database, where it becomes fully indexed.

* **Two Transport Strategies**:
* **Streaming Transporter** (legacy): Processes writes incrementally in small batches, with transport operations competing with reads and writes for system resources. While this provides consistent throughput by amortizing indexing costs across operations, it can lead to "stop-the-world" situations during high load when transport operations block all reads (and implicitly writes, which perform verification reads).
* **Batch Transporter** (default/new): Performs indexing entirely in the background, allowing reads and writes to continue uninterrupted during the indexing phase. Only a brief critical section is required when merging the fully-indexed data into the Database. This approach dramatically improves overall system throughput by eliminating the resource contention between transport operations and normal read/write operations.

* **Configuration Options**:
```
# Simple configuration
transporter: batch # or "streaming"

# Advanced configuration
transporter:
type: batch # or "streaming"
num_threads: 2 # default: 1
```

* **Performance Benefits**: The Batch Transporter significantly improves system throughput by:
* Moving the time-consuming indexing work to background threads
* Minimizing the duration of critical sections where locks block concurrent operations
* Reducing "stop-the-world" situations during high load
* Making system performance more predictable and responsive

* **Use Case Recommendation**: The Batch Transporter is particularly beneficial for workloads with high concurrent activity or large data volumes that require extensive indexing.

This enhancement represents a fundamental improvement to Concourse's architecture, addressing a key bottleneck in the storage engine by separating the indexing process from the critical path of normal operations.

##### Search
We made several changes to improve search performance and accuracy:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,14 @@ public static boolean isWritable(String key) {
boolean writable = key.length() > 0
&& KEY_VALIDATION_REGEX.matcher(key).matches();
if(writable) {
WRITABLE_KEY_CACHE.add(key);
try {
WRITABLE_KEY_CACHE.add(key);
}
catch (Exception e) {
// Ignore any concurrency exceptions while modifying the
// non-concurrent collection because failure would just
// cause us to re-validate the key in the future.
}
}
return writable;
}
Expand Down
Loading