Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a significant optimization to the reindexing process in
JanusGraph
by allowing a subset of vertices to be reindexed instead of scanning the entire storage.This enhancement provides substantial performance improvements, primarily when the specific subset of vertices for indexing is already known.
NOTE
This feature is currently supported only for
CQL
storage. Other storage backends still need to be implemented.KeyColumnValueStore.java
Motivation
Previously, reindexing required scanning all vertices in storage, which could be highly resource-intensive and time-consuming, particularly in large datasets.
This update enables users to focus on a targeted subset of vertices, reducing the time and computational load for reindexing. This is especially beneficial in environments where only specific vertices are relevant to a given index or data update.
Changes
API in
JanusGraphManagement
Benefits
Enhanced Flexibility: This feature allows users to update specific sections of the graph more easily without impacting the entire dataset.
Backward Compatibility
This feature is backward compatible and does not impact existing functionality. Users not specifying a subset will still experience the previous behavior of scanning the entire storage.