-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate index state sharing into mem management #1545
Integrate index state sharing into mem management #1545
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## feature/remove-redunant-allocs #1545 +/- ##
====================================================================
+ Coverage 85.14% 85.32% +0.18%
- Complexity 1289 1302 +13
====================================================================
Files 168 169 +1
Lines 5249 5308 +59
Branches 495 499 +4
====================================================================
+ Hits 4469 4529 +60
+ Misses 573 571 -2
- Partials 207 208 +1 ☔ View full report in Codecov by Sentry. |
* | ||
* @param sharedIndexState to return to the system. | ||
*/ | ||
public void release(SharedIndexState sharedIndexState) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we wrap this whole function in try and finally to ensure that if any exceptions comes from any of the method we are releasing the write lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that makes sense
if (refCount <= 0) { | ||
log.info("Evicting entry from shared index state cache for key {}", sharedIndexState.getModelId()); | ||
sharedIndexStateCache.remove(sharedIndexState.getModelId()); | ||
JNIService.freeSharedIndexState(sharedIndexState.getSharedIndexStateAddress(), sharedIndexState.getKnnEngine()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this logic it means that if some query is still using the SharedIndexState then we will not be able to release the memory. But if a user has done the clear cache api then in that case we will just skip over this release and return that we have cleared the cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The release portion is tied to an IndexAllocation not the query. When the index allocation is closed, we will decrease the reference count to this. Once it goes to 0, we free. If an index is loaded and needs shared state, it will need to obtain read lock and so will be blocked. So, after this completes, it will have to load again.
On clear cache, all IndexAllocations will be closed leading to refcount going down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so this line will prevent https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/query/KNNWeight.java#L273 the scenario where query is happening and we cannot release the memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes thats correct
Adds the ability to share index state amongst indices during index load operations into the plugins memory management system. Introduces a manager of the shared state that will properly manage the lifecycle of the shared state. There was a bug in clear cache that had to be fixed to get this change working as well. Previously, only one index file per clear cache would be freed. This fixes that logic to clear everything. Added unit tests and an integration test to confirm functionality. In addition, modified recall integration tests to get more coverage on the different algo configs. Along with this, had to fix a few things around the computation of recall for non-l2 space types. Signed-off-by: John Mazanec <jmazane@amazon.com>
* @return ShareModelContext | ||
*/ | ||
public SharedIndexState get(long indexAddress, String modelId, KNNEngine knnEngine) { | ||
this.readWriteLock.readLock().lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we don't add lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its possible that underlying shared index state gets freed in release and this returns a value with an invalid pointer
this.readWriteLock.writeLock().lock(); | ||
|
||
try { | ||
if (!sharedIndexStateCache.containsKey(sharedIndexState.getModelId())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Why not use get and do null check? This will avoid get call at line no 107
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense let me add
* @return ++referenceCount | ||
*/ | ||
private long incRef() { | ||
return referenceCount.incrementAndGet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: may be void since you are not using incremented value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back and forth. It is required for decRef() so I figured for symmetry I would make it return long
if (StringUtils.isEmpty(modelId) || KNNEngine.FAISS != knnEngine) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checking Engine is faiss or not is already covered in the JNIService.isSharedIndexStateRequired
why we are adding it here again? We can avoid that check otherwise if we need to share state for some other engine then we will end up making changes here and also in JNIService.
public static boolean isSharedIndexStateRequired(long indexAddr, KNNEngine knnEngine) {
if (KNNEngine.FAISS == knnEngine) {
return FaissService.isSharedIndexStateRequired(indexAddr);
}
return false;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. will update
BWC seems unrelated. Created an issue around it: #1556 |
src/main/java/org/opensearch/knn/index/memory/SharedIndexStateManager.java
Outdated
Show resolved
Hide resolved
Signed-off-by: John Mazanec <jmazane@amazon.com>
da8a936
into
opensearch-project:feature/remove-redunant-allocs
…#1545) Adds the ability to share index state amongst indices during index load operations into the plugins memory management system. Introduces a manager of the shared state that will properly manage the lifecycle of the shared state. There was a bug in clear cache that had to be fixed to get this change working as well. Previously, only one index file per clear cache would be freed. This fixes that logic to clear everything. Added unit tests and an integration test to confirm functionality. In addition, modified recall integration tests to get more coverage on the different algo configs. Along with this, had to fix a few things around the computation of recall for non-l2 space types. Signed-off-by: John Mazanec <jmazane@amazon.com>
…#1545) Adds the ability to share index state amongst indices during index load operations into the plugins memory management system. Introduces a manager of the shared state that will properly manage the lifecycle of the shared state. There was a bug in clear cache that had to be fixed to get this change working as well. Previously, only one index file per clear cache would be freed. This fixes that logic to clear everything. Added unit tests and an integration test to confirm functionality. In addition, modified recall integration tests to get more coverage on the different algo configs. Along with this, had to fix a few things around the computation of recall for non-l2 space types. Signed-off-by: John Mazanec <jmazane@amazon.com>
Adds the ability to share index state amongst indices during index load operations into the plugins memory management system. Introduces a manager of the shared state that will properly manage the lifecycle of the shared state. There was a bug in clear cache that had to be fixed to get this change working as well. Previously, only one index file per clear cache would be freed. This fixes that logic to clear everything. Added unit tests and an integration test to confirm functionality. In addition, modified recall integration tests to get more coverage on the different algo configs. Along with this, had to fix a few things around the computation of recall for non-l2 space types. Signed-off-by: John Mazanec <jmazane@amazon.com>
…#1545) Adds the ability to share index state amongst indices during index load operations into the plugins memory management system. Introduces a manager of the shared state that will properly manage the lifecycle of the shared state. There was a bug in clear cache that had to be fixed to get this change working as well. Previously, only one index file per clear cache would be freed. This fixes that logic to clear everything. Added unit tests and an integration test to confirm functionality. In addition, modified recall integration tests to get more coverage on the different algo configs. Along with this, had to fix a few things around the computation of recall for non-l2 space types. Signed-off-by: John Mazanec <jmazane@amazon.com>
Adds the ability to share index state amongst indices during index load operations into the plugins memory management system. Introduces a manager of the shared state that will properly manage the lifecycle of the shared state. There was a bug in clear cache that had to be fixed to get this change working as well. Previously, only one index file per clear cache would be freed. This fixes that logic to clear everything. Added unit tests and an integration test to confirm functionality. In addition, modified recall integration tests to get more coverage on the different algo configs. Along with this, had to fix a few things around the computation of recall for non-l2 space types. Signed-off-by: John Mazanec <jmazane@amazon.com>
Description
Last PR in series to fix #1507
Adds the ability to share index state amongst indices during index load operations into the plugins memory management system. Introduces a manager of the shared state that will properly manage the lifecycle of the shared state.
Additionally, there was a bug in clear cache that had to be fixed to get this change working as well. Previously, only one index file per clear cache would be freed. This fixes that logic to clear everything.
Added unit tests and an integration test to confirm functionality. In addition, modified recall integration tests to get more coverage on the different algo configs. Along with this, had to fix a few things around the computation of recall for non-l2 space types.
Note - the addition of these integration tests will increase the CI time. Its a tradeoff that can be discussed. In future, we can take some items to reduce this time (such as #1540 to decouple CI for lib tests and plugin tests)
Once merged, I will create a PR to merge the feature branch into main so it can be released in 2.13.
Issues Resolved
#1507
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.