-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Remote Vector Index Build] Add download + indexOuput#write implementation to RemoteIndexBuildStrategy #2554
base: main
Are you sure you want to change the base?
Conversation
1a11805
to
3674aa2
Compare
src/main/java/org/opensearch/knn/index/codec/nativeindex/remote/RemoteIndexBuildStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/nativeindex/remote/RemoteIndexBuildStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/nativeindex/remote/RemoteIndexBuildStrategy.java
Outdated
Show resolved
Hide resolved
3674aa2
to
6d950eb
Compare
@@ -93,12 +100,12 @@ public void buildAndWriteIndex(BuildIndexParams indexInfo) throws IOException { | |||
log.debug("Submit vector build took {} ms for vector field [{}]", time_in_millis, indexInfo.getFieldName()); | |||
|
|||
stopWatch = new StopWatch().start(); | |||
awaitVectorBuild(); | |||
BlobPath downloadPath = awaitVectorBuild(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume that awaitVectorBuild
returns a BlobPath
constructed from the response the remote index build client gave. This may change in future PRs, see: #2548
bc0f66a
to
c945c88
Compare
BlobContainer blobContainer = getRepository().blobStore().blobContainer(downloadPath.parent()); | ||
// TODO: We are using the sequential download API as multi-part parallel download is difficult for us to implement today and | ||
// requires some changes in core. For more details, see: https://github.com/opensearch-project/k-NN/issues/2464 | ||
String fileName = downloadPath.toArray()[downloadPath.toArray().length - 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a place that can cause NPE, can we ensure that length checks and other things to ensure that this line is not throwing NPEs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that I'm trying to not take the file path resolution as part of this PR as whether we get it as a BlobPath
or String
or any other format is still tbd by the client implementation. Let me add some assertions / checks here for now though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So BlobPath
is not supposed to be used to represent a specific file, only a given directory. So in latest revision I'm going to take in a String and do the BlobPath
creation manually.
src/main/java/org/opensearch/knn/index/store/IndexOutputWithBuffer.java
Outdated
Show resolved
Hide resolved
…ategy Signed-off-by: Jay Deng <jayd0104@gmail.com>
c945c88
to
6b6c3b1
Compare
Description
As a follow-up to #2550, this PR includes the implementation + testing for downloading a graph file from the remote repository and then writing to an
IndexOutput
.The key change here is that in order to manage memory consumption, we use a buffer of size 64kb, the same as used in the
IndexOutputWithBuffer
class, to buffer the graph file from the remote repository into theIndexOutput
, else we will go OOM on the download of larger graph files.For now, we are only using sequential download for the graph file. Parallel download of the graph file has several difficulties associated with it, both on the download side as well as on the writing to
IndexOutput
side. For more details on these, see: #2464. We take on parallel blob download as a future improvement.Related Issues
Relates #2465
Relates #2392
Relates #2391
Check List
- [ ] New functionality has been documented.- [ ] API changes companion pull request created.--signoff
.- [ ] Public documentation issue/PR created.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.