-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confluence pagination handling #5440
Confluence pagination handling #5440
Conversation
Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>
Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>
Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>
@@ -38,7 +40,7 @@ public class ConfluenceRestClient extends AtlassianRestClient { | |||
//public static final String REST_API_SPACES = "/rest/api/api/spaces"; | |||
public static final String FIFTY = "50"; | |||
public static final String START_AT = "startAt"; | |||
public static final String MAX_RESULT = "maxResults"; | |||
public static final String LIMIT = "limit"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe LIMIT_PARAM
or LIMIT_PARAM_NAME
for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed to LIMIT_PARAM
} while (startAt < total); | ||
log.debug("Content items fetched so far: {}", total); | ||
paginationLinks = searchContentItems.getLinks(); | ||
} while (paginationLinks != null && paginationLinks.getNext() != null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any concerns about having this as a loop that we don't have control over the bounds for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Is there an expectation that this while loop terminates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It is paginating the confluence site content using confluence search results api provided cursor pointer. It can be a large site but it can't go infinite because we are relaying on the Confluence api response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that it is bounded by the response. But, that could put our code in an infinite loop if Confluence had a bug or the server itself were maliciously sending the same link over and over.
Should we have any sort of checks to ensure that doesn't happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dlvenable it looks like it is infinite loop in one way or the other. Even if it terminates the loop here, it will come back here to read more. So, as a source, that is OK, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, agree with Krishna. Even if we exit the loop, we will come back here to complete the page navigation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that makes sense. We'd hit it again. Maybe we should have a metric or some other mechanism to at least detect it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, currently, I am recording the metric at the end of the loop. I will move inside in my next PR
Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>
} while (startAt < total); | ||
log.debug("Content items fetched so far: {}", total); | ||
paginationLinks = searchContentItems.getLinks(); | ||
} while (paginationLinks != null && paginationLinks.getNext() != null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that it is bounded by the response. But, that could put our code in an infinite loop if Confluence had a bug or the server itself were maliciously sending the same link over and over.
Should we have any sort of checks to ensure that doesn't happen?
* pagination handling Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com> * adding option to pass batch size for the source coordinator partition Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com> * renaming the constant Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com> * review comments Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com> --------- Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>
Description
With each search call, Confluence API returns the cursors for the next page that we should use to navigate through the search results. Enhanced the code to follow through these cursor based navigation
Issues Resolved
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.