-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#305 getAncestors Database functionality. #311
base: master
Are you sure you want to change the base?
Conversation
1. Made the necessary changes as mentioned by the team. 3. Made the necessary changes to the getAncestors Database functionality.
JaCoCo model module code coverage report - scala 2.13.11
|
JaCoCo agent module code coverage report - scala 2.13.11
|
JaCoCo reader module code coverage report - scala 2.13.11
|
JaCoCo server module code coverage report - scala 2.13.11
|
-- has_more - Flag indicating if there are more partitionings available | ||
|
||
-- Status codes: | ||
-- 11 - OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not super important I suppose but still, according to https://github.com/AbsaOSS/fa-db/blob/master/core/src/main/scala/za/co/absa/db/fadb/status/README.md we would maybe want to use status 10 instead of 11.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed as mentioned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see 11. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed as needed
-- | ||
------------------------------------------------------------------------------- | ||
DECLARE | ||
partitionCreateAt TIMESTAMP; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partitioning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed as mentioned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the local variables start with _
by convention - avoids confusion with OUT parameters and column names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you the convention is useful to know. I have made the change.
LIMIT i_limit | ||
OFFSET i_offset; | ||
|
||
IF FOUND THEN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You return status already from the query. And there is no reason to return 42. There are no records returned if ancestors don't exist. Have a look at runs.get_partitioning_checkpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We then simply process the data as
if (results.nonEmpty && results.head.hasMore) ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense although. From runs.get_paritioning_checkpoint_v2 it has a similar logic to this.
What I will do is comment it out for now and determine if it is necessary.
* limitations under the License. | ||
*/ | ||
|
||
CREATE OR REPLACE FUNCTION runs.get_ancestors( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would name it to runs.get_partitioning_ancestors
, otherwise the name is little ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, changed
-- | ||
-- Parameters: | ||
-- i_id_partitioning - id that we asking the Ancestors for | ||
-- i_limit - (optional) maximum number of partitionings to return, default is 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not important:
Don't we used 10 as the default limit in our functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change it to 10. I don't think it is important either.
-- has_more - Flag indicating if there are more partitionings available | ||
|
||
-- Status codes: | ||
-- 11 - OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see 11. 😉
-- | ||
------------------------------------------------------------------------------- | ||
DECLARE | ||
partitionCreateAt TIMESTAMP; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the local variables start with _
by convention - avoids confusion with OUT parameters and column names.
-- Status codes: | ||
-- 11 - OK | ||
-- 41 - Partitioning not found | ||
-- 42 - Ancestor Partitioning not found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no need for this status (and error one furthermore). If no ancestors found, it's OK, simple an empty list (particularly with paging).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed as needed
WHERE | ||
PF2.fk_partitioning = i_id_partitioning | ||
AND | ||
P.created_at < partitionCreateAt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this condition?
Actually I think the whole query is incorrect, unfortunately.
It should be
FROM
flows.partitioning_to_flow PF
INNER JOIN flows.flows F ON F.id_flow = PF.id_flow
INNER JOIN runs.partitionings P ON P.id_partitioning = F.fk_primary_partitioning
WHERE
PF.fk_partitioning = i_id_partitioning AND
P.id_partitioning IS DISTINCT FROM i_id_partitioning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, but I think that it should be as the current since your query would get the children partitionings as well. Not just the Ancestors. Please let me know if I am mistaken with my analysis on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that you understood the problem, technologies, and I checked your tests and it looks logically correct. Good job!
However, the biggest problem I see is this condition: P.created_at < partitionCreateAt
- we cannot count on creation time - parent can be created before or after child creation, so you need to adjust the query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made this change as discussed, I did not realise there was an added field in the Flows table. That made this very easy. Thank you for the feedback, it has been changed and everything is working as intended.
IN i_offset BIGINT DEFAULT 0, | ||
OUT status INTEGER, | ||
OUT status_text TEXT, | ||
OUT ancestorid BIGINT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OUT ancestorid BIGINT, | |
OUT ancestor_id BIGINT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
assert(row.getLong("ancestorid").contains(partId1)) | ||
assert(returnedPartitioningParsed == expectedPartitioning1) | ||
assert(row.getString("author").contains("Grandpa")) | ||
assert(row.getString("author").contains("Grandpa")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also I'd recommend to test whether the result is final, by adding this on the end of this local scope:
assert(!queryResult.next())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in to all tests, thank you
assert(row.getString("status_text").contains("OK")) | ||
assert(row.getLong("ancestorid").contains(partId4)) | ||
assert(returnedPartitioningParsed == expectedPartitioning4) | ||
assert(row.getString("author").contains("Grandson")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, add the check on finality
Btw, I draw a simple diagram of partitioning hierarchy for this test and I see that this test checks ancestors correctly. For other reviewers, this is the hierarchy - ancestors are requested for Partitioning 5, and ancestors are: 1, 2, 3, 4:
1 2
| |
3 4 6
\ | |
5 7
| /
8
consider adding this 'diagram' into the code comment somewhere for easier understanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, you can use .map
to iterate over a list of expected results - you are checking 4 rows and the code is almost the same, so can be massively simplified :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you will modify once I have the solution correct. I have also added the diagram in the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, looking forward to it, I'll review once adjusted :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used the map implementation as mentioned.
Take a look, I had to add a few other items to make it work.
I could be missing something or if my method is not appropriate, please let me know.
assert(row.getString("status_text").contains("OK")) | ||
assert(row.getLong("ancestorid").contains(partId7)) | ||
assert(returnedPartitioningParsed == expectedPartitioning7) | ||
assert(row.getString("author").contains("Daughter")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here - briefly checked, what you are checking seems to be correct, but can be massively simplified & I'd like you to add the last assert on data finality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
val Time5 = OffsetDateTime.parse("1992-08-07T10:00:00Z") | ||
val Time6 = OffsetDateTime.parse("1992-08-08T10:00:00Z") | ||
val Time7 = OffsetDateTime.parse("1992-08-09T10:00:00Z") | ||
val Time8 = OffsetDateTime.parse("1992-08-09T11:00:00Z") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose to change these times so that they are not incremental - let's say that they are more 'random', let's say that there isn't this nice chronological sequence of partitioning time creation. For example, in real life, parent can be created in later moment than a child.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the entire premise of time creation from the query
1. Made the necessary changes as mentioned by the team. 2. Made the necessary changes to the getAncestors Database functionality. 3. Made changes as requested
1. Made the necessary changes as mentioned by the team. 2. Made the necessary changes to the getAncestors Database functionality. 3. Now working completely as intended.
1. Made the necessary changes as mentioned by the team. 2. Made the necessary changes to the getAncestors Database functionality. 3. Now working completely as intended.
1. Made the necessary changes as mentioned by the team. 2. Made the necessary changes to the getAncestors Database functionality. 3. Now working completely as intended. 4. Removed unnecessary files
This PR is strictly for the getAncestors DataBase Functionality.
Any feedback is welcome.