-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add general_merge tests and a test data generator module #32
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few comments. I think adding such a test with generated data is always interesting but it's not where I would have started :) So something I would still like to have is very basic test vectors with static data, primarily because these are very easy to reason about and after we have set it up once it becomes trivial to add all kinds of edge cases we will stumble over. We could have them in CSV file like this one from BIP340: https://github.com/bitcoin/bips/blob/master/bip-0340/test-vectors.csv But let's add this here first since it's already in pretty good shape.
from random import randint | ||
|
||
|
||
def generate_ip(ip_type="v4", subnet_size="16"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The generate_data.py
file could be moved into a util
or shared
folder. But can be done as a follow-up when more tests are added.
tests/generate_data.py
Outdated
def generate_asns(count): | ||
asns = [] | ||
while count > 0: | ||
asn = randint(1, 100000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use our own max asn value for the end of the range here, i.e. max_encode
. Can also use the default for now since we are pretty married to it currently with the assumptions in the core code: 33521664
.
return network | ||
|
||
|
||
def generate_ip_networks( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, the ip networks (and to some degree asns too) generate methods don't have guarantees that the generated result is a set without overlap, right? While the chance seems low, there could be overlapping results which means the test outcomes are not deterministic. I think it would be neat if these functions could always generate a set. This would then allow us to make stricter testing. I.e. we could be sure that merging two sets of 10 with overlap of 1 have a result of 19 entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, done. However it's a bit more complicated as comparing sets of networks requires checking whether the networks are subnets of each other, hence the make_disjoint
function to enforce non-overlapping networks after generation.
tests/generate_data.py
Outdated
""" | ||
extra_new = [] | ||
for extra in extra_items: | ||
included = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could use boolean here
tests/generate_data.py
Outdated
end_ip = int(ipaddress.IPv4Address("255.255.255.255")) | ||
subnet_mask = end_ip << (32 - subnet_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is cleaner:
subnet_mask = ipaddress.IPv4Network(f"0.0.0.0/{subnet_size}", strict=False).netmask
tests/merge_test.py
Outdated
|
||
TEST_FILE_PATHS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the files are not cleaned up after each test run yet, right? should do that to avoid test polution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks -- I went with using pytest's tmp_path
which writes to tmp
and overwrites on every run, but doesn't clean up the run after it's finished.
65c42c1
to
2f9991d
Compare
Changes look good so far but similar to what I reported in the other PR, I am not sure how to run the test currently:
|
9899f4b
to
866834f
Compare
hmm sorry, not sure what happened there. I should have stacked these test PRs. |
Now the test works but I am getting these deprecation warnings. Maybe another version incompatibility issue?
|
iiuc Pandas should support numpy > 1.x , so this is odd. I can't see anything in pandas about this deprecation in particular. Our pandas version requirement is what pandas version do you have? |
Sorry, I thought I had already answered here. I did have 1.5.3. It looks good to me now when I use the new pandas version but it seems this branch is conflicting with the upgrade of the pandas version in master. Can you rebase one more time, then it should be ready to go :) |
the generate_data module has functions to create IP networks and addrs and create AS files as we would get from RPKI or RIR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
main_data = generate_file_items(100) | ||
main_ips = [item.split()[0] for item in main_data] | ||
rpki_ips = main_ips[:50] | ||
irr_ips = main_ips[50:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note (also to myself), since we may remove IRR as a source in the short-mid term when RPKI has adopted broadly enough, we shouldn't hardcode it in the tests unless necessary. Here I think the variables could have been named base
and extra
as well and that's a bit more future-proof.
Initial approach to tests, starting with the merge process. The generate_data module has functions to create IP networks and addrs and create AS files as we would get from RPKI or RIR.