Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add general_merge tests and a test data generator module #32

Merged
merged 1 commit into from
Dec 11, 2024

Conversation

jurraca
Copy link
Contributor

@jurraca jurraca commented Nov 4, 2024

Initial approach to tests, starting with the merge process. The generate_data module has functions to create IP networks and addrs and create AS files as we would get from RPKI or RIR.

Copy link
Collaborator

@fjahr fjahr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments. I think adding such a test with generated data is always interesting but it's not where I would have started :) So something I would still like to have is very basic test vectors with static data, primarily because these are very easy to reason about and after we have set it up once it becomes trivial to add all kinds of edge cases we will stumble over. We could have them in CSV file like this one from BIP340: https://github.com/bitcoin/bips/blob/master/bip-0340/test-vectors.csv But let's add this here first since it's already in pretty good shape.

from random import randint


def generate_ip(ip_type="v4", subnet_size="16"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The generate_data.py file could be moved into a util or shared folder. But can be done as a follow-up when more tests are added.

def generate_asns(count):
asns = []
while count > 0:
asn = randint(1, 100000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use our own max asn value for the end of the range here, i.e. max_encode. Can also use the default for now since we are pretty married to it currently with the assumptions in the core code: 33521664.

return network


def generate_ip_networks(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, the ip networks (and to some degree asns too) generate methods don't have guarantees that the generated result is a set without overlap, right? While the chance seems low, there could be overlapping results which means the test outcomes are not deterministic. I think it would be neat if these functions could always generate a set. This would then allow us to make stricter testing. I.e. we could be sure that merging two sets of 10 with overlap of 1 have a result of 19 entries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, done. However it's a bit more complicated as comparing sets of networks requires checking whether the networks are subnets of each other, hence the make_disjoint function to enforce non-overlapping networks after generation.

"""
extra_new = []
for extra in extra_items:
included = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could use boolean here

Comment on lines 10 to 11
end_ip = int(ipaddress.IPv4Address("255.255.255.255"))
subnet_mask = end_ip << (32 - subnet_size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is cleaner:

subnet_mask = ipaddress.IPv4Network(f"0.0.0.0/{subnet_size}", strict=False).netmask

Comment on lines 10 to 11

TEST_FILE_PATHS = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the files are not cleaned up after each test run yet, right? should do that to avoid test polution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks -- I went with using pytest's tmp_path which writes to tmp and overwrites on every run, but doesn't clean up the run after it's finished.

@jurraca jurraca force-pushed the merge-tests branch 2 times, most recently from 65c42c1 to 2f9991d Compare November 18, 2024 11:53
@jurraca jurraca mentioned this pull request Nov 18, 2024
@fjahr
Copy link
Collaborator

fjahr commented Nov 24, 2024

Changes look good so far but similar to what I reported in the other PR, I am not sure how to run the test currently:

 [nix] pytest tests/merge_test.py
===================================================================================================================== test session starts =====================================================================================================================
platform darwin -- Python 3.11.9, pytest-8.1.1, pluggy-1.4.0
rootdir: /Users/XXX/projects/python/kartograf
collected 0 items / 1 error

=========================================================================================================================== ERRORS ============================================================================================================================
____________________________________________________________________________________________________________ ERROR collecting tests/merge_test.py _____________________________________________________________________________________________________________
ImportError while importing test module '/Users/XXX/projects/python/kartograf/tests/merge_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/nix/store/na2a6spzlqg0rrgz8462bla7hxvygjcq-python3-3.11.9/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/merge_test.py:10: in <module>
    from kartograf.merge import general_merge
E   ModuleNotFoundError: No module named 'kartograf'

@jurraca jurraca force-pushed the merge-tests branch 3 times, most recently from 9899f4b to 866834f Compare November 25, 2024 14:55
@jurraca
Copy link
Contributor Author

jurraca commented Nov 25, 2024

hmm sorry, not sure what happened there. I should have stacked these test PRs.

@fjahr
Copy link
Collaborator

fjahr commented Nov 26, 2024

Now the test works but I am getting these deprecation warnings. Maybe another version incompatibility issue?

$ pytest tests
===================================================================================================================== test session starts =====================================================================================================================
platform darwin -- Python 3.12.6, pytest-8.3.3, pluggy-1.5.0
rootdir: /Users/XXX/projects/python/kartograf
collected 3 items

tests/merge_test.py ...                                                                                                                                                                                                                                 [100%]

====================================================================================================================== warnings summary =======================================================================================================================
tests/merge_test.py::test_merge
tests/merge_test.py::test_merge_disjoint
tests/merge_test.py::test_merge_joint
  /Users/XXX/projects/python/kartograf/venv/lib/python3.12/site-packages/pandas/core/dtypes/cast.py:1641: DeprecationWarning: np.find_common_type is deprecated.  Please use `np.result_type` or `np.promote_types`.
  See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information.  (Deprecated NumPy 1.25)
    return np.find_common_type(types, [])

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================================================ 3 passed, 3 warnings in 0.70s ================================================================================================================

@jurraca
Copy link
Contributor Author

jurraca commented Nov 27, 2024

iiuc Pandas should support numpy > 1.x , so this is odd. I can't see anything in pandas about this deprecation in particular. Our pandas version requirement is ==1.5.3 which is a bit old (we could at least have it be >=1.5.3). That deprecation went into effect in July 2023. I've been running pandas 2.2.1 with numpy 1.26.1 so haven't seen it.

what pandas version do you have?

@fjahr
Copy link
Collaborator

fjahr commented Dec 7, 2024

what pandas version do you have?

Sorry, I thought I had already answered here. I did have 1.5.3. It looks good to me now when I use the new pandas version but it seems this branch is conflicting with the upgrade of the pandas version in master. Can you rebase one more time, then it should be ready to go :)

the generate_data module has functions to create IP networks and addrs and
create AS files as we would get from RPKI or RIR.
Copy link
Collaborator

@fjahr fjahr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

main_data = generate_file_items(100)
main_ips = [item.split()[0] for item in main_data]
rpki_ips = main_ips[:50]
irr_ips = main_ips[50:]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note (also to myself), since we may remove IRR as a source in the short-mid term when RPKI has adopted broadly enough, we shouldn't hardcode it in the tests unless necessary. Here I think the variables could have been named base and extra as well and that's a bit more future-proof.

@fjahr fjahr merged commit b5dda3b into asmap:master Dec 11, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants