Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bogon.py updates #45

Merged
merged 4 commits into from
Jan 26, 2025
Merged

Bogon.py updates #45

merged 4 commits into from
Jan 26, 2025

Conversation

jurraca
Copy link
Contributor

@jurraca jurraca commented Jan 3, 2025

Refactors the bogon.py module with some performance and clarity improvements, and adds tests for the module.

Copy link
Collaborator

@fjahr fjahr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far. But I don't think we should handle things gracefully in is_bogon_pfx, having an invalid network there is almost certainly an developer error and so it should fail loudly.

kartograf/bogon.py Outdated Show resolved Hide resolved
kartograf/bogon.py Outdated Show resolved Hide resolved
kartograf/bogon.py Outdated Show resolved Hide resolved
@jurraca jurraca force-pushed the bogons-fixes branch 2 times, most recently from b6d8e81 to bb1ca3a Compare January 11, 2025 11:34
@jurraca
Copy link
Contributor Author

jurraca commented Jan 11, 2025

I added the bugfix commit here for ease but its the same as #37.
Regarding handling valid or invalid networks, we could be getting invalid networks from the RPKI files no? Currently we just cast it and hope it works, afaict. I went in circles with this for a while, and settled on 7ce9637, which:

  • renames format_pfx to parse_pfx for clarity
  • returns None if it fails casting to a network
  • handles that return in the parsing logic of rpki irr and collectors parsing
    Maybe that last commit doesn't fit in this PR, but I couldn't figure out a clean way to do it without it. Let me know if I'm thinking about this wrong.

Copy link
Collaborator

@fjahr fjahr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the renaming and returning None makes sense, thanks! When you are adding the logging I will do some more testing and then it should be ready soon.

On the error handling and formatter/parser: First of all this is of course cleaning up stuff that I didn't do well initially. In the beginning I didn't really know what to expect from the data formats and so I was taking it step by step, looking at what I was getting and then adjusting the parsing. At the time I didn't think through where I wanted the decision to lie if a pfx is valid or not and I was probably a bit too permissive because I wanted to get something works. As I looked at it now a formatting function didn't seem to be the right place and I fear that it may drop some networks that are formatted in a way we did not anticipate but that may be valid if we bring it into the right format. Going forward, ideally we should anticipate in what ways the pfx can be invalid and document these cases, similar how this is done for the comments case. If we encounter something new that we didn't anticipate I guess it is debatable if we should fail or not. Probably logging to the debug file is a better option for now and as we get more knowledge what we might be missing and if we don't see debug logs there anymore maybe we can get more confident and fail more quickly.

What do you think? Does that make sense?

formatted_pfx = str(ipaddress.ip_network(pfx))
return f"{formatted_pfx}"
return str(ipaddress.ip_address(pfx))
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you print the pfx to our debug log here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like: "Parser encountered unknown error while parsing ...."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, but I added this in the {rpki, irr, collectors}/parse.py code instead so we don't have to pass the debug.log file around.

if network.subnet_of(ipaddress.ip_network(ipv6_range)):
return True

if prefix:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this isn't needed. There is a prefix and before the places where it is called and I think it's better to handle this outside the function rather than inside.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_bogon_asn also doesn't take such precautions and I would like it more if there were consistent on this.

Copy link
Contributor Author

@jurraca jurraca Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you, I was trying to get to a place where we cast via ipaddress.ip_network(pfx) only once, and everything downstream in the parsing takes an IPvXNetwork() object -- thanks for reading thru my intent :)


last_modified = datetime.strptime(entry["last-modified"], '%Y-%m-%dT%H:%M:%SZ')
last_modified = last_modified.replace(tzinfo=timezone.utc)
last_modified = last_modified.timestamp()

# Bogon prefixes and ASNs are excluded since they can not
# be used for routing.
if is_bogon_pfx(route) or is_bogon_asn(origin):
if route and (is_bogon_pfx(route) or is_bogon_asn(origin)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would do the same and be a bit easier to reason about?

Suggested change
if route and (is_bogon_pfx(route) or is_bogon_asn(origin)):
if not route or is_bogon_pfx(route) or is_bogon_asn(origin):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice yes, thank you, but this went in #47


# Bogon prefixes and ASNs are excluded since they can not
# be used for routing.
if is_bogon_pfx(prefix) or is_bogon_asn(asn):
if prefix and (is_bogon_pfx(prefix) or is_bogon_asn(asn)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here and in the irr parse file, I would prefer: if not prefix or

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, #47

asn = asn.upper().rstrip('\n')

if is_bogon_pfx(prefix) or is_bogon_asn(asn):
if prefix and (is_bogon_pfx(prefix) or is_bogon_asn(asn)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here and in the irr parse file, I would prefer: if not prefix or

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, #47

kartograf/util.py Show resolved Hide resolved
@fjahr
Copy link
Collaborator

fjahr commented Jan 12, 2025

I added the bugfix commit here for ease but its the same as #37.

Hm, this was a bit confusing to me because I wasn't sure where to add my review comments on this part. In the future, can you be explicit which PR you would me to review first if there is overlap? In Bitcoin Core we either say "This depends on XY, please review that first" or even put a PR into draft status.

@jurraca
Copy link
Contributor Author

jurraca commented Jan 14, 2025

this was a bit confusing

OK, i've rebased this on #47 , and this PR depends on it.

Probably logging to the debug file is a better option for now

Yep, good point. When I update my daily run with this, I'll go through the debug logs and see what we get.

I didn't really know what to expect from the data formats

Totally! More than the existing code, it took me a while to think through how the error handling should work tbh. I'm actually surprised this hasn't blown up more often, so it's likely that the upstream sources are pretty good about IP formats.

@fjahr
Copy link
Collaborator

fjahr commented Jan 19, 2025

Looking good here too after another light pass, but also needs rebase first :)

@jurraca
Copy link
Contributor Author

jurraca commented Jan 21, 2025

rebased and ran another reproduction run to test.

using sets instead of lists is a faster lookup. Also, compiling the set
with networks at module load time will be faster than casting to an ip_network
at every iteration.
we're really parsing here, not formatting: the network may be invalid
from sources provided upstream. Thus we return None for an invalid
network and handle accordingly.
@fjahr
Copy link
Collaborator

fjahr commented Jan 26, 2025

ACK, looks great now!

@fjahr fjahr merged commit f9a6fb7 into asmap:master Jan 26, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants