-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bogon.py updates #45
Bogon.py updates #45
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great so far. But I don't think we should handle things gracefully in is_bogon_pfx
, having an invalid network there is almost certainly an developer error and so it should fail loudly.
b6d8e81
to
bb1ca3a
Compare
I added the bugfix commit here for ease but its the same as #37.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the renaming and returning None makes sense, thanks! When you are adding the logging I will do some more testing and then it should be ready soon.
On the error handling and formatter/parser: First of all this is of course cleaning up stuff that I didn't do well initially. In the beginning I didn't really know what to expect from the data formats and so I was taking it step by step, looking at what I was getting and then adjusting the parsing. At the time I didn't think through where I wanted the decision to lie if a pfx is valid or not and I was probably a bit too permissive because I wanted to get something works. As I looked at it now a formatting function didn't seem to be the right place and I fear that it may drop some networks that are formatted in a way we did not anticipate but that may be valid if we bring it into the right format. Going forward, ideally we should anticipate in what ways the pfx can be invalid and document these cases, similar how this is done for the comments case. If we encounter something new that we didn't anticipate I guess it is debatable if we should fail or not. Probably logging to the debug file is a better option for now and as we get more knowledge what we might be missing and if we don't see debug logs there anymore maybe we can get more confident and fail more quickly.
What do you think? Does that make sense?
kartograf/util.py
Outdated
formatted_pfx = str(ipaddress.ip_network(pfx)) | ||
return f"{formatted_pfx}" | ||
return str(ipaddress.ip_address(pfx)) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you print the pfx to our debug log here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like: "Parser encountered unknown error while parsing ...."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, but I added this in the {rpki, irr, collectors}/parse.py
code instead so we don't have to pass the debug.log
file around.
kartograf/bogon.py
Outdated
if network.subnet_of(ipaddress.ip_network(ipv6_range)): | ||
return True | ||
|
||
if prefix: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this isn't needed. There is a prefix and
before the places where it is called and I think it's better to handle this outside the function rather than inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is_bogon_asn
also doesn't take such precautions and I would like it more if there were consistent on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you, I was trying to get to a place where we cast via ipaddress.ip_network(pfx)
only once, and everything downstream in the parsing takes an IPvXNetwork()
object -- thanks for reading thru my intent :)
kartograf/irr/parse.py
Outdated
|
||
last_modified = datetime.strptime(entry["last-modified"], '%Y-%m-%dT%H:%M:%SZ') | ||
last_modified = last_modified.replace(tzinfo=timezone.utc) | ||
last_modified = last_modified.timestamp() | ||
|
||
# Bogon prefixes and ASNs are excluded since they can not | ||
# be used for routing. | ||
if is_bogon_pfx(route) or is_bogon_asn(origin): | ||
if route and (is_bogon_pfx(route) or is_bogon_asn(origin)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would do the same and be a bit easier to reason about?
if route and (is_bogon_pfx(route) or is_bogon_asn(origin)): | |
if not route or is_bogon_pfx(route) or is_bogon_asn(origin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice yes, thank you, but this went in #47
kartograf/rpki/parse.py
Outdated
|
||
# Bogon prefixes and ASNs are excluded since they can not | ||
# be used for routing. | ||
if is_bogon_pfx(prefix) or is_bogon_asn(asn): | ||
if prefix and (is_bogon_pfx(prefix) or is_bogon_asn(asn)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here and in the irr parse file, I would prefer: if not prefix or
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, #47
kartograf/collectors/parse.py
Outdated
asn = asn.upper().rstrip('\n') | ||
|
||
if is_bogon_pfx(prefix) or is_bogon_asn(asn): | ||
if prefix and (is_bogon_pfx(prefix) or is_bogon_asn(asn)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here and in the irr parse file, I would prefer: if not prefix or
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, #47
Hm, this was a bit confusing to me because I wasn't sure where to add my review comments on this part. In the future, can you be explicit which PR you would me to review first if there is overlap? In Bitcoin Core we either say "This depends on XY, please review that first" or even put a PR into draft status. |
OK, i've rebased this on #47 , and this PR depends on it.
Yep, good point. When I update my daily run with this, I'll go through the debug logs and see what we get.
Totally! More than the existing code, it took me a while to think through how the error handling should work tbh. I'm actually surprised this hasn't blown up more often, so it's likely that the upstream sources are pretty good about IP formats. |
Looking good here too after another light pass, but also needs rebase first :) |
rebased and ran another reproduction run to test. |
using sets instead of lists is a faster lookup. Also, compiling the set with networks at module load time will be faster than casting to an ip_network at every iteration.
we're really parsing here, not formatting: the network may be invalid from sources provided upstream. Thus we return None for an invalid network and handle accordingly.
ACK, looks great now! |
Refactors the
bogon.py
module with some performance and clarity improvements, and adds tests for the module.