Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: Check PIM Register/-Stop handling in pim_igmp_vrf topotest #18329

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

gromit1811
Copy link
Contributor

Check whether PIM Register messages are generated towards the RP and answered by Register-Stop from the RP. To force generation of PIM Register, move the DR role from the RP on R11/12 to R1 (otherwise, with DR and RP being the same routers, no Register messages would be needed). The RPs need to generate Register-Stop because there are no other group members besides the ones on the path whether the traffic is received from on the RPs.

Useful as a regression test for #18216 even though the situation there was slightly different: In that case, the RP was the VRF router and its Register-Stop messages ended up the in the wrong VRF. In our case, the DRs are the VRF routers and the Register messages would end up in the wrong VRF. But our check works for both cases: If we don't receive Register-Stop messages, either the Register or the Register-Stop messages got lost and will trigger an assertion failure.

Note: The last commit is the actual change to check PIM Register. The other commits are just minor cleanup/bugfix changes to the original topotest.

@frrbot frrbot bot added the tests Topotests, make check, etc label Mar 6, 2025
@Jafaral
Copy link
Member

Jafaral commented Mar 6, 2025

@gromit1811 gromit1811 force-pushed the bugfix/pim_vrf_topotest branch from c3a53d4 to cea2017 Compare March 11, 2025 10:23
@gromit1811 gromit1811 marked this pull request as draft March 11, 2025 10:24
@gromit1811
Copy link
Contributor Author

Converted to draft while working on IPv6 support

@gromit1811 gromit1811 force-pushed the bugfix/pim_vrf_topotest branch from cea2017 to 5778a40 Compare March 17, 2025 15:10
@github-actions github-actions bot added size/XL and removed size/M labels Mar 17, 2025
@gromit1811
Copy link
Contributor Author

gromit1811 commented Mar 17, 2025

Added IPv6 support. Open issues, to be fixed before removing "draft" status:

  • Lots of duplicated code, refactor
  • Rebase to current master
  • Register/Register-Stop check sometimes fails with IPv6. Looks like a pim6d issue rather than a test case bug. Analyze
  • We need explicit vrf red and vrf blue statements in r1/pim*d.conf to avoid the following assertion failure: 2025/03/10 17:02:12 PIM: lib/routing_nb_config.c:51: routing_control_plane_protocols_control_plane_protocol_create(): assertion (vrf) failed. An assertion failure is never the correct way to handle config errors, check what the actual problem is.

@gromit1811 gromit1811 force-pushed the bugfix/pim_vrf_topotest branch 2 times, most recently from 1dca706 to cad49e3 Compare March 18, 2025 10:34
Copy link
Member

@Jafaral Jafaral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @gromit1811

Comment on lines 12 to 17
vrf blue
exit-vrf
!
vrf red
exit-vrf
!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree that we should fix this. A more permanent fix for this test is to actually reflect the interface vrf assignment below. I never liked relying on side effects from the kernel to map interfaces to vrfs because the configs don't really tell us much about that. Here is the the vrf info from the test file.

        "ip link set dev r1-eth0 vrf blue up",
        "ip link set dev r1-eth1 vrf blue up",
        "ip link set dev r1-eth2 vrf red up",
        "ip link set dev r1-eth3 vrf red up",

So for interfaces below you would have

interface r1-eth0 vrf blue

That is the correct config that will behave exactly as intended and also reflects the mapping for interfaces to vrfs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's also how I'd write it if I had to write it from scratch.
But the vrf ... / exit-vrf are still required to avoid the assertion failure.
BTW, IIRC, I've seen cases where I had the "interface XXX vrf YYY" in the config and then at runtime, "show run" listed the interfaces but stripped the vrf options again. Don't remember the exact circumstances, but it that's still the case, that probably should get fixed as well...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is probably a race condition. The kernel sends us an new interface message say new eth0. If there is no vrf binding in the kernel for that interface, then it will show without a vrf in the configs, or if the message arrived late.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked this again. It even happens when I interactively configure this using vtysh, so it's unlikely to be a race condition (I don't type that fast 😉)

For instance with FRR from git master 361f80a plus this PR, i.e. post-vrf-fix. System VRF config:

504: ovl2336: <NOARP,MASTER,UP,LOWER_UP> mtu 65575 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 52:49:4a:dd:27:aa brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 1280 maxmtu 65575 
    vrf table 2336 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
root@debian:~/frr# ip -d l sh dev veth0a
512: veth0a@veth0b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovl2336 state UP mode DEFAULT group default qlen 1000
    link/ether 12:36:87:5a:e2:f2 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 
    veth 
    vrf_slave table 2336 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 

-> VRF ovl2336 plus interface veth0a in this VRF.

Trying to configure these interactively in FRR with initial config empty:

debian# show run
Building configuration...

Current configuration:
!
frr version 10.4-dev-my-manual-build
frr defaults traditional
hostname debian
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
end
debian# conf
debian(config)# vrf ovl2336 
debian(config-vrf)# exit-vrf 
debian(config)# interface veth0a vrf ovl2336 
debian(config-if)# ip pim
debian(config-if)# exit
debian(config)# router pim vrf ovl2336 
debian(config-pim)# join-prune-interval 5
debian(config-pim)# end
debian# show run
Building configuration...

Current configuration:
!
frr version 10.4-dev-my-manual-build
frr defaults traditional
hostname debian
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
vrf ovl2336
exit-vrf
!
interface veth0a
 ip pim
exit
!
router pim
 join-prune-interval 5
exit
!
end

-> All my config commands are there, but all vrf ovl2336 arguments have been stripped. Am I doing something wrong? Am I expecting too much?

Copy link
Contributor Author

@gromit1811 gromit1811 Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I open yet another issue for that? And/or add a check to this topotest?

@gromit1811 gromit1811 requested a review from Jafaral March 18, 2025 16:03
Copy link
Member

@Jafaral Jafaral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments. Sorry, for the incremental review, finding time here and there!

Comment on lines +5 to +15
"address":"fe80::XXXX:XXXX:XXXX:XXXX",
"flagMulticast":true,
"flagBroadcast":true,
"lanDelayEnabled":true,
"ff18:100::1":{
"*":{
"source":"*",
"group":"ff18:100::1",
"upTime":"--:--:--",
"expire":"--:--",
"prune":"--:--",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, fields that you don't care about can be omitted completely, and that would simplify the test and make it a tiny bit faster :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without fully understanding what this test is trying to test, I didn't want to deviate too much from the original and just did things the same way as in IPv4.

@Jafaral
Copy link
Member

Jafaral commented Mar 18, 2025

new format suggestion as well.

@Jafaral
Copy link
Member

Jafaral commented Mar 19, 2025

@gromit1811 if you rebase , you no longer need to do the funny vrf business. Fixed on master.

Using "router pim" blocks instead of "ip pim ..." statements. No functional
changes.

Signed-off-by: Martin Buck <mb-tmp-tvguho.pbz@gromit.dyndns.org>
Shouldn't be required, but without this, the "router pim vrf ..." command
introduced in the previous commit crashes with an assertion failure:

2025/03/10 17:02:12 PIM: [G822R-SBMNH] config-from-file# router pim vrf blue
2025/03/10 17:02:12 PIM: lib/routing_nb_config.c:51: routing_control_plane_protocols_control_plane_protocol_create(): assertion (vrf) failed

To be analyzed/fixed later. Once fixed, this commit may be reverted.

Signed-off-by: Martin Buck <mb-tmp-tvguho.pbz@gromit.dyndns.org>
…otest

Only documentation/formatting, no functional changes.

Signed-off-by: Martin Buck <mb-tmp-tvguho.pbz@gromit.dyndns.org>
…test

We use VRF interfaces as loopback interfaces, but they're not real loopback
interfaces, so in contrast to lo, pimd will not automatically use passive
mode for them. So explicitly enable passive mode when adding them to PIM.

Doesn't change results of the topotest, but causes less clutter in captured
PCAP files.

Signed-off-by: Martin Buck <mb-tmp-tvguho.pbz@gromit.dyndns.org>
Check whether PIM Register messages are generated towards the RP and
answered by Register-Stop from the RP. To force generation of PIM Register,
move the DR role from the RP on R11/12 to R1 (otherwise, with DR and RP
being the same routers, no Register messages would be needed). The RPs need
to generate Register-Stop because there are no other group members besides
the ones on the path whether the traffic is received from on the RPs.

Useful as a regression test for FRRouting#18216
even though the situation there was slightly different: In that case, the RP
was the VRF router and its Register-Stop messages ended up the in the wrong
VRF. In our case, the DRs are the VRF routers and the Register messages
would end up in the wrong VRF. But our check works for both cases: If we
don't receive Register-Stop messages, either the Register or the
Register-Stop messages got lost and will trigger an assertion failure.

Signed-off-by: Martin Buck <mb-tmp-tvguho.pbz@gromit.dyndns.org>
@gromit1811 gromit1811 force-pushed the bugfix/pim_vrf_topotest branch from 1c18bec to e2d76b9 Compare March 21, 2025 13:05
@gromit1811
Copy link
Contributor Author

Rebased to get the vrf fix.

Added yet another test case to check interface/router removal. PIM interface removal seems to work properly, but completely disabling PIM (e.g. causing the pimreg interface to disappear) doesn't. Also, "no router pim vrf XXX" doesn't actually remove the PIM section from config, "show run" afterwards still shows it.

I guess that's not expected behavior, right? I'll open an issue for that unless somebody tells me it's OK like this.

@Jafaral
Copy link
Member

Jafaral commented Mar 26, 2025

Rebased to get the vrf fix.

Added yet another test case to check interface/router removal. PIM interface removal seems to work properly, but completely disabling PIM (e.g. causing the pimreg interface to disappear) doesn't. Also, "no router pim vrf XXX" doesn't actually remove the PIM section from config, "show run" afterwards still shows it.

I guess that's not expected behavior, right? I'll open an issue for that unless somebody tells me it's OK like this.

Yeah, that should probably be fixed.

Configure IPv6 addresses in parallel to IPv4, configure OSPFv3 and PIM IPv6
in addition to OSPFv2 and PIM IPv4 and run all test cases sequentially using
first IPv4 and then IPv6.

Signed-off-by: Martin Buck <mb-tmp-tvguho.pbz@gromit.dyndns.org>
Merge test functions for IPv4/IPv6 and both VRFs to reduce insane amount of
duplicated code.

Signed-off-by: Martin Buck <mb-tmp-tvguho.pbz@gromit.dyndns.org>
Explicitly specify the VRFs in each interface context. Also make dynamic RP
configuration vtysh command more readable. Requested by @Jafaral after
review.

Signed-off-by: Martin Buck <mb-tmp-tvguho.pbz@gromit.dyndns.org>
Use unified frr.conf configs instead of individual per-daemon config files
as suggested in review by @Jafaral

Signed-off-by: Martin Buck <mb-tmp-tvguho.pbz@gromit.dyndns.org>
…otest

Check whether removing PIM interfaces or the whole PIM router has the
desired effect (removal from pimd, removal from config).

Signed-off-by: Martin Buck <mb-tmp-tvguho.pbz@gromit.dyndns.org>
Also add a little delay between joining and stream start. This way, we can
reliably reproduce FRRouting#18445.

Signed-off-by: Martin Buck <mb-tmp-tvguho.pbz@gromit.dyndns.org>
@gromit1811 gromit1811 force-pushed the bugfix/pim_vrf_topotest branch from 22d3ed9 to 1577506 Compare March 28, 2025 14:56
@github-actions github-actions bot added the rebase PR needs rebase label Mar 28, 2025
@gromit1811
Copy link
Contributor Author

Yeah, that should probably be fixed.

Not fixed, but at least reported: #18537

@gromit1811
Copy link
Contributor Author

Pushed an update with some pytest mark fixes and a modified join/MC stream start order: Join first, then start MC stream 1 second later (before: start MC stream and then immediately join). With this, #18445 now also occurs for IPv4 ☹️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
master rebase PR needs rebase size/XXL tests Topotests, make check, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants