Skip to content

Mctp bridge support #71

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

faizana-nvidia
Copy link

This PR aims to introduce ALLOCATE_ENDPOINT_ID message support along with MCTP Bridge endpoint into the existing peer structure.

Signed-off-by: Faizan Ali <faizana@nvidia.com>
* Update peer structure to accomodate for Bridge Type Endpoints.
* New Dbus argument in AssignStaticEndpoint dbus method to consider
bridge/pool start eid.
* update existing pytest methods to account for new signature of
  AssignEndpointStatic dbus method for pool start eid

Signed-off-by: Faizan Ali <faizana@nvidia.com>
* New mctpd method to send Allocate Endpoint ID mctp control message
* Timer for MCTP Bridge to settle ( observed it takes sometime for
Bridges like FPGA to allocate downstream eids to endpoints during
which any request on the downstream endpoint fails)
* update AssignEndpoint and AssignEndpointStatic method to consider
Endpoint nature (MCTP bridge based on pool size received)
* Check for presence of contiguous available EIDs of pool size.
* Pytest framework change

> Introduced new Endpoint of MCTP Bridge type (pool size =2)
> New test method to validate endpoint_allocate_eid()
> update existing pytest method to accomodate for new endpoint
  in present network

'''
System contains MCTP Bridge (FPGA) over USB which has
downstream endpoints such as EROTS, IROTS.

MCTP Bridge is one bus usb1-1 populating as mctpusb0 interface.

USBMON :
==================

cat /sys/kernel/debug/usb/usbmon/1u
866e6900 886711482 S Bo:1:002:1 -115 13 = 1ab4000d 010008c8 00800100 0c  -> SET EID 0xc
866e6900 886711719 C Bo:1:002:1 0 13 >
82a05f00 886711737 C Bi:1:002:2 0 15 = 1ab4000f 01080cc0 00000100 120c0f  -> Bridge size 0xf
82a05f00 886711753 S Bi:1:002:2 -115 512 <
866e6200 886712339 S Bo:1:002:1 -115 11 = 1ab4000b 010c08c8 008105  -> GET MSSG TYPE (eid : 0xc)
866e6200 886712449 C Bo:1:002:1 0 11 >
82a05f00 886712461 C Bi:1:002:2 0 14 = 1ab4000e 01080cc0 00010500 017e
82a05f00 886712474 S Bi:1:002:2 -115 512 <
866e6900 886712862 S Bo:1:002:1 -115 11 = 1ab4000b 010c08c8 008203  -> GET UUID (eid : 0xc)
866e6900 886712961 C Bo:1:002:1 0 11 >
82a05f00 886712975 C Bi:1:002:2 0 28 = 1ab4001c 01080cc0 00020300 xxxx xxxx xxxx xxxx
82a05f00 886712993 S Bi:1:002:2 -115 512 <
866e6200 886714375 S Bo:1:002:1 -115 14 = 1ab4000e 010c08c8 00830800 0f0d  -> ALLOCATE EID 0xc, bridge size 0xf, start eid 0xd
866e6200 886714455 C Bo:1:002:1 0 14 >
82a05f00 886714475 C Bi:1:002:2 0 15 = 1ab4000f 01080cc0 00030800 010f0d
82a05f00 886714490 S Bi:1:002:2 -115 512 <
85e6ab80 890712851 S Bo:1:002:1 -115 11 = 1ab4000b 010d08c8 008405 -> GET MSSG TYPE (eid : 0xd)
85e6ab80 890713119 C Bo:1:002:1 0 11 >
82a05f00 890713452 C Bi:1:002:2 0 17 = 1ab40011 01080dc0 00040500 0401057e 7f
82a05f00 890713487 S Bi:1:002:2 -115 512 <
85e6a780 890713984 S Bo:1:002:1 -115 11 = 1ab4000b 010d08c8 008503 -> GET UUID (eid : 0xd)
85e6a780 890714073 C Bo:1:002:1 0 11 >
82a05f00 890714451 C Bi:1:002:2 0 28 = 1ab4001c 01080dc0 00050300 xxxx xxxx xxxx xxxx
82a05f00 890714496 S Bi:1:002:2 -115 512 <
85e6a180 890718152 S Bo:1:002:1 -115 11 = 1ab4000b 010e08c8 008605 -> GET MSSG TYPE (eid : 0xe) (Timedout)
85e6a180 890718331 C Bo:1:002:1 0 11 >
86654980 891215635 S Bo:1:002:1 -115 11 = 1ab4000b 010e08c8 008703 -> GET UUID (eid : 0xe)  (Timedout)
86654980 891215839 C Bo:1:002:1 0 11 >
86654e00 891712957 S Bo:1:002:1 -115 11 = 1ab4000b 010f08c8 008805 -> GET MSSG TYPE (eid : 0xf)
86654e00 891713082 C Bo:1:002:1 0 11 >
82a05f00 891713576 C Bi:1:002:2 0 17 = 1ab40011 01080fc0 00080500 0401057e 7f
82a05f00 891713615 S Bi:1:002:2 -115 512 <
85e6a400 891715076 S Bo:1:002:1 -115 11 = 1ab4000b 010f08c8 008903
85e6a400 891715199 C Bo:1:002:1 0 11 >

MCTPD Traces
=======================
busctl call au.com.codeconstruct.MCTP1 /au/com/codeconstruct/mctp1/interfaces/mctpusb0 au.com.codeconstruct.MCTP.BusOwner1 AssignEndpoint ay 0
yisb 12 1 "/au/com/codeconstruct/mctp1/networks/1/endpoints/12" true

mctpd: read_message got from sockaddr_mctp_ext eid 12 net 1 type 0x00 if 3 hw len 0 0x len 6
mctpd: physaddr if 3 hw len 0 0x requested allocation of pool size = 15
mctpd: emit_endpoint_added: /au/com/codeconstruct/mctp1/networks/1/endpoints/12
endpoint_allocate_eid Asking for contiguous EIDs for pool with start eid : 13
mctpd: endpoint_send_allocate_endpoint_id Allocation was rejected as it was Allocated by other bus
endpoint_send_allocate_endpoint_id Allocated size of 15, starting from EID 13

Bridge Time expired, set pool eids
mctpd: emit_endpoint_added: /au/com/codeconstruct/mctp1/networks/1/endpoints/13
mctpd: Error getting endpoint types for peer eid 14 net 1 phys physaddr if 3 hw len 0 0x state 1. Ignoring error -110 Connection timed out
mctpd: Error getting UUID for peer eid 14 net 1 phys physaddr if 3 hw len 0 0x state 1. Ignoring error -110 Connection timed out

mctpd: emit_endpoint_added: /au/com/codeconstruct/mctp1/networks/1/endpoints/14
mctpd: emit_endpoint_added: /au/com/codeconstruct/mctp1/networks/1/endpoints/15
===============================================================================

mctp addr show
eid 8 net 1 dev mctpusb0

busctl tree au.com.codeconstruct.MCTP1
`- /au
  `- /au/com
    `- /au/com/codeconstruct
      `- /au/com/codeconstruct/mctp1
        |- /au/com/codeconstruct/mctp1/interfaces
        | |- /au/com/codeconstruct/mctp1/interfaces/lo
\        | `- /au/com/codeconstruct/mctp1/interfaces/mctpusb0
        `- /au/com/codeconstruct/mctp1/networks
          `- /au/com/codeconstruct/mctp1/networks/1
            `- /au/com/codeconstruct/mctp1/networks/1/endpoints
              |- /au/com/codeconstruct/mctp1/networks/1/endpoints/12
              |- /au/com/codeconstruct/mctp1/networks/1/endpoints/13
              |- /au/com/codeconstruct/mctp1/networks/1/endpoints/14
              |- /au/com/codeconstruct/mctp1/networks/1/endpoints/15
              |- /au/com/codeconstruct/mctp1/networks/1/endpoints/16
              |- /au/com/codeconstruct/mctp1/networks/1/endpoints/17
              |- /au/com/codeconstruct/mctp1/networks/1/endpoints/18
              |- /au/com/codeconstruct/mctp1/networks/1/endpoints/19
              |- /au/com/codeconstruct/mctp1/networks/1/endpoints/20
              |- /au/com/codeconstruct/mctp1/networks/1/endpoints/21
              |- /au/com/codeconstruct/mctp1/networks/1/endpoints/22
              |- /au/com/codeconstruct/mctp1/networks/1/endpoints/23
              |- /au/com/codeconstruct/mctp1/networks/1/endpoints/24
              |- /au/com/codeconstruct/mctp1/networks/1/endpoints/25
              |- /au/com/codeconstruct/mctp1/networks/1/endpoints/26
              |- /au/com/codeconstruct/mctp1/networks/1/endpoints/27
              |- /au/com/codeconstruct/mctp1/networks/1/endpoints/8

==========================================================================
PYTEST

ninja -C obj test
ninja: Entering directory `obj'
[0/1] Running all tests.
1/1 test-mctpd        OK             21.17s   20 subtests passed

Ok:                 1
Expected Fail:      0
Fail:               0
Unexpected Pass:    0
Skipped:            0
Timeout:            0

'''

Signed-off-by: Faizan Ali <faizana@nvidia.com>
@jk-ozlabs
Copy link
Member

Thanks for the contribution! I'll get to a proper review shortly.

I have some pending changes that rework a lot of the peer, link and network allocation mechanisms. That shouldn't affect your code too much, but I'll request a rebase once that is merged.

@jk-ozlabs jk-ozlabs self-assigned this Apr 25, 2025
@faizana-nvidia
Copy link
Author

Thanks for the contribution! I'll get to a proper review shortly.

I have some pending changes that rework a lot of the peer, link and network allocation mechanisms. That shouldn't affect your code too much, but I'll request a rebase once that is merged.

Sure no problem

Copy link
Member

@jk-ozlabs jk-ozlabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the main design point here is how we're handling the pool allocations. It looks like your particular use-case is around static allocations, which I'll focus on here.

As I mentioned in the dbus changes, we cannot add arguments without further version-compatiblity changes. After a bit of chatting with the team, I think a better approach would be to add a new dbus call to explicitly allocate a bridge and a predefined pool (which would include the pool size). Perhaps something like:

AllocateBridgeStatic(addr: ay, pool_start: y, pool_size: y)
  • where the Set Endpoint ID response must match the expected pool size.

(we would also want purely-dynamic pools to be allocated from SetupEndpoint and friends, but that would result in a dynamic pool allocation. This dynamic pool would be defined either by a toml config option, or via a new TMBO dbus interface. However, we can handle those later, I think)

Would that work?

Comment on lines +187 to +188
mctp_ctrl_cmd_alloc_eid_op alloc_eid_op : 2;
uint8_t : 6;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bitfield members have an implementation-defined layout, so we cannot use them for on-the-wire formats.

I'd suggest using specifically-sized fields (ie. uint8_t here), and handling the mask/shift operations on parse.

(same for `mctp_ctrl_resp_alloc_eid below too)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comments, your ask makes sense to me, have tried to go via similar approach as done in mctp_ctrl_cmd_set_eid where bitfield was used. Nonetheless I will change this and rely more on bitmask/shift operation

@@ -123,16 +123,18 @@ Similar to SetupEndpoint, but will always assign an EID rather than querying for
existing ones. Will return `new = false` when an endpoint is already known to
`mctpd`.

#### `.AssignEndpointStatic`: `ayy` → `yisb`
#### `.AssignEndpointStatic`: `ayyy` → `yisb`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change in the dbus API, which we cannot do here without proper versioning updates.

However, I'm not sure that having the caller be responsible for the pool sizes makes the most sense. I had planned to handle this as part of the mctpd configuration, allowing pools to be defined there. I'm happy to hear arguments for this approach though - what drove this design element?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there's a misunderstanding, the new argument was to take in pool_start-eid while the pool size itself would have been filled based on whatever bridge mentions as its pool size.

About this breaking the existing dbus api, I totally agree with you on this, based on your suggestion I agree may be having a new dbus method for AllocateBridgeStatic would be more convenient and stable way of doing this.

But I do have few questions on the suggested signature
AllocateBridgeStatic(addr: ay, pool_start: y, pool_size: y)

Q1) Do we really need pool_size as an argument here? (I get that your next point on checking this with the response of Set Endpoint ID command, but I don't get if its necessary)

where the Set Endpoint ID response must match the expected pool size.
Q2) If we go with seperate dbus method for Allocating EIDs, we would need own bridge EID as an argument as well.
AllocateBridgeStatic(addr: ay, bridge_eid: y, pool_start: y)

But this would again lead us to split the method (Static and Dynamic) to cover dynamic EID allocation as well.
I presume this is what you meant from

(we would also want purely-dynamic pools to be allocated from SetupEndpoint and friends, but that would result in a dynamic pool allocation. This dynamic pool would be defined either by a toml config option, or via a new TMBO dbus interface. However, we can handle those later, I think)

i.e for dynamic pool allocation we would prefer SetupEndpoint and AssignEndpoint method? Please correct me if I misunderstood you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q2) If we go with seperate dbus method for Allocating EIDs, we would need own bridge EID as an argument as well.

Yes, good point. I had intended to include this, but missed it in the prototype.

Q1) Do we really need pool_size as an argument here?

Yes, I think so.

With the static strategy that you're proposing here, the general interaction model will be that some external process (say, OpenBMC's Entity Manager) is taking responsibility for controlling the EID space. With the current approach (of just specifying the start), there is split responsibility: the bridge itself now controls part of the allocation scheme. To provide a consistent approach to the allocations, we would need control of the pool size so that we're not at risk of creating an overlapping pool with some other (future) bridge pool allocation.

We probably have a choice about what to do in case those pool requests do not match (ie, the call argument vs. the Set Endpoint ID response) - either refusing the pool allocation entirely, or truncating the pool to the smaller of those values. I think the latter would be reasonable.

But this would again lead us to split the method (Static and Dynamic) to cover dynamic EID allocation as well.

We don't really need to split it - the existing calls (SetupEndpoint, AssignEndpoint and possibly AssignEndpointStatic) would just gain support for the pool allocation as well - depending on what the device requests from the Set Endpoint ID response. In those cases, if a pool is requested, it would be completely dynamic, and we would serve that from a per-whole-network EID pool (configurable, via the toml file).

How does that sound?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sounds feasible and reasonable to me (should have thought about case when ask won't be of utilizing full bridge size but a short chunk of it, giving better control to upper applications like dbus sensors etc.). So we'll have a poll from start eid to pool size asked, to check presence of device on the network. But in case we are going with dynamic eid allocation, since pool size will be relied on what whatever bridge has as capacity, we'll utilizing all the eids from the pool.

Whatever pool size we ask either it would be more than bridge capability in which case, it would assign only to its max capability thus truncating the value and us getting notified via response of Allocate endpoint ID. On contrary, if asks is less than max capability then bridge would do it since we'll be updating the request with required size.

Comment on lines +295 to +298
if (self.pool_size):
data = bytes(hdr + [0x00, 0x01, self.eid, self.pool_size])
else:
data = bytes(hdr + [0x00, 0x00, self.eid, 0x00])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: just make that second byte conditional, not the whole data list

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledge it, will update it

@@ -2173,6 +2184,11 @@ static int method_assign_endpoint_static(sd_bus_message *call, void *data,
}
dfree(peer_path);

/* MCTP Bridge Support */
if(peer->pool_size > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: formatting here should be:

    if (peer->pool_size > 0) {

(with the space after the if)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, since we are on this, what clang-format are we following here? Any steps on how I could run it locally. Would come in handy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a full .clang-format definition, but we're roughly doing kernel format here. If it helps, I can put an actual config together, but that then we'd need a lot of churn in applying it.

So, that's probably better something done just before release, to avoid to much merge collisions.

Or, if you're OK with a bit of churn, I can do that beforehand?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sold for before release part :)

peer->pool_size = allocated_pool_size;
peer->pool_start = allocated_pool_start;

//delay for settling MCTP Bridge after allocation of downstream eid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that we issue the Allocate Endpoint IDs immediately. If the bridge needs some time to settle, we can implement that with a retry.

For your device, what do you see when you attempt to do the Allocate immediately? Does the command fail, or something else?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me highlight the issue better, it's not that Allocate Endpoint ID is getting delayed (we are sending it as part of sequence right after bridge eid is assigned), but post allocate endpoint id, referring to my setup, I see no response from the downstream EIDs, basically timeout for GET_MESSG_TYPE and GET_UUID mctp commands on downstream eids.

On topic of retry mechanism, in my setup based on observational data, find that it takes around 4 sec for all eids to be in proper state after receiving ALLOCATE_EID but this also depends on the bus topology and how many downstream endpoints are there.

That being said it could easily be something which could become a common issue for Bridge's owing to their network congestion, resource congestion etc.
A retry mechanism would then be needed as general part of design for messages like GET_MESSG_TYPE and GET_UUID, which is currently missing implementation.

Would like to explore any other suggestion beyond retry mechanism, if possible :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me highlight the issue better, it's not that Allocate Endpoint ID is getting delayed (we are sending it as part of sequence right after bridge eid is assigned), but post allocate endpoint id, referring to my setup, I see no response from the downstream EIDs, basically timeout for GET_MESSG_TYPE and GET_UUID mctp commands on downstream eids.

ah, gotchya!

Yeah, this isn't a trivial problem, and I've been thinking of potential approaches for a while.

My current leaning is that once an endpoint pool is assigned, mctpd would perform some sort of polling to determine which of the pool's EIDs are active (ie, have been assigned to a device, and that device is responding).

We could potentially do this via a Get Routing Table Entries command to the bridge, but that's a bit of a mis-use of the command, and isn't necessarily returning the data we want (the bridge may have routing information to a device, but the device may not be present).

So, I think the most reliable approach is that the pool's EIDs are periodically polled using a Get Endpoint ID command - which is the spec-recommended method of detecting endpoint presence, and is what we use for the existing recovery process. We may be able to share some code with the recover infrastructure for this too.

We would schedule those polls when the pool is allocated, and then repeat until we detect a device - at which point we would perform full enumeration. If no device is ever present, mctpd would end up polling forever, but I don't think there's an alternative there.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about doing a GetRoutingTableEntries, the bridge should return ERROR_NOT_READY if it is still assigning EIDs downstream from the previous AllocateEndpointID. Once we get a successful response, we can start querying EIDs downstream for UUIDs and message types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the bridge should return ERROR_NOT_READY if it is still assigning EIDs downstream from the previous AllocateEndpointID

that's not necessarily what the bridge will do though - there's no concept of the routing state being "complete" at any point (eg, for a downstream port that may have an enpoint hotplugged at some point in the future (or perhaps never!))

For that reason, I don't think we'll be able to avoid polling.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one downside of this I feel (if we don't go via GetRoutingTable implementation (have a change list ready but nevermind)) is we'll be ending up utilizing the eids from the pool (specially for Dynamic eid allocation based on bridge pool size) which is a bit concerning if the network becomes complex.

Take RoutingTable approach makes us definitive about what eids needs to be consumed from the pool (minimizing the utilization, based on available device downstream).

Contains one interface (lladdr 0x10, local EID 8), and one endpoint (lladdr
0x1d), that reports support for MCTP control and PLDM.
Contains two interface (lladdr 0x10, local EID 8), (lladdr 0x11, local EID 10) with one endpoint (lladdr
0x1d), and MCTP bridge (lladdr 0x1e, pool size 2), that reports support for MCTP control and PLDM.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't alter the default sysnet unless it's needed by a significant number of tests (which in this case, it is not). Just set up the test fixtures default as needed for your new test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so better to update the current interface (lladdr 0x10, local EID 8) with pool size and update pool size numbered eids to the network simulating a bridge rather than creating a new one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Just update the interface/endpoints/etc in the individual test case.

@jk-ozlabs
Copy link
Member

In general, can you add a bit more of an explanation / rationale as part of your commit messages, instead of just log output? There is some good guidance for commit messages up in the "Patch formatting and changelogs" section of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/5.Posting.rst

@jk-ozlabs
Copy link
Member

We'll also need to consider the routing setup for bridged endpoints. Ideally we would:

  1. create a route for the bridge itself, plus a neighbour entry with the appropriate physical address data
  2. create a range route for the allocated endpoint pool, using the bridge as a gateway for that range (ie, no neighbour entry)

the issue is that there is no kernel support for (2) at present: we need some kernel changes to implement gateway routes. It is possible to create "somewhat-fake" routes for those endpoints, using a neighbour table entry for each (bridged) peer that uses the bridge phys address, but that's a bit suboptimal. I'd prefer not to encode that hack into mctpd if possible.

I do have a todo for the kernel changes necessary for that, sounds like I should get onto it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants