-
Notifications
You must be signed in to change notification settings - Fork 3
Ingest ereports from SPs #370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit refactors `gateway_sp_comms::SharedSocket` to make the received message handler a generic trait. This way, the `SharedSocket` type and its associated machinery for discovering SPs and forwarding received messages to per-SP handlers can be used for the ereport ingestion socket as well as for `control-plane-agent` messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look particularly closely at the ereport parsing etc details (I'll defer that to you and Cliff, if that's okay). The structural MGS changes look good; just a handful of nits and questions.
recv_handler_task: JoinHandle<()>, | ||
log: Logger, | ||
} | ||
|
||
impl Drop for SharedSocket { | ||
// Hand-rolled `Debug` impl as the message type (`T`) needn't be `Debug` for the | ||
// `SharedSocket` to be debug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine, but I'll plug https://docs.rs/derive-where/latest/derive_where/ since we use it in omicron, if you want to pull it in here too.
@@ -282,15 +284,18 @@ details in `dump.json`." | |||
pub struct SingleSp { | |||
interface: String, | |||
cmds_tx: mpsc::Sender<InnerCommand>, | |||
ereport_req_tx: mpsc::Sender<ereport::WorkerRequest>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not do this as a part of this PR, but I'm curious for your thoughts: there are a bunch of places like this one where we're going from one "thing" (in this case, an mpsc::Sender
) to two "things". The original "one" is always named generically, and the new one is named indicating it's related to ereports. Do you think we should go back and rename the generic things to indicate they're intended for control-plane-agent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it might be worthwhile to do that (especially if we ever add a third port to the management network...). You're right that I just didn't really want to touch all the control-plane-agent code in this PR, but I'd be happy to do it in a separate commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No argument from me. Do you think it would be clearer to move the control-plane-agent stuff to its own submodule too? (Not as part of this PR, but maybe alongside the renamings I suggested in another comment?)
i thought i had gotten rid of these...
as per [this comment][1] from @jgallagher. this is similar to the control-plane-agent protocol. wow, it's almost like we're reimplementing TCP (but without flow control because thats hard). [1]: #370 (comment)
this way they get logged as UUIDs, which is nicer, as it matches how the control plane formats them
per today's chat with @cbiffle. this is to prevent a malformed ereport from wrecking the rest of the packet if it contains a break byte (0xff) or similar.
this way, if we encounter an individual ereport that's malformed, we decode the rest of the packet and let Nexus decide what to do.
this will make life easier for upstack software
this moves the message types out of `gateway-messages` (used by the `control-plane-agent` task) to their own crate, so that the snitch task can use them. i've also switched from hubpack serialization to `zerocopy`, as this is what the snitch is using. this should permit sharing the type defs between the SP and MGS. note that i've also updated `zerocopy` from v0.6 to v0.8 here, as (AFAICT) the older version doesn't know how to do fallible zerocopy from-bytes conversions for enums. i'm happy to land the version update separately.
This branch updates our `zerocopy` dependency from v0.6.x to v0.8.x. I initially made the upgrade as I wanted to use some new `zerocopy` features in #370, but have factored it out to land separately. Of course, we'll need to update Hubris and MGS, as well. Note that `zerocopy`'s `read_from_prefix` now returns the rest of the buffer, making some code a little bit simpler where we were previously doing that manually. Other than that, there's not a lot to this change, besides deriving some additional marker traits (`Immutable` and `KnownLayout`).
This pull request implements the MGS side of the SP ereport ingestion
protocol. For more information on the ereport ingestion protocol, refer
to the following RFDs:
In particular, this branch makes the following changes:
gateway-messages
representing the ereport protocol wiremessages exchanged between MGS and the SP; these are defined in
RFD 545.
shared_socket
module ingateway-sp-comms
. Currently, theSharedSocket
code for handlingreceived packets is tightly coupled to the control plane agent message
types. Ereport requests and responses are sent on a separate UDP port.
Therefore, I've hacked up this code a bit to allow
SharedSocket
tobe generic over a
RecvHandler
trait that defines how to handlereceived packets and dispatch them to single-SP handlers. This is
implemented for both the control-plane-agent protocol and, separately,
for the ereport protocol.
code for decoding ereport packets and a per-SP worker task that tracks
the metadata sent by the SP and adds it to each batch of ereports.
A corresponding Omicron branch, oxidecomputer/omicron#7903, depends on
this branch and integrates the ereport code into the MGS app binary and
the SP simulator.