-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add field_behavior=REQUIRED when it makes sense #64
Conversation
protos/sigstore_bundle.proto
Outdated
@@ -55,15 +55,15 @@ message VerificationMaterial { | |||
} | |||
// This is the inclusion promise and/or proof, where | |||
// the timestamp is coming from the transparency log. | |||
repeated dev.sigstore.rekor.v1.TransparencyLogEntry tlog_entries = 3; | |||
repeated dev.sigstore.rekor.v1.TransparencyLogEntry tlog_entries = 3 [(google.api.field_behavior) = REQUIRED]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There’s no guarantee an entry is added to a tlog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that is interesting. Technically speaking, one can query the log to check the presence, however, it would be nice if the inclusion proof was attached in the first place.
I wonder if it makes sense to add sigstore-specific annotations like 'required for offline bundle'
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There’s still the case where no tlog is used if you’re using only a signed time stamp (in cases of private Sigstore instances). I think we would end up needing multiple annotations, for example how to verify with a timestamp, how to verify with a tlog, how to verify with both. That may be better to have in comments or a part of a reference implementation of verification logic.
This is a follow-up to sigstore#41 Signed-off-by: Vladimir Sitnikov <sitnikov.vladimir@gmail.com>
Nice, thanks for adding these! |
Can somebody clarify what is the point of storing and comparing the generated code under source control? |
At least for Go, this is the official way to distribute the Go package, as there are no built packages in Go, the dependency is against a source code repository. For the other languages, it's to make sure that the code generates properly I would say, to catch an error where the proto files are erroneous. |
Can it be generated into a Frankly speaking, I am not in a position to review and/or propose changes on my behalf for binary changes like https://github.com/sigstore/protobuf-specs/actions/runs/3981004986/jobs/6831243603#step:5:459 |
I assume you have run |
Frankly speaking, I have not. |
I'm in agreement with you that this is really annoying, sorry about that.
TBH this is mostly for Go, because Go is extremely opinionated about the release process. We could push to a release branch but that's out-of-sync with how everything else gets released. I don't think there's a really good reason for the other languages to do it; if the goal was to make sure
We have workflows to ensure that they remain in sync, so you can safely ignore these.
Provided you have |
The same goes for just My wild guess is it might be caused by the fact I use Apple's M1. If I change the platform to
|
@vlsi That is interesting, I have never seen any issues with that. I'm on macOS/Arm and haven't seen any issue. I'll reach out to you on sigstore slack. |
Weird. I am able to do this on an m1 Mac, running macOS with no issues.
This suggests to me that the If you run:
It should drop you in a shell. Do you see your If you'd rather not debug this for now, I'm happy to push the generated code for you—just let me know. We can open an issue to discuss changing this workflow as well. |
Signed-off-by: Fredrik Skogman <kommendorkapten@github.com>
protos/sigstore_rekor.proto
Outdated
// The inclusion proof can be used for online verification that the | ||
// entry was appended to the log, and that the log has not been | ||
// altered. | ||
InclusionProof inclusion_proof = 6; | ||
InclusionProof inclusion_proof = 6 [(google.api.field_behavior) = REQUIRED]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing my second round, I think we don't need the inclusion proof marked as REQUIRED
, as this would only be used for online verifications, which are dictated by the policy, not the bundle itself. So I would remove `REQUIRED here.
cc: @haydentherapper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kommendorkapten , I'm inclined that the commend should read "The inclusion proof can be used for offline verification" rather than "online verification". WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inclusion proof requires that the client performs a request against the transparency log, so it's an online operation. The inclusion promise (i.e verifying the SET) is what can be done offline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of having InclusionProof inclusion_proof
then?
I thought the values in InclusionProof inclusion_proof
enable offline
verification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The the inclusion promise, is just that, it's a promise from the log to include the entry on the log. And it's basically a signature over the entry, together with a timestamp from the log. If a user trust the log, this can be enough to verify that artifact, as the log promises it will be put on the log. (A technical detail is that now the entry is added to the log synchronously, but this may not hold in the future, as there a plans to start to update the tree in batches, or exposing a batch API.)
The inclusion proof is a bit more complicated, as it depends on the state of the Merkle tree during the time the entry was inserted, and this can then be used at a later time to verify the consistency of the tree, i.e verifying that the tree has not been modified since the the entry was added (i.e the entry has not been removed from the tree) by requesting the an consistency proof via this endpoint https://www.sigstore.dev/swagger/#/tlog/getLogProof
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make both optional. At least one should be required for this message, but there's no way to easily represent this.
Currently Rekor returns both an inclusion proof and promise always, but I think this behavior shouldn't be reflected in the specs. Future iterations of the log may choose to return either a promise or an inclusion proof or both, but I don't think we should guarantee that a promise is always set.
Something else to consider is that the log index may not be set for an inclusion promise, because the entry has yet to be admitted into the log. Now, once again the current behavior of Rekor is to always put an entry into the log immediately, but this may not always be the case. I'd rather have clients handle it as optional for now, it'll be less burden in the future if we do make it optional. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean that if Rekor omits the proof, then offline verification would not be possible?
By the way, is there something like 'caching Rekor proxy' like there are proxy servers for artifact repositories?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what's tricky here is there is what Rekor does currently, and what I want Rekor to do in the future. So my first question is, should the protos reflect the current state of the service, or the ideal state?
Slight aside, right now a promise is simply a signed statement from Rekor. Even with a set of hashes to calculate an inclusion proof, there is no guarantee that an entry actually exists in Rekor without doing an online lookup. This is because you need to verify consistency (that the log is still append-only), which would require an online lookup. I've been thinking about how to improve this, but we're awhile away from being able to have stronger offline verification.
With what we've got right now, a client could "verify" offline using either the hashes from the proof or the promise, they're pretty much equivalent. Currently Rekor will return both because of its design, but technically it could omit one or the other - A promise would be omitted when a proof is requested (because if a proof can be generated, why need a promise?), or a proof would be omitted when only a promise can be generated if the entry is actively being uploaded (Rekor operates in a batched mode, but the batch size = 1 currently, so entries are immediately uploaded)
So I don't really know the best way to handle this for whether it should be required or optional. I'd lean towards being permissive and allow either the proof or promise or both, but not require that a promise is always present.
For proxies, we've chatted a little bit about Rekor read-only replicas. I see this as a requirement for being able to scale out Rekor. This is definitely something I want to do, we just need to scope it out and figure out technical blockers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haydentherapper I think this is good as-is now, as it reflects the current way Rekor works?
As there are more changes expected in Rekor (SET v2), omitting the log index I think we can wait with them until Rekor is updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I'm afraid one more re-generation is needed. |
Signed-off-by: Vladimir Sitnikov <sitnikov.vladimir@gmail.com>
Signed-off-by: Vladimir Sitnikov <sitnikov.vladimir@gmail.com>
Signed-off-by: Fredrik Skogman <kommendorkapten@github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
Co-authored-by: Hayden B <hblauzvern@google.com> Signed-off-by: Vladimir Sitnikov <sitnikov.vladimir@gmail.com>
@@ -63,7 +63,7 @@ message VerificationMaterial { | |||
message Bundle { | |||
// MUST be application/vnd.dev.sigstore.bundle+json;version=0.1 | |||
// when encoded as JSON. | |||
string media_type = 1; | |||
string media_type = 1 [(google.api.field_behavior) = REQUIRED]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly irrelevant, but is anybody familiar with what this annotation implies about non-JSON-serialized protobufs? My understanding is that media_type
is only sensible when encoded as JSON, so it may not make sense to set it as REQUIRED
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try it out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I was digging around a bit, and was not able to get a program to fail marshalling to either binary or JSON when the preconditions where not met (i.e some REQUIRED
fields was not present). This led me to the documentation where I found this:
Note: The vocabulary given in this document is for descriptive purposes only, and does not itself add any validation. The purpose is to consistently document this behavior for users.
So it any developer would still need to perform any validation manually it seems. The generated (Go) code does not seem to include any annotations for JSON marshal/unmarshal that would imply that a given field is REQUIRED
or not.
Maybe this is old news to everyone, but it was not for me (my previous example is to rely on optional
keywords and treat anything else as REQUIRED
).
But to go back to @woodruffw 's question. I think it still make sense to include the media_type
even when binary marshalling is being conducted, as we may not know what a downstream consumer is intending to do with the message. Maybe there is a service pulling the data from a central registry and serializing to disk (via JSON) for further consumption by a different service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haydentherapper you got any different experiences with REQUIRED
fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct, we need to find a library for enforcing this (I know of the internal google one, not sure what’s available for OSS)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I think it's weird that we have media_type
in the proto at all. It's a property of the serialization, not the underlying data structure. I think it would make more sense if we could hook into the JSON marshal step and add it there. Or ask clients to add it after the JSON marshalling (though it'd be very easy to forget).
That said, I know we can't so we're kinda stuck. I still think it's wrong to set it unless you set it right before JSON serialization, especially if you're not intending on doing that directly. In practice, it'll probably just always be set to the JSON value so I won't fight super hard here, but I don't think it should be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it were a fixed string? Or at least an enum? That maybe prevents clients or clarifies usage in code that it must be certain values.
At the same time, I do want it as a type hint. So yeah, maybe not required, but otherwise I don't know how to package this unless it's inside another wrapped proto with (payload=bundle, media_type)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed -- IMO it's not ideal, but the alternatives (hooking into the marshalling, or doing it after marshalling) are also probably more error prone/more likely to be forgotten due to a lack of controls.
I'm loathe to add yet another layer of schematization here, but maybe what we really need is a JSON Schema definition that gets generated from these protobuf definitions. That schema could then include (and enforce) the requirement hints.
Just from a brief look, it seems like https://github.com/chrusty/protoc-gen-jsonschema might do the trick, and has support for their own required
annotation to mark fields as non-optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good ideas...but I think this is a case where "done" is more important than "right" so we should just pick something and do it.
Maybe for now let's leave it required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me. This is definitely not something that should block any other work here, and I can pursue it as an independent follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM mod one comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My vote is to err on the side of not marking things required for now.
But I suppose an error either way doesn't really hurt here, since it's not really enforced in proto and these things being missing should cause other verification errors anyway.
@@ -63,7 +63,7 @@ message VerificationMaterial { | |||
message Bundle { | |||
// MUST be application/vnd.dev.sigstore.bundle+json;version=0.1 | |||
// when encoded as JSON. | |||
string media_type = 1; | |||
string media_type = 1 [(google.api.field_behavior) = REQUIRED]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I think it's weird that we have media_type
in the proto at all. It's a property of the serialization, not the underlying data structure. I think it would make more sense if we could hook into the JSON marshal step and add it there. Or ask clients to add it after the JSON marshalling (though it'd be very easy to forget).
That said, I know we can't so we're kinda stuck. I still think it's wrong to set it unless you set it right before JSON serialization, especially if you're not intending on doing that directly. In practice, it'll probably just always be set to the JSON value so I won't fight super hard here, but I don't think it should be required.
@@ -69,7 +70,7 @@ message CertificateAuthority { | |||
// that are allowed. | |||
message TrustedRoot { | |||
// MUST be application/vnd.dev.sigstore.trustedroot+json;version=0.1 | |||
string media_type = 1; | |||
string media_type = 1 [(google.api.field_behavior) = REQUIRED]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same media type question here.
This is good I think. As we have I think the state is now when some fields are marked as required and some not (which are) is a bit confusing. So either we make a "best effort" like this PR, or remove them all together until we can offer a good way to programatically verify a message. edit: s/now/not/ |
Where did we leave off with this PR? Are we going to proceed with this and cut a new version (as its a breaking change)? |
I'm +1 for getting this merged -- I agree that these annotations have limited value on the programmatic side, but they do have some value on the design/engineering side (in terms of visually confirming my intuition that various fields actually are required). In particular, it's nice to have a confirmation that the SET and log entry body are required, while the inclusion proof (intuitively) isn't! CC @tnytown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok too with merging this.
We don't need to cut a breaking change, we can do a patch update as this is only clarifications and does not change the original intent.
edit: sentence missed a crucial word, don't, thanks @feelepxyz for pointing that out 🚀
@kommendorkapten to clarify, you mean we don't need to do a major version/breaking change release? |
#79 partially overlapped with this, so we'll need to rebase/deconflict here 🙂 |
Closing since we've made a lot of fields required already. |
Summary
This is a follow-up to #41
Release Note
More protobuf fields are marked with field_behavior=REQUIRED
Documentation
NONE