Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add field_behavior=REQUIRED when it makes sense #64

Closed
wants to merge 6 commits into from

Conversation

vlsi
Copy link

@vlsi vlsi commented Jan 22, 2023

Summary

This is a follow-up to #41

Release Note

More protobuf fields are marked with field_behavior=REQUIRED

Documentation

NONE

@@ -55,15 +55,15 @@ message VerificationMaterial {
}
// This is the inclusion promise and/or proof, where
// the timestamp is coming from the transparency log.
repeated dev.sigstore.rekor.v1.TransparencyLogEntry tlog_entries = 3;
repeated dev.sigstore.rekor.v1.TransparencyLogEntry tlog_entries = 3 [(google.api.field_behavior) = REQUIRED];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s no guarantee an entry is added to a tlog.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that is interesting. Technically speaking, one can query the log to check the presence, however, it would be nice if the inclusion proof was attached in the first place.

I wonder if it makes sense to add sigstore-specific annotations like 'required for offline bundle'

WDYT?

Copy link
Collaborator

@haydentherapper haydentherapper Jan 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s still the case where no tlog is used if you’re using only a signed time stamp (in cases of private Sigstore instances). I think we would end up needing multiple annotations, for example how to verify with a timestamp, how to verify with a tlog, how to verify with both. That may be better to have in comments or a part of a reference implementation of verification logic.

This is a follow-up to sigstore#41

Signed-off-by: Vladimir Sitnikov <sitnikov.vladimir@gmail.com>
@kommendorkapten
Copy link
Member

Nice, thanks for adding these!

@vlsi
Copy link
Author

vlsi commented Jan 23, 2023

Can somebody clarify what is the point of storing and comparing the generated code under source control?

@kommendorkapten
Copy link
Member

Can somebody clarify what is the point of storing and comparing the generated code under source control?

At least for Go, this is the official way to distribute the Go package, as there are no built packages in Go, the dependency is against a source code repository.

For the other languages, it's to make sure that the code generates properly I would say, to catch an error where the proto files are erroneous.

@vlsi
Copy link
Author

vlsi commented Jan 23, 2023

At least for Go, this is the official way to distribute the Go package, as there are no built packages in Go, the dependency is against a source code repository.

Can it be generated into a release or whatever branch?

Frankly speaking, I am not in a position to review and/or propose changes on my behalf for binary changes like https://github.com/sigstore/protobuf-specs/actions/runs/3981004986/jobs/6831243603#step:5:459

@kommendorkapten
Copy link
Member

I assume you have run make locally prior to pushing?

@vlsi
Copy link
Author

vlsi commented Jan 23, 2023

Frankly speaking, I have not.

@vlsi vlsi mentioned this pull request Jan 23, 2023
@znewman01
Copy link
Contributor

Can somebody clarify what is the point of storing and comparing the generated code under source control?

I'm in agreement with you that this is really annoying, sorry about that.

At least for Go, this is the official way to distribute the Go package, as there are no built packages in Go, the dependency is against a source code repository.

For the other languages, it's to make sure that the code generates properly I would say, to catch an error where the proto files are erroneous.

TBH this is mostly for Go, because Go is extremely opinionated about the release process. We could push to a release branch but that's out-of-sync with how everything else gets released.

I don't think there's a really good reason for the other languages to do it; if the goal was to make sure protoc is successful, we can just do that in a workflow.

Frankly speaking, I am not in a position to review and/or propose changes on my behalf for binary changes like https://github.com/sigstore/protobuf-specs/actions/runs/3981004986/jobs/6831243603#step:5:459

We have workflows to ensure that they remain in sync, so you can safely ignore these.

I assume you have run make locally prior to pushing?
Frankly speaking, I have not.

Provided you have docker installed, all you should need to do for an update is to run make.

@vlsi
Copy link
Author

vlsi commented Jan 23, 2023

% make all
Building development docker image
docker build -t protobuf-specs-build .
[+] Building 0.1s (9/9) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                                                                      0.0s
 => => transferring dockerfile: 37B                                                                                                                                                                                                       0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                         0.0s
 => => transferring context: 2B                                                                                                                                                                                                           0.0s
 => [internal] load metadata for docker.io/namely/protoc-all@sha256:07f1ba9dbe11f5675e2efc8617c9552217dc4c3eb5ccd108f7c3889878dbae50                                                                                                      0.0s
 => [internal] load build context                                                                                                                                                                                                         0.0s
 => => transferring context: 41B                                                                                                                                                                                                          0.0s
 => [1/4] FROM docker.io/namely/protoc-all@sha256:07f1ba9dbe11f5675e2efc8617c9552217dc4c3eb5ccd108f7c3889878dbae50                                                                                                                        0.0s
 => CACHED [2/4] RUN set -ex &&     apt-get update &&     apt-get install -y --no-install-recommends     python3-pip                                                                                                                      0.0s
 => CACHED [3/4] COPY ./dev-requirements.txt /tmp/                                                                                                                                                                                        0.0s
 => CACHED [4/4] RUN python3 -m pip install --upgrade pip &&     python3 -m pip install --requirement /tmp/dev-requirements.txt                                                                                                           0.0s
 => exporting to image                                                                                                                                                                                                                    0.0s
 => => exporting layers                                                                                                                                                                                                                   0.0s
 => => writing image sha256:5a2f6273d08f8049e3c2c1f49c12bfed460cf2a1cb4f1eb9ad693e338c08dc98                                                                                                                                              0.0s
 => => naming to docker.io/library/protobuf-specs-build                                                                                                                                                                                   0.0s
Generating go protobuf files
docker run --platform linux/amd64 -v /Users/.../protobuf-specs:/defs protobuf-specs-build -d protos -l go --go-module-prefix github.com/sigstore/protobuf-specs/gen/pb-go
/usr/bin/find: 'protos': No such file or directory
make: *** [go] Error 1

The same goes for just make.

My wild guess is it might be caused by the fact I use Apple's M1.

If I change the platform to linux/arm64/v8, the error message is different:

Generating go protobuf files
docker run --platform linux/arm64/v8 -v /Users/.../protobuf-specs:/defs protobuf-specs-build -d protos -l go --go-module-prefix github.com/sigstore/protobuf-specs/gen/pb-go
Unable to find image 'protobuf-specs-build:latest' locally
docker: Error response from daemon: pull access denied for protobuf-specs-build, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.
See 'docker run --help'.
make: *** [go] Error 125

@kommendorkapten
Copy link
Member

@vlsi That is interesting, I have never seen any issues with that. I'm on macOS/Arm and haven't seen any issue. I'll reach out to you on sigstore slack.

@znewman01
Copy link
Contributor

Weird. I am able to do this on an m1 Mac, running macOS with no issues.

/usr/bin/find: 'protos': No such file or directory

This suggests to me that the protos directory isn't getting mounted into your running container environment. That's what -v /Users/.../protobuf-specs:/defs is supposed to do.

If you run:

docker run -it --platform linux/amd64 -v $PWD:/defs --entrypoint /bin/bash protobuf-specs-build 

It should drop you in a shell. Do you see your protobuf-specs dir mounted if you type ls? Specifically, you should see a protos directory.


If you'd rather not debug this for now, I'm happy to push the generated code for you—just let me know. We can open an issue to discuss changing this workflow as well.

Signed-off-by: Fredrik Skogman <kommendorkapten@github.com>
// The inclusion proof can be used for online verification that the
// entry was appended to the log, and that the log has not been
// altered.
InclusionProof inclusion_proof = 6;
InclusionProof inclusion_proof = 6 [(google.api.field_behavior) = REQUIRED];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing my second round, I think we don't need the inclusion proof marked as REQUIRED, as this would only be used for online verifications, which are dictated by the policy, not the bundle itself. So I would remove `REQUIRED here.
cc: @haydentherapper

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kommendorkapten , I'm inclined that the commend should read "The inclusion proof can be used for offline verification" rather than "online verification". WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inclusion proof requires that the client performs a request against the transparency log, so it's an online operation. The inclusion promise (i.e verifying the SET) is what can be done offline.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of having InclusionProof inclusion_proof then?
I thought the values in InclusionProof inclusion_proof enable offline verification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The the inclusion promise, is just that, it's a promise from the log to include the entry on the log. And it's basically a signature over the entry, together with a timestamp from the log. If a user trust the log, this can be enough to verify that artifact, as the log promises it will be put on the log. (A technical detail is that now the entry is added to the log synchronously, but this may not hold in the future, as there a plans to start to update the tree in batches, or exposing a batch API.)

The inclusion proof is a bit more complicated, as it depends on the state of the Merkle tree during the time the entry was inserted, and this can then be used at a later time to verify the consistency of the tree, i.e verifying that the tree has not been modified since the the entry was added (i.e the entry has not been removed from the tree) by requesting the an consistency proof via this endpoint https://www.sigstore.dev/swagger/#/tlog/getLogProof

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make both optional. At least one should be required for this message, but there's no way to easily represent this.

Currently Rekor returns both an inclusion proof and promise always, but I think this behavior shouldn't be reflected in the specs. Future iterations of the log may choose to return either a promise or an inclusion proof or both, but I don't think we should guarantee that a promise is always set.

Something else to consider is that the log index may not be set for an inclusion promise, because the entry has yet to be admitted into the log. Now, once again the current behavior of Rekor is to always put an entry into the log immediately, but this may not always be the case. I'd rather have clients handle it as optional for now, it'll be less burden in the future if we do make it optional. Thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that if Rekor omits the proof, then offline verification would not be possible?

By the way, is there something like 'caching Rekor proxy' like there are proxy servers for artifact repositories?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's tricky here is there is what Rekor does currently, and what I want Rekor to do in the future. So my first question is, should the protos reflect the current state of the service, or the ideal state?

Slight aside, right now a promise is simply a signed statement from Rekor. Even with a set of hashes to calculate an inclusion proof, there is no guarantee that an entry actually exists in Rekor without doing an online lookup. This is because you need to verify consistency (that the log is still append-only), which would require an online lookup. I've been thinking about how to improve this, but we're awhile away from being able to have stronger offline verification.

With what we've got right now, a client could "verify" offline using either the hashes from the proof or the promise, they're pretty much equivalent. Currently Rekor will return both because of its design, but technically it could omit one or the other - A promise would be omitted when a proof is requested (because if a proof can be generated, why need a promise?), or a proof would be omitted when only a promise can be generated if the entry is actively being uploaded (Rekor operates in a batched mode, but the batch size = 1 currently, so entries are immediately uploaded)

So I don't really know the best way to handle this for whether it should be required or optional. I'd lean towards being permissive and allow either the proof or promise or both, but not require that a promise is always present.

For proxies, we've chatted a little bit about Rekor read-only replicas. I see this as a requirement for being able to scale out Rekor. This is definitely something I want to do, we just need to scope it out and figure out technical blockers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haydentherapper I think this is good as-is now, as it reflects the current way Rekor works?

As there are more changes expected in Rekor (SET v2), omitting the log index I think we can wait with them until Rekor is updated?

Copy link
Member

@kommendorkapten kommendorkapten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@vlsi
Copy link
Author

vlsi commented Jan 23, 2023

I'm afraid one more re-generation is needed.

vlsi added 2 commits January 23, 2023 18:36
Signed-off-by: Vladimir Sitnikov <sitnikov.vladimir@gmail.com>
Signed-off-by: Vladimir Sitnikov <sitnikov.vladimir@gmail.com>
Signed-off-by: Fredrik Skogman <kommendorkapten@github.com>
Copy link
Member

@kommendorkapten kommendorkapten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Co-authored-by: Hayden B <hblauzvern@google.com>
Signed-off-by: Vladimir Sitnikov <sitnikov.vladimir@gmail.com>
@@ -63,7 +63,7 @@ message VerificationMaterial {
message Bundle {
// MUST be application/vnd.dev.sigstore.bundle+json;version=0.1
// when encoded as JSON.
string media_type = 1;
string media_type = 1 [(google.api.field_behavior) = REQUIRED];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly irrelevant, but is anybody familiar with what this annotation implies about non-JSON-serialized protobufs? My understanding is that media_type is only sensible when encoded as JSON, so it may not make sense to set it as REQUIRED here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try it out!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was digging around a bit, and was not able to get a program to fail marshalling to either binary or JSON when the preconditions where not met (i.e some REQUIRED fields was not present). This led me to the documentation where I found this:

Note: The vocabulary given in this document is for descriptive purposes only, and does not itself add any validation. The purpose is to consistently document this behavior for users.

So it any developer would still need to perform any validation manually it seems. The generated (Go) code does not seem to include any annotations for JSON marshal/unmarshal that would imply that a given field is REQUIRED or not.

Maybe this is old news to everyone, but it was not for me (my previous example is to rely on optional keywords and treat anything else as REQUIRED).

But to go back to @woodruffw 's question. I think it still make sense to include the media_type even when binary marshalling is being conducted, as we may not know what a downstream consumer is intending to do with the message. Maybe there is a service pulling the data from a central registry and serializing to disk (via JSON) for further consumption by a different service?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haydentherapper you got any different experiences with REQUIRED fields?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, we need to find a library for enforcing this (I know of the internal google one, not sure what’s available for OSS)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I think it's weird that we have media_type in the proto at all. It's a property of the serialization, not the underlying data structure. I think it would make more sense if we could hook into the JSON marshal step and add it there. Or ask clients to add it after the JSON marshalling (though it'd be very easy to forget).

That said, I know we can't so we're kinda stuck. I still think it's wrong to set it unless you set it right before JSON serialization, especially if you're not intending on doing that directly. In practice, it'll probably just always be set to the JSON value so I won't fight super hard here, but I don't think it should be required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it were a fixed string? Or at least an enum? That maybe prevents clients or clarifies usage in code that it must be certain values.

At the same time, I do want it as a type hint. So yeah, maybe not required, but otherwise I don't know how to package this unless it's inside another wrapped proto with (payload=bundle, media_type)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed -- IMO it's not ideal, but the alternatives (hooking into the marshalling, or doing it after marshalling) are also probably more error prone/more likely to be forgotten due to a lack of controls.

I'm loathe to add yet another layer of schematization here, but maybe what we really need is a JSON Schema definition that gets generated from these protobuf definitions. That schema could then include (and enforce) the requirement hints.

Just from a brief look, it seems like https://github.com/chrusty/protoc-gen-jsonschema might do the trick, and has support for their own required annotation to mark fields as non-optional.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good ideas...but I think this is a case where "done" is more important than "right" so we should just pick something and do it.

Maybe for now let's leave it required?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me. This is definitely not something that should block any other work here, and I can pursue it as an independent follow-up.

Copy link
Member

@woodruffw woodruffw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM mod one comment

Copy link
Contributor

@znewman01 znewman01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is to err on the side of not marking things required for now.

But I suppose an error either way doesn't really hurt here, since it's not really enforced in proto and these things being missing should cause other verification errors anyway.

@@ -63,7 +63,7 @@ message VerificationMaterial {
message Bundle {
// MUST be application/vnd.dev.sigstore.bundle+json;version=0.1
// when encoded as JSON.
string media_type = 1;
string media_type = 1 [(google.api.field_behavior) = REQUIRED];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I think it's weird that we have media_type in the proto at all. It's a property of the serialization, not the underlying data structure. I think it would make more sense if we could hook into the JSON marshal step and add it there. Or ask clients to add it after the JSON marshalling (though it'd be very easy to forget).

That said, I know we can't so we're kinda stuck. I still think it's wrong to set it unless you set it right before JSON serialization, especially if you're not intending on doing that directly. In practice, it'll probably just always be set to the JSON value so I won't fight super hard here, but I don't think it should be required.

@@ -69,7 +70,7 @@ message CertificateAuthority {
// that are allowed.
message TrustedRoot {
// MUST be application/vnd.dev.sigstore.trustedroot+json;version=0.1
string media_type = 1;
string media_type = 1 [(google.api.field_behavior) = REQUIRED];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same media type question here.

@kommendorkapten
Copy link
Member

kommendorkapten commented Feb 9, 2023

But I suppose an error either way doesn't really hurt here, since it's not really enforced in proto and these things being missing should cause other verification errors anyway.

This is good I think. As we have REQUIRED on some fields already, I think adding more makes it a bit more consistent and clear for a user. When we go after #67 and have a strong test harness we can nail it down to perfection.

I think the state is now when some fields are marked as required and some not (which are) is a bit confusing. So either we make a "best effort" like this PR, or remove them all together until we can offer a good way to programatically verify a message.

edit: s/now/not/

@haydentherapper
Copy link
Collaborator

Where did we leave off with this PR? Are we going to proceed with this and cut a new version (as its a breaking change)?

@woodruffw
Copy link
Member

I'm +1 for getting this merged -- I agree that these annotations have limited value on the programmatic side, but they do have some value on the design/engineering side (in terms of visually confirming my intuition that various fields actually are required).

In particular, it's nice to have a confirmation that the SET and log entry body are required, while the inclusion proof (intuitively) isn't!

CC @tnytown

Copy link
Member

@kommendorkapten kommendorkapten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok too with merging this.
We don't need to cut a breaking change, we can do a patch update as this is only clarifications and does not change the original intent.

edit: sentence missed a crucial word, don't, thanks @feelepxyz for pointing that out 🚀

@feelepxyz
Copy link
Member

We need to cut a new breaking release

@kommendorkapten to clarify, you mean we don't need to do a major version/breaking change release?

@woodruffw
Copy link
Member

#79 partially overlapped with this, so we'll need to rebase/deconflict here 🙂

@haydentherapper
Copy link
Collaborator

Closing since we've made a lot of fields required already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants