Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds: generic xds client resource watching e2e #8183

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

purnesh42H
Copy link
Contributor

@purnesh42H purnesh42H commented Mar 19, 2025

This is the change to make generic xDS client do resource watching from xDS management server. The xDS client uses the ADS (Aggregated Data Service) stream transport channel that was created in #8144. The e2e tests are written using the listener resource type from #8144 and grpctransport from #8066.

The PR copies the existing clientimpl, authority, clientimpl_watcher from internal xdsclient code along with the helpers that are part of gRPC and then modify them to use the generic client types and interfaces.

The PR also adds String() and Equal() methods for ServerIdentifierExtension and implements them for grpctransport.ServerIdentifierExtension. It also adds a custom map implementation similar to resolver.AddrMap to re-use existing channels for similar server configurations.

Recommend to review each commit of the PR separately in the order.

Commits with copy prefix are copy paste of internal xdsclient code and are followed by modifications. Review only the modification commit.

RELEASE NOTES: None

Copy link

codecov bot commented Mar 19, 2025

Codecov Report

Attention: Patch coverage is 75.64259% with 199 lines in your changes missing coverage. Please review.

Project coverage is 82.23%. Comparing base (5edab9e) to head (eea4f8b).
Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
xds/internal/clients/xdsclient/authority.go 69.67% 114 Missing and 27 partials ⚠️
xds/internal/clients/xdsclient/clientimpl.go 79.38% 26 Missing and 14 partials ⚠️
...s/internal/clients/grpctransport/grpc_transport.go 89.09% 4 Missing and 2 partials ⚠️
.../internal/clients/xdsclient/clientimpl_watchers.go 86.66% 3 Missing and 3 partials ⚠️
...ds/internal/clients/internal/testutils/wrappers.go 87.50% 2 Missing and 1 partial ⚠️
xds/internal/clients/xdsclient/xdsclient.go 81.25% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8183      +/-   ##
==========================================
+ Coverage   82.17%   82.23%   +0.06%     
==========================================
  Files         410      416       +6     
  Lines       40236    41192     +956     
==========================================
+ Hits        33065    33876     +811     
- Misses       5822     5900      +78     
- Partials     1349     1416      +67     
Files with missing lines Coverage Δ
xds/internal/clients/internal/internal.go 96.00% <ø> (+7.11%) ⬆️
xds/internal/clients/internal/testutils/channel.go 100.00% <100.00%> (ø)
xds/internal/clients/xdsclient/logging.go 100.00% <100.00%> (ø)
xds/internal/clients/xdsclient/xdsconfig.go 100.00% <100.00%> (ø)
...ds/internal/clients/internal/testutils/wrappers.go 87.50% <87.50%> (ø)
xds/internal/clients/xdsclient/xdsclient.go 82.35% <81.25%> (+82.35%) ⬆️
...s/internal/clients/grpctransport/grpc_transport.go 89.32% <89.09%> (-1.25%) ⬇️
.../internal/clients/xdsclient/clientimpl_watchers.go 86.66% <86.66%> (ø)
xds/internal/clients/xdsclient/clientimpl.go 79.38% <79.38%> (ø)
xds/internal/clients/xdsclient/authority.go 69.67% <69.67%> (ø)

... and 41 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@purnesh42H purnesh42H force-pushed the generic-xds-client-resource-watchine-e2e branch from c99ca98 to 02db583 Compare March 19, 2025 19:48
@purnesh42H purnesh42H requested a review from dfawley March 19, 2025 20:07
@purnesh42H purnesh42H added Type: Feature New features or improvements in behavior Area: xDS Includes everything xDS related, including LB policies used with xDS. labels Mar 19, 2025
@purnesh42H purnesh42H added this to the 1.72 Release milestone Mar 19, 2025
@purnesh42H
Copy link
Contributor Author

purnesh42H commented Mar 19, 2025

@dfawley could you do a pass on the implementation? For now, I have removed the metrics from client as it requires some more thought. It can be added in separate commit or PR. Otherwise the main implementation can be reviewed to make sure we are following the right approach and do the changes (if needed) before we go further. Also, I will be adding more e2e tests in subsequent commits

@purnesh42H purnesh42H requested a review from dfawley March 20, 2025 20:21
@purnesh42H purnesh42H force-pushed the generic-xds-client-resource-watchine-e2e branch 6 times, most recently from aeb32c1 to 7064881 Compare March 23, 2025 19:36
@purnesh42H purnesh42H changed the title Generic xds client resource watchine e2e xds: Generic xds client resource watchine e2e Mar 24, 2025
@purnesh42H purnesh42H changed the title xds: Generic xds client resource watchine e2e xds: generic xds client resource watchine e2e Mar 24, 2025
@dfawley dfawley assigned purnesh42H and unassigned dfawley Mar 26, 2025
@purnesh42H purnesh42H force-pushed the generic-xds-client-resource-watchine-e2e branch 2 times, most recently from 4a038d9 to 3bacb44 Compare March 28, 2025 18:27
@purnesh42H purnesh42H requested a review from dfawley March 28, 2025 18:31
@purnesh42H purnesh42H assigned dfawley and unassigned purnesh42H Mar 28, 2025
@dfawley dfawley changed the title xds: generic xds client resource watchine e2e xds: generic xds client resource watching e2e Mar 28, 2025
@purnesh42H
Copy link
Contributor Author

purnesh42H commented Mar 28, 2025

@dfawley i have made the change to use ServerConfig and ServerIdentifier as map key for xdsclient and transport respectively. Now we don't need equal. Only thing is for grpcTransport, we return runtime error if extensions are not *ServerIdentifierExtension. Though I was testing and found that if the two extensions pointing to same address but have different values, they are still considered equal. Is that okay since that's our transport?

Could you do the pass on last commit and confirm if we can finalize this approach?

@purnesh42H purnesh42H force-pushed the generic-xds-client-resource-watchine-e2e branch 2 times, most recently from 03adc89 to 5bcc906 Compare March 28, 2025 18:57
@dfawley dfawley removed their assignment Apr 1, 2025
@dfawley dfawley requested a review from easwars April 1, 2025 22:45
@easwars easwars self-assigned this Apr 3, 2025
@easwars
Copy link
Contributor

easwars commented Apr 3, 2025

The PR copies the existing clientimpl, authority, clientimpl_watcher from internal xdsclient code along with the helpers that are part of gRPC and then modify them to use the generic client types and interfaces.

Didn't we at some point say that we will move the code as-in in one PR and then make changes in a follow-up, so that we can clearly see what is changing. If that is not feasible, could you please describe the changes in detail in each of those non-test files that are being copied over. Thanks.

@@ -59,6 +59,11 @@ type ServerIdentifier struct {
//
// For example, a custom TransportBuilder might use this field to
// configure a specific security credentials.
//
// Extensions must be able to be used as a map key, or the client may
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

// Extensions may be any type that is comparable, as they are used as map keys internally.
// See: https://go.dev/ref/spec#Comparison_operators.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Thanks

@@ -59,6 +59,11 @@ type ServerIdentifier struct {
//
// For example, a custom TransportBuilder might use this field to
// configure a specific security credentials.
//
// Extensions must be able to be used as a map key, or the client may
// panic. Any equivalent extensions in all ServerIdentifiers present in a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please help me understand this.

// Any equivalent extensions in all ServerIdentifiers present in a
// single client's configuration should have the same value.  Not following
// this restriction may result in excess resource usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this basically means if all the fields in ServerIdentifier are not equal between 2 server identifiers, then they will create a separate transport to the server. So, even if they are pointing to same server but some fields are different they will create two separate transports to that server.


func ExampleServerIdentifierExtension() {
si := clients.ServerIdentifier{ServerURI: "localhost:5678", Extensions: grpctransport.ServerIdentifierExtension{Credentials: "local"}}
fmt.Printf("%v", si)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to a comment with the expected output. Otherwise, this does nothing.

Copy link
Contributor Author

@purnesh42H purnesh42H Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah actually, you need to mention the output as a comment below. I added. https://go.dev/blog/examples

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you can not print it.

Can you docstring examples to explain what they're attempting to show?

Copy link
Contributor Author

@purnesh42H purnesh42H Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the docstring. Still printing though but combined print and initialization in single line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By examples, do you mean add more examples?

Comment on lines +41 to +42
// It must be set by value (not pointer) in the
// clients.ServerIdentifier.Extensions field (See Example).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Pointer types are comparable too? Maybe I'm missing something.

https://go.dev/ref/spec#Comparison_operators

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With pointer there are 2 issues

  1. two different pointer with same values are unequal
  2. same pointers with different values are equal

So, we decided to go by value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same pointers with different values are equal

That's not possible. But (1) is the reason why using by value is preferred. Otherwise you have to reuse instances for equivalent values.

type Builder struct {
// Credentials is a map of credentials names to credentials.Bundle which
// can be used to connect to the server.
Credentials map[string]credentials.Bundle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to exported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Unexported


if logger.V(2) {
logger.Info("Creating a new connection to the server for ServerIdentifier: %v", si)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/connection/transport

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you could move this log to be right before the last return statement, and change "Creating" to "Created".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

t.Fatalf("Got nil `tr1` transport from Build(serverID1), want non-nil")
}
if tr1.(*grpcTransport).cc == nil {
t.Fatalf("Got nil grpc.ClientConn in transport `tr1`, want non-nil")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ever possible for this to be nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually removed these stuff from the test. As you mentioned, we shouldn't to verifying the internal fields

Comment on lines 164 to 168
if len(b.serverIdentifierMap) != 1 {
t.Fatalf("Builder.serverIdentifierMap has unexpected length %d, want 1", len(b.serverIdentifierMap))
}
if b.serverIdentifierMap[serverID1] != tr1 {
t.Fatalf("Builder.serverIdentifierMap[serverID1] = %v, want %v", b.serverIdentifierMap[serverID1], tr1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of tests that verify implementation details instead of verifying functionality. If we change the implementation for whatever reason without actually changing the provided functionality, this test would have to be rewritten. Instead I would recommend spawning real backends, and ensuring that new connections are created to them when a new transport is expected to be created (and no new connections are created when a transport is expected to be re-used). I would even recommend moving this test to grpctransport_test package, thereby ensuring that you can never depend on the implementation.

Copy link
Contributor Author

@purnesh42H purnesh42H Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i simplified this test to just create transports. The dedupe part is being tested in refcounted_transport_test.go which is using hooks similar internal clientImpl ref count tests.

Comment on lines +248 to +252
for i, cfg := range a.xdsChannelConfigs {
if isServerConfigEqual(sc, cfg.serverConfig) {
return i
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within an authority, we should only be using a single server at a time. If it fails, we fall back to the next one

But when that happens, we still keep the higher priority server in the hopes that it will become ready. So, we a single authority could have more than one xDS channel. Only one can be active though.

Comment on lines +750 to +753
xc, cleanup, err := a.getChannelForADS(sc, a)
if err != nil {
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was there any reason why we didn't do the client with failing transport?

Oversight I guess.

@purnesh42H purnesh42H force-pushed the generic-xds-client-resource-watchine-e2e branch from c879605 to 26f807f Compare April 3, 2025 06:46
@purnesh42H purnesh42H requested a review from dfawley April 3, 2025 12:11
@purnesh42H purnesh42H assigned dfawley and unassigned purnesh42H Apr 3, 2025
@purnesh42H
Copy link
Contributor Author

purnesh42H commented Apr 3, 2025

The PR copies the existing clientimpl, authority, clientimpl_watcher from internal xdsclient code along with the helpers that are part of gRPC and then modify them to use the generic client types and interfaces.

Didn't we at some point say that we will move the code as-in in one PR and then make changes in a follow-up, so that we can clearly see what is changing. If that is not feasible, could you please describe the changes in detail in each of those non-test files that are being copied over. Thanks.

yeah its not possible to have separate PRs with copy paste as they will fail the build. Basically if you see the commits, the one starting with copy are exact copy from internal xds client and then all the copy commits are followed by the modifications. So, you can review the modifications. For 3 files you mentioned this is the copy commit 94b4135 and the next commit 35cd65e is modifications. Ofcourse they were later trimmed a bit more based on doug's review but nothing major. Same pattern is followed for copied e2e tests.

@purnesh42H purnesh42H requested a review from easwars April 3, 2025 16:00
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: rename to examples_test.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


func ExampleServerIdentifierExtension() {
si := clients.ServerIdentifier{ServerURI: "localhost:5678", Extensions: grpctransport.ServerIdentifierExtension{Credentials: "local"}}
fmt.Printf("%v", si)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you can not print it.

Can you docstring examples to explain what they're attempting to show?

"google.golang.org/grpc/xds/internal/clients/grpctransport"
)

func ExampleServerIdentifierExtension() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExampleServerIdentifierExtension_HowToSetInServerIdentifier or _Usage or something?

Copy link
Contributor Author

@purnesh42H purnesh42H Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it requires to follow format "Example{struct name}" else it gives warning that the struct is not defined.

)

func ExampleServerIdentifierExtension() {
si := clients.ServerIdentifier{ServerURI: "localhost:5678", Extensions: grpctransport.ServerIdentifierExtension{Credentials: "local"}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment: // Note the extensions field is set by value and not by pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +41 to +42
// It must be set by value (not pointer) in the
// clients.ServerIdentifier.Extensions field (See Example).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same pointers with different values are equal

That's not possible. But (1) is the reason why using by value is preferred. Otherwise you have to reuse instances for equivalent values.

Comment on lines 99 to 100
var creds credentials.Bundle
if creds, ok = b.Credentials[sce.Credentials]; !ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this would more commonly be written to use type inference:

creds, ok := b.Credentials[sce.Credentials]
if !ok {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

type grpcTransport struct {
cc *grpc.ClientConn
cc *grpc.ClientConn
refCount int32 // accessed atomically
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If atomic, then it should be atomic.Int32. But you need a lock otherwise there will be races between transport re-use and dereferencing. In which case you don't need an atomic here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incrRef and decRef were doing atomic add and subtract so the var itself need not be atomic? But yeah you correctly pointed out the race condition of deleting the transport and at the someone else getting it. I have switched to lock so now both decrement and deletion are serialized.

Comment on lines 159 to 163
if g.decrRef() != 0 {
return nil
}

g.deleteFromServerIdentiferMap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a race. If you dec the counter then take the lock, then re-use happens, then you're going to delete a used transport. Best to take the lock and remove the atomic and keep things simple. Performance does not matter on these code paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah fixed using lock. Thanks for pointing it out.

Comment on lines +185 to +187
tr.closed.Do(func() {
err = tr.grpcTransport.Close()
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can guarantee this is only called once, so the Once shouldn't be needed. No?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will you guarantee? I think we can guarantee in our implementation may be but this transportRef is returned to the user of grpcTransport which can be used by anyone who is using our xdsclient so we should make sure its idempotent.

Comment on lines +50 to +56
// Override the gRPC transport creation hook to get notified.
origGRPCTransportCreateHook := grpcTransportCreateHook
grpcTransportCreateCh := testutils.NewChannelWithSize(1)
grpcTransportCreateHook = func() {
grpcTransportCreateCh.Replace(struct{}{})
}
defer func() { grpcTransportCreateHook = origGRPCTransportCreateHook }()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you don't need the hooks if you just check the pointer of the returned transport to confirm you are getting the same one when expected, or a new when when re-use is not expected?

Copy link
Contributor Author

@purnesh42H purnesh42H Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if that's a better alternative. As @easwars pointed out, then we are being tied to implementation for assertion. So, in this case we will have a get method in transportReference which will return underlying transport but then to compare we will have to type cast etc. to compare. Also, for close we still need a hook? So, better to be consistent? Do you see any big limitation with hooks? Ofcourse a code change can disturb the order of them but that still has to go through a PR review.

@dfawley dfawley assigned purnesh42H and unassigned dfawley Apr 3, 2025
@purnesh42H purnesh42H requested a review from dfawley April 4, 2025 05:06
@purnesh42H purnesh42H assigned dfawley and easwars and unassigned easwars and purnesh42H Apr 4, 2025
@purnesh42H purnesh42H force-pushed the generic-xds-client-resource-watchine-e2e branch from 1fd3ff5 to eea4f8b Compare April 7, 2025 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants