Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: grpc health service #4020

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

mouchar
Copy link

@mouchar mouchar commented Mar 4, 2025

Overview

Implements gRPC Health check using standard grpc library.

Reuses the health as reported by gosundheit health checker. Grpc health server
implements also streaming Watch method, so it was necessary to add
Stop() failback to stop server even if some clients ignore NOT_SERVING
status. Timeout for graceful shutdown was chosen to match timeouts for
http and telemetry servers.

What this PR does / why we need it

Dex already provides health check endpoint via HTTP. For integrations based on
gRPC, notably if we want to use client-side health checks for higher reliability,
as described here, we also need
to report service health using grpc.health.v1.Health service.

Special notes for your reviewer

Golang is not my mother tongue, so if there is something utterly wrong, please let me
know.

cmd/dex/serve.go Outdated
Comment on lines 519 to 531
go func() {
var status healthgrpc.HealthCheckResponse_ServingStatus
for {
switch healthChecker.IsHealthy() {
case true:
status = healthgrpc.HealthCheckResponse_SERVING
default:
status = healthgrpc.HealthCheckResponse_NOT_SERVING
}
healthcheck.SetServingStatus("", status)
time.Sleep(healthCheckPeriod)
}
}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could ae avoid using goroutines in goroutines here and use a separate group for the health endpoint?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code updated.

api.RegisterDexServer(grpcSrv, server.NewAPI(serverConfig.Storage, logger, version, serv))

grpcMetrics.InitializeMetrics(grpcSrv)
if c.GRPC.Reflection {
logger.Info("enabling reflection in grpc service")
reflection.Register(grpcSrv)
}
ctx, cancelHealthcheck := context.WithCancel(context.Background())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need a ctx here?
I think group does the trick with canceling the function.

Can you check it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need context for early breaking from the for { } loop below when shutdown is requested.

Indeed, group cancels the function, but after the gracefulStop of grpc server times out (see my comment below). Meanwhile, the healhcheck loop keeps updating status if not properly stopped from the outside (which is not a big deal, but I prefer stopping stuff when it is not needed anymore).

I can use bare channel for receiving shutdown event, but context is more convenient.


group.Add(func() error {
return grpcSrv.Serve(grpcListener)
}, func(err error) {
logger.Debug("starting graceful shutdown", "server", "grpc")
grpcSrv.GracefulStop()
ctx, cancel := context.WithTimeout(context.Background(), shutdownTimeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There you also probably do not need a context.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This context with timeout is essential for correct shutdown of grpc server. which now provides streaming method grpc.health.v1.Health/Watch. When stream is kept open by one or more client, gracefulStop waits indefinitely until all streams are closed. That's why I added 1-minute timeout on gracefulStop, followed by grpcSrv.Stop() when server doesn't stop on time.

Again, I can implement it using bare channels, but I believe that using context lib is a more idiomatic approach.

Implements gRPC Health check using standard grpc library. Reuses the
health as reported by gosundheit health checker. Grpc health server
implements also streaming Watch method, so it was necessary to add
Stop() failback to stop server even if some clients ignore NOT_SERVING
status. Timeout for graceful shutdown was chosen to match timeouts for
http and telemetry servers.

Signed-off-by: Robert Moucha <robert.moucha@gooddata.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants