Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect Gloo metrics and some snapshots on test failure #10400

Merged
merged 33 commits into from
Nov 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
45537f3
Enable stats in common recommendations
ryanrolds Nov 22, 2024
ff88db6
Fetch and save Gloo metrics and some snapshots on failure
ryanrolds Nov 23, 2024
24322c5
Merge branch 'main' into rolds/test_failure_collect_metrics_dump
ryanrolds Nov 23, 2024
e2b43ec
Added changelog
ryanrolds Nov 23, 2024
31d5f5b
Adding changelog file to new location
Nov 25, 2024
ba62682
Deleting changelog file from old location
Nov 25, 2024
2cbdc1e
Move to using glood admin cli package
ryanrolds Nov 25, 2024
da1b4ae
Merge branch 'rolds/test_failure_collect_metrics_dump' of ssh://githu…
ryanrolds Nov 25, 2024
788c36f
Adjusting error handling
ryanrolds Nov 25, 2024
1e353f2
Adjusting error handling
ryanrolds Nov 25, 2024
f40c5a0
Adjusting error handling
ryanrolds Nov 25, 2024
a32eac9
Fix error check failing
ryanrolds Nov 25, 2024
1b100f5
Removed intentional failure
ryanrolds Nov 25, 2024
5618f3e
Merge branch 'main' into rolds/test_failure_collect_metrics_dump
ryanrolds Nov 25, 2024
b88fb8d
Merge branch 'main' into rolds/test_failure_collect_metrics_dump
ryanrolds Nov 25, 2024
1f6a3be
Merge branch 'main' into rolds/test_failure_collect_metrics_dump
ryanrolds Nov 25, 2024
510fb88
Merge branch 'main' into rolds/test_failure_collect_metrics_dump
nfuden Nov 26, 2024
49d65dd
Merge branch 'main' into rolds/test_failure_collect_metrics_dump
ryanrolds Nov 26, 2024
34ee46c
Merge branch 'main' into rolds/test_failure_collect_metrics_dump
nfuden Nov 26, 2024
11d4336
Partial work unifying the dump logic
ryanrolds Nov 27, 2024
9eed2ca
Merge branch 'rolds/test_failure_collect_metrics_dump' of ssh://githu…
ryanrolds Nov 27, 2024
d526096
More work, should be at least building and running now. Still fixing …
ryanrolds Nov 27, 2024
432a4b3
Working dump
ryanrolds Nov 27, 2024
690e5be
Intentionally fail a kube2e test
ryanrolds Nov 27, 2024
068f973
Minor log adjustment
ryanrolds Nov 27, 2024
afe6f10
Clean up
ryanrolds Nov 27, 2024
9755e4a
Merge branch 'main' into rolds/test_failure_collect_metrics_dump
ryanrolds Nov 27, 2024
6709c45
Fixing import cycle
ryanrolds Nov 27, 2024
d184371
Adjusting the port forwarding for fetching envoy state
ryanrolds Nov 27, 2024
3f63697
Making the envoy dumping a little more fault tollerant
ryanrolds Nov 27, 2024
8c494c4
Don't wipe while making subdirs
ryanrolds Nov 27, 2024
04c342b
Removed intentional failures
ryanrolds Nov 27, 2024
b326bb1
Merge branch 'main' into rolds/test_failure_collect_metrics_dump
ryanrolds Nov 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions changelog/v1.18.0-rc3/collect-more-artifacts-on-ci-failure.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
changelog:
- type: NON_USER_FACING
description: >-
Gloo Gateway controller metrics and xds/krt snaphots are now collected and included
the test failure artifacts.
After encountering some test failures that proved difficult to debug without knowing more
about the state of the cluster, we have added additional artifacts to be collected when
a test fails.
This will help us to more easily diagnose the cause of test failures.
- type: NON_USER_FACING
description: >-
Unified the kube2e and kubernetes/e2e test failure artifact collection.
Previously, the test failure artifacts for kube2e and kubernetes/e2e tests were different
and produced by their own logic.
12 changes: 12 additions & 0 deletions pkg/utils/glooadminutils/admincli/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ import (

const (
InputSnapshotPath = "/snapshots/input"
xdsSnapshotPath = "/snapshots/xds"
krtSnapshotPath = "/snapshots/krt"
)

// Client is a utility for executing requests against the Gloo Admin API
Expand Down Expand Up @@ -84,6 +86,16 @@ func (c *Client) InputSnapshotCmd(ctx context.Context) cmdutils.Cmd {
return c.Command(ctx, curl.WithPath(InputSnapshotPath))
}

// XdsSnapshotCmd returns the cmdutils.Cmd that can be run, and will execute a request against the XDS Snapshot path
func (c *Client) XdsSnapshotCmd(ctx context.Context) cmdutils.Cmd {
return c.Command(ctx, curl.WithPath(xdsSnapshotPath))
}

// KrtSnapshotCmd returns the cmdutils.Cmd that can be run, and will execute a request against the KRT Snapshot path
func (c *Client) KrtSnapshotCmd(ctx context.Context) cmdutils.Cmd {
return c.Command(ctx, curl.WithPath(krtSnapshotPath))
}

// GetInputSnapshot returns the data that is available at the input snapshot endpoint
func (c *Client) GetInputSnapshot(ctx context.Context) ([]interface{}, error) {
var outLocation threadsafe.Buffer
Expand Down
25 changes: 25 additions & 0 deletions pkg/utils/kubeutils/kubectl/cli.go
Original file line number Diff line number Diff line change
Expand Up @@ -342,3 +342,28 @@ func (c *Cli) GetContainerLogs(ctx context.Context, namespace string, name strin
stdout, stderr, err := c.Execute(ctx, "-n", namespace, "logs", name)
return stdout + stderr, err
}

// GetPodsInNsWithLabel returns the pods in the specified namespace with the specified label
func (c *Cli) GetPodsInNsWithLabel(ctx context.Context, namespace string, label string) ([]string, error) {
podStdOut := bytes.NewBuffer(nil)
podStdErr := bytes.NewBuffer(nil)

// Fetch the name of the Gloo Gateway controller pod
getGlooPodNamesCmd := c.Command(ctx, "get", "pod", "-n", namespace,
"--selector", label, "--output", "jsonpath='{.items[*].metadata.name}'")
err := getGlooPodNamesCmd.WithStdout(podStdOut).WithStderr(podStdErr).Run().Cause()
if err != nil {
fmt.Printf("error running get gloo pod name command: %v\n", err)
}

// Clean up and check the output
glooPodNamesString := strings.Trim(podStdOut.String(), "'")
if glooPodNamesString == "" {
fmt.Printf("no %s pods found in namespace %s\n", label, namespace)
return []string{}, nil
}

// Split the string on whitespace to get the pod names
glooPodNames := strings.Fields(glooPodNamesString)
return glooPodNames, nil
}
37 changes: 19 additions & 18 deletions projects/gloo/pkg/servers/iosnapshot/history_test.go
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
package iosnapshot
package iosnapshot_test

import (
"context"
Expand All @@ -8,6 +8,7 @@ import (
gomegatypes "github.com/onsi/gomega/types"
"github.com/solo-io/gloo/pkg/schemes"
gloov1 "github.com/solo-io/gloo/projects/gloo/pkg/api/v1/kube/apis/gloo.solo.io/v1"
"github.com/solo-io/gloo/projects/gloo/pkg/servers/iosnapshot"
apiv1beta1 "sigs.k8s.io/gateway-api/apis/v1beta1"

wellknownkube "github.com/solo-io/gloo/projects/gloo/pkg/api/v1/kube/wellknown"
Expand Down Expand Up @@ -52,16 +53,16 @@ var _ = Describe("History", func() {
ctx context.Context

clientBuilder *fake.ClientBuilder
history History
history iosnapshot.History

historyFactorParams HistoryFactoryParameters
historyFactorParams iosnapshot.HistoryFactoryParameters
)

BeforeEach(func() {
ctx = context.Background()
clientBuilder = fake.NewClientBuilder().WithScheme(schemes.DefaultScheme())

historyFactorParams = HistoryFactoryParameters{
historyFactorParams = iosnapshot.HistoryFactoryParameters{
Settings: &v1.Settings{
Metadata: &core.Metadata{
Name: "my-settings",
Expand Down Expand Up @@ -98,11 +99,11 @@ var _ = Describe("History", func() {
},
}

history = NewHistory(
history = iosnapshot.NewHistory(
historyFactorParams.Cache,
historyFactorParams.Settings,
clientBuilder.WithObjects(clientObjects...).Build(),
append(CompleteInputSnapshotGVKs, deploymentGvk), // include the Deployment GVK
append(iosnapshot.CompleteInputSnapshotGVKs, deploymentGvk), // include the Deployment GVK
)
})

Expand Down Expand Up @@ -136,15 +137,15 @@ var _ = Describe("History", func() {
},
}

history = NewHistory(&xds.MockXdsCache{},
history = iosnapshot.NewHistory(&xds.MockXdsCache{},
&v1.Settings{
Metadata: &core.Metadata{
Name: "my-settings",
Namespace: defaults.GlooSystem,
},
},
clientBuilder.WithObjects(clientObjects...).Build(),
CompleteInputSnapshotGVKs, // do not include the Deployment GVK
iosnapshot.CompleteInputSnapshotGVKs, // do not include the Deployment GVK
)
})

Expand Down Expand Up @@ -377,11 +378,11 @@ var _ = Describe("History", func() {
},
}

history = NewHistory(
history = iosnapshot.NewHistory(
historyFactorParams.Cache,
historyFactorParams.Settings,
clientBuilder.WithObjects(clientObjects...).Build(),
CompleteInputSnapshotGVKs)
iosnapshot.CompleteInputSnapshotGVKs)
})

Context("Kubernetes Core Resources", func() {
Expand Down Expand Up @@ -663,11 +664,11 @@ var _ = Describe("History", func() {
Context("GetEdgeApiSnapshot", func() {

BeforeEach(func() {
history = NewHistory(
history = iosnapshot.NewHistory(
historyFactorParams.Cache,
historyFactorParams.Settings,
clientBuilder.Build(), // no objects, because this API doesn't rely on the kube client
CompleteInputSnapshotGVKs,
iosnapshot.CompleteInputSnapshotGVKs,
)
})

Expand Down Expand Up @@ -758,11 +759,11 @@ var _ = Describe("History", func() {
Context("GetProxySnapshot", func() {

BeforeEach(func() {
history = NewHistory(
history = iosnapshot.NewHistory(
historyFactorParams.Cache,
historyFactorParams.Settings,
clientBuilder.Build(), // no objects, because this API doesn't rely on the kube client
CompleteInputSnapshotGVKs,
iosnapshot.CompleteInputSnapshotGVKs,
)
})

Expand Down Expand Up @@ -794,7 +795,7 @@ var _ = Describe("History", func() {

})

func getInputSnapshotObjects(ctx context.Context, history History) []client.Object {
func getInputSnapshotObjects(ctx context.Context, history iosnapshot.History) []client.Object {
snapshotResponse := history.GetInputSnapshot(ctx)
Expect(snapshotResponse.Error).NotTo(HaveOccurred())

Expand All @@ -804,7 +805,7 @@ func getInputSnapshotObjects(ctx context.Context, history History) []client.Obje
return responseObjects
}

func getProxySnapshotResources(ctx context.Context, history History) []crdv1.Resource {
func getProxySnapshotResources(ctx context.Context, history iosnapshot.History) []crdv1.Resource {
snapshotResponse := history.GetProxySnapshot(ctx)
Expect(snapshotResponse.Error).NotTo(HaveOccurred())

Expand All @@ -814,7 +815,7 @@ func getProxySnapshotResources(ctx context.Context, history History) []crdv1.Res
return responseObjects
}

func getEdgeApiSnapshot(ctx context.Context, history History) *v1snap.ApiSnapshot {
func getEdgeApiSnapshot(ctx context.Context, history iosnapshot.History) *v1snap.ApiSnapshot {
snapshotResponse := history.GetEdgeApiSnapshot(ctx)
Expect(snapshotResponse.Error).NotTo(HaveOccurred())

Expand All @@ -827,7 +828,7 @@ func getEdgeApiSnapshot(ctx context.Context, history History) *v1snap.ApiSnapsho
// setSnapshotOnHistory sets the ApiSnapshot on the history, and blocks until it has been processed
// This is a utility method to help developers write tests, without having to worry about the asynchronous
// nature of the `Set` API on the History
func setSnapshotOnHistory(ctx context.Context, history History, snap *v1snap.ApiSnapshot) {
func setSnapshotOnHistory(ctx context.Context, history iosnapshot.History, snap *v1snap.ApiSnapshot) {
gwSignal := &gatewayv1.Gateway{
// We append a custom Gateway to the Snapshot, and then use that object
// to verify the Snapshot has been processed
Expand Down
Loading