Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve automated Slack messaging #425

Merged
merged 71 commits into from
Feb 4, 2025

Conversation

andrewmogan
Copy link
Collaborator

Last week, Eric requested (and Alessandro seconded) that our automated Slack workflow failure alerts list the jobs and steps that failed. I took this as an opportunity to overhaul how our messages are generated. The major changes are listed below.

  • Bump slack-github-action to v2, which requires some minor syntax changes in how the workflow is called.
  • Our slack-notification.yml reusable workflow now uses a gh api call to get a JSON list of failed jobs and steps.
  • Rather than choosing one of two hard-wired, generic messages, Slack messages are now generated by slack_payload_generator.py under scripts/github-ci. This script parses the gh api output to generate a JSON payload that can be parsed by slack-github-action.
  • In the case of integration test failures, slack_payload_generator.py gets a brief error summary from the junit xml files output by our pytests.

This new method allows for more flexibility and modularity when generating Slack messages, meaning future requests for changes or improvements should be easier to handle.

Note to self: remove ref: amogan/improved_slack_messaging after this is merged.

@jcfreeman2
Copy link
Collaborator

Two things:

  1. It looks like almost all our v5 integration tests are removed in this PR
  2. What does the removal of the line workflow_success: ${{ needs.publish_to_cvmfs.result == 'success' }} throughout do?

@andrewmogan
Copy link
Collaborator Author

  1. That's because I forgot to uncomment them! That's fixed now.
  2. That line was previously necessary in case the publish_to_cvmfs step was skipped due to a prior failure. The previous implementation generated one of two Slack messages based on workflow success or failure. But now, we can just use the output of gh api <etc> to check for failed jobs and steps after the workflow is finished.

jcfreeman2 pushed a commit that referenced this pull request Feb 3, 2025
@jcfreeman2
Copy link
Collaborator

OK, I ran the v5 integration test which I knew would fail, and the output to Slack looked great (2:21 PM Chicago time message in #daq-release-notifications). The result for the passing v4 integration test also looked good (2:29 PM message).

However, while playing around a bit, I uncovered something interesting: basically, if I cancel an integration test which runs using the develop branch, it registers as an error on Slack (see 2:39 PM message). However, if I cancel it while using this feature branch, it registers as a success (see 2:43 PM message). Is it easy to revert back to the previous behavior? I suppose ideally we'd even have it appear in Slack as a "run cancelled" or maybe not at all, but that would be a separately filed Issue.

@jcfreeman2
Copy link
Collaborator

I've just confirmed that with the latest commits on this branch, canceling an integration test Workflow sends a "Cancelled" message; PR is approved.

@jcfreeman2 jcfreeman2 merged commit fbaf971 into develop Feb 4, 2025
5 of 15 checks passed
@jcfreeman2 jcfreeman2 deleted the amogan/improved_slack_messaging branch February 4, 2025 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants