Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prod release 2024-01-31T10:17:53 by Jean Schmidt #10

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/on-pr.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
name: On PR

on:
pull_request: {}
push:
branches:
- main

permissions:
id-token: write
contents: read

jobs:
tflint-plan:
name: tflint + terraform plan
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release-do-open-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@ jobs:
run: pip install virtualenv

- name: Open PR
working-directory: runners
shell: bash
run: make open-rel-pr
env:
GHA_PRIVATE_KEY_DEPLOY : ${{ secrets.RELEASE_APP_PRIVATE_KEY }}
GITHUB_APP_ID: ${{ secrets.RELEASE_APP_ID }}
GITHUB_APP_INSTALLATION_ID: ${{ secrets.RELEASE_APP_INSTALLATION_ID }}
GITHUB_REPOSITORY: ${{ github.repository }}
FAST_RELEASE_FIREFIGHT: ${{ github.event.inputs.fast_release_firefight }}
1 change: 0 additions & 1 deletion .github/workflows/release-on-comment-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ jobs:
- name: React to PR comment
run: |
make COMMENTS="PROCEED_TO_VANGUARD,ABORT_DEPLOYMENT_SHUTDOWN_VANGUARD,PROCEED_TO_PRODUCTION,CLEANUP_DEPLOYMENT" LABELS="deploy-to-vanguard,abort-vanguard,deploy-to-prod,cleanup-deployment" CHECK_REMOVE_LABELS="deploy-to-canary,,deploy-to-vanguard,deploy-to-prod" CHECK_COMMENTS="Successfully deployed to canary##Successfully deployed to vanguard#Successfully deployed to production" react-pr-comment
working-directory: runners
shell: bash
env:
GHA_PRIVATE_KEY_DEPLOY : ${{ secrets.RELEASE_APP_PRIVATE_KEY }}
Expand Down
41 changes: 2 additions & 39 deletions .github/workflows/release-on-release-label.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,16 +57,15 @@ jobs:
aws-region: us-east-1

- name: Notify job started
working-directory: runners
shell: bash
run: make COMMENT_TO_ADD="Starting to deploy to canary, wait for its conclusion and I'll guide you to next steps" add-comment-to-pr
env:
GHA_PRIVATE_KEY_DEPLOY : ${{ secrets.RELEASE_APP_PRIVATE_KEY }}
GITHUB_APP_ID: ${{ secrets.RELEASE_APP_ID }}
GITHUB_APP_INSTALLATION_ID: ${{ secrets.RELEASE_APP_INSTALLATION_ID }}
GITHUB_REPOSITORY: ${{ github.repository }}

- name: Terraform Apply / ARC canary (apply-arc-canary arc-canary)
working-directory: runners
shell: bash
run: make apply-arc-canary arc-canary
env:
Expand All @@ -78,7 +77,6 @@ jobs:
TERRAFORM_EXTRAS: -auto-approve -lock-timeout=15m

- name: Notify job success
working-directory: runners
shell: bash
run: make COMMENT_TO_ADD="Successfully deployed to canary, add a comment with PROCEED_TO_VANGUARD in order to proceed deploying to vanguard, or close this PR in order to abort" add-comment-to-pr
env:
Expand All @@ -103,7 +101,6 @@ jobs:
run: pip install virtualenv

- name: Notify deploy-to-canary job Failure
working-directory: runners
shell: bash
run: make COMMENT_TO_ADD="Something went wrong when deploying to canary, either re-run the job or close this PR to abort the deployment process" add-comment-to-pr
env:
Expand Down Expand Up @@ -154,7 +151,6 @@ jobs:
aws-region: us-east-1

- name: Notify job started
working-directory: runners
shell: bash
run: make COMMENT_TO_ADD="Starting to deploy to vanguard, wait for its conclusion and I'll guide you to next steps" add-comment-to-pr
env:
Expand All @@ -164,7 +160,6 @@ jobs:
GITHUB_REPOSITORY: ${{ github.repository }}

- name: Double-check comment added
working-directory: runners
shell: bash
run: make WAIT_COMMENT="PROCEED_TO_VANGUARD" wait-check-user-comment
env:
Expand All @@ -174,7 +169,6 @@ jobs:
GITHUB_REPOSITORY: ${{ github.repository }}

- name: Double-check bot comment added
working-directory: runners
shell: bash
run: make WAIT_COMMENT="Successfully deployed to canary" wait-check-bot-comment
env:
Expand All @@ -184,7 +178,6 @@ jobs:
GITHUB_REPOSITORY: ${{ github.repository }}

- name: Check PR approval
working-directory: runners
shell: bash
run: make wait-check-pr-approved
env:
Expand All @@ -194,7 +187,6 @@ jobs:
GITHUB_REPOSITORY: ${{ github.repository }}

- name: Terraform Apply / ARC vanguard (apply-arc-vanguard arc-vanguard)
working-directory: runners
shell: bash
run: make apply-arc-vanguard arc-vanguard
env:
Expand All @@ -203,12 +195,9 @@ jobs:
GITHUB_TOKEN: ${{ secrets.LIST_PYTORCH_RUNNERS_GITHUB_TOKEN }}
KUBECONFIG: ${{ runner.temp }}/kubeconfig
NO_EKSCTL: 'true'
ROCKSET_API_KEY: ${{ secrets.ROCKSET_API_KEY }}
ROCKSET_API_SERVER: ${{ secrets.ROCKSET_API_SERVER }}
TERRAFORM_EXTRAS: -auto-approve -lock-timeout=15m

- name: Notify job success
working-directory: runners
shell: bash
run: make COMMENT_TO_ADD="Successfully deployed to vanguard. In order to proceed find someone to approve this PR and then add a comment with PROCEED_TO_PRODUCTION in order to proceed deploying to production environment or ABORT_DEPLOYMENT_SHUTDOWN_VANGUARD in order to stop vanguard and close this PR" add-comment-to-pr
env:
Expand All @@ -233,7 +222,6 @@ jobs:
run: pip install virtualenv

- name: Notify deploy-to-vanguard job Failure
working-directory: runners
shell: bash
run: make COMMENT_TO_ADD="Something went wrong when deploying to vanguard, either re-run the job or comment ABORT_DEPLOYMENT_SHUTDOWN_VANGUARD to revert vanguard to old state, abort the deployment proces and close the PR" add-comment-to-pr
env:
Expand Down Expand Up @@ -284,7 +272,6 @@ jobs:
aws-region: us-east-1

- name: Notify job started
working-directory: runners
shell: bash
run: make COMMENT_TO_ADD="Starting to deploy to prod, wait for its conclusion and I'll guide you to next steps" add-comment-to-pr
env:
Expand All @@ -294,7 +281,6 @@ jobs:
GITHUB_REPOSITORY: ${{ github.repository }}

- name: Double-check comment added
working-directory: runners
shell: bash
run: make WAIT_COMMENT="PROCEED_TO_PRODUCTION" wait-check-user-comment
env:
Expand All @@ -304,7 +290,6 @@ jobs:
GITHUB_REPOSITORY: ${{ github.repository }}

- name: Double-check bot comment added
working-directory: runners
shell: bash
run: make WAIT_COMMENT="Successfully deployed to vanguard" wait-check-bot-comment
env:
Expand All @@ -314,21 +299,17 @@ jobs:
GITHUB_REPOSITORY: ${{ github.repository }}

- name: Terraform Apply / ARC prod (arc-prod)
working-directory: runners
shell: bash
run: make apply-arc-prod arc-prod
run: make apply arc-prod
env:
GHA_PRIVATE_KEY_CANARY: ${{ secrets.GHA_PRIVATE_KEY_CANARY }}
GHA_PRIVATE_KEY: ${{ secrets.GHA_PRIVATE_KEY }}
GITHUB_TOKEN: ${{ secrets.LIST_PYTORCH_RUNNERS_GITHUB_TOKEN }}
KUBECONFIG: ${{ runner.temp }}/kubeconfig
NO_EKSCTL: 'true'
ROCKSET_API_KEY: ${{ secrets.ROCKSET_API_KEY }}
ROCKSET_API_SERVER: ${{ secrets.ROCKSET_API_SERVER }}
TERRAFORM_EXTRAS: -auto-approve -lock-timeout=15m

- name: Notify job success
working-directory: runners
shell: bash
run: make COMMENT_TO_ADD="Successfully deployed to production, add a comment with CLEANUP_DEPLOYMENT in order to merge this PR and stop vanguard" add-comment-to-pr
env:
Expand All @@ -353,7 +334,6 @@ jobs:
run: pip install virtualenv

- name: Notify deploy-to-prod job Failure
working-directory: runners
shell: bash
run: make COMMENT_TO_ADD="Something went wrong when deploying to production, re-run the job given job. If it does not work, manual action is required" add-comment-to-pr
env:
Expand Down Expand Up @@ -406,7 +386,6 @@ jobs:
aws-region: us-east-1

- name: Notify job started
working-directory: runners
shell: bash
run: make COMMENT_TO_ADD="Starting to revert vanguard to old state and shut it down, wait for its conclusion and I'll guide you to next steps" add-comment-to-pr
env:
Expand All @@ -416,7 +395,6 @@ jobs:
GITHUB_REPOSITORY: ${{ github.repository }}

- name: Double-check comment added
working-directory: runners
shell: bash
run: make WAIT_COMMENT="ABORT_DEPLOYMENT_SHUTDOWN_VANGUARD" wait-check-user-comment
env:
Expand All @@ -426,7 +404,6 @@ jobs:
GITHUB_REPOSITORY: ${{ github.repository }}

- name: Terraform Apply / ARC vanguard OFF (apply-arc-vanguard arc-vanguard-off)
working-directory: runners
shell: bash
run: make apply-arc-vanguard arc-vanguard-off
env:
Expand All @@ -435,12 +412,9 @@ jobs:
GITHUB_TOKEN: ${{ secrets.LIST_PYTORCH_RUNNERS_GITHUB_TOKEN }}
KUBECONFIG: ${{ runner.temp }}/kubeconfig
NO_EKSCTL: 'true'
ROCKSET_API_KEY: ${{ secrets.ROCKSET_API_KEY }}
ROCKSET_API_SERVER: ${{ secrets.ROCKSET_API_SERVER }}
TERRAFORM_EXTRAS: -auto-approve -lock-timeout=15m

- name: Notify job success
working-directory: runners
shell: bash
run: make COMMENT_TO_ADD="Successfully reverted vanguard to current prod_live branch state and disabled it" add-comment-to-pr
env:
Expand All @@ -449,7 +423,6 @@ jobs:
GITHUB_REPOSITORY: ${{ github.repository }}

- name: Close PR
working-directory: runners
shell: bash
run: make close-pr
env:
Expand All @@ -474,7 +447,6 @@ jobs:
run: pip install virtualenv

- name: Notify abort-vanguard job Failure
working-directory: runners
shell: bash
run: make COMMENT_TO_ADD="Something went wrong when restoring vanguard state, THIS IS A MAJOR ISSUE, firefight starts **NOW**" add-comment-to-pr
env:
Expand Down Expand Up @@ -525,7 +497,6 @@ jobs:
aws-region: us-east-1

- name: Notify job started
working-directory: runners
shell: bash
run: make COMMENT_TO_ADD="Finishing deployment, by shutting down vanguard and merging this PR" add-comment-to-pr
env:
Expand All @@ -535,7 +506,6 @@ jobs:
GITHUB_REPOSITORY: ${{ github.repository }}

- name: Double-check comment added
working-directory: runners
shell: bash
run: make WAIT_COMMENT="CLEANUP_DEPLOYMENT" wait-check-user-comment
env:
Expand All @@ -545,7 +515,6 @@ jobs:
GITHUB_REPOSITORY: ${{ github.repository }}

- name: Double-check bot comment added
working-directory: runners
shell: bash
run: make WAIT_COMMENT="Successfully deployed to production" wait-check-bot-comment
env:
Expand All @@ -555,7 +524,6 @@ jobs:
GITHUB_REPOSITORY: ${{ github.repository }}

- name: Terraform Apply / ARC vanguard OFF (arc-vanguard-off)
working-directory: runners
shell: bash
run: make arc-vanguard-off
env:
Expand All @@ -564,12 +532,9 @@ jobs:
GITHUB_TOKEN: ${{ secrets.LIST_PYTORCH_RUNNERS_GITHUB_TOKEN }}
KUBECONFIG: ${{ runner.temp }}/kubeconfig
NO_EKSCTL: 'true'
ROCKSET_API_KEY: ${{ secrets.ROCKSET_API_KEY }}
ROCKSET_API_SERVER: ${{ secrets.ROCKSET_API_SERVER }}
TERRAFORM_EXTRAS: -auto-approve -lock-timeout=15m

- name: Notify job success
working-directory: runners
shell: bash
run: make COMMENT_TO_ADD="Successfully stopped vanguard" add-comment-to-pr
env:
Expand All @@ -579,7 +544,6 @@ jobs:
GITHUB_REPOSITORY: ${{ github.repository }}

- name: Merge PR
working-directory: runners
shell: bash
run: make merge-pr
env:
Expand All @@ -604,7 +568,6 @@ jobs:
run: pip install virtualenv

- name: Notify cleanup-deployment job Failure
working-directory: runners
shell: bash
run: make COMMENT_TO_ADD="Something went wrong when stopping vanguard and merging the PR, pleae take manual actions from now on to stabelize the status of the system" add-comment-to-pr
env:
Expand Down
Loading
Loading