Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing break the glass as a principle #40

Closed
wants to merge 7 commits into from

Conversation

grmhay
Copy link

@grmhay grmhay commented Oct 20, 2021

We (representing Morgan Stanley) believe that the situation where the source of truth for desired state (e.g. github.com or a git-equivalent that an enterprise may run - but recognizing there are central and decentralized approaches for storing desired state) is less available than your users' expected SLA for making configuration changes is being left by the community as an issue for the implementer to overcome.
Put succinctly, if (in our example) Github is unavailable and you want to make changes to your System State, there should be one approach and a set of tooling to allow reconciliation after the fact. A further example exists in disconnected systems (e.g. Kubernetes on a ship) where the system may be disconnected from the store where the desired state resides. How then would the system state be updated in an emergency and then reconciled with the desired state?
This will both harm adoption of gitops and is inefficient as I believe we shared a common challenge that we can solve once within the project.
The first step, as this project has so well established, is a glossary of terms to allow us to describe the problem and a draft principle to add. I have included these in this PR.

Signed-off-by: Graeme Hay <grmhay@gmail.com>
Signed-off-by: Graeme Hay <grmhay@gmail.com>
Signed-off-by: Graeme Hay <grmhay@gmail.com>
Signed-off-by: Graeme Hay <grmhay@gmail.com>
Signed-off-by: Graeme Hay <grmhay@gmail.com>
Signed-off-by: Graeme Hay <grmhay@gmail.com>
Signed-off-by: Graeme Hay <grmhay@gmail.com>
@scottrigby
Copy link
Member

@grmhay thanks for this PR! To get more discussion on this, you may want to:

  1. add this to the next WG meeting agenda?
  2. threaded discussions may help. The place to do start would be here: https://github.com/open-gitops/project/discussions
  3. to help put this into context with previous/existing discussion, add a list of links to previous PRs and comments as needed. I could help with that if it's useful

@scottrigby
Copy link
Member

My gut response is perhaps principle 3 could somehow address that agents should be able to pull the manifests from the source WHENEVER NEEDED (not just when a CI job runs, or as you said limited to uptime of your source of truth).

We removed the "break glass" glossary item temporarily, because:

  • The principles or other glossary items were revised and no longer mention this, so it was an orphaned glossary item
  • Less importantly (but still a factor), because "break glass" starts with "B" it was the first thing people read about when opening the glossary – which tells them when NOT to do GitOps (or when to pause, etc). Even though it should be mentioned somewhere (perhaps best practices?) leading with this seemed like not putting our best foot forward

What about something like "whenever needed" to principle 3?

3. **Pulled Automatically**
-    Software agents automatically pull the desired state declarations from the source.
+    Software agents automatically pull the desired state declarations from the source <whenever needed>.

Then link "whenever needed" to a glossary item about source uptime, which could then link to your "Intermediate State Store" item and perhaps some version of the former "break glass" glossary item?

@christianh814
Copy link
Member

My two cents: I don't think "break glass" is something that should be a principal.

This is something that can be a "best practice" or "operating model" or a "white paper"

Break glass is too specific for these principals, which is meant to be open ended.

@todaywasawesome
Copy link
Member

Ah really interesting idea @grmhay! You make some really good points.

Couple things

  1. The format of the principles is "The desired state of a GitOps managed system must be:" so the formatting would need to fit within that framework. Something like "Always appliable" or something that would match.
  2. The basic concern is "what if I need to deploy something and GitOps isn't working" in which case I need to be able to do manual processes, potentially outside of git to make things work again. I think we can all understand that this might happen but is it something that should be covered by GitOps itself? If you're having to break glass every week then we may not really have achieved GitOps right? I could
  3. I think we should develop the idea of break glass policy as a whitepaper within the documents repo in the meantime.

@scottrigby
Copy link
Member

Also cross-liking older discussion open-gitops/project#86

@christianh814
Copy link
Member

Just revisiting this. @grmhay Would you want to close this and open a "best practice" or "white paper" PR?

@grmhay
Copy link
Author

grmhay commented Feb 23, 2022 via email

@williamcaban
Copy link

Any updates or progress with this?

@scottrigby
Copy link
Member

Asked again in this Slack thread.

Comment on lines +43 to +45
- ## Intermediate State Store
A system for storing a copy of the declarations that are mastered in the State Store. This system's purpose is intended to bridge the gap in availability between that of the State Store and the expected availability to make configuration changes to the Software System. The Intermediate State Store will offer an availability the same as or near enough to that of the users' expectations to update configuration in the Software System.
Where an Intermediate State Store is used, Reconciliation is used between the State Store and the Intermediate State Store and then again between the Intermediate State Store and the Software System.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part has gotten (and will become) much more relevant with people adopting OCI also for things other than container images. Or as @monadic put it:

GitOps is a transaction system with a Git backend and OCI cache

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we moving into leader election / consensus territory?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you see leader election or consensus palying a role here? There is exactly one SSoT state store and N intermediate state stores. In simple setups you'll have exactly one intermediate state store. In more complex scenarios you might have more than one (e.g. one per environment). Synchronization always happens unidirectional from state store into the intermediate state stores. The intermediate state stores are independent of each other and don't need to be synchronized laterally.

                           +======+         +==============+
                     +---->| iss1 |<--------| gitops agent |
+=============+      |     +======+         +==============+
| state store |------+            
+=============+      |     +======+         +==============+
                     +---->| iss2 |<--------| gitops agent |
                     |     +======+         +==============+
                     |
                     |     +======+         +==============+
                     +---->| iss3 |<--------| gitops agent |
                           +======+         +==============+

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO what's important to point out here is to make clear the requirements the intermediate state store has to satisfy in order for this whole setup to satisfy the GitOps principles:

  1. It must version all the artifacts
  2. Artifacts, once stored, must be immutable
  3. It must retain a version history

- ## Feedback

Open GitOps follows [control-theory](https://en.wikipedia.org/wiki/Control_theory) and operates in a closed-loop. In control theory, feedback represents how previous attempts to apply a desired state have affected the actual state. For example if the desired state requires more resources than exist in a system, the software agent may make attempts to add resources, to automatically rollback to a previous version, or to send alerts to human operators.

- ## Break the Glass
The process of editing the Intermediate State Store directly in the event that a configuration update needs to be made to the Software System but the State Store is unavailable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suppose an example of why the state store is not available might be helpful to understand the potential situation better. Also a note that this should really only be a very rare exception and that proper authorization must be in place (e.g. multi-party authorization).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State store can be anything? s3? cassandra?

@scottrigby
Copy link
Member

There has been recent - and excellent! - discussion on the reviews/comments in this PR. We have a discussion item for this here open-gitops/project#86. Could one of you please summarize the above conversation and move that into that the discussion linked here? That way we can keep this conversation alive even while I'll now close this PR.

BTW, We'll be using those discussion topics as the basis for this KubeCon EU OpenGitOps project meeting in Amsterdam.

@scottrigby scottrigby closed this Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants