Replies: 3 comments 1 reply
-
Could you mark what is a phase and what is a state? |
Beta Was this translation helpful? Give feedback.
-
In argus currently, we have no phases, and two boolean states: open and acked. "open=True" means neither the source nor a human has closed the incident and that "acked=True" means there exists at least one ack that has not expired. A stateful incident when created has an Event.INCIDENT_START,
Whether something is shown or not in the UI depends on the currently set incident filter. "pending" (name pending) could be implemented several ways (see #804 and #805). While "pending" IIRC you need to be able to remotely take it out of pending so a flag stored in the database is probably needed. With this flag in one position, deletion is allowed and closing/acking is not. In the other, deletion is not allowed and closing/acking is ok. We can add such a flag without changing anything to our existing process, by not using the flag in its "pending" position. The problem is "clear". We do not combine incidents into a single incident today (and we do not delete incidents), our plan was to hide, not delete incidents. Our "closed" covers your "clear". The user has full control over which is shown in the UI with the filters. There are multiple groups using argus today and I doubt they use the exact same process which means we have to be backwards compatible, that is: everything needs to work without "clear" being used. I think, for the time being, it is best if "phase" exists only in Incident.metadata, so that I (we) can think of another solution in the meantime. |
Beta Was this translation helpful? Give feedback.
-
The exact details should not matter to the gui/argus. Some of the states, phases (and we also have status, to keep it interesting...) or combinations are only relevant for the correlator. The correlator and the incident UI have evolved separately resulting in a mixed bag of different phases/states/statuses that may or may not be relevant to Argus. I've tried to limit the summary to what is relevant to Argus / an Incident UI, but for completeness sake I can give the following. Yes, this is confusing...: First I'd like to introduce the concept of an Alarm. An Alarm is what the correlator calls an Incident internally. The correlator has a collection of active Alarms. The correlator has no concept of historical Alarms/Incidents. Whenever I mention an Alarm, this relates to the correlator. Incidents relate to the UI / Argus. We have 3 phases (for Alarms, of which PENDING and (to a lesser extend) FINALIZED are also relevant for incidents)
We have 2 states that are Correlator/Alarms only
We have 3 ui statuses (for Incidents):
Which results in the following valid combinations
|
Beta Was this translation helpful? Give feedback.
-
We (Geant) have a number of phases and states that an incident can go through. These serve a (partly) similar purpose but are separate, due to "historical reasons". Let's call them phases for now. From a UI perspective, we have the following phases:
Pending: The incident has just been recorded and we're still in the process of a root cause analysis, multiple snmp traps may come in that are tied together
Finalized: After some period (at least 60 seconds, but may be longer if the correlator decides) if an incident is still active, The incident finalizes. The following happens:
Clear: After an incident has become inactive (ie. if the network problem has disappeared) an incident goes to the clear phase. It is still shown on the UI until a user actively closes it.
Closed: An incident is closed and is no long visible in the default view of the UI. An incident may close from the following triggers:
I think I covered most/everything wrt phases and states, but this is a relatively complex topic and I may have missed something 😅
Do you think you could use this as a basis on how Argus might want to work with phases?
Beta Was this translation helpful? Give feedback.
All reactions