Tuesday, February 16, 2010

Bringing monitors back to a healthy state: the difference between ‘Reset Health’ and ‘Recalculate Health’.

As stated in some postings of mine, when an Alert is generated by a Monitor it is NOT sufficient to close that particular Alert and be done with it. After all, a Monitor is a ‘State Machine’ which can have three different states (Healthy, Warning or Critical).

When a Monitor changes State (it goes from Healthy to Critical for instance) it will raise an Alert. But closing that same Alert will not set the Monitor back to its Healthy State. So when a condition occurs which is Critical for that same Monitor, an Alert will NOT be raised since that same Monitor is already in a Critical State which raised the old (and already closed) Alert.

So the Health Explorer is needed in order to set the related Monitor back to its Healthy State in order to get a new Alert when a Critical or Warning situation (State Change) occurs. And now one finds two options to be available: Reset Health and Recalculate Health.

Even though both options can have the same end result (a Monitor which is Healthy again), there are certain things to reckon with:

Reset Health
This option always works. It resets the Health of the Monitor back to Healthy, no matter what the current state is. So even when the Monitor is already in a Healthy State, it will be forced into a reset thus a Healthy State.

Recalculate Health
This option will not always work. Depends on the Monitor involved. When the Monitor has on-demand detection, the option to recalculate its Health will work.

Good question. Glad you asked. The ‘on-demand-detection’ option means that the monitor is allowed to recalculate its Health at any moment and is NOT dependant of the interval set for these calculations to run.

How to differentiate between such Monitors you ask?
Well, you could take a look into the xml-code involved for that particular Monitor. But that is not very user friendly. Marius Sutara has written a blog posting about a rule of thumb. The video he refers to has been withdrawn unfortunately. Basically what he means is that when checking out the initial State Change of a Monitor in the Health Explorer and goes from Unmonitored to any State but Healthy OR goes from Unmonitored to Healthy AND the initial State Change Context isn’t like ‘The monitor has been initialized for the first time or it has exited maintenance mode’, then the changes are very likely that on-demand detection is present.

OK. Now what?
Well, that depends on you. A Reset will always work. So when an Alert has been raised by a Monitor and the cause has been solved and the Alert has been closed, the related Monitor can be reset. Or – when on-demand-detection – is being used, a Recalculate can be used as well.

Used sources
As stated before, Marius Sutara has blogged about it, to be found here. And Boris Yanushpolsky has written a good posting about on-demand-detection as well, to be found here.

No comments: