Wednesday, December 2, 2009

OpsMgr. Where the technology ends and the organization starts. Part III: Beyond the Alert

Postings in the same series:
Part   I:  Introduction.
Part  II: I Think I Saw An Alert.

As stated in the last posting in this series (I Think I Saw an Alert) this article will be about how to handle Alerts, like setting up a Notification Model, how to configure alert forwarding to remote systems, such as an Enterprise Management System (EMS) or a service desk system and how to close an Alert. When all this is done right there are many good things to be gained here.

Setting up a Notification Model
I will not dive into the technical details about that. I have already written an article about that (scratching the surface of it) based on OpsMgr R2, to be found here. How ever, a more generic approach can have added value.

For instance, all Alerts with the Priority High (out of the box, most Alerts coming from Microsoft MPs do have the Priority Medium) AND the Severity Critical can be send out as a mail message to a mailbox owned by the IT Operations Department. This way one knows what is going on without having that mailbox flooded.

Less is More…
Of course, additional filtering can be used here as well. And the more systems are being monitored, the more filtering is needed here.

Advise 01:
Filter the notifications to such an extend that the quantity of the mail messages will lessen. This will enhance the quality of the mail messages to a great extend. Less (mail messages) is more (less noise in the mail messages that do get out).

Advise 02:
It takes time to set up a good working Notification Model. Every now and then it needs attention and some additional tweaking and tuning. So the MP containing the subscriptions (Notifications Internal Library) needs to be covered by a solid backup plan. Also, when adjusting the notification model to a great extend, version control and -keeping is also needed. And: DOCUMENT the difference between the versions of this MP.

Alert forwarding to an Enterprise Management System (EMS) / Service Desk System
With a Connector OpsMgr can forward Alerts to EMS or a Service Desk System. Out of the box there are free (!) Connectors available for:

  • IBM Tivoli Enterprise Console
  • HP OpenView Operations for Unix
  • HP OpenView Operations for Windows
  • BMC Remedy ARS

Besides these Connectors there is also a Universal Connector available. R2 Connectors to be downloaded from here. TechNet documentation to be found here.

Connectors aren’t very hard to configure. Take your time to read the documentation. Basically they work like a Notification Model. Now the Alerts aren’t send out as a mail message but to HP OVO or IBM TEC for instance. The most important part is the mapping of Severity/Alert Fields/Resolution State of OpsMgr to that of the remote system.

Advise 03:
Start small when configuring a Connector. Like a faucet. Don’t turn it wide open but start on a modest scale. Keep a keen eye on the remote system in order to see what is coming in AND how. Is the mapping done right? Tune and tweak it and go from there.

Closing an Alert
It may sound stupid. But bear with me. Some times I see organizations where Alerts are solved and then closed. But there are some other crucial steps needed as well.

Suppose you have an Alert triggered by a Monitor. A Monitor is a like a traffic light (also referred to as a State Machine). When it is red (Critical), it can’t go red again. First it has to jump to green (Healthy) before it can turn red again. Even when the Alert triggered by the very same monitor has been closed. So next time – when a new issue arises which should trigger an Alert – that Monitor will not raise a new Alert since it is still in a Critical condition.

I know, there are many ‘self-healing’ monitors out there. But some aren’t. Or some monitors take time to go ‘green’ again. So this is what I advise my customers when a new Alert comes in:

  1. Use the Alert Resolution State as described in the 2nd posting of this series.
  2. READ the Alert and its properties. Many times the Alert also contains knowledge how to solve the cause of the Alert or refers to one or more KB articles. Use that knowledge since it can be a real time-saver.
  3. Troubleshoot the cause(s) of the Alert
  4. When all is well, go back to the OpsMgr Console
  5. Open the Health Explorer of the related object/server in order to check whether it is a monitor which raised the Alert
  6. If so, select that very same monitor and reset it to a Healthy State
  7. When done, check out the Alert (if still present) – always do a refresh of the Alert View – and when still present close it.

Advise 04:
Never close an Alert without taking a further look at it. Checkout the Health Explorer and also checkout the Computer state. Be sure the related monitor which triggered the Alert is set back to a Healthy state after the cause of the Alert has been solved. This way everything is back under control and next time when a new issue arises, a new Alert will be raised.

Bringing it all together
In this series I have tried to show that even though OpsMgr is a good enterprise monitoring solution, the organization itself needs to work with it and make it part of their daily tasks and procedures. Then and only then OpsMgr will be used to its fullest extend. Of course there is much more to it than covered in this series. Every organization differs from the other in the details which I can’t cover in this series. But at the end of the day all IT organizations are aiming at the same ultimate goal: making IT a robust process thus enabling their organization to be competitive and on top of things. OpsMgr can be a great aid for that goal. Use it wisely and be pleasantly surprised.

No comments: