Wednesday, August 22, 2012

OM12 Core Monitoring Functionality Tuning Guide

When OM12 is cleanly installed it will contain – out of the box – a set of MPs. These MPs are divided in subsets where each subset delivers a particular functionality, like the core monitoring functionality.

This subset is the foundation for many other MPs which will be imported when you start rolling out OM12 into your organization. In other words, they depend on it, like this example, the Active Directory Server Common Library MP, part of the AD MP:
image
All MPs highlighted in yellow are part of the subset of OM12 MPs which deliver the core monitoring functionality.

However, this same subset delivers another crucial functionality as well: monitoring the health of the Operations Manager infrastructure, its components and services.

In order to get that crucial functionality just right, additional tuning is required here. However, unlike SCOM R2, there is no guidance here how to do that. I have searched every where, read the Deployment and Operations guides many times, but didn’t find anything about it.

Hence this posting in order to point out some important things you’ll need to know about how to configure this subset of MPs in order to get the monitoring of your OM12 environment just right. So let’s start!

Spoiler Alert
The information about tuning the monitoring of your OM12 environment is based on the MP guide (OM2007_MP_OpsMgrR2.doc) delivered with the
SCOM R2 MP, version 6.1.7695.0. Only the information relevant to OM12 is described and tailored to OM12 in this posting.

Tuning the monitoring of your OM12 environment
High level overview of the steps we’re going to take:

  1. Configure automatic agent management.
    • Create a Run As account with administrator access on the target computers.
    • Add a Run As account to the Automatic Agent Management Account Run As profile to enable automatic agent recovery.
  2. Create a new management pack for customizations.
  3. Enable recovery for the Health Service Heartbeat Failure monitor.
  4. ONLY WHEN REQUIRED: Add a Run As account to the Validate Alert Subscription Account Run As profile.

Detailed steps:

  1. Configuring automatic agent management
    Why? This enables automatic remediation for OM12 Agents which are having issues. The actions to remediate can be a restart of the OM12 Agent service (healthservice) or even – only when you have configured it(!) – an automated reinstall of the OM12 Agent.

    In order to get this up and running, two additional actions are required:
    > Create a Run As account with administrator access on the target computers;
    > Add that to the Automatic Agent Management Account Run As profile to enable automatic agent recovery.

    Steps:
    1. In the OM12 Console, go to Administration.
    2. In the navigation pane, expand Administration, expand Run As Configuration, click Run As Configuration, and then click Profiles.
    3. Double-click Automatic Agent Management Account, and then click the Run As Accounts tab.
    4. Click Add, and then in the Run As Account drop-down menu, click an existing account that has administrator access to the agents or click New to create a new AD account to use.
    5. For This Run As account will be used to manage the following objects, ensure All targeted objects is selected, and then click OK.
    6. Click Save.

  2. Create a new MP for customizations
    In the next steps we’re going to set some overrides which need to be stored in a dedicated unsealed MP. In this case this MP is named: Overrides OM12 Core. No I am not going to explain how to create such a MP since this is basic knowledge.

  3. Enable recovery for the Health Service Heartbeat Failure monitor
    Why? This monitor contains some recovery options as well, which can be configured according the company policies:
    image 

    Steps:
    1. In the OM12 Console go to Tools > Search > Monitors
      image
    2. In the search box type Health Service Heartbeat Failure > Search
    3. The Monitor will be found. Click on it and in the Monitor Details pane the details of this Monitor will be shown (now you might hit one or two bugs: at Step 1 the OM12 Console might crash (restart it) and at Step 3, the Monitor Details pane might stay empty. For the latter, just select the Monitor a few times more and suddenly the details of that Monitor will be shown…).
    4. In the Monitor Details pane click on the link View Knowledge and the properties of that Monitor (Health Service Heartbeat Failure) will be shown. Go to the tab Overrides
    5. In the list, under Recovery, click Enable and Restart Health Service, click Override, and then click For all objects of class: Health Service Watcher
    6. Under Override-controlled parameters, in the Override column, select the check box next to the Enabled value that appears in the Parameter Name column.
    7. In the Override Value column, in the drop-down box, click True.
    8. In the Select destination management pack section, select the management pack that you created earlier (Main Step 2 – Creating a New MP for customizations), and then click OK.
    9. Repeat steps 5 through 8 for:
      > Restart Health Service;
      > Reinstall Health Service (triggered from Diagnostic);
      > Resume Health Service;
      > Set the "Computer Not Reachable" monitor to success because the "Ping Computer on Heartbeat Failure" diagnostic succeeded.

    Additional advice
    When you hover the mouse over a recovery task a pop-up will appear giving additional explanation about that particular recovery task. This way you know better what a certain recovery task does:

    image


  4. ONLY WHEN REQUIRED: Add a Run As account to the Validate Alert Subscription Account Run As profile
    Why?  Validates whether the Notification subscriptions are in scope. Needs administrator access within OM12 and admin access on the OM12 Management Servers.

    !!! Additional warning !!!
    By default this Run As Profile is already populated with the Local System Windows Account, targeted against the OM12 Management Server which was first installed in the new Management Group. Many times this is sufficient and DOESN’T need any modification. When it works SKIP THIS STEP!!!

    Only in locked down environments an AD account – instead of the local system account - might be required here. Then you’ll need to follow this step. 

    Steps:
    1. In the OM12 Console, go to Administration.
    2. In the navigation pane, expand Administration, expand Run As Configuration, click Run As Configuration, and then click Profiles.
    3. Double-click Validate Alert Subscription Account, and then click the Run As Accounts tab.
    4. Click Add, and then in the Run As Account drop-down menu, click an existing account that has administrator access to the agents or click New to create a new AD account to use.
    5. Remove the Local System Windows Account.
    6. For This Run As account will be used to manage the following objects, ensure A selected class, group or object is selected.
    7. Select the OM12 Management Server(s) which send out the notifications and then click OK.
    8. Click Save.

Hopefully this mini guide helped you out with fine tuning OM12 itself. See you all next time. There is a lot more to share!

3 comments:

StDenis said...

Hi, usable article, but there are some notes (about part 3):
1. Recovery "Set the "Computer Not Reachable" monitor to success because the "Ping Computer on Heartbeat Failure" diagnostic succeeded"" is enabled by default.
2. Enabling recovery "Resume Health Service" may have problems with SCCM MP that used pause Health Service for quick "Maintenance Mode".
3. I made overrides for class "Health Service Watcher (Agent)", but not just "Health Service Watcher". It's more safe.

Ted Hacker said...

Great posting.
- Do you have any suggestions on how to handle the situation where we have different accounts for administrator access to some servers which will impact the use of the "Automatic Agent Management Account"? Is there a way to get some servers managed through one account and others through another domain account? Is there a way to do this by assigning the management server an account that is the default for the connected agents' "Automatic Agent Management Account"?

Ted Hacker said...

I see that the profile allows for a scope of where an account is used. It allows for a class, group or object to use the account against.
- For Windows agent management tasks, what host class should I use? Windows computer, health service, windows operating system? Which host class should I use for Linux management tasks?