Wednesday, January 12, 2011

SCOM R2: Am I Healthy? – Part II – Know What You Have

----------------------------------------------------------------------------------
Postings in the same series:
Part   IThe Introduction
Part III – Are the DBs OK?
Part IV – Is SCOM R2 Up-to-date or Outdated?
Part  V – Let’s Use The Community
 
----------------------------------------------------------------------------------

The second posting of this series is all about the inventory of your SCOM R2 environment. A good and clear picture of your environment is key to the overall health of that same environment.

Why? Suppose you have too many servers reporting to a single SCOM R2 Management Server. This will affect the total performance of your SCOM R2 environment negatively. Or suppose you think the RMS is beefed up enough but it turns out it isn’t. And when it does, it is always on the wrong moment of the day and/or week.

Or even worse, the non-clustered RMS fails and a MS has to be promoted to RMS. What MS is suited best for this temporary task? Where is the Encryption Key? Where is the password? Does the RMS also has a SMS enabled device attached to it which needs to be connected to the MS which has become the temporary RMS?

It might sound stupid but many times I bump into SCOM R2 environments where the system engineers assume it they have a clear picture of it. But when I ask questions like:

  • What AD accounts are used for SCOM;
  • Are those accounts written down;
  • Do you have the passwords available for those accounts;
  • Are there one or more Gateway Servers in place;
    • If yes, what are their names and in what Forests/Workgroups do they reside?
    • If yes, what PKI did you use?
    • If yes, is that PKI still operational?
  • The exact amount SCOM R2 Management Servers;
  • Whether the RMS is running on a physical box or not;
  • Whether the RMS is clustered;
  • How many Agents are pushed and how many manually installed;
  • What CU level the environment is running on;
  • Whether the SQL server is running on a physical box or not;
  • The size of the databases;
  • At what dimension/size the SCOM R2 environment initially was designed for;
  • etc
  • etc

Some or many of those questions aren’t answered right away. And yes, it is understandable. Many times the people who run the SCOM R2 environment first are working in different departments now or are working for another company. And not much is documented and when it is, it isn’t shared with the new system engineers. So one or more blind spots are present and need to addressed.

When you don’t have a document describing the SCOM R2 environment it is time to do it now. Better to write it down in a moment of ‘calm and peace’ (I know, system engineers are always busy) than hitting into one or more blind spots while trying to recover from a disaster. Because at moments like that it is too late (and you hear a small voice nagging ‘I told you so…’).

So what do you write down about your SCOM environment?

Here is a ‘shortlist’ (duh!) of what kind of information is required in order to ‘Know What You Have’. Only then good management is plausible. Anything else is just an assumption and as we all know, ‘assumption is the mother of all …’.

‘One’ piece of advice: take your time to complete the document. Every day an  hour or so for a week should be enough. And when the document is completed, keep it up to date. Takes about five minutes. Max. Saves a lot of time. So good versioning is required. And store it on a good location which is available and accessible for the other team members as well, like SharePoint.

So let’s start. The document should contain information like this:

  1. Management Group information
    1. Name of the Management Group;
    2. How many Agents are being used (or: how many servers/clients are being monitored);
    3. AEM setting;
    4. ODR setting;
    5. CEIP setting;
    6. Database Grooming Setting;
    7. Number of allowed missed heart beats;
    8. Manually installed Agents:
      1. Are they allowed;
      2. Are they approved automatically;
    9. Connectors, per Connector:
      1. Name;
      2. Functionality;
      3. Used AD accounts with passwords (encrypted!);
      4. Used FQDNs and IP Addresses;
      5. Configured settings;
    10. Whether 3rd Party software (nWorks, Jalasoft, OpsLogix etc etc) is being used;
    11. History:
      1. When is it installed;
      2. What version was it originally;
      3. Has it ever been migrated to other hardware or from P to V or vice versa;
      4. Any major issues like outages and other major downtimes;
    12. Installed Language;
    13. Is the Reporting Component installed:
      1. What is the SSRS url;
      2. What SQL server is hosting is the Data Warehouse;
    14. Is the Web Console installed:
      1. What is the url;
      2. Is SSL used;
      3. Is it published to the internet.

  2. Placement;
    1. FQDN of Forest;
    2. LAN segment;
    3. Environment, like Production, Testing or anything else.

  3. Version of SCOM (RTM (I hope not!), SP1, R2 and CU level when R2 is being used);

  4. RMS information;
    1. FQDN;
    2. IP address;
    3. LAN segment;
    4. Physical location (even when it is virtualized);
    5. P or V box;
    6. Amount of CPU, RAM, Disks;
    7. Server OS and patch level;
    8. Disk configuration, RAID settings and sizes;
    9. Whether it is clustered or not;
    10. SMS enabled device attached to it or not;
    11. AD accounts being used for the SCOM service accounts WITH passwords like (or referring to the encryption tool where they are stored);
      1. SCOM SDK Account;
      2. SCOM Action Account;
      3. SCOM Data Warehouse Read Account;
      4. SCOM Data Warehouse Write Account;
      5. SCOM Health Account;
      6. Any third party software account;
      7. Run-As-Profile accounts (like needed for the SQL/AD MPs).
    12. Backup of Encryption Key and its location (stored outside the RMS with the password as well);
    13. Does the RMS perform other tasks as well (do monitored servers report to it/is 3rd party software installed on it).

  5. SQL Server and SCOM databases
    1. FQDN;
    2. IP address;
    3. LAN segment;
    4. Physical location (even when it is virtualized);
    5. P or V box;
    6. Amount of CPU, RAM, Disks;
    7. Server OS and patch level;
    8. SQL Server:
      1. Version;
      2. Edition;
      3. Architecture;
      4. Patch Level (CUs, SPs and the lot);
      5. Installed features;
    9. Disk configuration, RAID settings and sizes;
    10. Whether it is clustered or not;
    11. Does the SQL server also host other DBs or SQL Instances;
    12. SCOM Databases sizes and locations.

  6. The total amount of MS servers and per MS server:
    1. FQDN;
    2. IP address;
    3. LAN segment;
    4. Physical location (even when it is virtualized);
    5. P or V box;
    6. Amount of CPU, RAM, Disks;
    7. Server OS and patch level;
    8. Disk configuration, RAID settings and sizes;
    9. Function of MS server: what is being monitored and how many.

  7. The total amount of Gateway Servers and per Gateway Server:
    1. FQDN;
    2. IP address;
    3. LAN segment;
    4. Physical location (even when it is virtualized);
    5. P or V box;
    6. Amount of CPU, RAM, Disks;
    7. Server OS and patch level;
    8. Disk configuration, RAID settings and sizes;
    9. Functions of Gateway server(s): what is being monitored and how many;
    10. Whether the Gateway server(s) is/are configured in a fail-over configuration;
    11. FQDN of PKI which is used;
    12. SCOM Action Account for the Forest (and its password) where the GW server resides.

  8. Management Packs
    1. What Microsoft MPs are loaded and configured;
    2. What MPs are custom made;
    3. Versions of MPs;
    4. What Third Party MPs are loaded and configured, if so:
      1. Are additional Connectors installed and if so how are they configured (accounts, FQDNs, IP addresses and the lot);
      2. Are additional servers installed, if so treat them the same as a Management Server while writing down the details.

  9. Backups
    1. Are the SCOM DBs back upped on a regular basis:
      1. What tooling is used;
      2. Where are the backups stored;
      3. What retention policy is used;
    2. Are the SCOM servers (RMS, MS server(s), Gateway Server(s), Third Party Server(s)) back upped on a regular basis;
        1. What tooling is used;
        2. Where are the backups stored;
        3. What retention policy is used;
    3. Are the unsealed MPs back upped on a regular basis;
    4. Are all the backups tested on their validity on a regular basis.

When you have the document ready, you will have many benefits from it. I know, it is a lot of information to gather but one day you need it and then you are HAPPY with the effort you made while creating this document.

And when you have this document you have a better understanding of your SCOM environment as well. So when things look like going sour you know how to act.

2 comments:

John Bradshaw said...

Oh, I wish I had done this before my current situation arose!!
Shall definitely be gathering this information and keeping it up-to-date when I sort out the current install mess I've created!! :)
Looks very comprehensive.
thx Marnix

Unknown said...

anone created a document as describe or any smaple document of it so i can use it as a hint.