Want to know more?
- Read more about the utility here;
- Watch a demo of it in action here;
- Register for the download here.
Want to know more?
In the fourth posting of this series I will take a deeper dive into the status of SCOM R2, or better the patch- and hotfix level of the entire Management Group AND the monitored servers. This topic deserves an entire posting since it gets neglected many times while it’s key to the overall health, performance and availability of the entire SCOM R2 infrastructure.
So let’s start in order to pinpoint outdated stuff and correct it!
But before I start I want you to know that this posting is based upon the assumption that proper update management is in place AND functional. It goes without saying that all Windows Servers are patched and updated on a regular (monthly?) basis. So this posting will not describe this since it is a basic requirement for any IT environment in order to be in good shape.
01 – SCOM R2 Core MP
Let’s start very close to SCOM R2 itself. It is time to review the version of the Core MP of SCOM R2 itself. This MP is key to the overall health, performance and availability of SCOM R2.
If there is one MP I really appreciate AND gets better with every successor, it’s this MP. All the flows, thresholds of Agents, queue sizes and the lot are configured in this MP. Remember the days where we had to configure certain thresholds (Private Bytes for instance) our selves? Those days are over with the latest versions of the Core MP for SCOM R2.
The Product Team releases basically every three to four months a new version. This version contains besides bug fixes (if any) also improvements based on Best Practices and input from the field like the customers, the official Forums, PFE’s and MVPs. Also additional functionality is added in order to improve SCOM R2 as a whole. And on top of it all, additional reports are added and existing ones improved.
When upgrading from an old version of the SCOM R2 Core MP (for example when upgrading from SCOM SP1 to SCOM R2) it is good to know ADDITIONAL reports have been added. When upgrading the Core MP from the online catalog these additional Reports won’t be added since they aren’t present in the original version of the SCOM R2 Core MP. In order to solve that, go to this blog posting of mine.
The latest version of the SCOM R2 Core MP contains a very nice feature: it detects whether WMI on the monitored Windows Servers is still functional and running. And when it’s not, SCOM R2 can take action in order to restore WMI. And as we all know, WMI isn’t always that robust as it should be. So this MP takes away a lot of manual labor. Of course, it would be better when WMI would be robust out of itself, but that is a different topic all together :).
Say what? How to know what version of the Core MP is the most recent? And where to download it? And when to know a new version has arrived? Good question! Fortunately I post about it on a regular basis. The posting about the most current version of the SCOM R2 Core MP is to be found here with all the information you need.
One word of caution how ever: Like any other update, patch, fix or updated MP: TEST IT before you put it into production. A single box – virtualized – can be used as a test environment, running an isolated AD Forest, SQL and SCOM R2, all based on trial licenses. Also keep a keen eye on the Official TechNet Forums in order to know whether the update is OK.
The latest version of the Core MP is a good one BUT needs some additional attention which I also blogged about, to be found here.
02 – The other MPs
In your SCOM R2 environment other MPs will be present as well (otherwise SCOM R2 is only monitoring itself…). Many times I bump into SCOM R2 environments which run outdated MPs – besides the Core MP for SCOM R2 – as well.
But to update those MPs without any proper planning can be even worse compared to running outdated MPs. Why? Two reasons:
Having said that, it is still best practice to run the latest versions since they deliver much added functionality and monitoring options. Some good and shiny examples are:
03 – Cumulative Updates (CUs)
We all know the days SCOM hit RTM. Soon SP1 came out. After that a whole chain of hotfixes and patches. It was a challenge to know what hotfixes to install and when and in what order. For myself I had a Word document containing many hotfixes and per hotfix an overview what it did and whether it was required by default or under certain circumstances. Alongside I kept a watchful eye on Kevin Holman’s blog as well since he ran a list with those hotfixes as well.
But it wasn’t very effective. Gladly Microsoft listened and introduced the well know CU system (already present in SQL and Exchange for instance) in SCOM R2 as well. So now all hotfixes and patches are combined and put into a CU with added functionality as well. So no more hassle with running multiple hotfixes/patches, but a single executable. Another good thing about the CUs is that the successor of the CUs contains all the hotfixes/patches/updates contained in the previous ones. So with just installing the LATEST CU for SCOM R2, one is up to speed again.
Again, i can’t say it enough times, TESTING is key and also RTFM. Also WAITING is another nice thing to do. So when a new CU is released, just wait some weeks, test it as well, and follow the blogs and the OpsMgr TechNet Forum. Also keep an eye on the official web pages of Microsoft about the CU since it will reveal new information as well. The section ‘List of known issues for this update’ will tell you many things you need to know. So READ it.
I can recommend this CU BUT when you are running SCVMM and use PRO Tips, be careful though. Read the ‘List of known issues for this update’ and you know why…
04 – It’s all about patches and updates…
SCOM R2 and its Agents hits many core components of the monitored Windows Servers or the Windows Servers where the SCOM R2 infrastructure is installed on. And some of these core components, like the JET database and WMI need some additional patching, all depending on the version (2003 or 2008) you are running (mind you, some hotfixes are meant for both server versions).
Here is a list of hotfixes and patches I always advise my customers to install on their servers. Again <sigh> TESTING is required. In a customers environment the mentioned SP1 for XML Parser broke an in-house build application which ran on XML version 4.x… So be careful.
05 – CU and SP level for SQL Server
I know, just like the previous Item 04 some IT shops go by the credo ‘If it ain’t broken, don’t fix it’. But personally I think it is better to prevent serious issues to occur than to solve them afterwards.
Since SCOM R2 is based upon SQL Server this component needs serious attention as well.
Again, caution is required. For instance, does the SQL server instance run only the SCOM DBs or DBs for other applications as well? If so, do those applications support the latest SPs? Sometimes they don’t.
So inform yourself.
Having said that, there are some issues with SQL Server 2008 SP1 which is fixed with the release of CU#3 and later. Also know what versions of SQL Server is officially supported by Microsoft for running SCOM R2, to be found here.
And when you decide to install an update, it is perhaps better to skip the CUs (like CU#7 for SQL Server 2008 SP1) and install SP2 right away.
As you can see, much of the overall health, availability and performance of the entire SCOM R2 infrastructure – including the monitored Windows Servers – is directly related to hotfixes, patches and service packs for Windows Server and SQL Server. Also the versions of the Core MP for SCOM R2 and the imported MPs plays a significant role. Go and check it out yourself and when needed, it is time for some RFC’s to be filed… Have fun!
In the past Cameron posted a four part series about the same topic. In conjunction with the earlier mentioned White Paper (there is some overlap of course) you will have a good understanding about Groups in SCOM, what they do, how they function and how to use them in a proper way.
Since I am talking about Groups, I want to mention some other postings as well since they give more deep information about Group creation as well:
The White Paper written by Cameron contains also many other links to web pages all about Group creation, calculation and Best Practices. So whenever you are bored, there is enough to read!
However, for a few days now this article has been made publicly available. Even though I have posted a whole series about Dashboards, Cameron’s article reveals new angles and approaches.
So when you are interested in Dashboards for SCOM R2 and want to know more about it, go here. News things to be learned.
Thanks Cameron and WindowsITPro for sharing this good information.
Some examples of the Dashboards Cameron created (nice job!):
01-24-2010 Update: Based on a comment it turned out that the Discovery Cycle can’t be more than 25 hours (86400 seconds). This blog posting has been corrected based on that input. Thanks for keeping me sharp!
Based on this blog posting of Kevin Holman I changed some Discovery Rules in the Management Groups which I control for some customers of mine.
However, many more MGs from other customers I do not control anymore but I still want those MGs to be in good condition. So I contacted the system engineers running those environments and told them about the issue and sent them the url of the blog posting.
Some of them reported back with the issue being fixed. Others told me they could not find the Discoveries being described in Kevin’s posting. So this posting is all about how to find these Discoveries and to change them accordingly, or better, a quicker way to get the job done. So let’s start.
Never ever be afraid to ask any question because you think it might sound stupid. To my humble opinion there is only ONE truly stupid question which is the question which is NEVER asked….
Combined you will have a thorough understanding about what to do. Thanks Charles & Christopher for sharing!
The latest Dell MP is a good showcase about how mature Dell has become with SCOM and its willingness to learn & listen. Of course, like any other MPs there is always room for improvement. But still the latest Dell MP is really good.
As it turns out, Microsoft and Dell are gaining momentum when it comes down to integrating Dell’s products with System Center. A video has been published in which Brad Anderson, CVP of the Management & Security Division, interviews Laurie Tolson, Vice President of Systems Management at Dell.
They also discuss Dell's solution for Hyper-V Cloud Fast Track, Microsoft's set of programs for private cloud deployment. And believe me, the cloud is coming. There is much cloud washing going on, but that doesn’t mean EVERYTHING is just vapor…
On Codeplex three new IPs have been published:
Other cool stuff can be found there as well:
So go check it out yourself.
But wouldn’t it be nice to have an option to export the devices as well? Into the same format (IP address and friendly name) so when the list of devices in the MP gets corrupted or damaged in any kind of way, one is easily back on track again by importing the latest export file?
Yesterday I visited a customer of mine. One of his system engineers bumped into the issue where the list of devices got corrupted so he was forced to add many of the devices by hand since the import list – originally used - wasn’t up to date any more.
Since he is a real script kiddo, he built a PS-script which exports the list of devices present in the OpsLogix Ping MP. This script is managed by Task Scheduler so it runs on a regular basis. That way he has always a backup available in case ‘disaster’ strikes. Some time ago disaster struck again. But now he had this export file, ready to be used. Within a few minutes all the devices were back again!
I asked him whether I am allowed to blog about it and to share his script with the Community. And guess what: he agreed. But he doesn’t want me to mention his name, so let’s call him Mister X. This is his favorite lunch:
All credits go to him. Thank you Mister X! Much appreciated. I owe you a hamburger!
The export functionality comes in two parts:
The batch file is managed by the Task Scheduler in order to have it run on a scheduled basis. The entries in red need modification:
PS script (opslogix.ps1):
$rootMS="NAME OF RMS"
$mclass = get-monitoringclass -name "OpsLogix.IMP.Ping.Target"
I am sure many people will appreciate this. Nice!
Of course, before importing the file again, one or more valid Source Hosts must be present.
The third posting in this series is all about the SCOM R2 DBs and their health. The overall health AND performance is directly related to the SCOM R2 DBs and their health. So it is key to have an insight view into the status of them. This posting will enable you to do just that and in such a way that you don’t need to become a SQL or SCOM guru. Just like checking the oil of your car.
01 – Along came the Community
In the old days one had to run many different SQL queries against the DBs. However, thanks to the combined efforts of two fellow MVPs, Pete Zerger and Oskar Landman, this isn’t needed anymore.
Just import the SCC Health Check Management Pack Version 2 (RTFM is required since an additional data source has to be created) and you will get 27(!) new Reports, all targeted at showing the status of your SCOM R2 environment, including the Health of the DBs:
(The highlighted Reports are the ones related to the health of the SCOM DBs.)
And not just that. These Reports also give very good information, shown in the Report Details area when a Report is selected. Besides the explanation, some good url’s are shown as well. These url’s give you more information about Config Churn, how to identify it and how to battle it, the Known Issue with the LocalizedText table, eating away too much DB space and how to solve that one as well:
02 – Let’s not forget the SCOM R2 Core MP…
On top of it all, the latest core MP for SCOM R2 does a really good job of monitoring the health of the SCOM DBs as well. The DBs are monitored in many ways so when something goes wrong you will certainly know it.
One important monitor to reckon with is the Operational Database Space Free (%) Monitor. As the name implies, it monitors the percentage of the available free space of the OpsMgr DB and raises a Warning when it falls below to 40% and creates an error when it falls below 20%:
When the OpsMgr DB has too less free space, many issues might happen, like being unable to import any new MP, as I blogged about earlier on. So do not alter this Monitor in any kind of way. And when an Alert comes in from this Monitor, act upon it.
Mind you, this Monitor does not monitor the size of the DB as a file on the disks, but looks inside the DB itself and monitors the available free space within that DB. Have had some serious discussions about this some time ago.
03 – Let’s check the Default Settings for the SCOM R2 DBs
While installing SCOM R2 the DBs are created and the settings for the Recovery Model, Autogrowth and Autoshrink are configured based on best practice and shouldn’t be altered UNLESS you know what you’re doing and know its consequences. But never ever enable Autoshrink because it will automatically ‘enable’ a lot of unwanted issues as well…
Having said that it’s worthwhile checking these settings out since it wouldn’t be the first time a SQL Operator changed one or more of these settings (they seem to love the setting Auto Shrink) without knowing the consequences or communicate it to the SCOM people involved…
Data Warehouse DB (OperationsManagerDW) settings
For this DB are the settings a bit different. The Recovery Model for the OperationsManager DB is set to Simple and Autoshrink to False. So far so good:
04 – Are the DBs properly dimensioned?
Based on my second posting you know by now how many objects are being monitored by SCOM R2. This amount has a direct relationship with the seize of the SCOM R2 DBs. So it is time to do some checking.
Do they match? Are the DBs properly sized or not? When they seem to have too much space, it is not an issue as long as there is enough disk space available and not required by other applications.
However, when the DBs are too small it is time to make them a bit bigger. Especially for the OpsMgr DB will this do some good work. But how do you know what size the DBs need to have? Good question! The answer is simple: Go here, download the sheet and use it as intended. It will explain itself.
Also something to reckon with, besides this sheet, are the recommendations from the field. Based on real life experience. To be found here. See screen dump as well:
So now you have gained a good insight on your environment. But still there might be some issues at hand which need attention. And when taking a good look at those issues, those are related to the DBs as well. To name a few:
05 – The Ghost Computers. Am I being haunted?
Sometimes you might bump into an issue where you have removed some monitored Computers from the SCOM R2 environment. But they are still present in the SCOM R2 Console. No matter what you do, like clearing the cache of the Console for instance, they keep coming back, totally grayed out of course. So now what?
Good news! A much respected SCOM MVP for many years now (Maarten Goet) blogged about this issue some time ago. However, the solution is unsupported. So use it wisely and at your own risk. Since it is better to be safe then sorry, BACKUP the OpsMgr DB first before running the queries mentioned in his blog posting.
06 – Where are my new Reports?
When you import a new MP into SCOM R2, it will take a while before all the Reports do show up. They need to uploaded to the SQL Server Reporting Services (SSRS) instance. Sometimes however, they won’t appear at all or only partially. This latter might happen with the MP I mentioned in Item 01.
When does not create the required data source BEFORE importing this MP, only some pieces of the Reports contained within this MP will be uploaded to the SSRS instance. In order to remedy it go through these steps:
Times that I bumped into it the cause was related to altered permissions on the SCOM R2 services accounts. So check them out thoroughly. Also read this posting of mine. Resetting the accounts mentioned here will force SCOM R2 to upload the Reports contained in the MPs to the SSRS instance. So when any Reports are missing, they will be tried again. And when they go wrong, it will be logged in the OpsMgr event log of the RMS. So keep a keen eye on that log.
07 – Does the information flow into the DBs?
Sometimes a hiccup might occur and the information, collected from the monitored servers and other objects, might not flow from the Management Servers into the SCOM DBs. When that happens, SCOM R2 will tell you.
Times I bumped into this issue the cause was to be found into altered permissions. So check the SCOM R2 service accounts. Other temporary issues were network related, like an enabled firewall on the SQL server without having set the proper exceptions to be found here (under the header Operations Manager 2007 Firewall Scenarios).
08 – Does Grooming Work?
Important to know is whether grooming is still running and functional. Grooming means that closed Alerts are groomed out of the OpsMgr DB as configured in the Database Grooming settings:
In order to check whether grooming is doing fine and when not, how to solve it, go here.
As you can see much of the Health of SCOM is directly related to the DBs. It goes without saying that the underlying Server OS and hardware (whether P or V) must be up to specs as well. So check it out your self and regain control over SCOM R2. Next posting in this series will be all about updates and the lot. So stay tuned and see you all next time.
When you are using AVIcode and do have any questions, you can post them here. The AVIcode forum is staffed by Microsoft employers and MVPs with deep AVIcode knowledge and experience.
The second posting of this series is all about the inventory of your SCOM R2 environment. A good and clear picture of your environment is key to the overall health of that same environment.
Why? Suppose you have too many servers reporting to a single SCOM R2 Management Server. This will affect the total performance of your SCOM R2 environment negatively. Or suppose you think the RMS is beefed up enough but it turns out it isn’t. And when it does, it is always on the wrong moment of the day and/or week.
Or even worse, the non-clustered RMS fails and a MS has to be promoted to RMS. What MS is suited best for this temporary task? Where is the Encryption Key? Where is the password? Does the RMS also has a SMS enabled device attached to it which needs to be connected to the MS which has become the temporary RMS?
It might sound stupid but many times I bump into SCOM R2 environments where the system engineers assume it they have a clear picture of it. But when I ask questions like:
Some or many of those questions aren’t answered right away. And yes, it is understandable. Many times the people who run the SCOM R2 environment first are working in different departments now or are working for another company. And not much is documented and when it is, it isn’t shared with the new system engineers. So one or more blind spots are present and need to addressed.
When you don’t have a document describing the SCOM R2 environment it is time to do it now. Better to write it down in a moment of ‘calm and peace’ (I know, system engineers are always busy) than hitting into one or more blind spots while trying to recover from a disaster. Because at moments like that it is too late (and you hear a small voice nagging ‘I told you so…’).
So what do you write down about your SCOM environment?
Here is a ‘shortlist’ (duh!) of what kind of information is required in order to ‘Know What You Have’. Only then good management is plausible. Anything else is just an assumption and as we all know, ‘assumption is the mother of all …’.
‘One’ piece of advice: take your time to complete the document. Every day an hour or so for a week should be enough. And when the document is completed, keep it up to date. Takes about five minutes. Max. Saves a lot of time. So good versioning is required. And store it on a good location which is available and accessible for the other team members as well, like SharePoint.
So let’s start. The document should contain information like this:
When you have the document ready, you will have many benefits from it. I know, it is a lot of information to gather but one day you need it and then you are HAPPY with the effort you made while creating this document.
And when you have this document you have a better understanding of your SCOM environment as well. So when things look like going sour you know how to act.
Postings in the same series:
Part II – Know What You Have
Part III – Are the DBs OK?
Part IV – Is SCOM R2 Up-to-date or Outdated?
Sometimes I get questions/comments coming from people who run their SCOM R2 environment for a while without properly maintaining it. And that it is not so good. Like every other ICT based solution it needs proper care. For SCOM R2 updates (Cumulative Updates) are published on a a regular schedule. The core MP (the MP of SCOM R2 itself) is also regularly updated. And for the other MPs updates do come out as well.
Of course, it is not wise to implement every update as soon it comes out. It needs some testing and sometimes it is good to wait some weeks as well in order to see what the community has to say about its experiences with the updated product. And when in doubt, go to the official TechNet SCOM Forum and post a question about it. You will definitely get an answer there.
Actually it is nothing new I am telling here. It is all about good Change Management and proper maintenance of your ICT assets. And SCOM R2 makes no difference here. But somehow, sometimes I have the feeling that SCOM R2 gets neglected. And when it does and issues start to occur, like a partially malfunctioning SCOM R2 environment, people start to complain.
Of course, it can be solved. But wouldn’t it be better to prevent it? So no hick ups will occur or even worse, outages of your SCOM R2 environment?
Therefore I have decided to post a whole new series about this topic, or better, how to check whether your SCOM R2 environment is still healthy and operating as it should.
This series will look like this:
So stay tuned. See you all next time.
There are a couple of new X-Plat agents developed by Javier Ripoll and available on Codeplex, delivering monitoring to the Debian 5 and Ubuntu 10 UNIX distributions. If you’re interested in monitoring either of these platforms these agents and MPs are worth a second look.
Debian 5 (x86) System Center Operations Manager 2007 R2 Agent.
Ubuntu 10 Management Pack for SCOM 2007 R2.
Requires the Agent (in this project). It is based on Red Hat 5 Management Pack.
Thanks Michael for sharing!
The books are ordered chronologically. Books without a publishing date are at the end of the list.
What it does? Taken directly from the website:
However, I do not have that much experience with the product. But I have seen some very cool demonstrations. A fellow MVP, Simon Skinner, has written a whole series about AVIcode 5.7. Since he is an experienced user of this software these postings are really spot on. This is what he has written do far:
Time for a small review about that year in relation to this blog. So I pulled some statistics in order to get an insight view about how this blog performed in 2010. Some information:
This is just a small part of all the statistics about my blog. But impressive they are more over when you know that the blog is being run by solely by me. However, without the input from the community I would not have the energy and drive as it is now.
So again, THANK YOU ALL for spending your time on my blog and all your comments. After all, this blog is meant for the community and not for myself (even though I must admit I use it as my online knowledge base). I am nothing more but a tool, the blog itself the means and SCOM the product it’s all about, meant to aid the community in using SCOM in a better way. Looking at the numbers it looks like I am succeeding in that approach. Nice!
I searched the internet but didn’t find anything solid. So it was time to take a deeper dive on the servers as well. Soon I found something which seems to be the cause: WMI is experiencing issues. The servers involved are W2K08 R2. These are some WMI related events which these servers show:
So it is clear WMI is having serious issues which need to be addressed.
There is a hotfix available for fixing WMI on W2K08 R2 based servers: http://support.microsoft.com/kb/981314. It addresses a memory leak with WMI. Normally this leak will not become visible, only when the Win32_Service class is frequently queried. And guess what SCOM does?
So before the Lync MP was imported, WMI on these servers didn’t get that much load. But after having imported the Lync MP, WMI on these servers is being queried with a higher frequency so the leak comes out.
Tonight the hotfix will be applied and I am pretty sure this will help and resolve the issue. When the results do come in I will update this posting accordingly.
This posting will show you some tips, tricks and advises how to go about it. Some might seem too obvious, but still they are worth being mentioned. So bear with me.
Hope this walk through helps in troubleshooting MPs which do not seem to land properly on one or more servers.
Wh00t! Nice! Of course I realize I couldn’t have achieved this on my own, so thanks everybody who helped me with it. Much appreciated!