Bumped into this strange issue at a customers site.
Issue SCOM R2 in place with the Web Console, hosted by the RMS so Windows Authentication was being used. Web Console ran like clock work. Only the Health Explorer gave some issues. Ran fine on the RMS but outside that server, it didn’t run good. Only the buttons were shown but the rest of the screen was grayed out, like this:
Also the RMS showed this error in the OpsMgr Event Log: EventID 10, source: Web Console. Description: bla bla bla Unhandled asynchronous postback error occured…
Things I tried to remedy it No matter what I tried, nothing helped:
Recycled App pool;
ResetIIS cmd;
Rebooted RMS;
Tried the Web Console with NetBIOS names and later with FQDN;
Repaired ASP.NET Ajax (required for running web based Health Explorer).
But all to no avail.
What finally worked Time for another approach. When more sophisticated measures don’t seem to land it was time for a drastic approach: the REMOVAL of the Web Console and afterwards, a new installation.
This can be done easily. Just run setup, select Modify and deselect the Web Console. Within a few minutes the Web Console is gone. Rerun setup and select Modify again. Now select the Web Console, the required type of authentication and a few minutes later a new Web Console is present, all without requiring a reboot of the RMS :).
And now the Web based Health Explorer runs like clock work again, from any server or workstation:
Even though I don’t like approaches like these (I still don’t know what caused it) sometimes measures like these are required. Fix it and move on to another more interesting issue :).
When one wants to monitor non-Windows servers (like RHEL and SLES for instance) one requires the presence of WS-MAN on the SCOM R2 Management Servers, or at least on the SCOM R2 Management Servers to which these monitored non-Windows servers report to.
But what is WS-MAN? And where to find it? What version do I need? And does one need to install it on Windows 200x servers or not? In this posting I'll try to shed some more light on this topic. So let’s start.
What is WS-MAN? WS-MAN, aka WinRM aka Windows Remote Management is a new remote management standard, enabling people and processes to remotely manage and execute programs on remote (windows) systems.
OK, I see. But why needs SCOM WS-MAN? Windows Servers and non-Windows Servers are two different worlds. In order to make them talk together, some enablers for this kind of communication need to be in place. One of these ‘enablers’ is WS-MAN which is present in both worlds. For clarification, look at underneath picture, taken from a slide deck used by Barry Shilmover with Tech-Ed Barcelona 2008 when he gave a presentation about SCOM R2 and monitoring X-plat:
What version do I need? Depends on the Server OS you are running. When you run Windows Server 2003 WS-MAN 1.1 will do. When you run Windows Server 2008 or higher, WS-MAN 2.0 is required. When Windows 8 will come out it is to be expected to see a new version of WS-MAN as well.
Where do I find it and do I need to install it on all Windows 200x servers? That’s a funny thing. When you run Windows Server 2003, you need to download it from here(make sure you choose the correct architecture!) and install it (Next > Next > Finish).
When you run Windows Server 2008 or higher, it’s already there! So no need to install it again. Open the Windows Services mmc and look for Display Name Windows Remote Management (WS-Management). The service name is WinRM :).
You only need to install it (when not running Windows Server 2008 or later that is!) on the SCOM R2 Management Server to which the non-Windows based servers report to. But as a best practice, it is advised to install it on all SCOM R2 Management Servers, so one can switch easily to another SCOM R2 Management Server for the non-Windows servers to report to.
Do I need to configure it in any kind of way? Yes, for SCOM R2 you’ll need configure WS-MAN for allowing basic authentication by running this command from an (elevated) cmd-prompt:
winrm set winrm/config/client/auth @{Basic="true"}
When you don’t do this, the Discovery of the non-Windows Servers will fail.
SNMP, OIDs and SCOM don’t seem to a very exciting mix at a first glance. However, when combined in a smart manner, they extend your monitoring solution in an awesome way. This posting is about just that. It will describe at a high level how to go about it and high light some potential pitfalls. And as an extra gift, it will show two short YouTube videos as a demonstration of the power of SNMP, OIDs and SCOM working together.
How about… checking the UPS systems, whether they’re still powered, or the batteries are being loaded or not? And at what percentage the battery capacity is and having it displayed in a graph in the SCOM Console? Or how about getting an Alert when the temperature in your datacenter is too high? And having a graph plotted near real time in SCOM about the temperature as well? Or getting an Alert when there is water detected in the datacenter?
All of this – and much much more – can be realized with SCOM, some good equipment, good software for SNMP walks (available for free like GetIf) and testing.
Nowadays one can buy for not too much money a data center thermometer with Ethernet connection or ‘Industrial Ethernet Temperature, Humidity, Pressure Sensors With Relay Outputs’, like this one:
Many times these devices are white labeled, thus sold under different brands. One of the real manufacturers is Comet Systems, to be found here. In the Netherlands these devices are sold under another brand, like Atal. Even though this information seems trivial it’s very important. It has everything to do with the related MIB files, about which I will tell more later on.
SNMP Get vs. SNMP Trap Any how, devices like these are really awesome since they contain a whole SNMP stack of their own which can be queried by SCOM, using a simple SNMP Get command. The advantage of this, compared to a SNMP Trap, is that a single Monitor can be build and targeted against a whole bunch of devices. With a SNMP Trap this won’t do and per device a Monitor has to be build. Besides that there are more downsides of SNMP Traps. So whenever I can, I stay away from SNMP Traps.
White label, other brands and the MIB mix-up As stated before, many times these devices do come from a couple of factories all over the globe. Companies buy them in masses and rebrand it under a label of their own. However, it’s necessary to know exactly what type of device you’re using so you know exactly what MIB file to use. For instance, the device in the picture above is sold in the Netherlands under a totally different label and model.
However, the same MIB still applies which only matches with the brand and models as the ones from the real manufacturer. So this is the hardest part, to search for the original label and model type. Only then you know what part of the MIB file relates to your device. But when you have tackled this, the rest is – almost - a walk in the park.
Let’s walk SNMP, some high level steps Place the correct MIB file into the directory where GetIf loads its MIB files from. Start GetIf, enter the correct ip-address, community string and connect to the device. Go to the MBrowser tab and go through the SNMP stack, and find the OID you’re looking for, like temperature:
Write down this OID (high lighted in yellow) since you’re going to need that in SCOM later on.
Another interesting OID in this case is for flood detection, which is an Ethernet thermometer with additional input. One of the additional inputs is the LG-12 Flood detector which works really simple and shows only two values: All is OK (no water detected) value 1 and ‘Houston, we’ve got a problem’, water detected: value 0:
Also write down this OID.
Let’s create a Flood Detection Monitor, some high level steps Create this kind of Monitor: SNMP > Probe Based Detection > Simple Event Detection > Event Monitor – Single Event and Single Event.
Don’t forget to DISABLE the Monitor and enable it through using an override, targeted against the group containing all these devices! Of course, these devices need to monitored by SCOM as network devices.
Use for both SNMP Probes(First and Second) the same OID. And for Parameter Name(used in both Expressions, First and Second Expression) this entry: /DataItem/SnmpVarBinds/SnmpVarBind[1]/Value.
Configure the Health and Alerting and save the Monitor. Don’t forget to enable the Monitor by using an override targeted against a Group containing these devices.
Time for a test of the Flood Detector Monitor Let’s say the Flood Detector Monitor is properly built and configured. So it’s time for some testing. In this case I have made two video’s and uploaded them to YouTube.
Water Alert In the first video the flood detector is put into a paper cup with some water:
Now the circuit closes (OID gets value 0) and SCOM will raise an Alert
Water is gone, Alert as well In the second video the flood detector is removed from the paper cup:
Now the circuit is open again (OID gets value 1, all is well in SCOM) and the related Monitor is set to a Healthy state again, thus closing the Alert:
Let’s create a Temperature Monitor For this the same steps are used as for the Flood Detector Monitor.
Of course, a different OID and other values are at play here. Suppose you want an Alert when the temperature of your datacenter exceeds 25 degrees Celsius. The First Expression(situation is not OK) looks like this:
The Second Expression (situation is OK) looks like this:
Configure the Health and Alerting and save the Monitor. Don’t forget to enable the Monitor by using an override targeted against a Group containing these devices.
And now monitoring is in place and an Alert will be raised when the temperature of 25 degrees Celsius is exceeded.
Let’s plot the temperature in near real time For this a Rule is required, using the same OID for the Temperature Monitor: Collection Rules > Performance Based > SNMP Performance:
Configure the SNMP Probe (nothing more than the OID and frequency of probing) and you’re done. Don’t forget to enable the Rule by using an override targeted against a Group containing these devices and you’re in business.
In the SCOM Console add a Performance View targeted against these SNMP Network Devices or targeted against the Rule you created earlier. Be patient and within an hour or so data starts getting in :).
Conclusion Even though SNMP, OIDs and SCOM might seem boring, there are many possibilities to extend your monitoring solution into places which you didn’t expect. Many devices are available on the market which have a SNMP stack. When you have the related MIB file and it contains some good OIDs, you can build almost anything. Happy SCOMming!
As we all know some weeks ago Microsoft released a new version of the Server OS MP, version 6.0.6957.0. But there were some issues with it. Some minor, some major.
Gladly, Microsoft listened and one of them, Barry Shilmover, fixed many issues of the MP with the aid of Kevin Holman. I for myself can’t imagine a better team! Bob and I had the honor to test the fixed MP.
Kevin Holman also blogged about this new MP and what’s new. Taken directly from his blog:
There are many commercials out there how to get some hair back. I know, I am almost bald, but I don’t care. But sometimes there are issues which make you loose hair or – the hair one still got – to go grey.
Recently I had such an issue. All about monitoring an Oracle database, running on a Windows server. The customer wanted to monitor that database with SCOM without purchasing a MP for it.
With the OLE DB Data Source wizard one can connect to such a database, run a query against it and collect some performance counters as well. How nice! At least, this is what theory tells us. But real life is a bit harder as it turned out.
The Beginning Of It All First I used Maarten Damen’s blog posting, to be found here. A really good posting it is since it contains really good information. However, while creating the OLE DB Data Source, I added a query and I decided to populate BOTH Run-as Profiles which are created by the same wizard. Both decisions turned out to be wrong. However, SCOM didn’t show that directly to me.
It took a while for the OLE DB Data Source to show up in the SCOM Console and even a it longer to enter a monitored status (Healthy! Yeah baby, I made it!) but all seemed to be well. After all, it got into a Monitored status and a GREEN one at that! Nice!
The Day After… But in reality things weren’t that OK. For sure, the Monitor was OK and GREEN. But after a week or so, the Health Explorer didn’t show any state change. Nothing. The Monitor was green, but nowhere to be found since when. And that triggered me to take a look into the OpsMgr event log of the Watcher Node.
Ouch!
SNAFU is paying us a visit EventID 1102 all over the place. Descriptions like: Rule/Monitor "OleDbCheck_a28c517f4fdc4038b880b2ac02796256.NoConnectionMonitor" running for instance "XYZ Oracle DB Check" with id:"{F8C4A05D-AAD4-56CA-7CC4-9AC095323F83}" cannot be initialized and will not be loaded. Management group "XYZ"
When I recycled the Health Service cache, the error came back and the Health Explorer of the same OLE DB Data Source showed now a Healthy status with a date and time: exactly matching the same date and time the first EventID 1102 was logged!
So this is a real false-positive. Yes, all is OK sir! And all the while, nothing is OK! Like SNAFU…
What Caused it all Deep and thorough investigation turned out this issue:
Used Run-as-Profile When the OLE DB Data Source wizard has done it’s job, TWO Run-as-Profiles are created. And one must only configure ONE of them. Not both! While running the wizard one has the option to select Simple Authentication:
So now you only have to configure that Run-as-Profile and not the other! The Synthetic Transaction Action Profile is only used when Integrated Security is being used. But when one has selected the option to use Simple Authentication, one can skip this Run-as-Profile and only configure the Simple Authentication RunAs Profile.
Time For Another Error… And yes, things got moving now. EventID 1102 went away and instead we got another EventID 11852: OleDb Module encountered a failure 0x80040e14 during execution and will post it as output data item. : ORA-00903: invalid table name. Because of this the Health Explorer for this OLE DB Data Source went into a critical state.
The Oracle DBA checked everything: yes, the account used by SCOM is OK and logged on to the Oracle DB. And yes, the query looked for an existing table and YES the account has enough permissions to run that query against that table. Tried many other queries, the DBA even created a table for that account and SCOM OLE DB Data Source check, but still the same error!
So somehow along the lines the query got garbled which caused Oracle to pull the plug.
However, we already had spend a lot of time in order to get rid of the first EventID 1102 (even by Microsoft CSS we were told to configure BOTH Run-as-Profiles so it was trial and error here…) so we decided to remove the query as a test.
And guess what? All became well now! So the presence of the data source is successfully tested but not the responsiveness by running a query. At the end the customer decided it to leave it at that. So only the presence of the data source is tested and not a single query is being used.
So after a long journey we came back to the posting of Maarten Damen(by removing all the fluff we had added by ourselves) and got it working.
Still it’s interesting to know what happens to the query fired from SCOM to the Oracle database. Guess it will be another time to check that one out.
A few years ago when I started with SCOM I bumped into many issues. Not only because I started out fresh but also SCOM was just RTM. And as we all know, there were a few ‘bumps’ in that road. Many of those got rapidly fixed.
But still, I was hungry for more knowledge and detailed information on SCOM and how to address certain issues. So I started to look on the internet for more information and found it in a few blogs about there. Those days there weren’t many blogs out there, but the few which were to be found were really good, like the ones from Pete Zerger, David Allen, Kevin Holman and Cameron Fuller.
Soon I started to mail those guys and you know what, I got answers from them! Wow! And then I started to grow and learn as well. Really fast, thanks to these guys who shared really good information. They inspired me to start a blog of my own.
To me these guys are really special. So when Cameron Fuller told some time ago he was about to stop his own blog and to continue blogging on SCC, it made me a bit sad. Don’t get me wrong. I deeply respect SCC, but people like Cameron need their own space as well.
So it’s good to see he’s back. He’s still blogging for SCC, but this guy doesn’t suffer a writer block, more like suffering from an overkill of inspiration :). Therefore he started to blog on a very regular basis on the site of the company he works for, Catapult Systems. Hopefully they have rented enough disk space and traffic bandwidth for their website since Cameron knows how to blog seriously :).
Want to know more? Go here: (Even though he might look a bit dangerous on the picture, he’s really an OK guy.)
When one creates a Web Application in order to monitor the availability and responses of a certain website, one might bump into this error: Untrusted CA:
Investigation taught me their might be two separate causes for this Alert, both related to the Watcher Node which run this Monitor. This posting will be about the second separate cause. For the posting about the first cause, go here.
Issue Even when the Root Certificates are up to date on the Watcher Node, there might be another issue at play. In this case, a certificate is being used which isn’t present in the CA store of the Watcher Node.
How to solve it Open a rdp-session on the Watcher Node generating the Alert Untrusted CA - log on with local admin permissions - and start IE. Surf to the website which creates these Alerts. Wait until the website is fully loaded and import the certificate for your account in to the Certificate Store Intermediate Certification Authorities:
The wizard Welcome to the certificate Import Wizard is started. Follow the instructions and when prompted what store to select, choose for the option Place all certificates in the following store > Intermediate Certification Authorities.
Finish the wizard. The Certificate is stored now.
Open a MMC > add Snap-in > Certificates > for Local Computer Account and My User Account. Export the certificate you just imported from the store Certificates – Current User\IntermediateCertification Authorities to a folder on your drive.
Import the certificate you just exported to the store Certificates (Local Computer) \IntermediateCertification Authorities:
Now all is well and the Watcher Node won’t throw the error Untrusted CA anymore.
Credits: Thanks to this blog posting I was able to crack this issue.
When one creates a Web Application in order to monitor the availability and responses of a certain website, one might bump into this error: Untrusted CA:
Investigation taught me their might be two separate causes for this Alert, both related to the Watcher Node which run this Monitor. This posting will be about the first separate cause.
Issue Sometimes I bump into environments where Windows Servers are installed and not patched on a regular basis. Many times because the people involved life by the credo ‘If it ain’t broke, don’t fix it’. Even though it might sound plausible there is too much to be said against it. But this posting isn’t about this approach, so I’ll refrain from it.
But whenever bumps into an issue like that there is a huge change the certificate store of the server involved is too old, thus missing out on renewed (and revoked!!!!) Root Certificates.
Servers like these are easily pinpointed. Open a rdp-session on the Watcher Node generating the Alert Untrusted CA - log on with local admin permissions - and start IE. Surf to the website which requires monitoring. When IE throws this error (or a similar one since the errors differs per IE version):
the Root Certificates require to be updated.
How to solve it Open Control Panel go to Add or Remove Programs and select Add/RemoveWindows Components. The Windows Components Wizard screen is started. Scroll down and select the option Update Root Certificates and click Next.
When the installation is finished the updated Root Certificates will be ‘installed’ on the server. Now the monitored website will be fine and the error will be gone.
However… In some conditions the error will return. If so there is another issue at play. Go here about how to solve that issue.
I already knew it before since other people bumped into the same issue as well, blogged about it and even filed a bug report on Connect. But somehow the bug report didn’t seem to land properly – the impact wasn’t fully understood – so it was closed without a fix.
So it was time to organize a new effort. A new bug report was filed on Connect and it was given exposure on my blog. And people started to vote on it, so it was brought again to the attention of Microsoft.
Because of this two things happened:
The first bug report was reopened and Microsoft told frankly they didn’t understand the impact at first;
A new effort at Microsoft was organized to FIX it!
And a few days I got the message the bug report was closed, simply because Microsoft FIXED it!!!
The yellow highlighted area is in Dutch so I translate it: Solved as in Repaired.
So THANK YOU Community and THANK YOU Microsoft for taking reported matters seriously.
for any one running SQL Server 2008 for their SCOM R2 environment and yet not very interested in the latest version of the Server OS MP BUT very interested in the two additional Reports which are very sharp looking and good as well (all credits go to Microsoft of course).
Nice thing about the MP containing these two Reports (Microsoft.Windows.Server.Reports.mp) is that it works perfectly together with the previous version of the Server OS MP! Tried it in one of mine SCOM R2 CU#5 test environment and the Report MP was imported quickly, Reports uploaded to SSRS as well and they work like a charm.
Of course, I tried this only in a SCOM R2 test environment so try it yourself in a test environment as well before moving to production but until now everything looks great!
And again, I tested it only on SQL Server 2008. I have heard rumors the Reports don’t work with SQL Server 2005.
Had to update the Proxy Agent for many (+200) SNMP-enabled devices in SCOM R2 CU#5.
When using the GUI for it this becomes a gruesome task, also because the list with available Proxy Agents (+400 in my case) isn’t sorted. Only the Management Servers are shown on top. And the remaining servers aren’t sorted at all, nor is any search function available…
So it was time to search the internet and gladly I found two postings from ‘Da Master of SCOM’, Pete Zerger. Together with Marco Shaw ‘Da Master of PowerShell’, they created a very good PS script. The first posting contains the first version of the script (without the aid of Marco Shaw) and the second is based on the input and knowledge of Marco Shaw.
Scripts can be found here(Part I) and here(Part II).
All credits go to Da Masters, Peter Zerger and Marco Shaw. Thanks guys!
The response we got – and still are getting – is a bit overwhelming I must say. At this moment the counter of votes for this bug report is already on 44(!). Besides that we got on our blog postings many comments.
However, somehow some people misunderstood our effort and started bashing and flaming. Gladly I have to approve any comment before it’s shown on my blog. And some of the comments I got were really bad. So I removed them and only approved the good ones.
Therefore I want to clarify the objective of our joint effort:
NOT to flame or bash any person or company in any kind of way BUT to make SCOM/OM12 an even better product. And we strongly believe that the total quality of SCOM/OM12 is based upon the overall quality of the Management Packs. So the better the MPs become, the better SCOM/OM12 will become as well.
So whenever you’re looking for a way to flame or bash, go on and look further since my blog won’t be the stage for that kind of purposes!!!
And to the persons who gave solid and valid feedback: THANK YOU and know that I respect all of you.
Yesterday I posted an article about the new Server OS MP, version 6.0.6957.0.
At a first glance all seemed to be well. But the same day I saw many bugs related to the MP coming in, especially on Kevin Holman’s blog. So I decided to pull my posting about that MP and rewrite it. Which I did.
But frankly, that’s not enough. Like a restaurant, it can only become good or even better when it’s guests are giving good positive feedback. This analogy goes for the MPs as well. I don’t like bashing nor flaming. But giving good feedback instead is something else.
This morning I talked with Bob Cornelissen about it. He is a Dutch SCOM addict and knows his stuff inside and out. Both we felt something has to be done besides blogging about it.
By blogging about this bug report we hope that many of you will add their vote as well in order to get it top on the list of Microsoft. This way we can aim for a new MP containing all the required fixes.
Last Friday Microsoft released the newest version of the Windows Server OS MP, version 6.0.6956.0.
But hold your horses! Don’t import this MP right away since there are some issues with it.
You have to know these issues before you import this MP. So read this whole posting and don’t forget to visit Kevin Holman’s blog as well in order to get a clear picture about this MP.
None the less, this MP adds some new functionality which is good.
The Good News
Two new very cool Reports are added to this MP. These are THE Reports many customers of mine asked for.
Beware though, these two new Reports aren’t found in the MP Catalog accessed by the SCOM Console, but put into the msi-file containing the new MP, to be found here. When you run the msi-file, the MPs will be extracted, among them the MP containing these Reports:
Really, these Reports are MUST have! Finally, we see the performance of a server in a glance! And now we can add Groups as well which will be enumerated in the Report. So no more aggregation (all servers thrown together on a pile per graph like CPU) but PER server the performance overview, like this where I have chosen the Group Windows Computers and yet per server an overview is created:
Cluster Shared Volume Monitoring(amount of free space and availability) is added. Customers of mine who run Hyper-V on Windows Server 2008 R2 will love this one!
BPA(Server Manager Best Practices Analyzer). Put into a MP of its own (Microsoft.Windows.Server.2008.R2.Monitoring.BPA.mp). So you can decide yourself whether or not to use that MP which is also a good move of Microsoft.
Personally I have a feeling this MP looks a bit like the Server OS MP used in SCA(System Center Advisor), a cloud based solution for companies who want to check whether their Windows based systems and enterprise applications like the Server OS, SQL, AD and Hyper-V are in line with the Best Practices as advised by Microsoft.
Noise is cut down. Many Rules/Monitors are modified in order to cut down on the noise they created earlier.
Performance Collection Rules are disabled by default. Which is good as well since too much performance collection was taking place with the old version of the MP. So this saves a lot of space in the databases of SCOM. And network bandwidth as well.
How many Rules are disabled you ask? The MP Guide lists them all in the appendix ‘Windows Server 2008 Rules and Monitors Disabled by Default’. I have counted about thirteen(!) pages…
The Not So Happy News
Well, check out Kevin Holman’s posting about this MP and don’t forget to read the comments. There are some issues (some minor, some a bit nastier) to reckon with. Until now it seems there are workarounds for it.
The System Center Operations Manager Team posted an article all about the new dashboards present in OM12.
This posting covers the mechanism of the Dashboards, the distribution to the end-user (OM12 Console, Web Console or Web Part for Share Point) and how to create such a Dashboard.
Ever wanted to integrate SCOM and SharePoint? Like many other wishes we had for SCOM they have come true for OM12.
For OM12 there is a web part available, the Operations Manager Web Part, which integrates View-Only OM12 dashboards into your SharePoint environment. So people can look, but not click or do anything else, like running tasks.
TechNet already wrote an article about it but it was missing many details, like screen dumps and WHY you would want to integrate OM12 with SharePoint. Gladly the System Center Operations Team has posted an excellent article about it, covering all details INCLUDING screenshots.
Last Friday Microsoft released the newest version of the ConfigMgr (SCCM) MP, version 6.0.6000.3.
Even though the version number itself doesn’t show major changes (the previous version of this MP was 6.0.6000.2) the most reported issues about this MP are resolved which is very good.
Any one who runs the ConfigMgr MP is strongly advised to read the article posted by Kevin Holman. This posting covers the changes in this MP in detail AND contains crucial information about cleaning up the localizedtext table.