Monday, July 8, 2013

Getting Rid Of EventID 1202: ‘…Condition indicates wrong server configuration…’

Issue
Bumped into this issue at a customers location. When a SCOM 2012 SP1 Management Server had its Health Service stopped, cache removed (~:\Program Files\System Center 2012\Operations Manager\Server\Health Service State) and Health Service restarted, this server froze for some minutes.

And then the OpsMgr event log started to log many events with EventID 1202, stating: New Management Pack with id:"ZXYZ", version:"A.B.C.D" conflicts with cached Management Pack. Condition indicates wrong server configuration.

These events occurred right after EACH event with EventID 1201, stating a new MP had been downloaded. So basically EVERY downloaded MP wasn’t right! That’s bad! So it was time for an investigation.

Cause
All the SCOM 2012 SP1 Management Servers are VMs running on VMware. And until now I have never seen this issue at any other customer. But SCOM 2012 SP1 Management Servers being VMs using VMware isn’t new at all. I see it many times and until now never ever had this issue.

So together with the VMware guru of this company we started to investigate the issue. We took one SCOM 2012 SP1 Management Server and ran it multiple times through the steps causing the EventID 1202:

  • Stop Health Service
  • Remove Cache
  • Empty Eventlog for SCOM
  • Start Health Service

Soon we noticed with Resource Monitor that a minute after the Health Service was started, the CPU went flat to 100% utilization for about three to five minutes. While this happened there were 76(!) threads WAITING to be processed!

Test
We stopped this VM and added some more vCPUs to it. This isn’t an issue at all since the VMware hosts had enough CPUs to share and only 5% of the total CPU capacity is being used during production. Based on the waiting threads we decided to bump up the total amount of vCPUs for this SCOM 2012 SP1 Management Server to 8.

When the VM was up and running again, we ran the same test again. Now the CPU spiked to 100% but didn’t flat out any more. And yes, multiple events with EventID 1201 (MP received) but NOT A SINGLE event with EventID 1202!

Solution
All SCOM 2012 SP1 Management Servers were given 8 vCPUs. And yes, most of the time these servers don’t use it at all. But when a refresh of the cache takes place these vCPUs are really needed and make the difference between good and faulty SCOM 2012 SP1 Management Servers.

So whenever you see EventID 1202 happening on your SCOM 2012 SP1 Management Servers AND you’re sure the configuration of these servers is by the book, simply add additional vCPUs to those servers and be done with it.

During ‘normal’ business hours most of these vCPUs won’t be used at all, as you can see here, screendump taken from one of the SCOM 2012 SP1 Management Servers:
image

It’s just that these servers have ‘some’ capacity reserves when they really need it. Which isn’t that often but a crucial process none the less.

No comments: