Friday, November 28, 2008

SCOM Database Owner

When one is installing a SCOM environment (for a customer for instance) one will do so with a certain account. However, this account becomes automatically the owner of the database. On it self no worries there. But what happens when this accounts gets disabled or removed?

Certain issues will arise, like SCOM not being able to discover new systems.

Therefore it is better to change the ownership of the SCOM databases to an account which never will be deleted. Even better is to use a specific Acitve Directory (AD) account for it. Be sure to grant this account only the needed permissions and nothing more.  

Otherwise these errors will pop-up in the SQLSERVER log:

Event Type: Error
Event Source: MSSQLSERVER
Event Category: (2)
Event ID: 28005
Date: xx-xx-xxxx
Time: xx:xx:xx
User: N/A
Computer: BLA
Description:
An exception occurred while enqueueing a message in the target queue. Error: 15404, State: 19. Could not obtain information about Windows NT group/user 'DOMAIN\ACCOUNT', error code 0xea.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 65 6d 00 00 10 00 00 00 em......
0008: 08 00 00 00 53 00 43 00 ....x.x.
0010: 4f 00 4d 00 53 00 51 00 x.x.x.x.
0018: 4c 00 00 00 07 00 00 00 x.......
0020: 6d 00 61 00 73 00 74 00 m.a.s.t.
0028: 65 00 72 00 00 00 e.r...

Follow this procedure for correcting this issue:

First you have to create a dedicated AD account and grant it just enough permissions. This account will be used in the procedure:

1. Start SQL Server Management Studio
2. Go to Databases --> System Databases
3. Select Properties
4. On the Select a Page select Files
5. On the right under header Database the current owner is displayed
6. Check the account. It's not the dedicated AD account? Click the button with the 3 dots
7. The screen 'Select Database Owner' appears
8. Click Browse. Select the dedicated AD account
9. Save the modifications
10. Close all screens

Repeat these steps for all SCOM related databases.

Wednesday, November 26, 2008

New DHCP MP solves perfomance issues

At a customers site – with already a SCOM implementation in place - there were performance issues with an Agent monitored DC.

Eventhough the server wasn't very dimensioned, the cpu peaked for long times at 100%.

First I tried the overrides for the DNS MP since the DC involved also functions as a DNS server. It helped but only for a short period of time. The same DC also functions as one of the DHCP-servers. After removing the DHCP MP and replacing it for the newest version (6.0.6452.0, a QFE release to fix the high CPU utilization issues), all is well again. The DC performs neatly, the cpu doesn't flat out at 100% anymore and it is monitored by SCOM.

So don't hesistate and replace any old version of the DHCP MP with the most recent one, to be found here.

Thursday, November 20, 2008

SCOM R2 Beta & upgrading existing SCOM SP1 Reporting component

When one upgrades an existing SCOM SP1 installation to R2 and this installation already has the reportingfunctionality installed, this component must be seperately upgraded. Otherwise the eventlog of OpsMgr will show these two events:
Event Type: Error
Event Source: Health Service Modules
Event Category: Data Warehouse
Event ID: 31565
Date: 20-11-2008
Time: xx:xx:xx
User: N/A
Computer: blah
Description:
Failed to deploy Data Warehouse component. The operation will be retried.Exception 'SqlScriptException': Batch ordinal: 5; Exception: Invalid column name 'SchemaName'.

One or more workflows were affected by this.

Workflow name: Microsoft.SystemCenter.DataWarehouse.Deployment.Component
Instance name: xxxx.xxxx.local
Instance ID: {DB2FC0CC-CC92-834B-6C5A-387AB914C800}
Management group: xxxx
and
Event Type: Error
Event Source: HealthService
Event Category: Health Service
Event ID: 1108
Date: 20-11-2008
Time: xx:xx:xx
User: N/A
Computer: blah
Description:
Secure Reference Override with id:"{63B43DA7-B8B1-6BBB-7EC0-2DC564855F6E}" requesting credentials identified by SSID:"007341DBB34FB8216AFD79D1C51874024D79A41F4100000000000000000000000000000000000000" cannot be resolved while loading configuration for instance "tempdb" with id:"{F142C30D-F3FC-AA12-9DF4-A5010B5E6A80}" in management group "xxxx".
This issue is similar to the upgrade from SCOM RTM to SP1. For this upgrade the Datawarehouse database had to be upgraded seperately as well.

Just start the setup of SCOM R2 again and select now 'Install Operations Manager 2007 R2 Reporting'. This setup will start the Operations Manager 2007 R2 Reporting Setup Upgrade Wizard. Just follow the onscreen instructions and take a cup of coffee since - based on the size of the database - it can take a while. Afterwards (a succesful upgrade that is) the earlier mentioned events are gone and SCOM R2 will show the newly added reports in its reportingpane.

SCOM R2 Beta 1 released!

Microsoft has just released SCOM/OpsMgr R2 Beta 1. Downloadable from the Connect site

Wednesday, November 19, 2008

Service Level Dashboard Management Pack

14-04-2009 Update: In SCOM R2 RC Service Level Tracking has been integrated. Want to know more? Read about it in this posting.
For some time now Microsoft has released the latest version (6.0.6278.6) of this Management Pack. It is meant for monitoring, reporting and tracking on line-of-business (LOB) application service level compliance. It displays the performance and availability of these LOBs against their Service Level Agreements (SLAs). An executive overview can be found here.

This overview is very important since it contains a diagram about the workings of this MP. Without a proper understanding of it, there is a great change this MP will deliver a wrong view upon the LOBs against their SLAs thus delivering wrong information!

In order for this to work a set of components have to be created into OpsMgr:

  1. SLA
    This action happens outside OpsMgr, but it is the most important one since the SLA for the LOB has to be defined. This information will be used by OpsMgr

  2. Web Application Monitors & Synthetic Transactions
    Watcher nodes have to be deployed and to configured in order to perform synthetic transactions such as connecting to the Web site and log on with a special account, starting a (bogus) transaction and logging off. Another example is querying the related database(s). The Web Application runs on the watcher node and uses the synthetic transactions to check whether the webapplication is available and measures its perfomance.

  3. Distributed Application
    A DA has to be built which represents the LOB/service. Here the monitors mentioned in step 2 are grouped and related to each other. For every component defined in this DA the availability and performance will be measured. With a certain override the SLA levels are to be set. The same override offers also the option to group the monitored LOBs based on a logical name, titled 'Dashboard Group'. Any name can be given. This name will later on be present in the reports and can be selected while defining the parameters for this report. This way the LOBs can be logically grouped together in one report.

  4. Dashboard Reports
    With importing this MP an new set of reports will be loaded as well. These reports will work based on the defined DAs earlier on. Every report will evaluate the LOB over the given defined reporting period whether it was compliant or not with the SLA, based on the levels set for this LOB. The Dundas Gauges are available in the summary reports and enable the viewer with a single glance to know whether or not the SLAs are met. Ofcourse all of these reports offer hotlinks so one can drill down to a certain aspect of it.

Conclusion

The ease of the usage of the MP can cause one to overlook the hardest part: defining the SLA and translating it in OpsMgr to a DA. But when this has been done properly the IT Management will find they have a good and easy to use tool to monitor the SLAs of the LOBs. With this MP it leverages OpsMGr to a new level of 'awareness'. It shows the dedication of Microsoft of making OpsMgr to a success and THE monitoring tool for today and tomorrow. In OpsMgr R2 this MP becomes an integrated part and as such will even be more better and easier in its usage. A must have for organizations dealing with LOBs, SLAs.


 

Screendumps

Image 1: (Example of a Web Application Monitor)

 

Image 2: (Example of a DA based on the new DA template with added components)

Image 3: (Setting the SLAs specifics for this LOB and defining a logical name)

 

Image 4: (Checking out whether the override has done it's work. Can take up to 15 minutes)

 

Image 5: (Running a summary report. Check out the Dundas Gauges!)

Tuesday, November 18, 2008

OpsMgr gets hosed: EventID 5300 & the Dell MP version 3.1 A01

Dell MP version 3.1 A01 has many issues. It can even cause the whole OpsMgr environment to become unstable. The Healthservice of the RMS stalls and the eventlog of the RMS displays EventID 5300
Event Type: Error
Event Source: HealthService
Event Category: Health Service
Event ID: 5300
Date: xx-xx-xxxx
Time: xx:xx:xx
User: N/A
Computer: blah
Description:
Local health service is not healthy. Entity state change flow is stalled with pending acknowledgement.

Management Group: xxx
Management Group ID: xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxxx

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
When one restarts the Health Service all is well again (for a short period of time that is)

When you don't have the Dell MP inported there can be other issues at hand:
Networkproblems between the RMS & SQL server hosting the OpsMgr database

SQL server hosting the OpsMgr database is overloaded

Wrong authorizations for OpsMgr on the SQL server hosting the OpsMgr database


However, when you are running the most recent version of the Dell MP it is very likely this MP is the culprit. One can run a few queries on the OpsMgr database to see what components are causing the most noise in the OpsMgr environment. These queries will show many components of the Dell MP being the noisiest.

A faster way though is to simply remove the Dell MP and to see whether the EventID 5300 disappears.

Dell has acknowledged this issue and is working on a new MP. This MP is to be released in december this year.
1/16/2009 - Update : Dell has stated the new MP won't be released before April 2009.
Cause: the latest version of the Dell MP contains many SNMP rules & monitors. In fact far too many. They create so much noise the RMS sooner or later stalls by the flood created by these monitors and rules. These rules & monitors are mostly meant to monitor DRAC.

There is a modified MP to be found on the internet from which all these SNMP monitors and rules have been removed. However as stated before the Dell MP has also other problems which aren't nice as well. Kevin Holman has written an article about it. Certainly worth reading it.

Monday, November 17, 2008

Error 1334 when trying to modify an existing OpsMgr installation

This posting is outdated. See for the update this posting on my blog.

Issues with DNS MP version 6.0.6278.27

12/23/2008 : Look here for the update on this posting

This MP causes high cpu spikes on underdimensioned DNS servers. It is caused by the fact that many scripts issued by this MP are running simultaneously on the DNS server. This might cause the cpu to run on 100% making the DNS server unavailable.

It is a known issue for Microsoft and they are about to release a new version with these bugs fixed.

For now there is a workaround look here.

Friday, November 14, 2008

R2 comes!

On TechEd 2008 in Barcelona Microsoft announced the newest release of OpsMgr: OpsMgr R2. Beta will come this month (november 2008), RC Q1 2009 & RTM Q2 2009.

The demo's were really spectacular.
For what I've seen I must say R2 is the grown up version of OpsMgr SP1. Can't wait for the beta release to fieldtest it! And... pigs DO fly at Redmond....
Eventhough much has/will be changed in R2 these are the main issues:

Native support for monitoring Linux/Unix
Out of the box R2 will support the monitoring of 14 types of Linux/Unix distributions. Monitoring of the seven mostly used deamons and all services on a non-Windows box is supported.

Performance Enhancements
Engine has been made more intelligent. It shows only the information queried for. So not everything will be loaded, but just those parts needed. Demovideo was impressive. Like OpsMgr SP1 on steroids!

Service Level Tracking
In SP1 with a special Management Pack. In R2 out of the box and improved.

Process Monitoring
Ever wanted to know about certain (un)wanted processes running on monitored boxes? R2 provides this kind of monitoring and last but not least enables automatic start/closure of these very same processes

Improvement the way Management Packs are imported
Directly out of the OpsMgr UI connecting to the catalog, searching the needed MPs and importing them directly into OpsMgr. When a underlying MP isn't there OpsMgr R2 will notify on it and propose corrective actions

One-click Alert Subscriptions
Back from MOM 2005: with a right-click one can subscribe to an alert

Power to the WebConsole
Health Explorer and the import of MPs are now available in the WebConsole

Overrides
Easy to obtain overview of the applied overrides. View is easily to be customized.

Management Pack Templates
More are available, the ones already present in OpsMgr SP1 are improved. Better Distributed Applications can be built now.

Reporting
No more empty reports because the wrong objects have been selected. With the selection of a report only the related objects will be shown (filtered) so the change of ending up with an empty report becomes very small.

Monitoring non-Windows applications
3rd parties like Novell & Xandros will deliver high quality MPs for monitoring Oracle, Apache, Samba, Linux based DNS, DHCP.