Saturday, August 8, 2009

Alert ‘Secure Reference Override’. Is it a bug?

With OpsMgr R2 many new features have been added. Also have existing features been updated or made more secure. Two of these are the Run As Acounts (RAAs) and Run As Profiles (RAPs). These already existed in OpsMgr RTM/SP1 but have been made more secure. New features for RAAs and RAPs are distribution and targeting.

The workings of this is neatly explained in the Operations Manager 2007 R2 Security Guide, found here, pages 23 & 24. So I won’t go too much in detail about it.

Last week on a customers site, where a new OpsMgr R2 implementation is being done, I bumped into a situation which puzzles me. What happened?

At this site multiple NLB Clusters and Windows 2008 Clusters are to be monitored. So after having deployed OpsMgr Agents to these servers and all was OK, the related MPs (one by one) were imported and configured. For this the related RAAs and RAPs were also configured. However, I thought being smart, for the RAAs I used the same account, the default Agent Action account, which is a domain account. On the related servers this account was given the appropriate permissions. Since this company needs to be compliant, I choose the option ‘More Secure’ for the RAAs as distribution mechanism and I choose the appropriate servers.

But then all monitored servers started to generate Secure Reference Override Failures, also the ones which aren’t NLB or Windows Clusters at all…

First I had used the common Agent Action Account for both RAPs. So I thought that being the error. I created a new domain account for each RAA, changed the RAP accordingly, but again these Alerts kept popping up. Then I cleaned the RAPs, deleted the RAAs, recreated them and configured the RAPs. But again, the same Alerts from the non-NLB and non-Cluster servers came back.

I noticed one thing: the RAP for Clustering (Windows Cluster Action Account) doesn’t come from the Windows Cluster MP, but is default present in OpsMgr when it is freshly installed. So I added here the NLB servers as well since these servers are a cluster as well.

It only solved the problem a little bit: the other servers (none of them NLB or Windows Cluster) still generate these Alerts and the Windows Cluster servers still generate the Alert because of the NLB RAP.

Then I checked the TechNet OpsMgr Forum. It seems that others are experiencing the same problem as well with the Dynamics MP.

When checking the OpsMgr event log on these non-NLB and non-Cluster servers I saw first EventID 1108 popping up (triggering the Secure Reference Override Alert) and immediately afterwards EventID 1102, telling that a NLB discovery script couldn’t run:

Source:        HealthService
Date:          8/8/2009 11:54:07 AM
Event ID:      1102
Task Category: Health Service
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      MS02.systemcenter.org
Description:
Rule/Monitor "Microsoft.Windows.NetworkLoadBalancing.2008.Cluster.Discovery" running for instance "MS02.systemcenter.org" with id:"{512D160A-04FB-EFB3-B587-D2D0B172449D}" cannot be initialized and will not be loaded. Management group "OpsMgr R2 RTM"

Apparently this script runs under the same RAA but isn’t distributed to these servers and can’t run therefore?

The SQL MP however has no problem with this setting (More Secure). So is it a MP related issue? Must the MP be made in such a manner that is supports these new features? I don’t know. But the NLB and Cluster MP force me into using the ‘Less Secure’ option where the RAA is distributed to all monitored servers…

Any one out there experiencing the same issues with these MPs? Or do other MPs have these issues as well? Or did you solve it? Hope to hear from you.

I replayed it in one of mine test environments and was able to reproduce this behavior. So these issues are not related to a specific Management Group.

Next week I’ll provide feedback to Microsoft about it through the Connect website.

No comments: