Issue
A customer of mine who has one of his SCOM environments connected with OMS reported me that they saw the Alert ‘Operations Manager Failed to Access the Windows Event Log’ coming in for many SCOM managed servers, but not all of them. They noticed the Alert was all about trying to access a non-existent event log, ATA?
Time to investigate
As it turned out, this Alert about not being able to access the ATA event log, only happened on a subset of SCOM managed servers. As stated before, the particular SCOM MG is connected to OMS. And in OMS a Group of computers is managed by OMS. And for all those servers, this Alert pops up.
The non-existent event log, ATA is all about Microsoft Advanced Threat Analytics. And the specific Rule causing this Alert is Microsoft.SystemCenter.CollectATAEvents:
This Rule comes from the MP Microsoft System Center Advisor Advanced Threat Analytics.
What surprises me here is the targeting of the Rule. One of the basics MP authors are taught (even though I am not a MP author, I am familiar with the foundation and the rules), is NOT to use the Windows Computer Class as a target. Simply because it’s to broad! Like using buckshot instead of a well aimed bullet…
And yet, this Rule is like buckshot:
Ouch!
And even though this Rule is disabled by default, it’s enabled for the Group Microsoft System Center Advisor Monitoring Server Group:
And this Group is populated with all the SCOM managed servers who’re also connected to OMS. And none of those servers has an Microsoft ATA event log, even though this Rule wants to connect to it:
But when looking deeper into this Rule, it looks even weirder since the Rule doesn’t contain any filters at all?
Wow, when an ATA log is present it basically means EVERY ATA event is uploaded to OMS. How much data is that? Consider this running for hundreds of servers….
So now we have the culprit and the cause. Time to solve it.
Workaround
Since this is a badly written Rule but we don’t have access to the source code, we need a workaround which is nothing more than an Override in order to disable it.
In this case I set an Override (Disable) for the Group Windows Server Computer Group and also ENFORCED the same Override in order to be 100% sure it’s effective:
Case closed.
1 comment:
If you monitor additional logs (i.e. IIS logs) all other servers monitored by SCOM and OMS display same error if they don't have that log. Very annoying issue with OMS.
Post a Comment