Okay, SCOM 2007 had some serious issues with network monitoring. So in SCOM 2012 this component got a complete overhaul and is rewritten from the ground up. And indeed, network monitoring in SCOM 2012 has improved compared to SCOM 2007. But to say it has really become top notch is a bit too much.
No, SCOM 2012 won’t replace the pure bred network monitoring tools. But guess what? Those tools will never replace SCOM 2012 as well. Ever. No matter what the marketing departments of those very same vendors want to make you to believe.
But when the network monitoring part of SCOM 2012 is put into perspective (SCOM 2012 monitors tons of work loads, whether it’s on-premise, cloud based, mobile units and from different angles, in- and outside) it’s okay. It’s has become an integrated part of the famous 360 degree monitoring. And for once I am on par with the marketing team of Microsoft because on this topic they tell the truth without any over estimation.
And now what?!
However, some things seem not to change and can still cause some strange issues. Suppose you have a brand new SCOM 2012 R2 RTM environment in place and everything is by the book. Many servers (Windows & Unix) are monitored and many different kind of workloads running on those very same servers. And yes, also many important network devices are being monitored.
And now one of those important monitored network devices goes down. In this case their were other monitoring solutions in place as well and they triggered the alarms. However, SCOM who’s monitoring that network device as well, stayed quiet. And now for a few minutes but for a long long time. And reported the network device to be HEALTHY!
Time to investigate
This really puzzled me so it was time for a deep dive into the way SCOM monitors network devices and alerts upon them. I agree, noise is bad but not Alerting when something is really amiss is even worse!
In Health Explorer of any given monitored network device you’ll find these two Unit Monitors:
- ICMP Ping
- SNMP Ping
So far so good. Both Unit Monitors are targeted against the Class Node, which is basically any monitored network device. However, per Unit Monitor there is an override in place which disables it.
The ICMP Ping Unit Monitor is disabled when the network device is covered by SNMP only, and the SNMP Ping Unit Monitor is disabled when the network device is covered by ICMP only. And this makes perfect sense.
But the configuration of those Unit Monitors really puzzled me.
The options Interval and Number of Samples are most important here. First of all the Interval on this Unit Monitor isn’t 240 seconds in SCOM 2012 R2, but 300 seconds, which is 5 minutes. The Number of Samples is indeed set to three. Basically meaning any given monitored network device can be down for 15 minutes before SCOM 2012 R2 triggers an Alert!
However, when a network device goes down, I want it to be a Critical Alert, not a Warning. However, since this Unit Monitor (and the ICMP Ping Unit Monitor) roll up to a Dependency Monitor, which also triggers the Alert, this kind of modification shouldn’t be done on the Unit Monitor level.
So for the Unit Monitor SNMP Ping I set these two overrides:
- Interval: from 300 seconds to 30 seconds;
- Number of Samples: from 3 to 2.
Time to take a look at the second Unit Monitor, ICMP Ping.
So this Unit Monitor changes State after 6 minutes (Interval of 120 seconds x Number of Samples, 3) which is still too much. Also a Warning State is generated, not a Critical condition…
Time to move on to the Dependency Monitor, Network Device Responsiveness since I want a Critical Alert with Priority High (for the Notifications which sends out only New Alerts which are Critical and have Priority High).
Time to test it
And now a new network device was added to SCOM to be monitored. This was a test network device. So when SCOM was monitoring it, the network cable was unplugged.
And YES! After a minute SCOM raised a Critical Alert with priority High. This Alert was neatly pushed out by the Notification Model as well. Awesome!
When you’re running SCOM 2012 R2 and are monitoring network devices, check the settings of the Monitors and make sure whether they match with the requirements of your organization. Changes are you have to make some modifications .