Monday, January 26, 2015

OM12 Quick Trick: Overview Of Clean & Dirty Server Reboots


Update 01-26-2015
A regular reader of my blog pointed out a BIG mistake in this posting. You can’t use Event Collection Rules to send out Alerts. Simply because Notifications REQUIRE Alerts to work. And guess what? Event Collection Rules DON’T create Alerts, so NO Notifications are send out. Ever! Now I come to think about it, I wonder why I made such a mistake. Like a am a total newbie to SCOM… Therefore I decided to pull this posting, update it and repost it. A BIG word of thanks to Keith Kleiman who pointed this out to me.

Why this posting?
Many times I am asked whether SCOM can show which Windows Servers got a reboot, whether it was a clean restart (e.g initiated by an administrator or a PS script, whether planned or unplanned) or a dirty reboot (unexpected, e.g a power off or a system failure).

The funny thing is, SCOM already collects that kind of information. However, it’s not shown by default in the SCOM Console. All you’ve to do is to build to Views and – when required – adjust the Notification Model so those server reboots (only the dirty ones or the clean ones as well) are send out by SCOM as an SMS or e-mail message.

Another trick I want to share with you is to build such a View first in the ‘My Workspace’ area of the SCOM Console. For multiple reasons, but these are the two main ones:

  1. All the Views you create here aren’t stored like all the other Views in a MP. So you get to see the end results much faster, also allowing you to tweak it until it fits the requirements of your organization. This way you’re capable to build & test the View much faster without impacting other users;
  2. Whenever you build a View in the Monitoring pane it’s stored in a MP which costs time and uses CPU time on the SCOM MS servers. Sometimes those CPU resources can’t be spared for it, especially in the really big SCOM environments. I’ve seen SCOM environments having issues with updating MPs, stalling the SCOM Consoles. In SCOM 2012x issues like these are mostly gone, but in SCOM 2007x it happened regularly.

Procedure
Open the SCOM Console with Admin permissions and go to the My Workspace area.

Clean Server Restarts View

  1. Right click on Favorite Views > New > Event View;
  2. Name: Clean Server Restarts. Give it a proper Description > Under the header Select conditions, select the
    option with a specific event number. Now you’ve got a screen like this one:
    image
  3. Click on the blue link ‘specific’ and you’ll get this screen:
    image
    You need EventID 6006
  4. OK > OK. The View is now created and will be SOON available:
    image
  5. However, the Computer Names are missing. Right click on the new View > Properties > tab Display and select the Column Logging Computer. and move it to the required position using the arrow button:
    image
  6. Now the View is ready:
    image

Dirty Server Restarts View

  1. Right click on Favorite Views > New > Event View;
  2. Name: Dirty Server Restarts. Give it a proper Description > Under the header Select conditions, select the option with a specific event number. Now you’ve got a screen like this one:
    image
  3. Click on the blue link ‘specific’ and you’ll get this screen:
    image
    You need EventID 6008
  4. OK > OK. The View is now created and will be SOON available:
    image
    However, the Computer Names are missing. Right click on the new View > Properties > tab Display and select the Column Logging Computer and move it by using the arrow key to the required position:
    image
  5. > OK. Now the View is ready:
    image

When both these Views are okay you can rebuild them in the Monitoring pane. Remember to build a MP FIRST and not to add these new Views to the root since they’ll end up in the Default MP and that’s a No Go Area!

Okay that’s nice. But how about sending out Notifications?
A bit more work is required here. Reason is that the previous mentioned Rules won’t work to send out Notifications. Why? Notifications only work for Alerts being shown in the Console. But the previous mentioned Rules are EVENT COLLECTION Rules so they’ll NEVER create an Alert. As a result a Notification won’t be send out based on those Event Collection Rules.

This means you’ve to create two new Alert Generating Rules; one for clean server restarts, and another one for dirty server reboots. Disable them by default and enable them for a special Group of servers you want to keep track of, like the production servers for instance. Of course, you can also enable these two Rules for the Groups containing the different Windows Server versions, like Windows Server 2003, 2008x and 2012x. By creating two Rules, disabling them by default and enabling them through an Override targeted at Groups of your choice, you circumvent the situation where you would have to create these Rules more than once.

A: Alert Generating Rule for Clean server restarts
What we need is Event ID 6006 from the System event log. This is easy and straight forward as well:

  1. SCOM Console > Authoring > Management Pack Objects > Rules > right click > Create New Rule > Alert Generating Rules > Event Based > NT Event Log (Alert) > Choose a proper Unsealed MP or create a new one (e.g. _Server Reboots) > Next
    image
  2. Give the Rule a proper Name (e.g. Clean Server Restart Alert) and Description. Choose as Rule target Windows Computer. I know this doesn’t adhere to the Best Practices of MP authoring since the target Windows Computer is way too generic, like aiming with buckshot. But in this case we DISABLE the Rule and enable it later on through one or more Overrides, aimed at specific Groups, containing a special set of Windows Servers. So in this case it’s not that bad Smile > Next
    image
  3. Type: System > Next
    image
  4. Type 6006 as Value for the Parameter Name Event ID. image
    Select the line Event Source by clicking on the left side of it and click on Delete. Now you’ve got this:
    image
    > Next
  5. Give it a proper Alert Name (e.g. A Windows Server Got a Clean Reboot), remove the entry in the Alert Description field and type Server[space] > hit the button with the three dots and select Target > DNS Name > and type now: [space] got a clean restart. Now you’ve got something like this:
    image
    > OK. Since it’s a CLEAN server restart, I expect it to be planned. So a Critical Alert is a bit too much so I’ve set it to be a WARNING Alert with Priority High > Create
    image
  6. The Rule is being created now. When it’s in place enable through an Override for a Group containing those Servers you wanted to be Alerted for when they got a clean restart. In this example I’ve enabled this Rule for the Group Windows Server 2012 R2 Computer Group:
    image
  7. Let’s test it and restart gracefully a Windows Server 2012 R2 server, monitored by SCOM:
    image
    Bingo! So now you can use this newly created Rule (Clean Server Restart Alert) for your Notification Model.

B: Alert Generating Rule for Dirty server reboots
In this case we need to create an Alert Generating Rule for Event ID 6008, System even Log.

Basically the steps for authoring this Rule are almost the same as described in the Steps for generating Alerts for Clean server restarts. So therefore I’ll only highlight the differences:

  1. No differences;
  2. Name is like: Dirty Server Reboot Alert and a good description, like:
    image
  3. No differences;
  4. Capture Event ID 6008:
    image
  5. Change the Alert Name to A Windows Server Got A Dirty Reboot! and modify the Alert Description as well so it displays that the server got a dirty reboot and additional checks are required, like this for example:
    image
    Since it’s a DIRTY reboot I set the Severity to Critical and the Priority to High:
    image
  6. Create the Rule and Override it as desired. In this example I Override it for the Group Windows Server 2012 R2 Computer Group.
  7. Let’s test it and give a monitored WS 2012 R2 server a dirty reboot:
    image
    Bingo! And now you can use this newly created Rule for your Notification Model as well.

Wow! And how about Reports?
That’s easy as well. Seriously.

  1. Reporting > Microsoft Generic Report Library > Custom Event > Open;
  2. Be patient Smile > choose for From an offset like a month or so (Advanced > Today > Minus > 1 > Month). Now you’ve got something like this:
    image
  3. Add Group > Windows Server Computer Group > Add. Now it looks like this:
    image
  4. Under Report Fields select these items in the SAME order(!):
    - Logging Computer
    - Object
    - Event ID
    - Level
    - Date
    By default the order is wrong. Use the arrow buttons to sort that out. It looks like this now:
    image
  5. Select Report Field Event ID > hit the button Filter > enter Event ID 6008
    image
  6. Run the Report once in order to test it.
    image
  7. When all is okay > File > Publish (or save to MP, go here for the differences) > give it a proper name like Dirty Server Reboots the last month.
  8. Repeat Steps 1 to 7 for another report about the clean server reboots. Only in Step 5 use Event ID 6006 and in Step 7 use another name like Clean Server Reboots the last Month.

Recap
As you can see, it’s quite easy to have SCOM monitoring for server reboots, whether those are clean restarts or dirty reboots. Some of this monitoring is already available out of the box whereas for Notifications some additional (very basic and easy!) authoring is required.

4 comments:

Wilson328 said...

I use a powershell script with the new-maintenancewindow cmdlet to put my systems in maintenance-mode for 15 minutes when it detects a user-initiated reboot (criteria is eventID = 1074 and event source = USER32). I kick off the powershell script with a command channel in my subscription and it works pretty well.

My users never remember to place their systems in MM when bouncing their servers so this method cuts down on a *lot* of noise.

And 15 minutes is just long enough for even the slowest physical servers to finish rebooting.

AdamF said...

Have you ever tried sending these alerts via SMS ? I have it working, however I am unable to find a way to pass the server name into the SMS message so I know which server has had the 6008 unexpected shutdown.

AdamF said...

Is it possible to pass the server name into the "Alert name" field so I can see on an SMS message which server has had the 6008 event ?

Anonymous said...

Hi,

do you have any problems with healthservice not picking these events up. Ive created an alert rule for bluescreen events and dirty shutdowns, but all these are logged before the agents starts (afte boot) so scom does not pick them up. Writing the event manually works as intended