Tuesday, March 15, 2011

Notification partially working and nagging EventID 4503 in the OpsMgr Eventlog of the RMS?

Wow! Got this situation at a customers site. Took me a long long time to crack it. But solved it none the less.

What was the deal?

Environment

  • +6 Channels were in place (mail, SMS and the lot);
  • +30 Subscribers;
  • +30 Subscriptions.

Symptoms
Some people were receiving Notifications but others weren’t. Strangest thing was that there seemed to be no logic behind it at all. So some Alerts tied to Channel A didn’t get out, but other Alerts, using the same Channel, did get out.

And in the meanwhile the OpsMgr Event log of the RMS showed EventID 4503 when an Alert failed to get out:
image

Things I tried but all to no avail
I tried many things, like opening all Subscriptions, cycling through all screens and save them one by one. Opened all Subscribers, looking for errors, found some, corrected them but again EventID 4503 kept on nagging me… Checked out all Channels but found nothing wrong there as well.

Found some Blog postings with Google also about the same EventID but there the Notification Model came completely to a halt and the cause was totally different, so I had another, yet undefined, cause to deal with. These are the postings I am talking about which did not help out in my particular case:

So it was time for some drastic measures:

  • I exported the MP containing the Notification Model (Channels, Subscribers and Subscriptions) which is the Notifications Internal Library MP. This MP is unsealed and every change in the Notification Model is put into this MP.
  • Ran it through XML Notepad 2007, checked every entry but found nothing strange there as well. It was a hideous task since it’s just bare xml code…

Then I got even more drastic since it really made me a ‘bit’ frustrated, it looked like SCOM was getting ahead of me, something I do NOT like…

  • Documented every single Channel, Subscriber and Subscription;
  • Exported the MP (again) to be on the safe side;
  • Removed ALL Subscriptions, Subscribers and Channels;
  • Recreated ONE SMTP Channel (as documented), ONE Subscriber and ONE Subscription;
  • Test fired an Alert which triggered an Alert which was covered by the Notification Model;
  • And… BANG! EventID 4503.

Lost a battle of two BUT got WISER none the less
Even though all seemed to be lost, the last time this event happened made me a lot wiser. Why? There was no more ‘noise’ present in the MP containing the Notification Model. So the actual cause was still there but easier to pinpoint. So it was time for some reasoning:

  1. Subscription was just default as it could get, By The Book. So no issues there;
  2. The Rule triggering the Alers wasn’t the cause either. A simple Rule, no Mumbo-Jumbo there either;
  3. The Subscriber was also just fine. Nothing fancy or any other magic happening there.
  4. What was left, was the CHANNEL!  So somehow the Channel was causing all this mischief. But why? And most important: from WHERE?

Let’s ZOOM in on the CHANNEL
Channel was pretty straight forward as well. Using two Exchange servers. Talked to the Exchange guys/girls and they told me everything was just fine there. However, I have learned my lessons the hard way about presuming. So time to test it as well.

I downloaded the Notifications Test Tool and used the configured SMTP Channel to send an Alert to my internal e-mail account. Which worked! Now one could say this is not good since the Notification Channel turned out just fine. But thing is, a Channel consists out of many components which might break or cause issues as well. So all this tool told me is that the BASIC functionality of the Channel was OK, like the configured Exchange servers. But how about the deeper configuration settings of this Channel?

Especially these configurable items of any SMTP Channel may cause some serious grief:

  1. E-mail subject;
  2. E-mail message;
  3. Encoding.

Configurable Item 3 wasn’t at play here since that is all about garbled mail messages. And here, the issues occurred before a mail message was even
generated, so configurable Items 1 or 2 (or both) were the ones causing all this havoc.

How the Final Battle was won, thus ending the war with SCOM
Time to zoom into Items 1 & 2 as stated above. This is what I did:

  1. Created another SMTP Channel, but now with the default configuration for the E-mail Subject and E-mail Message configurable items;
  2. Retargeted the Subscription so it used the new SMTP Channel;
  3. Removed the other SMTP Channel so no noise wasn’t present any more;
  4. Test fired an Alert which triggered an Alert which was covered by the Notification Model;
  5. And… YES! A Notification was sent out and received by me!

Time for some experiments. Quickly it turned out the configurable item E-mail Message wasn’t any problem. When I copied the formula from the ‘problematic’ SMTP Channel into the working one, it kept on working. However, when I copied the formula from the E-mail Subject from the ‘problematic’ SMTP Channel into the working one, EventID 4503 was back again!

Let’s crush the ENEMY!
Finally! The item causing the grief was found. But WHY? When I compared the ‘problematic’ SMTP Channel with the working SMTP Channel there was only ONE difference: it contained the header $Data/Context/DataItem/AlertDescription$, which is actually the Description of the Alert as shown in the SCOM R2 Console.

Depending on the Alert the Description might contain some strange characters, or can get as lengthy as a short novel. And strange characters or lengthy subject headers might cause issues with mail messages…

Time for another experiment:

  1. Recreated a SMTP Channel but without the header $Data/Context/DataItem/AlertDescription$  in configurable item E-mail subject;
  2. Test fired an Alert which triggered an Alert which was covered by the Notification Model;
  3. And… YES! A Notification was sent out and received by me!!!

Great! I pinpointed not only the item where it went wrong (the SMTP Channel) but more important, the REAL cause, the header $Data/Context/DataItem/AlertDescription$. Nice!

Time to party!
I imported the export file of the Notifications MP, containing ALL Channels, Subscribers and Subscriptions. Adjusted the Channels based on my findings (removing the header $Data/Context/DataItem/AlertDescription$ in configurable item E-mail subject) and all was well now. All Alerts meeting the Notification criteria were sent out. No more EventID 4503 in the OpsMgr Event log to be found again. Nice!

So whenever you bump into this nagging Event and the earlier mentioned blog postings do not match your case, try this approach by removing the header $Data/Context/DataItem/AlertDescription$ in configurable item E-mail subject of the SMTP Channels and you should be fine again.

And not just that, my customer wanted to document its Notification Model in more detail. Not needed any more since I already had done so while solving this issue! So this customer was happy twice: the Notification Model was fully operational AND thoroughly documented.

No comments: