Wednesday, April 23, 2014

Update Rollup 2 For SCOM 2012 R2 Is Out!

Just now Microsoft released Update Rollup 2 for SCOM 2012 R2, to be found in KB2929891.

This UR#2 fixes 9 issues:

  1. Issue 1
    This update rollup makes the stored procedure performance aggregate more robust against out-of-range values.
  2. Issue 2
    Adding multiple regular expressions (RegEx) to a group definition causes an SQL exception when the group is added or run.
  3. Issue 3
    Web applications fail when they are monitored by the System Center Operations Manager 2012 R2 APM agent.
  4. Issue 4
    Service Level Objectives (SLO) dashboards sometimes load in several seconds and sometimes take minutes to load. Additionally, the dashboard is empty after it loads in some cases.
  5. Issue 5
    Operations Manager Console crashes when you try to override the scope in the Authoring pane.
  6. Issue 6
    The System Center Operations Manager console is slow to load views if you are a member of a custom Operator role.
  7. Issue 7
    This update rollup includes a fix for the dashboard issue that was introduced in Update Rollup 1.
  8. Issue 8
    SQL Time Out Exceptions for State data (31552 events) occur when you create Data Warehouse workflows.
  9. Issue 9
    This update rollup includes a fix for the Event Data source.

Good to know is that this UR#2 requires additional manual actions, just like UR#1 for SCOM 2012 R2. So my advice is NOT to use the automated update feature but to roll out this UR#2 manually on a time schedule which suits your organization and you best.

This update also contains a fix for the data warehouse BULK insert commands timeout issue. This fix consists out of a registry key which must be manually added on any SCOM 2012 R2 UR#2 MS server on which you wish to override the default bulk insert command timeout. The procedure for adding this registry key is described in the same KB article.

Again, just like my advice for UR#1 for SCOM 2012 R2, wait with rolling out this UR in your production environment. Test it first in your test environment when you’ve got any. And when you don’t just wait as well before applying this UR. it wouldn’t be the first time some nasty side effects got through QC at Microsoft resulting in broken environments.

Tuesday, April 22, 2014

Oops! I’ve restored my SCOM 2012x Environment But Forget All About The Data Warehouse DB…

Situation
Whenever your SCOM 2012x environment gets hosed AND the only way to resolve it is to restore the backup of your SCOM 2012x databases, people tend to forget the Data Warehouse database. Or don’t want to restore it since it’s way too big, sometimes over 1 TB.

And many times people tend to think the Data Warehouse database can be skipped since ‘it all happens in the Operations Manager database whereas the Data Warehouse database is only used for Reporting’.

Issue(s)
However, now new issues arise since the connection between BOTH SCOM 2012x databases is far more complex. And when one ONLY restores the Operational database and NOT the Data Warehouse as well, this connection is out of sync, causing all kinds of unwanted issues, resulting in bad reporting (among other things).

Oops! I did it (again)! Now what?
Gladly the OpsMgr Engineering Team posted an excellent article about how to solve these particular issues. Until now I haven’t tested it myself.

However, these people know their trade so I am very happy with this additional knowledge, enabling me to resolve issues like these WITHOUT running another restore, and this time of BOTH SCOM 2012x databases.

Go here and check it out yourself.

Free Ebook: Microsoft System Center: Integrated Cloud Platform

A few days ago Microsoft released a new FREE ebook all about Microsoft System Center: Integrated Cloud Platform.

This book is targeted toward IT executives and architects interested in the big picture of how Microsoft’s cloud strategy is delivered using Windows and Microsoft System Center.

Microsoft provides an all-encompassing approach to understanding and architecting Windows Server 2012 R2, System Center 2012 R2, and Windows Azure based solutions for infrastructure as a service.

The combination of Windows, System Center, and Windows Azure is a cloud-integrated platform, delivering what Microsoft calls the “Cloud OS,” which is a common platform spanning private cloud, public cloud (Windows Azure), and service provider clouds. This platform enables a single virtualization, identity, data, management, and development platform across all three cloud types.

This FREE ebook is available in these formats: PDF, Mobi and ePub. IT can be downloaded from here.

Tuesday, April 15, 2014

Troubleshooting Flow For Slow SCOM 2012x Consoles

When the SCOM 2012x Consoles (both the UI and web based one) are slow, it’s a challenge to find the cause. And many times it’s not limited to an isolated causes but are many things at play, working in concert for a slow SCOM 2012x Console experience.

This is a bad thing since the end-user won’t use SCOM anymore since ‘it’s slow’ and even worse, ‘unresponsive’. People are starting to think SCOM is a bad product and turn away from it. When this happens it’s quite a challenge to address this where the technical aspect is the easiest one. Instead convincing the end users to start using SCOM again is a whole different story all together.

Therefore I’ve written this posting in order to help you to get SCOM back on track again, and even better to prevent this from happening. There is much to tell so let’s start.

The foundation: Compute, Storage & Networking
Many times (dead) slow SCOM Consoles are the result of a whole long chain of issues. Therefore it’s better to start at the beginning of it all, the three pillars of your data center/cloud based solution: Compute, Networking and Storage.

Compute
Yes, virtualization is everywhere and has become the norm. 99% of the SCOM installations I bump into are virtualized. Which is totally understandable and even the default of operations. Not an issue on itself NOR a cause for slow SCOM Consoles.

And yet, overcommitted hosts , badly configured hosts, or hosts running old and outdated technologies are many times the culprit for bad performance. Sure, SCOM isn’t a production critical system. But that doesn’t mean it should be put on second grade hosts or even worse.

At least you want your monitoring/management solution to be on par with your production environment. How else is it going to be used to measure and manage it? It simply won’t.

So ascertain yourself the SCOM environment is running on top of good virtualization hosts which would be used for the production environment as well. And also make sure these hosts aren’t overcommitted either and using similar configuration settings as the hosts for the production environment. A VM is a VM and whether it’s production or not, the same configuration rules are at play here.

When running physical SCOM servers, make sure those servers are using modern technologies and not technologies which were modern 5 or more years ago. Again, SCOM isn’t production and yet, it need to be taken seriously, thus installed on serious hardware which is current and not on the left overs which would be otherwise ditched or put into digital playgrounds. Hardware like that isn’t meant to run SCOM.

Storage
This is a nice one. Storage configured for maximized capacity measured in TBs IS NOT the type of storage you want to put your monitoring/management solution on. Ever! This simply will bring down the best applications no matter how good you tune them. Simply stay away from it.

Many times I see ISCSCI based SANs being used. And many times these type of SANs perform well. As long there are no short cuts and the disks running the SCOM databases are configured properly. So always ascertain these disks use dedicated LANs for ISCSCI traffic, so they can get all the resources they need. This requires additional configuration on the virtualization side of things, but belief me it’s worth the effort!

Networking
Network connectivity between the SCOM Management Servers and all related SQL servers has to be spot on. So placing one ore more SCOM Management Servers on other LAN sections or even on other connections connected by WAN are a no go area. Of course there are exceptions like remote locations connected by dark fiber, but even in those situations you must be a full 100% sure the latency is really low.

Otherwise the availability of your SCOM 2012x MG functions and Resource Groups will take a serious beating, resulting in an unstable SCOM environment.

Same goes for the related SQL servers. Make sure they’re connected properly together with the SCOM 2012x Management Servers. And when using a separate dedicated SQL Server Reporting Services (SSRS) instance, make sure to apply the same connectivity rules as well. Otherwise SCOM Reporting won’t deliver the expected performance either.

The SQL servers
Paul Keely has written excellent documents all about using SQL for System Center technologies. These documents contain tons of good information, so use it. At least read it and take notice of it. It will help you to design proper SQL servers for your SCOM 2012x environment.

Some good tips and tricks
SQL has the nasty habit to take away ALL available RAM. For SQL server this seems to be okay, but the underlying Server OS might get starved. Which is bad for SQL as well since it runs on top of the OS…

Therefore when provisioning SQL servers, reserve at least 4 GB of RAM for the Server OS itself. This will prevent the Server OS from starving, enabling SQL to run better.

Split the databases and logs! Put the temp DB on a disk of it’s own. The same goes for the log files and for the SCOM databases as well. Even better, use a dedicated SQL server for the OpsMgr database and another for the OpsMgrDW database. Put the system databases on a disk of their own as well.

And for bigger environments even a dedicated SQL server for the SSRS instance being used for SCOM. When using SQL Server standard edition AND using all those SQL servers and instances SOLELY for System Center 2012x technologies, no additional money for SQL server licenses are required Smile.

I won’t pretend I am a SQL DBA. But from my personal experiences I know this approach works well and results in a good performing SCOM 2012x environment. And when issues do arise, because all SQL servers are split, it’s far more easier to pinpoint the issues at hand and remedy them.

The SCOM 2012x MS Servers
Like I stated in a previous posting of mine, nowadays it’s better to roll out an additional SCOM 2012x MS server. This makes your environment more scalable and robust without having to go back to the drawing table when the monitoring requirements are changed.

I have seen this happening many times before and it’s a BIG difference when an additional MS server is already available, compared to provision a new VM, roll out SCOM and configure it. The latter will take way much more time, even when the installation of the SCOM server itself is automated by using PS.

Resource Pools
By default a new installed MG contains out of the box three Resource Pools. Use them wisely and add Resource Pools even more wisely. Don’t add them like you’re adding subscriptions for instance.

Every single Resource Pool requires attention and maintenance performed by the SCOM 2012x MS servers and MG as such. There are many different use cases and scenarios out there, some of them almost demanding a dedicated Resource Pool whereas others might fold in just fine with one of the already present Resource Pools.

The All Management Servers Resource Pool is a special one, requiring additional care. Sometimes there are good reasons to ‘break’ the automatic population rule of the Resource Pool and to remove one or more MS servers from it.

But be very careful here and only do this when you have really solid reasons for it (like isolating a SCOM 2012x MS server for a dedicated role) AND know what reverse side-effects it might have on the overall health and availability of your MG as a whole. When you don’t please don’t touch that Resource Pool.

UNIX\Linux monitoring
SCOM 2012x MS servers participating in the Resource Pool used for monitoring UNIX/Linux systems might require additional RAM and CPU. Simply because the UNIX/Linux SCOM Agents are nothing but web services, fully managed by the SCOM 2012x MS servers.

So additional power for those servers might come in handy. Just monitor them more closely as more UNIX/Linux systems are added to the mix. This will prevent performance issues as well.

Okay. I’ve followed all your advice and yet, the SCOM Consoles are dead slow! Now what?
Time to investigate! Start at the very bottom of things:

  1. Compute
  2. Storage
  3. Networking.

Use the available tools for it and don’t forget about SCOM itself Smile. There are many good reports out there enabling you to get a deeper insight into the performance of your SCOM infrastructure and not only limited to CPU, RAM and disk queue length. But also:

  1. Number of deadlocks on the related SQL DB engines;
  2. RAM consumption of the related SQL DB engines;
  3. Number of transactions per second for the related SQL DBs;
  4. Active connections count for the related SQL DBs.

On top of it all take also a look at what’s hitting the SCOM environment, like too many Alerts, State Changes, Performance Counters, Event collections and so on. The Report Data Volume By Management Pack (found under System Center Core Monitoring Reports)  shows you quickly what MP is generating the most volume.
image

And when clicking on one those values (like the one high lighted in yellow) the other report will be rendered, Report Data Volume By Workflow and Instance you’ll see what Monitors/Rules in particular cause the biggest bulk of total data volume in your SCOM MG:
image

By ‘simply’ tuning the first three Monitors you’ll address almost 35% of the data volume created by the identified MP in the first Report!

When you run reports like these on a weekly basis for the first few months and try to tune the identified Monitors and Rules, your SCOM environment will take a much smaller performance hit. When more in control, run these Reports on a monthly basis and go from there.

Another two reports which come in handy are both from the SCC Health Check Reports, Alerts - Top 20 Alerts By Alert Count (OM) and Alerts - Top 20 Alerts By Repeat Count (OM).

These Reports will show you quickly which Alerts are triggered most of the times. When solving the causes of those Alerts, and tuning the related Rules/Monitors, your SCOM environment will suffer less performance issues since far less Alerts do come in.

Again, run these Reports on a weekly basis for starters. When more in control, run these reports on a monthly basis.

Recap
When using SCOM 2012x, design and implement it properly. Even so, like any other technology, it requires maintenance and watchful eye on it all, by using the tips and tricks I provided. Soon you’ll see you’re in control of your environment and know how to check it when some performance issues do arise.

Thursday, April 10, 2014

!!!Don’t Use The Resource Pool Fix Anymore!!!

The past
When the Release Candidate of SCOM 2012 was available there were some issues with it. One of those issues was related to the Resource Pools. They were rather sensitive resulting in an unstable MG. Microsoft got a lot of feedback for it and released a quick fix KB article for it, KB2714482.

In this KB article one was told – when experiencing the Resource Pools issue with the Release Candidate version of SCOM 2012 – how to add a registry key (HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\HealthService\Parameters\PoolManager) with certain values, modifying the release request time and resource pool latency.

The present
So far so good. But now the BAD news.

KB2714482 is a long time ago pulled by Microsoft, almost at the same time SCOM 2012 went RTM and became general available. And there is a reason for it. The quick fix isn’t required any more since the SCOM 2012x versions after the Release Candidate contain a fix for it.

So now this quick fix works in a negative manner, resulting in unexpected and unstable behavior of the availability and recovery time of the Resource Pools.

I check on every customers location the SCOM 2012x Management Servers for the presence of this key, make a export – and after having a talk with the customer – I remove the registry key. Also reboot the servers so they start clean and with the normal configuration.

So far I have seen the SCOM 2012x become much more stable and reliable.

My two cents
Check your SCOM 2012x Management Servers for the presence of this key. When found, make an export of it, remove the key and reboot your servers. Yes, the Resource Pools will turn up grey for some time (sometimes up to 20 minutes, depending on the size and monitoring load). However afterwards your SCOM 2012x Management Servers will be okay and the overall availability and stability of your MG will improve.

Wednesday, April 9, 2014

OM12 R2 UNIX/Linux Agent Update Failing? Try Ping & Reverse Lookup

Issue
On a location I had to update a whole bunch of SCOM UNIX/Linux Agents to SCOM 2012 R2 UR#1. However, most of them failed with many different error messages. This was a bit strange since these systems were being monitored as intended, so no errors or issues there.

Cause
For SCOM it’s crucial being able to resolve the FQDNs of the UNIX/Linux systems. Also the reverse lookup has to be okay. Otherwise the management of these systems will fail.

Solution
So per UNIX/Linux system which failed to update to the latest SCOM Agent version I ran first a ping and then used that IP address for a reverse lookup using NSLOOKUP utility. Both commands I ran from the SCOM Management Server being used to update the SCOM Agent on those UNIX/Linux systems.

When everything matched (IP address <> FQDN) I reran the upgrade of the related UNIX/Linux Agent. And guess what? 98% ran just fine, leaving a smaller number of servers not accepting this upgrade.

Recap
Whenever upgrading the SCOM 2012x Agent on a UNIX/Linux system fails, first run a ping and a reverse lookup on the SCOM Management Server being used to run those upgrades. Many times it will solve the upgrade issues, making it much easier to single out the real problematic servers.

Monday, April 7, 2014

Free MP Authoring Tool For IT Pros Gets Update

In January 2014 Silect – in a joined effort with Microsoft – released a new MP authoring tool for IT Pros. This new tool also made the Visio MP Authoring tool obsolete and the old SCOM 2007 MP Authoring tool as well.

Based on customer feedback Silect follows up this first release with an updated version, properly titled MP Author SP1.

Customers already registered and using MP Author will be emailed with instructions how to obtain SP1. When you’re an IT Pro and new to MP Authoring, this is the tool to have. Go here.