Monday, August 26, 2013

SCOM & SQL: Provide Sufficient Memory Please

Issue
Bumped into this issue at a customers location. The MG came to a standstill which started with these two errors:
  1. Database error. MPInfra_p_ManagementPackInstall failed with exception:
    Database error. MPInfra_p_ManagementPackInstall failed with exception:
    There is insufficient system memory in resource pool 'default' to run this query.

  2. AppDomain OperationsManager.dbo[runtime].78 was unloaded by escalation policy to ensure the consistency of your application. Out of memory happened while accessing a critical resource.
    The application domain in which the thread was running has been unloaded.


  3. .NET Framework execution was aborted by escalation policy because out of memory. Thread was being aborted.
    image

First errors 1 and 2 were thrown. Afterwards the Console (UI) wouldn’t start anymore. Only the error as stated in Item 3 was thrown, happening on every computer where the Console (UI) was started.

Cause
After some thorough investigation it turned out the SQL server hosting the OpsMgr database was running out of memory. Because it of it, it couldn’t serve any requests any more, resulting in an unresponsive SCOM Console.

Resolution
This SQL server only ran 8 GB of RAM. After doubling it to 16 GB, and allocating a maximum of 12 GB of RAM to the related SQL instance, everything was just fine again.

Recap
The SQL server hosting the OpsMgr database is crucial to the overall health, performance and availability of the related Management Group. Therefore this server needs to be a beefed one for CPU, RAM and disk IO.

2 comments:

Richard Kinser said...

How big was this environment? In my environment, we have a little over 1000 servers being monitored. Here is the current RAM Allocation (these are VMs):

OperationsManager SQL Server - 8GB
OperationsManagerDW SQL Server - 8GB
OperationsManagerAC SQL Server - 16GB

Though, we were originally keeping about 90days worth of events in the ACS Database, now we are only keeping 2 weeks, so it's probably overkill.

Just curious how big the environment was you saw this issue in.

Marnix Wolf said...

Hi Richard.

This environment monitors about 300+ Windows Servers, 500+ network devices (with many customizations), 100+ UNIX/Linux servers, also with many customizations.

On itself not so big an environment but with a high level of customizations which creates additional load. Also the monitoring of UNIX servers creates more load compared to monitoring the same ammount of Windows based servers.

Cheers,
Marnix