As it turned out, this server ran many monitored objects, services and applications. As a matter of fact, too many. Normally the queue for a SCOM Agent is set to 15 megabytes (15360 KB).
The maximum queue size is set to 75 megabytes (76800 KB). Afterwards the Agent is restarted and all is well again. The maximum queue size is 256 megabytes (262144).
The register key you need to update is to be found here: HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\Management Groups\<MG NAME>\MaximumQueueSizeKb.
Whenever you bump into a server where there are really many objects to be monitored and not all the collected information seems to reach SCOM, try these steps as described here:
This applies to SCOM R2 Agents and OM12 Agents as well.
The way I found this issue was that SCOM was sending out notifications like this:
ReplyDeleteAlert: Health Service Heartbeat Failure
Path: Microsoft.SystemCenter.AgentWatchersGroup
Last modified by: System
Last modified time: 11/14/2012 7:58:52 AM Alert description: The System Center Management Health Service on computer failed to heartbeat.
The solution presented here fixed my issue immediately in SCOM 2012 CU1. I now have a whole bunch of working agents as they should be.
Thanks so much for posting!