Thoughts on Azure, OMS & SCOM: Crucial OpsMgr Services explained. Part I: The Basics

Friday, September 18, 2009

Crucial OpsMgr Services explained. Part I: The Basics

---------------------------------------------------------------------------------
Postings in the same series:
Part II: The SDK Service
Part III: The Config Service
Part IV: The Health Service
---------------------------------------------------------------------------------

On many occasions I do get questions about the OpsMgr specific services, like:

What are they and what are their purposes?
Why are certain services only available on OpsMgr Management Servers and not on Agent Managed servers?
Why are certain OpsMgr services disabled on Management Servers but enabled on the Root Management Server?

Since there is much to tell and I strive for readable blog postings I have decided to launch a new series in order to cover this topic. At the moment I do not know exactly how many postings this series will contain, but lets start with Part I: The Basics and see how it goes from there.

As one has noticed, every OpsMgr Management Server has three OpsMgr Services: (Blue: RTM/SP1 name of service, Red: R2 name of service. Check out this posting)

OpsMgr SDK Service / System Center Data Access
OpsMgr Health Service / System Center Management
OpsMgr Config Service / System Center Management Configuration

An OpsMgr managed server (a server which runs the OpsMgr Agent) runs only the second service, OpsMgr Health Service / System Center Management.

As expected, every service has is own purpose. Also when checking a Management Server (not the Root Management Server!) one will notice that only the second service is running and the other two (SDK & Config) are disabled. This is by design and shouldn’t be altered.

Question: Why does a Root Management Server runs all three services?
One can look upon the OpsMgr hierarchy as Windows NT4. Here one had a Primary Domain Controller (PDC) with a writable SAM and some or many Backup Domain Controllers (BDCs) with a read-only copy of the SAM. And no matter from what location the User Manager or Server Manager was being run, it always connected to the PDC since that server contained the only writable SAM. Any adjustment was done there and then replicated to the BDCs.

The RMS is just the PDC of OpsMgr and the Management Servers are the BDCs. So the RMS maintains the OpsMgr Management Group in every kind of way. Importing MPs? Done there. Adjusting MPs? Done there. (Web) Console connections? Authorizations? Notifications? Setting permissions? Scoping Views? Deleting objects? Setting Overrides? Yep! The RMS does it all. All changes are being put into the OpsMgr related DBs and replicated to the Management Servers (Better: The Management Servers are notified by the RMS things have changed so they have to update their configuration.)

This makes the RMS a rather busy fellow. It has a lot of work to process.

In order to do that, it needs all three OpsMgr Services to be in a running state. This also explains why enabling these three services on a Management Server is the recipe for disaster: the OpsMgr Management Group doesn’t know any more who the ‘leader’ is and ends up in a situation to be compared with one which is known in the Cluster world as a ‘Split-Brain’ scenario. Isn’t nice either….

Only when a Management Server must be promoted to Root Management Server, these services need to be started and set to automatically. But that is another story.
Question: Doesn’t that make the RMS vulnerable?
Hmmm. Yes. It does. Therefore in environments where Enterprise Monitoring is crucial, a Clustered RMS is advised. Personally I have some doubts about the mechanism behind Clustering but that is another discussion. But indeed, the RMS introduces a SPOF (Single Point Of Failure).

On System Center Central there was a very interesting discussion going on about the new version of OpsMgr to be released in 201x. One (or more) person(s) raised the idea to distribute the RMS roles and thus leverage it to the Windows Domain Controllers as we do know them now. That is really a good idea. Lets hope it is being picked up.
Question: What about the specifics of these OpsMgr Services?
Good question. I will describe every service in a separate blog posting. The blog postings in this series will look like this:
- Crucial OpsMgr Services explained. Part II: The SDK Service.
- Crucial OpsMgr Services explained. Part III: The Config Service.
- Crucial OpsMgr Services explained. Part IV: The Health Service.

And perhaps some other postings on this topic as well, based on the feedback I do get.

5 comments:

Anonymous said...: I appreciate your posting! I am looking forward to the next posting regarding the services. I am new to SCOM, using R2.; September 18, 2009 at 5:53 PM
Anonymous said...: Thanks a lot,
Easy to understand with the NT4 analogy.
I'm waiting another post ;-); September 21, 2009 at 8:52 PM
Marnix Wolf said...: Hi Anonymous (1 and 2) :)

Thanks for visiting my blog and your comment.

I just posted the second article about the SDK Service and a way to monitor its usage.

Best regards,
Marnix Wolf; September 21, 2009 at 9:43 PM
Tom Martin said...: Thanks for your post Marnix. When you refer to a management server, does this include Gateway Servers?

Thanks,
Tom; September 29, 2009 at 6:21 PM
Marnix Wolf said...: Hi Tom.

Thanks for visiting my blog. Good question you have.

When I refer to a Management Server it doesn't include a Gateway Server. A Gateway Server only runs the Health Service and can never be promoted to a RMS.

As a matter of fact, a Gateway Server is more like a super OpsMgr Agent.

On one side there are many OpsMgr Agents communicating with it, based on Kerberos authentication, on the other side the Gateway Server communicates with a Management Server based on certificate authentication.

Also a Gateway Server has a huge benefit: it compresses the communication between itself and the Management Server to a great extend.

This is because the Agents already compress the traffic a bit. However, this is only traffic from ONE Agent. The Gateway server sends information from MANY Agents to the Management Server, thus the compression is much more successful since it is done on much more data.

Therefore Gateway Servers are a great way to be used on sites where WAN connections are in place. Suppose you have a datacenter where the mainload of the servers are located and on a couple of other locations there are some servers to be monitored as well.

A good way to go about it is to place the Management Group (RMS, MS and SQL) in the datacenter and on the other locations, Gateway servers. When these servers are in the same forest as the Management Group then certificates aren't needed. Kerberos will suffice.

Hope this clears things up a bit for you.

Best regards,
Marnix Wolf; September 29, 2009 at 7:48 PM