Thoughts on Azure, OMS & SCOM: SCOM vNext – Part IV – Topology Simplification, Pooling and Timeline

Thursday, December 2, 2010

SCOM vNext – Part IV – Topology Simplification, Pooling and Timeline

----------------------------------------------------------------------------------
Postings in the same series:
Part I – The Next Generation of SCOM
Part II – Holistic View of Application Health
Part III – Network Monitoring
----------------------------------------------------------------------------------

In the fourth and last posting of this series I will describe another new feature in OpsMgr vNext, the Topology Simplification. On top of it all the timeline, set for this newest version of OpsMgr will be shown alongside a total wrap up of the total session presented at Tech-Ed.

Lets start!

Topology Simplification

In one of mine blog postings I compared the today’s hierarchy of SCOM (SP1/R2) with NT4:

‘…One can look upon the OpsMgr hierarchy as Windows NT4. Here one had a Primary Domain Controller (PDC) with a writable SAM and one or more Backup Domain Controllers (BDCs) with a read-only copy of the SAM. And no matter from what location the User Manager or Server Manager was being run, it always connected to the PDC since that server contained the only writable SAM. Any adjustment was done there and then replicated to the BDCs. The RMS is just the PDC of OpsMgr and the Management Servers are the BDCs. So the RMS maintains the OpsMgr Management Group in every kind of way. Importing MPs? Done there. Adjusting MPs? Done there. (Web) Console connections? Authorizations? Notifications? Setting permissions? Scoping Views? Deleting objects? Setting Overrides? Yep! The RMS does it all. All changes are being put into the OpsMgr related DBs and replicated to the Management Servers…’.

Well, the NT4 days are history, so it’s time to throw out the RMS! Why? Because it is a SPoF (Single Point of Failure). Say what? The RMS can be clustered? Yes, you are right. But personally I don’t like that kind of setup since it is prone to errors, especially when an update like a CU has to be applied. And when a clustered RMS breaks down, it can take a lot of time and energy to get it right again.

Microsoft got much response from their customers and MVPs about the RMS and its SPoF ‘capabilities’. And again, they LISTENED!

With OpsMgr vNext the RMS won’t be there anymore. All OpsMgr vNext Management Servers will be running the SDK and Config service, alongside the Health Service! Also an additional database has been added: the Config DB, which is OPTIONAL and advised to be used in REALLY LARGE OpsMgr vNext environments with a huge config space!

How does OpsMgr vNext function without the RMS? Lets take a look at these pictures:

Picture 01-A:
As you can see, all OpsMgr vNext servers are connected to all three OpsMgr related DBs. On all three Management Servers the three OpsMgr services are running. And every single Management Server has a certain set of managed (monitored) items ‘talking’ to it: the MS on the left has a Gateway Server talking to it, the MS in the middle manages a set of servers and the MS on the right monitors network devices and Windows Azure.

So that’s good news: load distribution, however nothing new compared to today’s situation with OpsMgr.

But wait just a minute! Suppose the MS in the middle dies. Now what? The servers reporting to that MS aren’t monitored anymore? That’s bad! With OpsMgr as it is today we had a single SPoF and with OpsMgr vNext we have multiple SPoFs?

Gladly Microsoft thought this one over and introduced a new concept in OpsMgr vNext in order to address that issue: an OpsMgr vNext Server Pool. Without it one would experience the above mentioned issue where a single instance of a Health Service run by an OpsMgr Management Server monitors a set of devices, and when that Management Server dies, so does the monitoring of those devices.

Picture 01-B:
But take a look at this picture where a POOL is being used. A Pool is nothing more of a logical grouping of multiple Health Service instances. So instead of a set of monitored devices being managed by a single instance of a Health Service, running on a single Management Server, these devices are being managed by a set of multiple Health Services, hosted by multiple Management Servers:

Picture 01-C:
And when the Management Server on the left dies, the other Health Services in the Pool will automatically take over the devices which were being managed by that broken Management Server! Wow! Sweet!

And these Pools aren’t limited to MS only! No way. One can also create Pools for Agents. Why should one do that? Imagine you use certain monitored servers as Proxies for OpsMgr vNext in order to monitor Windows Azure or a set of websites. When the server hosting that OpsMgr vNext Agent dies, the additional monitoring being run by that same server (as a Proxy) would come to an end as well.

When using a set of Agents in a Pool, that would not happen and monitoring of those Azure based apps and websites would be taken over by one or more Agents in that same pool!

So the SPoFs are really gone in OpsMgr vNext! That’s really a HUGE improvement!

Java EE (J2E) Web Service Monitoring:
Like WebSphere/WebLogic/JBOSS/Tomcat on Windows. In telegram style: Investments made on .NET side are also made on J2E server side. Demonstration about the first beta version was given. The way it is presented in the OpsMgr vNext Consoles is the same across all monitored objects in OpsMgr vNext. Same user experience like monitoring other applications and services. Nice!

Partners:
Many third parties write MPs for OpsMgr. However with OpsMgr as it is today, it is a challenge for them to get a real native MP which means the MP is fully integrated into OpsMgr without the need for additional layers of software, installed alongside or even separate servers.

With OpsMgr vNext Microsoft addresses this issue so Partners are enabled to author native MPs by providing a standard, simple, reliable and consistent way to build their rich solutions on top of OpsMgr vNext.

Timeline:
When can we expect OpsMgr vNext to go RTM? Q4 2011:

Total Wrap Up:
Wow! OpsMgr vNext is really something new. Not just a rebranding of the name has been done. No way! OpsMgr vNext has much to offer, compared to today’s version of OpsMgr (R2 with CU#3). Many things have been altered, adjusted, improved, added and enriched.

Sometimes I get questions out of the field about the upgrade path to OpsMgr vNext or that it is going to be the same like the move from MOM 2005 to SCOM 2007. In that scenario an upgrade wasn’t possible. One had to install SCOM 2007 alongside MOM 2005.

Good news is that OpsMgr vNext allows for an upgrade path from OpsMgr R2 CU#3 to OpsMgr vNext. As Microsoft puts it: ‘It will be a Seamless Experience’. Also custom made MPs will migrate from OpsMgr R2 CU3 to OpsMgr vNext. Only small scenario's based on PS and SDK will not work right away, because of some DB Schema updates, but MPs and DA's should just work. When it doesn't Microsoft wants to know!!!

So OpsMgr vNext has much to offer, like:

Revamped Network Monitoring;
360 degree Monitoring of Applications;
Removing the RMS and introducing Pools thus removing SPoFs completely;
Adding monitoring of J2E;
Adding monitoring of .NET based applications, based on the AVICode aquisition;
Better ways for Partners to develop native MPs.

Personally I think this list will only grow in the months to come. In Q2 of 2011 the first public beta will be available, from that moment on we can see what additional features have been added as well.

For now Microsoft has shown its dedication to the OpsMgr product. Many things have been added, improved and enhanced. Much of it based on the input from YOU!

So whenever you bump into an issue with today’s version of SCOM do not hesitate and post it on Connect. Microsoft listens and cares! Only with the input from its end-users Microsoft is capable to develop the next generation of a product which really adds value.

4 comments:

RageFan said...: Just like Microsoft to leave out 54 degrees of monitoring ;-); December 3, 2010 at 3:56 PM
Marnix Wolf said...: :)

Thanks for telling me. A typo it was...

Cheers,
Marnix; December 3, 2010 at 5:22 PM
aenagy said...: Did Microsoft talk about data consistency in management packs? The problem we encounter every time we implement a new MP is that the data fields are not populated consistently (Computername, Hostname, Fully qualified hostname, etc), either within the same MP or across all MPs. As a result of the inconsistent data coming from SCOM we need to implement a lot of ugly Perl on our enterprise event management/correlation solution (TEC/Omnibus) to try to normalize the data. As you can imagine this code is difficult to maintain and constantly needs to be tweaked in cases where the author of a new MP decided to be cute/funny/stupid/lazy.; December 4, 2010 at 8:27 PM
Marnix Wolf said...: Hi aenagy,

no Microsoft did not directly. However, they know it is an issue and they are going to address it.

Cheers,
Marnix; December 5, 2010 at 2:10 PM