Thoughts on Azure, OMS & SCOM: MMS 2013: Co-Presenting Two Birds-Of-a-Feater Sessions

I had the honor to co-present two BOF (Birds-Of-a-Feather) sessions at MMS 2013:

BOF01 - Cloud and Datacenter Management Roundtable;
BOF10 - Operations Manager Best Practices.

For BOF01 Cameron Fuller was the speaker (thanks Cameron for having me as co-presenter).

For BOF10 Eric Olmstead was the speaker.

However, Eric was a bit surprised since he only proposed this session, and had no intention to become the speaker. So he was glad to hand that role over to Joseph Chan (Principal Lead Program Manager for Microsoft) who accepted me as co-presenter (thank you Chan!).

What BOF is all about
BOF sessions have an open format. The slide deck (if any) consists out of three slides at the most: the title slide, the slide showing some additional information and the end slide about providing the evaluations for this session.

With a BOF the audience can ask any question related to the topic of the BOF session. And any one present in that BOF can answer it, which makes a BOF always a bit exciting since one never knows what questions are going to pop up.

So a BOF is dynamic and highly interactive. The speakers become more like presenters and have to manage the discussions, questions and answers.

Therefore a BOF represents the community in a very close manner. Anyone can chime in and share their personal experience and knowledge.

Who attended one or both BOF sessions?
For both sessions many experienced people were present like (but not limited to) Bob Cornelissen, Kevin Greene, Damian Flyn, Pete Zerger, John Joyner, Maarten Goet, Walter Eikenboom, Gordon McKenna, Arie Haan and Raymond Chou.

Microsoft was represented as well by people like (but not limited to) Daniele Muscetta, Joseph Chan, Satya Vel, Eugene Bykov and Vlad Joanovic.

On top of it all, the audience itself. All people working with one or more System Center components on a daily basis, which results in good experience and knowledge.

So a lot of brain power was present during both BOF sessions creating the ideal environment to answer even te hardest questions.

Some of the questions asked
These are some of the questions which were asked during these BOF sessions:

SCOM 2007 R2: Upgrade or not?
Even though Microsoft provides the mechanisms to run an inplace upgrade, experiences out of the field (both by Cameron and me) aren’t always that good. Already a couple of times an upgrade went bad resulting in a total restore of the environment. Also many times SCOM 2007 R2 environments carry around a lot of legacy, coming from SCOM 2007 RTM > SCOM 2007 SP1 > SCOM 2007 R2 with a lot of legacy MPs, connectors and the lot. Also many times Windows Server 2003 and SQL Server 2005 are being used, both NOT supported by SCOM 2012. Therefore it’s better to start fresh and opt for the alongside scenario where a brand new SCOM 2012 environment is build and step by step taking over the functionality of the SCOM 2007 R environment.
How about Agentless Exception Monitoring (AEM)?
Additional question: What kind of load does it create on the Management Server (MS) which becomes the AEM endpoint (collector)?
Even though Microsoft provides good guidelines, it’s hard to say per situation how many AEM clients a single MS can handle. It’s also good to know that when the AEM clients don’t run a SCOM Agent, they’ll show up as unmanaged computers in the SCOM Console. When having many AEM clients which are unmanaged it’s even better to run a dedicated SCOM Management Group for this purpose. Also the MS becoming the AEM Collector must use disks allowing high IO in order for the Read/Write actions. The better the disks, the more AEM clients a single AEM Collector can handle.
The Dell MP an UNIX/Linux servers don’t work well together. Why and how to work around it?
The Dell MP is targeted at Windows Servers, or more specific the Windows based Dell management client running on that server. So the Dell MP will only discover Dell servers running Windows Server OS with a SCOM Agent in place. UNIX/Linux servers won’t be discovered. There are some options to work around it like using SNMP and querying the MIB table for Dell specific OID’s and monitor those. Another approach is to replace them with Windows Server (this one was suggested by a Microsoft PM).
Audit Collection Services (ACS) and the future
ACS has gotten an update when SCOM 2012 became RTM, targeted at new security features present in Windows Server 2012. This question was answered by a Microsoft PM and somehow I got the feeling something in the near future will be communicated about ACS. None the less, ACS is here to stay and won’t be scrapped.
When monitoring UNIX/Linux servers a MS server can’t take the load as described by Microsoft.
Additional questions: Why? And: How come a MS monitoring UNIX/Linux systems is far more busier compared to a MS monitoring Windows servers?
First and foremost there are huge differences between a SCOM Agent for a Windows Server and a SCOM ‘Agent’ for a UNIX\Linux server. The latter is more a web service. Also a SCOM Agent for a Windows Server is an independent entity. It downloads the MPs, processes them, performs all the related scripts as required, collect the results and send them back to the MS. The UNIX/Linux ‘Agent’ however is light weight and is being controlled by the MS. So the MS processes the MPs targeted at the UNIX/Linux servers and sends ‘orders’ to the UNIX/Linux ‘Agent’ who processes it and sends the results back to the MS.

The answer to the Why question is harder to answer. How are the MS servers provisioned? What kind of disks do they use? When these are plain virtual disks, there might be a performance issue. Also where do all the MS servers and the related SQL servers reside? In the same LAN segmented with very low latency? Or are they separated by a WAN connection (not advised, ever!). Besides that, what is being monitored on those UNIX/Linux servers? Multiple instances and many UNIX/Linux deamons? Because that will increase the load on the MS involved. Also good to know is when the behavior (overloaded MS) started to happen? Perhaps a MP creates too much weight on the environment. As you can see many things might be happening here.
Can I use an already present SQL server for SCOM?
That depends. Like how many items the SCOM environment is going to monitor, what kind of usage of the SCOM Reporting is to be expected, what kind of SQL server are we talking about, what specs does it have, what kind of IO, RAM and CPU does it have to offer and what are the SQL collation settings. When having answered all these questions it can be decided what to do. Whether to use the existing SQL server or to provision one or even more new ones or to install a new dedicated SQL instance on the existing SQL server.

Some rules of thumb are: when SCOM Reporting is going to be used heavily together with customized Reports it’s better to separate the Data Warehouse from the OpsMgr database. Also when many network devices are going to be monitored, the Temp database must get it’s own disk. When the SCOM MG is going to monitor many servers and network devices, it’s better to use dedicated SQL servers. Also the SQL collation setting is very important. When the existing SQL server runs another SQL collation, a new SQL instance on that server for SCOM must be installed.

Sometimes the existing SQL server is a ‘beast’: having many CPU’s, packed with lots of RAM and running super fast disks. And when that server has enough power left to host the SCOM related databases, it’s relatively safe to use that SQL server. In all other conditions it’s better to use dedicated SQL servers.
How do I size my SCOM environment?
For this Microsoft released the SCOM 2012 Sizing Helper, an Excel sheet with some macro’s running in the background. This sheet helps you to size your SCOM environment properly.
How to go about Resource Groups?
Additional questions/remarks: Resource Groups are a new feature in SCOM 2012, taking over the RMS functionality and spreading them over all MS servers present in a MG. But sometimes Resource Groups can have unforeseen behavior or even (very seldom!) result in a broken MG. How to scale Resource Groups? How to go about it?

Microsoft is aware of it and working hard to release new updated documentation about these topics on Resource Groups. Soon this document will be made available to the public.

There were many more questions asked but these are the one’s which stood out for me personally.

Both BOF sessions were great and I love the open format where everyone can ask good questions and chime in for answering it. Awesome and thank you all for being there and making these BOF sessions a great success.

Thoughts on Azure, OMS & SCOM

Thursday, April 11, 2013

MMS 2013: Co-Presenting Two Birds-Of-a-Feater Sessions

No comments: