Thursday, June 30, 2011

Distributed Applications (DAs) – Part IV: Tips & Tricks

----------------------------------------------------------------------------------
Postings in the same series:
Part   I - Why use them?
Part  II - How about DAD?
Part III - Do’s and Don’ts
----------------------------------------------------------------------------------

First some background information about the reason WHY behind this series
This posting is for me personally the MOAP (Mother Of All Postings) since it took me four to five weeks to collect all required data, check, validate and double check them and document everything in a proper way.

But it’s time well spent. Actually, this collection of Tips & Tricks about DAs triggered a new series of postings (Postings 1 to 3) instead of a single posting since I wanted to put everything described in this posting in the proper perspective. So when you haven’t read the first three postings, please go back and read them first in order to get the most out of this series.

About this Posting
In the fourth and last posting about DAs I will share some experiences and lessons learned. It’s good to know that some of these lessons were learned by other colleagues/project team members (like Arthur Nieuwland), where others were learned by myself. So this posting is the result of the ‘blood’, sweat and tears shed by many people. Without the input of us all this posting wouldn’t contain 17 items, which makes this latter posting true to the real spirit of the community: it’s all about sharing.

!!!Spoiler Alert!!!
Some of the items might be well known where as others might be totally new. Item 17 is really really something special. It’s a BOMBSHELL!!!

Let’s start since it’s a long long list.

  1. DAD freezes many times while I am working with it
    Basically what’s happening is this: One is working in DAD and wants to add an Object to a new or existing Component (Group) by using the Object Picker.
    image
    And when one hits enter, all seems to be just fine. Until one selects an Object in order to add it to a Component (Group): DAD freezes. Totally.
    Cause
    Many people might think SCOM is broken. But that’s not the case. What happens is that SCOM kicks of a query in order to enumerate ALL Objects which relate to the Object you’re looking for. This a generic query. But when one adds an Object to a Component (Group), SCOM kicks of a second query in order to enumerate ALL the Objects which are similar to the one which you just added to a Component (Group) since DAD wants to add a ‘Wunder Bar’ containing all these Objects.
    image
    Under normal conditions this is OK. But when there are ‘tons’ of Objects, like logical disks, network interfaces (when a network MP is in place) this causes SCOM to hang. Big time.

    Workaround
    Easy. Saturate Object Picker (with 7+ Object Types) in order to work around a frozen DAD. How? Simply add new Component Groups to your DA (give them names like 1, 2, 3 etc etc) by selecting Objects which aren’t present in big numbers in your SCOM environment. Per NEW Object Type DAD will create an additional Wunder Bar. So the game is that you need to add a different Object per Component Group. Otherwise DAD won’t create a new Wunder Bar. You must add new Components until you’re shown the Replace Visible Object Type screen:
    image

    Object Picker will show 7 Wunder Bars now:
    image

    In the Replace Visible Object Type screen, leave the default option selected (Leave the new object type not visible) and click OK. Now the Object Picker is saturated and one can select the Objects which are present in the SCOM database in huge numbers without freezing DAD. When all Objects are added to your DA remove the ones you don’t need and save the DA. May cost some time but when one gets the hang of it, it certainly beats the time lost when DAD freezes (again).

  2. DAs don't show items any more but group them
    When a DA contains 7+ Components (or Component Groups) they won’t be shown separately but grouped together like ‘Healthy’, ‘Not Monitored’, ‘Warning’ etc etc. This is by design. Cameron Fuller pointed this one out a while ago and his approach is the one I use as well: split DAs into multiple layers, Also see posting Part III about DAs, 3rd bullet Divide and Conquer.

  3. When I want to add a component to a DA it takes ages for the 'Create New Component Group' screen to load
    image
    Funny thing, the Add Component button kicks of a query which enumerates all Objects Types present in the SCOM database. This query isn’t really fast so it takes a lot of time in big SCOM environments. Therefore it takes a ‘while’ before the  Create New Component Group screen is shown. Same thing happens when one wants to look at the properties of a Component (Group) which already exists. For the latter there isn’t a workaround, for the first situation there is. Don’t use that button but the Search for objects screen instead:
    image

    When the Object is found, right click it and select the option Add To. Depending on your DA and the Component (Groups) already present one can choose between New Component Group or existing Component (Groups):
    image

  4. When I create a relationship between two or more Component Groups in a DA, what happens underwater?
    Common mistake: when adding a relationship ('Create Relationship' button) it won't create a Monitor (or alter an existing one for that matter). ONLY a relationship will be created. Nothing less, nothing more.

  5. Should one use Relationships or not?
    Relationships are more a visual item than anything else. Take a look at these two pictures. The one on the left is the DA without a Relationsship between Database 1 and Database 2 whereas the picture on the right shows the same DA but now with an added Relationship.
    image  image

    One could say the latter picture clarifies the relationship, but that's the whole point of any DA already. Somehow the Components relate to each other. If there wasn't any kind of Relationship at all, those very same Components shouldn't be in the DA in the first place. Also, when the DA consists out of multiple components there are already enough lines between all those components. The credo 'Less is more' comes into play here. So personally I don't use Relationships that often.

  6. Let's query the SCOM database!
    The 'Advanced Search' link can be very helpful. Use it and be surprised. Works way much faster compared to the earlier mentioned Add Component button .

  7. I want to check out the settings of the Monitors created by DAD. Is there a quick way to do this?
    When altering settings of the Dependency Monitors (created per DA and per Component Group) it's better to open the Diagram View of the DA.
    image
    image
    Right click on any Component Group and open the Health Explorer. From there any Monitor can be opened and adjusted as required.
    image
    image

  8. How Health Rolls up
    By default, the Health Rollup Policies of the Dependency Monitors of the DA and Component Groups are set to 'Worst state of any member'. In many situations this is OK, but through overrides it can be set to other states as well.
    image
    Suppose you are monitoring the redundancy of some Objects like services, four in total. Only a critical situation arises when 3 or more services go down. Here the Health Rollup Policy must be changed to 'Best State of any member'. Adjust it as required > Apply > OK and be done with it.

  9. When the 'flow' isn't working or how to redirect it...
    Sometimes you create a DA which doesn't work. Health doesn't rollup. When looking at the DA in Diagram View, the top level or some sublevels (the Component Groups) are in an unmonitored status. When you expand the Diagram further, many times you'll see that the underlying components are OK: they're monitored. But why doesn't the status rollup? 
    image

    This is where Health Explorer comes into play. Simply because the Diagram View only displays the looks of the DA and Health Explorer shows you what is happening under water. As stated before, the Dependency Monitors (residing at DA and Component Group level and created by DAD) are targeted against the Parent Monitor 'Availability'. But sometimes the monitored Objects, like network interfaces of network devices monitored by Jalasoft Xian IO, don't do very much with that Parent Monitor. In this particular case all the Monitors are grouped under the Parent Monitor 'Performance'. So some additional tweaking is required in order to get the flow running again.
    image


    In this case the Dependency has to be changed (NOT the Parent Monitor) from 'Availability' to 'Performance'. Save your changes and soon all will be fine in a few minutes.
    image

    This trick can also be used when a particular Class is added to a DA of which the stakeholders tell you that some Performance Monitors are more important to them. Another approach would be to set the Dependency at ENTITY level. This way all four Parent Monitors (Availability, Configuration, Performance and Security) will be used.

    Another approach could be more granular. Suppose you're monitoring an Object which involves tons of Monitors. Changes are that all those Monitors aren't required in your DA. So it's better to select only that particular Monitor which is important to that DA.This can be done by changing the Dependency of the Monitor by selecting the single important Monitor.

  10. Do You Want it ALL or just the Important Bits?
    Many times when building DAs one is attempted to use whole entities like Windows Computer, Network Devices, SQL Computer and the lot. Many times however the details are the most important things. So it better to go a more granular level like the disk space of a certain disk drive, a certain SQL DB or a certain Service/Process.

    Also because many organizations have carved their ICT into departments as well. One department is responsible for hardware, the other for the Virtual infrastructure and SAN, another for the databases and another department for a certain set of services/applications. So the DA has to reflect that as well.

    No point to create a set of DAs for application owner which will raise Alerts for the related Computer Objects as well since Alerts like that are meant for another Department. This way a DA will die soon since it gives too many false-positives. The server itself might experience issues but those very same issues won't necessarily affect the service/application. Therefore, differentiate in you DAs in order to reflect the ICT departments and their related responsibilities. And keep in mind for whom you're building that particular set of DAs. The overall quality of SCOM is judged on the QUALITY of the Alerts not the QUANTITY...

  11. Don't open Health Explorer of the DA itself
    Many times a single DA contains some or more Component Groups. Per Component Group one or more Objects are added. When the Health Explorer of that DA is opened, it will aggregate ALL Health Explorers of ALL objects present in that very same DA. This will bog down the overall performance of SCOM, so it's better NOT to open the Health Explorer of any DA. An alternative is to open the DA in Diagram View (which can already take some time depending on the size of the DA), use the button 'Problem Path'. Now the Object with the problem will be shown. Select that Object and open Health Explorer on that level.

  12. Error 'The value does not fall into the expected range' when opening/closing DAs
    This error puzzled me for some time. But I think I know the two most plausible reasons behind this error:
    • DA is defect
      This bad. Remove it and rebuild it. This doesn't happen many times though, 1 time out of hundred times this error is thrown. When it happens many more times to you, changes are you're experiencing issues with SCOM like a database issues for instance.
    • DA was still loading and being enumerated in the background but the screen of the DA is closed
      Some DAs are big and (especially top level DAs or big aggregate DAs) so SCOM is busy enumerating all the Objects within those DAs. Sometimes those enumerations time out or, when the screen is closed, are cut off in between which generates the earlier mentioned error.

      How to differentiate between a defect DA or the enumeration issue? Easy. Just open the DA in Diagram View. When the above mentioned error is thrown while the Diagram View is opened, changes are the DA turned sour. When the DA opens neatly in the Diagram View, the DA itself is OK. Of course, give a DA time to land. So when it's build and saved wait some time (like 10 to 20 minutes) before opening it in Diagram View.

  13. Error 'Verification failed with [x] errors' when removing a DA
    This error occurs when one tries to remove a DA which is put into another DA as a Component Group. So remove the DA from the other DA(s) and save the changes. Now the DA can be removed without any error message.

  14. I have a DA in place so now I can run reports against it, can’t I?
    Nice one! The answer is No and Yes. Why? First of all creating a DA doesn't create any Rules. None. So for any given DA no data collection for Reports takes place. So that's the No.

    The Yes comes into play when taking a look at the Objects which are part of that very same DA. Suppose you add one or more SQL databases in a DA. Against these very same databases, collection of data for Reports already takes place. So against Components like those one can run Reports and be successful at it. But pure technically this is outside the scope of the DA and only so because that Object has some collection Rules running against it, all taking place outside the DA itself.

  15. Based on item 14: How to run Reports against certain DA Components?
    Open Reporting and click till you drop? Based on trial & error you should come a long way. But it takes too much time, effort AND frustration. Gladly there is far more better way. By using the Diagram View of the DA in conjunction with the Targeted Reports option.
    image
    This way one can select an Object of any given DA and the Action Pane will show the related Reports. Another advantage of this approach is that SCOM will do a lot of work for you, like selecting the correct Classes for the Report. This step can be hard to grasp so it's good SCOM does the work for you :). Just try and be surprised.

  16. My Group is GONE?
    Sometimes it’s better to use Groups in order to populate Components in DAs. But when a Group is added to a DA the very same Group won’t show up anymore in the Groups view of the Authoring Wunder Bar in SCOM. This is by design so don’t start doubting yourself. When you want to edit the Group, you have to remove it from the DA, save the DA and within a few minutes the Group will be shown again in the Authoring Wunder Bar under Groups. Now you can edit it.

  17. Can I put components from other unsealed MPs into a DA which resides in another unsealed MP?
    Officially, No. You can’t because an unsealed MP can’t reference another unsealed MP. You’ll get this error message when you save the DA:
    image
    image
    And:
    image

    But unofficially, there is a workaround!
    Got this from the highly respected project team leader, Arthur Nieuwland. Really a good trick it is. Awesome! 

    Let’s go back to the first screen Create New Component Group. But instead of the most specific level (x64 Based Windows Server Operating System), which resides at the UNSEALED MP level (which is another unsealed MP than the unsealed MP where the DA resides), you select one level higher which resides in the core MP of SCOM, which is a SEALED MP.

    So now the unsealed MP, containing the DA references a SEALED MP, which is allowed!

    Like this: 
    image

    Now the Object can be added and the DA saved. All is well now: 
    image

As you can see, there is a lot more to DAs one would first expect. When one follows the guidelines described in the this series of postings, DAs can become a great asset to your SCOM environment, thus enabling SCOM to deliver more bang for the buck. Enjoy creating DAs but keep in mind that good planning is required in order to get it right. Happy SCOMming!

4 comments:

Daniel said...

Great Article Marnix!!

Thinking about item 8, "how health roll up". Maybe my english is a bit poor, but i don´t understand the text "Worst State of the specified percentage of members in good health state".
AFAIK it means that if you have three objects and two are failed:
- if you select 70% it will be healthy
- And if you select 60% it will be in critical state.

Thanks in advance.

Howard said...

Awesome Post.Wish I knew about it when I did it last year. TO get around the groups disappearing out the console you can create the as Sub Groups:)

Dennis de Jager said...

At 16. My Group is GONE, I have found a working solution for that.
For example I Made a 2 groups called SCOM App Server and SCOM DB Server to use in my DA and both disappeared.

I went on and came to the point where I had so many groups hidden in the Distributed Applications that managing them became a hastle, so I decided to put them in a group call SCOM as a subgroup.

They don't disappear anymore since they are subgroups! You can still manage members, exclusions and so on.

So maybe you can update 16 with this work around.

Dennis de Jager said...

Hmmmm I should have read Howard's comment first!