Tuesday, June 11, 2013

High Level Overview: NetApp SAN Monitoring With DATA ONTAP MP

Update 06-12-2013:
Cameron Fuller posted today a blog article all about how to tune this MP. Awesome. So for more information about how to tune this MP when all is in place, go here. Thanks Cameron!

This posting contains a high level overview of the required steps in order to monitor a NetApp SAN with the DATA ONTAP MP, titled OnCommand PlugIn by NetApp. This high level overview is based on version 3.2 of the OnCommand Plugin.

For a full installation manual please use the PDF files supplied by NetApp. These manuals are part of the downloadable executable (OnCommand-PlugIn-Microsoft_3.2_x64_NetApp.exe).

Dependencies
This MP has some dependencies. Without having them in place AND properly configured, the OnCommand Plugin won’t work. So make sure all is accounted for.

  1. PowerShell version 3.0 has to be installed on ALL OM12 Management Servers;
  2. NetApp OnCommand PlugIn has to be installed on ALL OM12 Management Servers;
  3. SNMP on ALL NetApp Filers must be enabled and configured;
  4. ALL NetApp Filers must be present in OM12 as network devices (so run a Discovery);
  5. The OM12 Action account requires permissions on the NetApp Filers;
  6. A SQL server for hosting the SQL database the OnCommand Plugin uses. The SQL Server hosting the OpsMgr SQL database will do the trick.

Dependencies 1, 3, 4 and 5 must be in place before you start with installing the OnCommand PlugIn. Dependency 2 will be taken care of when installing and configuring the OnCommand PlugIn.

Installation & configuration
The installation of the OnCommand PlugIn starts really simple with installing the OnCommand Plugin on ALL OM12 Management Servers. Please make sure PS 3.0 is installed and operational before you start. Otherwise the installation will fail.

  1. Installing OnCommand Plugin on all OM12 Management Servers
    1. Start the file OnCommand-PlugIn-Microsoft_3.2_x64_NetApp.exe with elevated permissions.
    2. Follow the wizard and select the required components, e.g: SCOM Management Packs, Storage Monitoring, SCOM Console Integration, Cmdlets, Documentation and OnCommand Discovery Agent;
    3. When having SCORCH and/or Hyper-V you can also select the components related to those technologies;
    4. The account you have to specify requires local admin access on the OM12 Management Servers. Many times using the OM12 Action account works best;
    5. From version 3.2 this MP uses a SQL database as well. Using the same SQL server which hosts the OpsMgr database works fine for me.

  2. Configuring the NetApp MP
    Make sure all NetApp Filers are already discovered and monitored in OM12 as network devices.
    1. During the installation of the OnCommand PlugIn on the OM12 Management Servers two NetApp MPs are imported: OnCommand Data ONTAP and OnCommand Data ONTAP Reports;
    2. When you had the OM12 Console open when installing OnCommand PlugIn, close it and open it again;
    3. Create a MP for the overrides created for the NetApp MPs;
    4. Go to Monitoring > Data ONTAP > Storage Systems > Management Server. Select one of the listed OM12 Management Servers and click on the right side of the OM12 Console under the header Health Service Tasks on Data ONTAP: Add Controller;
    5. Add all NetApp controllers, one by one;
    6. Go to Monitoring > Data ONTAP > Storage Systems > Controllers. Select one of the listed Controllers and select in the right side of the OM12 Console under the header Data ONTAP Controller Tasks > Data ONTAP Manage Controller Credentials;
    7. Add per Controller the required credentials. Best Practice here is to use an AD based account. When SSL isn’t required, remove the selection. Because of a bug removing the SSL requirement might fire an error. Simply click Continue and go on.

  3. Discovering the NetApp components
    Now all NetApp components need to be discovered. Otherwise no monitoring Smile.
    1. Search for the Rule Data ONTAP: Discovery Rule. Use this shortcut for this search: go to Tools (top menu bar of the OM12 Console) > Search > Rules. Saves you a lot of time;
    2. By default this Rule is turned off. Enable it through an override and store it in the MP created in Step 2.3;
    3. Now the Discovery has to be started. Go to Monitoring > Data ONTAP > Storage Systems > Management Server. Select one of the listed Controllers and select in the right side of the OM12 Console under the header Data ONTAP Controller Tasks > Data ONTAP: Run Discovery Task;
    4. When the OM12 Action account is authorized for accessing the NetApp SAN, you don’t need to enter credentials for running this task;
    5. After an hour or so all NetApp components are discovered and will have a monitored state a bit later.

  4. Required: Tuning!
    This MP is really good and really appreciated by many of my customers. However, many of the Monitors in this MP are set to zero so those Monitors require some good tuning in order to get the best out of this MP.

    Other Monitors use wrong thresholds. This isn’t a bug is done on purpose, forcing you to tune them according your environment. When done this MP will really deliver added value.

Compliments!
Compliments to NetApp for delivering such a good MP. When properly tuned (like any other MP Smile), this MP really delivers added value for any organization running one or more NetApp SANs and OM12. I have seen many third party MPs but not many are of this level. A job well done NetApp!

10 comments:

SCOM Artist said...

Hello. First of all thanks for this post as it is a great overview for configuring the MP.

I recently installed the 4.0 version and I'm slightly confused on the wording of the documentation.

I installed the MP, added my controllers, setup credentials, and completed the "Run Discovery Task" that is located inside the SCOM console. I also setup the SNMP alert destinations on the NetApp. The documentation mentions the discovery will occur by default in 4 hour intervals. So do I still need to enable the Rules for discovery?

Unknown said...

Just a word of caution about this MP/Plug-In, we’ve found that some of the rules that run/sync each day, actually Close all open alerts, ie. Data ONTAP: Volume Space Utilization (%) monitoring, then after a few minutes all of the closed alerts will re-open. This can cause 2 alert storms: 1. for Closed alerts 2. for New alerts. NetApp tech support states that because the re-sync rule, the one that syncs up with the NetApp controllers every so often (ie. every night), basically clears all such open alerts and then reopens them when it discovers threshold breaches. According to NetApp tech support, this is by design. I believe it is a flaw, but there is no word on whether or not they will fix or enhance this so that open alerts stay open and the repeat count gets incremented.

TommyGun said...

So... i don't understand how monitoring actually works. We have 2 dedicated Management Servers for DATA ONTAP, so we enabled the Discovery Rule for those two (although the manual states that you should enable this rule for one Management Server only). Anyhow, when adding the controllers, you can select any Management Server you like, even while it is not running the DATA ONTAP software. Afterwards, there is no way of seeing which Management Server is doing the actual work, also there is no way of 'moving' controllers between Management Servers (since there is no Resource Pool). I can see the OnCommand event log being created and it is indeed working, but how can i determine if it is using just one or more Management Servers and if those are the correct (dedicated) ones?
Thanks!

Marnix Wolf said...

Hi TommyGun.

This eludes me as well. To be frank it's some time ago now I worked with this MP so I am not able to answer your good question. Guess it's better to contact NetApp support. Hopefully they're capable to tell you the deeper workings of this MP.

Cheers,
Marnix

TommyGun said...

Thanks Marnix, i will let you know the outcome.

Marnix Wolf said...

Thanks man.

TommyGun said...

Well, the very usefull comment from NetApp..


Microsoft would still have to indicate how their SCOM software functions to determine which Management Server is doing the work.
That type of functionality is outside of our Management Pack.

Marnix Wolf said...

Hi Tommy Gun.

Thanks for the feedback. As you may know this response from NetApp is b%$sh^#t. Simply because they could use a custom Resource Pool and use that as the source. But my guess is that NetApp lacks the resources/budget/interest in order to serve their customer base properly, to make this happen. Hence the b%$sh^#t response you got.

Cheers,
Marnix

TommyGun said...

Hi Marnix,
Yes, i totally agree with you. There is a resource pool for the “Clustered Data ONTAP” Management Pack, but there is none for “OnCommand Data ONTAP”, hence my question which they replied on as posted just before.

This is what i mailed back;
Well, I doubt if Microsoft can say anything about a third party Management Pack and I would suspect that NetApp has all knowledge of how their own software. Is there no way at all of forwarding my question internally within NetApp?

Which lead to;
It’s not a situation where we haven’t obtained the information internally or contacted the right Developer to answer the question.
We have been asked before about the load balancing of Management Servers and determined via previous cases that this functionality is within SCOM, not our Management Pack.
Asking Microsoft how they determine “which Management Server is doing the actual work” is not a question about “a third party Management Pack”; it’s asking Microsoft how SCOM works.


So i will try to reverse engineer the Management Pack and hope to come up with some answers myself.

Marnix Wolf said...

Sigh. Somehwo, somewhere at NetApp one or more persons are stuck in the infamouse 'the computer says no' loop. No way to get past that. Too bad.