Wednesday, June 1, 2011

Logical Disk Fragmentation Level Monitor: Creating Noise outside SCOM…

The Monitor ‘Logical Disk Fragmentation Level’, introduced in the latest versions of the Server OS MP, creates a lot of ‘noise’ not only limited to the Console.

I know, the Monitor runs once a week (every Saturday at 3 AM) so when there is no policy or agreement in place about fragmentation, the SCOM Console will raise a few up to many Alerts in the Console. But that’s only the starting point of the noise, since a discussion will soon follow which will supersede the amount of Alerts – created by the earlier mentioned Monitor – about whether or not to defrag the logical disks on which SCOM raised these Alerts.

But why? In order to get to the bottom of this let’s take a few steps back.

Past, Present & Future
In the past there was a world where server virtualization was something you heard of or even tested it, but didn’t use it at all or on a reasonable scale in production environments. Much has changed. Today all of the customers I work for do run one or more virtualized infrastructures. Whenever a server is provisioned, it’s a virtual one. Sometimes a physical box is required and provisioned, but it isn’t the standard anymore.

In the old world, disk fragmentation was also at play. And – under most circumstances – easily addressed. Just run a defrag tool and be done with it. Of course, in some SAN configurations there were other approaches required, but in 80% of the defrag demand, a local tool would do the trick.

Along came a virtualized server
But how to go about defragmentation on virtualized servers? There are many new ‘ingredients’ to reckon with when SCOM raises an Alert about a ‘logical disk’ in a VM which needs to be defragmented. Because the logical disk is also virtual, hosted by the Virtual Infrastructure. So where does one need to run the defragmentation tool/command? On the VM level, no matter what? On the host level? (But what host is running the VM at that moment?) On SAN level? And when we’re talking SAN, are we talking about LUNs? Or the ‘whole’ SAN? Also, what kind of disk are we talking about? A fixed or dynamic one? Are they present on a cluster shared volume or not? Are we talking Hyper-V or VMware? Are there snapshots of that VM present or not?
Auslogics Disk Defrag rus

The discussion has started
As you can see, the decision whether to defrag or not isn’t a simple one. Even when it’s decided to do so, at what level and where do you run it? In today’s world it isn’t up to the systems engineers anymore. More knowledge about the Virtual Infrastructure, storage and SAN is required. So the colleagues responsible for those area’s must be taken into the loop as well.

And the winner is…
Unfortunately there isn’t a default answer available. Simply because there are too many variables at play which not only differ per company but also per environment (Production vs Test for instance). And when looking for an answer on the internet there are good and solid postings to be found which contradict each other…

So what to do?
Whenever I bump into a situation like this – believe me I have been there a few times – I let the company decide and provide them these basic guidelines:

  • Take the colleague's responsible for the VI and storage/SAN from the beginning in the loop, when everybody is still blank and all options are open (this way they know their opinion really matters);
  • Set a time span for an outcome, like two to three days (otherwise nothing will happen);
  • During that time, disable the Monitor temporarily (reduces the noise in SCOM);
  • Write down the pro’s and cons for every option when running the defrag tool/command on every level (VM, host, SAN etc);
  • Write down the pitfalls (for example: defragging causes high IO which is bad for other VMs, when a snapshot is in place defragging is a no go area…);
  • Write down the decision and the reasons why;
  • Write a test plan, targeted at a limited set of servers, test it and monitor the impact on the VI, Hosts, VMs and SAN;
  • Process the results, adjust if needed – test it again! – and when all is well, put it into production by batches;
  • Adjust the monitor as required.

It may sound stupid to go through such a scenario but it’s the only way to get it right. Otherwise you might find yourself in a situation where your VI and SAN are hammered down by defragmentation actions…

Some helpful internet url’s in order to create more uncertainty about whether or not to defragment a logical disk of a VM:


Another piece of advise:
The Monitor contains a Recovery Task for automatically defragmenting the logical disk. This Task is disabled by default. Don’t enable that since you’re out of control and SCOM will kick of the defrag command without any clue about the earlier mentioned questions and their answers. So stay away from it and stay in charge of your own environment.


Vincent said...

It's also good to take your SAN inner workings into account (you already included the guy at step 1 but (s)he isn't always available so I thought I post it.).

Most SANs write there data randomly in their own file system (NetApp, Compellent and more). When you host your disks on these kind of SANs that defragmentating is useless since the only thing you will clean is your FAT. And the performance impact of that is very minimal at best.

Marnix Wolf said...

Hi Vincent.

Thanks for your comment. Much appreciated.


Michael Wilson said...

SO any ideas how to enable the defrag recovery for only physical disks? Does a class exist for this, or would we be required to create a custom class, and if so how?

Michael Wilson said...

SO any ideas how to enable the defrag recovery for only physical disks? Does a class exist for this, or would we be required to create a custom class, and if so how?

Marnix Wolf said...

Hi Micheal.

No, I don't know how create such a particular class since I am not a MP author.