Saturday, September 25, 2010

The Curious Case of The Empty Reports

Had an interesting (and not so nice) issue at a customers site: a SCOM R2 environment was in place for some months and fully operational. Also SCOM R2 Reporting was in place for which many customized Reports were made. Many servers were being monitored by SCOM R2 and the Reports were just fine.

The Case
But after a couple of months I was told the Reports, which showed data before, turned up empty. Only a header was shown, but NO data!

First I expected that wrong targeting was being used or that an object was selected which was new in SCOM so there wasn’t that much data to be shown. But soon it turned out that it wasn’t the case. No.

Time to Investigate

EVERY Report worked just fine, but NO data! The RMS and MS servers showed no errors what so ever about the DW. The Events told that all SCOM R2 Management Servers succeeded successfully with storing data in the DW! Also SCOM R2 itself – in the Console that is – showed no errors what so ever. The graphs under Data Warehouse showed that indeed data was being put into the DW.

So time to check out the SQL side of it all. The SSRS instance worked just fine. The ReportServer loaded neatly in IE. So no errors there. When SQL Profiler was started it was clearly that data was pouring into the DW. The Reports, contained by the MPs, were uploaded to SSRS as well.

Also the logs on the SQL server revealed NOTHING! The DW db itself was healthy as well. I could connect to it, query it all the way down, check out its tables and it had more than 95% free space. Which was to be expected because I prefer to oversize the DW just a little bit so the customer can use that DB to its fullest extend without having to resize it first.

Case is Clear
Now I had the case where data was pouring into the DW, the Reports itself were in place AND functional but somehow the data – present in the DW I knew that for sure – wasn’t shown.

Time to put the finger on the problem

So time for a deeper dive. I ran some easy Reports, showing the availability of a server, based on the HealthService Watcher. Targeted it against the RMS so I knew for sure I targeted a server which was present when SCOM R2 came to be. And run the Report multiple times, every time selecting a timeframe going back further in time.

And suddenly the Report contained data! So now it was important to get the exact time frame where no more data was shown. It took me some Reports but finally I had pinpointed the date where no more data was shown.

When did it go wrong?
Time to note that date and check out SCOM R2 itself and in particular, to see whether some new MPs had been loaded on the same date. There weren’t but a day before that some MPs were loaded. The SCVMM MP, an adjusted one which enables the discovery of VMs based on VMWare (which is being used by the customer as virtualization platform) and a MP containing the overrides for the Exchange 2010 MP.

Making the Reports work again
So these are steps I took in order to get it working again:

  1. Removed the customized MP which depends on the SCVMM MP;
  2. Removed the SCVMM MP;
  3. Checked the OpsMgr event log of the RMS. Now it threw every 10 seconds an error about writing data to the DW. It told a dataset coming from the Exchange 2010 MP caused some serious issues;
  4. Removed the Ex2010 MPs all together, including the MP containing the overrides. Exported that MP first of course;
  5. Bummer! The OpsMgr even log of the RMS still threw the same error;
  6. Made a backup of the DW db and checked it on its validity;
  7. Closed ALL SCOM R2 Consoles;
  8. Stopped all SCOM R2 related services on the RMS and MS servers;
  9. Run a special SP against the DW in order to remove the dataset (ONLY DO THIS WHEN YOU KNOW WHAT YOU ARE DOING OTHERWISE YOU END UP WITH AN UNSUPPORTED CONFIGURATION!!!!);
  10. SP ran for ten minutes but ended successfully;
  11. Removed the cache files of ALL Consoles, RMS and MS servers;
  12. Cleared the OpsMgr logs on RMS and MS servers;
  13. Rebooted the SQL, RMS and MS servers;
  14. Checked out the OpsMgr logs of RMS and MS servers;
  15. No more errors to be seen, dataset error is gone as well;
  16. Checked out the SQL server, which was rather high on its CPU, RAM and disk. Much IO going on there but the server is a beast so no worries there;
  17. Ran the same Report and now, hour by hour more data was coming in. Per 15 minutes another day was added to the Report!
  18. Kept an eye on SCOM R2 itself and the OpsMgr event log. Nothing wrong there. SQL was pumping data like crazy!
  19. Kept on running the Report every ten minutes and every time more data was shown, more days were coming in;
  20. Went outside for a walk in order to breath some fresh air.

Recap
I have imported the Exchange 2010 MP in many environments already without having this issue. So the Exchange MP is not to blame. Same goes for the SCVMM MP. This MP isn’t a beauty I must add, but never ever I experienced this issue before as well. The adjusted MP in order to discover non-Hyper-V based VMs isn’t shocking either.

But somehow, somewhere things turned sour. Which resulted in data being collected into the DW but never being aggregated so the Reports stayed empty. After having removed the MPs which I expected to causing this issue, the blockade inside the DW was removed as well, and the Beast (the SQL Server) started to aggregate the data. And the Reports started to show data again.

Next week I will import the Exchange MP and its overrides MP as well. A day or so later the MP containing the Exchange 2010 Reports will follow. And of course, I will keep a watchful eye on it all now. But personally I believe all will be just fine and that I bumped into a rare situation. Which is good because I wasn’t happy when all these Reports were EMPTY….

1 comment:

Unknown said...

Hello,
i have a problem in my SCOM Reports.when i run SCVMM 2012 Virtual machine utilization Report on my operation manager it show me the result having Average Memory usage 100%.which is completely changed from my actual memory usage .My actual memory usage is far low the that.
can any one identify me that what is the Problem???