Thursday, January 5, 2012

SCOM R2 doesn’t work: SDK Service isn’t initialized. However, SDK is up & running…

Issue
Bumped into this situation: SCOM R2 didn’t run anymore. When people tried to start the Console the message was shown about the SDK Service not running. But when they checked the status of this service on the RMS, all seemed to be fine. All three SCOM R2 services were in a running state. However, the OpsMgr event log told them a different story: EventID 33333 all over. This events tells SCOM R2 isn’t able to store date in the database. So time for some investigation.

Cause
When one bumps into a situation like this it’s time to investigate the SQL server which hosts the SCOM R2 databases. Here the cause is to be found. These causes can be:

  1. SQL Engine not running;
  2. SCOM database is corrupted (you don’t want to go there…);
  3. SCOM database is running in single-user mode;
  4. SCOM database is full;
  5. SCOM database log file is full.

In this case Option 5 was at play here. The log file was full. Which is strange since by default the Recovery Model for the SCOM R2 databases is set to Simple. So the log file stays small and nothing serious happens to it. But now the log file was totally used (0% space left). And indeed, the Recovery Model for the OpsMgr database AND the OpsMgrDW database was changed to FULL…

Time for some actions.

Solution
In this case I stopped all SCOM R2 services on the RMS and the Health Service on the MS servers. Since SQL wasn’t happy either with the filled log files for the SCOM R2 databases and it’s a dedicated SQL server, I restarted the SQL engine so everything was fresh.

First thing I did was giving more space to the log files of both SCOM R2 databases, just a couple of GBs per log file, but just enough to do the trick:

  1. Open SQL Server Management Studio, log on to the correct instance, select the SCOM R2 database, right click it and select Properties;
  2. Go to the second page on the left (Files) and adjust the file size of the log file by incrementing it with a couple of GBs;
  3. Click OK and the change will be applied right away;
  4. Repeat this action for the other SCOM database as well.

Now the log files have some space again so the can ‘breath’. Now it’s time for the second action, backing up the SCOM R2 databases. Better to be safe than sorry :). When you don’t add space to the log files, you’ll get this error when trying to backup the SCOM R2 databases:
image

In SQL Server Management Studio:

  1. Right click the SCOM R2 database, select Tasks and the option Back Up;
  2. Backup the database to a file location and select the on the page Options under the header Reliability the option Verify backup when finished;
  3. Let the backup job run by clicking OK and wait until it’s finished;
  4. Repeat this action for the other SCOM database as well.

Now we have valid backups of both SCOM R2 databases, so there is a way back when things turn sour. Until now I haven’t seen this happening but better be safe than sorry :).

Since SCOM R2 was installed in such a manner to facilitate the full Recovery Model of both SCOM R2 databases (not enough disk space nor enough I/O power for having a smooth running SCOM R2 environment) I decided to change the Recovery Model back to Simple for both SCOM R2 databases:

In SQL Server Management Studio:

  1. Select the SCOM R2 database, right click it and select Properties;
  2. Go to the fourth page on the left (Options) and adjust the Recovery Model of the database to Simple;
  3. Click OK. Now the Recovery Model will be set back to Simple;
  4. Repeat this action for the other SCOM database as well.

Now the log files of both SCOM R2 databases are still huge but they’re almost empty. Perfect time for some shrinking of the size of the databases! Before that it’s better to run a backup again of the databases. Also to be safe and not sorry but also to make sure the shrink actions will land properly.

In SQL Server Management Studio:

  1. Select the SCOM R2 database, right click it and select Tasks > Shrink > Files;
  2. Make sure for File Type to select Log;
  3. Under the header Shrink Action select the option Reorganize pages before releasing unused space. Shrink File to: xyz MB (minimum is 0 MB);
  4. Select a proper file size (not too big since the Recovery Model is Simple);
  5. And click OK. This can take some minutes, depending on the size and speed of the SQL server;
  6. Repeat this action for the other SCOM database as well.

Now all is well again and SCOM will operate smoothly again AFTER the SCOM R2 services are started of course on the RMS and MS servers :).

No comments: