Monday, October 5, 2015

OM12 DW Database Error. EventID 31553: ‘Arithmetic overflow error converting IDENTITY to data type int’

Issue
An OM12 environment had some MPs updated, including the core MPs based on UR#9 AFTER the required SQL scripts were applied. So far so good.

Out of the blue, the OpsMgr event log on the MS servers started to log EventID 31533 with the message  ‘…Data was written to the Data Warehouse staging area but processing failed on one of the subsequent operations. Exception 'SqlException': Sql execution failed. Error 8115, Level 16, State 1, Procedure ManagedEntityChange, Line 237, Message: Arithmetic overflow error converting IDENTITY to data type int…’

I’ve seen a lot 31533 events but never before I saw the message Arithmetic overflow error converting IDENTITY to data type int.

So I reached out to some people I know. And gladly I got a response.

Cause
As it turns out, the  related table ran out of ID’s. This can be checked by running this query against the Data Warehouse database: DBCC CHECKIDENT ("Managedentity"). The value should be lower than 2147483647.

WARNING
Before I start: Microsoft PFE recommends to open a case with Microsoft Customer Support Services, so think TWICE before running this fix.

!!!WARNING!!!
Be sure you know what you do. When possible, do this with a SQL DBA so you have additional SQL knowledge and experience available. Also know that fixing this issue for a certain table can’t be enough and that other tables will suffer from the issue as well. In cases like these it’s better to open a case with Microsoft Customer Support Services.

Fix
Reseed the related table when the value is 2147483647 or higher.

  1. First and foremost, BACKUP both SCOM databases, and ascertain yourself the backups are in working order (aka: they can be 100% restored). Also, BEFORE running the backup, STOP all SCOM related services on ALL SCOM Management Servers so new data will be processed by the SCOM databases. Enable these services AFTER this procedure has been run;
  2. Find the current number of rows in the table. This number will be used in Step 3 as value n: select count(*) from managedentity;
  3. Reseed the identity of this table with this query: DBCC CHECKIDENT ("Managedentity",RESEED, n+1). Again: n is the value found in Step 2. Also: there should be no spaces between the , and RESEED;
  4. Check and confirm that the value was changed: DBCC CHECKIDENT ("Managedentity");
  5. Enable the SCOM related services on the Management Servers;
  6. Confirm that EventID 31553 is gone.

Changes are however, other tables will show up in the OpsMgr event log with the same error. Run Steps 2 to 6 in order to remedy it. And again STOP the SCOM related services on the Management Servers before going through these steps and enable them afterwards.

2 comments:

thx1200 said...

Did you find out the root cause? Running out of identities is almost assuredly a bug because filling that many rows would have caused huge DB bloat and horrible performance that you would have noticed earlier.

Nathan Gau said...

Hi Marnix, just figured I'd let you know there are a couple other tables that can produce these results as well. Those include "ManagedEntityProperty" and "ManagedEntityManagementGroup". I put together a quick blog on it here:

http://blogs.technet.com/b/nathangau/archive/2016/04/25/event-id-31553-in-scom-arithmetic-overflow-error.aspx