Wednesday, February 10, 2016
Wednesday, January 20, 2016
Never to old to learn…
Okay. Some time ago I had a serious issue because of the PS cmdlet Remove-SCOMDisabledClassInstance. Even to such an extend that a TOTAL restore of BOTH OM12x SQL databases was required in order to get things back on track again.
The cause? Me. Sort of.
Before Moment Zero
In a rather sized SCOM 2012x production environment there was an issue with a certain script running on the SCOM 2012x Agents. Based on thorough investigation it turned out it was a Discovery script from the Discovery Discover Agent Relationship Settings object, discovering an additional attribute for the Class Health Service, Managed Through Active Directory (Boolean):
So the instances of the Class Health Service are already Discovered (and managed by SCOM). The earlier mentioned Discovery Discover Agent Relationship Settings object, discovers the additional attribute Managed Through Active Directory (Boolean). Basically a yes or a no whether the SCOM Agent is Active Directory Integrated or not. That’s all.
Making Moment Zero possible…
Because for this customer not a single SCOM Agent is Active Directory Intergrated (ADI), it was decided to DISABLE this particular discovery. Not needed, so why run it while the related script causes issues?
Also because the Class instance Discovery isn’t done by this particular Discovery. Only an ATTRIBUTE to existing Class instances (a yes or no for ADI), is added. No harm in that?!
Also in place – for some time already – is the OpsMgr 2012x Self Maintenance MP, made by Tao Yang. And this MP is fully configured, ALSO the workflow OpsMgr 2012 Self Maintenance Remove Disabled Discovery Objects Rule,which runs the PS cmdlet Remove-SCOMDisabledClassInstance on a scheduled time (at 20:30, every 24 hours).
Please be reminded, WITHOUT this MP Moment Zero was already in place, simply by MANUALLY running the PS cmdlet SCOMDisabledClassInstance. So this issue didn’t happen because of the OpsMgr 2012x Self Maintenance MP.
With this, all was set for Moment Zero to happen…
Moment Zero strikes!
The next day, the SCOM 2012x environment was dead. All SCOM Agents couldn’t communicate anymore with the SCOM 2012x Management Group. Simply because the SCOM Management Servers didn’t recognize ANY of the SCOM Agents as a trusted entity, so communication was cut down immediatly by the very same SCOM MS servers…
Also, in the SCOM Console under Administration > Agent Managed, not a SINGLE SCOM Agent was listed. Totally empty. The SCOM SQL database revealed the same information, so apparently ALL SCOM Agents were removed!
So this explained why all SCOM MS servers refused to communicate with all SCOM Agents. Simply because they weren’t present in the SCOM SQL database anymore. So the WHY question was answered, but the answer to the HOW question eluded me for some minutes…
HOW Moment Zero came to be
So I back traced all my steps taken the days before. When working in different SCOM environments for different customers one quickly learns to log all steps. For this purpose I use OneNote, which is an excellent tool for this purpose.
When going through all the actions the days before I noticed the action in which I disabled the Discovery Discover Agent Relationship Settings object.
Could it be that disabling this particular Discovery (which addes only a boolean attribute (yes/no) to an already discovered Class instance, combined with the scheduled workflow which executes the PS cmdlet Remove-SCOMDisabledClassInstance, REMOVED all Health Service instances?
As the PS cmdlet states it removes CLASS instances from which the related Discovery is DISABLED. And even though I disabled a Discovery for an additional boolean attribute, the PS cmdlet doesn’t work on that granular level. Nor does it differentiate between Discoveries targeted at the same Class!
As a result, the PS cmdlet REMOVED ALL SCOM Health Service instances! As CONFIGURED by me!!!
Time to fix Moment Zero
After a session with Microsoft Customer Support Services, it was decided to restore BOTH SCOM SQL databases. Simply because it was the fastest way to fix this issue.
First both SCOM SQL databases were backed up and then the restores of a previous backup (when all was still okay) were run and successfully executed. Now SCOM ‘recognized’ all SCOM Agents again and resumed communications…
After this we had a meeting about this issue. We talked about the cause, the fix and what we learned from it. A small recap:
- PS cmdlet Remove-SCOMDisabledClassInstance runs on Class instance level, NOT on attribute level. Meaning, only a Class instance as a whole can be undiscovered, not a particular attribute for a Class instance, even when there is a specific Discovery for it.
- The OpsMgr 2012x Self Maintenance MP ‘saved the day’. Simply because it runs the PS cmdlet Remove-SCOMDisabledClassInstance on an daily basis. When this wasn’t the case and the PS cmdlet had been run manually weeks later, it would have been far more difficult to pinpoint the root cause of this situation.
- Disabling a Discovery isn’t to be taken lightly. It can have huge consequences for your SCOM environment. So check and double check and think it over what it might do when the PS cmdlet Remove-SCOMDisabledClassInstance is executed.
- DOCUMENT all disabled Discoveries and inform the SCOM administrators about it. Keep the document on a central place, like DFS, SharePoint or OneDrive for Business.
- Running the PS cmdlet Remove-SCOMDisabledClassInstance must be done with GREAT care and consideration. Enabling this workflow in the OpsMgr 2012x Self Maintenance MP can be a time saver but must be done with great care and consideration.
It made my feel humble again and I learned a lot from it.
Friday, January 15, 2016
01-15-2016 Update: As of now the slide deck and recordings of this event can be downloaded from the WMUG NL website.
This evening I joined the first (online) event of the Windows Management User Group Netherlands (WMUG NL), Introduction To OMS Alerting:
This event was presented by Tao Yang, so this session was in very good hands.
After a quick introduction of OMS (available Data Plans, the different Solutions, Log Searches and so on) he quickly moved on the topic of his presentation: Alerting in OMS.
As a ‘warning’ he told the audience that OMS Alerting is still under development (Preview). So what he shows today doesn’t mean it’s still the same when OMS Alerting becomes General Available. Changes are more features will be added whereas existing ones will be improved or extended.
He demonstrated the Solution Change Tracking in order to receive an Alert when a configuration has changed. For this he created – with the help of the UI(!) – a query about the SQL Server Agent service being stopped.
This query he used for OMS Alerting. This way OMS will Alert him when on any server running the SQL Server Agent service is stopped. He showed the schedule for it, the trigger (how many times the query must turn up a result), a time window, subject and recipients.
Then he told more about OMS Alert Remediation. This enables one to create an Azure Automation Runbook to be started automatically when a certains OMS Alert is raised. For this certain requirements must be met:
- Azure Automation Account;
- Azure Automation integrated with OMS;
- Runbooks must be published;
- Hybrid Workers when targeting on premises infrastructure.
For now you can’t enter Runbook parameters. Perhaps this feature will be added later on by Microsoft. He demonstrated the Runbook, built in the PowerShell ISE, with the Azure Runbook add on. After explaining how the PS script is setup, he showed it in his Azure portal.
He pointed out that when you have a Azure Runbook which is expected to touch on prem based workloads, the related Webhook must be set to Hybrid Worker.
At the end of his presentation Tao showed us to links of his blog, all about OMS Alerting:
It was a good session with some good insights and demonstrations all about OMS Alerting. Gladly the demo devil decided to stay away from this session, so there were no glitches.
A BIG thanks to Tao and WMUG NL for this online event all about OMS Alerting.
With SCOM 2012x one can ‘monitor’ network devices out of the box. However, this functionality is limited, basic at it’s best and has quite a few flaws, one of them not being able to handle a real load of network devices to monitor (1500+). Because of that I myself limit the monitoring of network devices with SCOM to a limited set and even those are monitored at a basic level.
Already with SCOM 2007x there was a 3rd party solution available, build by Jalasoft, titled Xian Network Manager IO. This product has seen many changes and has been upgraded many times.
And yes, I’ve worked with this product quite a few times. Basically, any device with a SNMP stack (or NetFlow or SFlow for that matter) can be monitored deeply by Jalasoft Xian Network Manager IO. The integration with SCOM is pretty good and delivers a level of network monitoring which is unprecedented.
Meet the newest version
The latest release of Jalasoft Xian NM v7 is totally revamped. It runs now on Ubuntu and PostgreSQL, where as previous versions ran on Windows Server and SQL server.
As Jalasoft states: ‘…This means that users no longer need expensive software and hardware to be able to run an enterprise class monitoring tool. In addition, the hardware requirements have been brought down to a more acceptable level in order to reach a broader scope of users…’.
No SCOM support anymore…
And it doesn’t end here. Also with the latest release, the support for SCOM is thrown out of the window: ‘…Xian NM will no longer support Microsoft System Center Operations Manager…’.
So one would say: I’ll keep on using the previous version, Xian NM 2012. But that version is end of life after December 31st 2016: ‘…The 2012 version will be fully supported until December 31st 2016, after that date no more public updates will be distributed and as of that moment existing customers will be supported on case to case basis according to their SSA…’
Reason stated by Jalasoft for this change
Jorge Lopez, CEO of Jalasoft states: ‘…we feel that it is time that customers all over the world have access to high end software solutions for a fair price. By disconnecting from the expensive Microsoft solutions we are creating that option…’
My personal opinion
Let me be straight forward here: YES, I understand Jalasoft’s decision to move on and drop support for SCOM, but NOT for the reasons stated by Jalasoft. Why?
- As we all know Microsoft is cloud and mobile first;
- OMS is one of the show cases for this new mantra;
- OMS evolves super fast. No evolution but more like revolution;
- It’s only a matter of time before network monitoring will be added. The availability of the OMS Solution Wire Data is already announced:
- Looking at the rate of growth, it’s to be expected that this is only the beginning. Meaning: either the Solution Wire Data will be extended, improved and/or additional network monitoring capabilities will be added.
- And because Jalasoft NM (2012) costs a lot of $$$ or €€€, many potential customers will opt for OMS instead of another third party solution.
- Therefore it’s better for Jalasoft to change direction and revamp their flagship product.
So far so good.
IMHO is a bit sad to see that Jalasoft is kicking at SCOM because of the price, being the main reason to drop support for it. Without SCOM, not many people would have known Jalasoft at all. So to me this kind of goodbye feels unpolite and unreal. Even more so because Xian NM itself is quite expensive.
A new world for Xian NM
In the days of Jalasoft relying on SCOM, Xian NM used premium prices, causing many of my customers choosing for OTHER network monitoring solutions.
Jalasoft steps with Xian NM v7 into an existing arena where some big parties already deliver top notch network device monitoring solutions, like (but not limited to): Solar Winds, PRTG, ManageEngine and Nagios.
Since the CEO of Jalasoft is so focussed on the price of it all, I am curios what the new pricing model will be like. Simply because with the previous versions of Xian NM Jalaosft had the USP the in-depth SCOM integration, allowing them to go for the premium price model.
Now with the Xian NM v.7 this USP is gone which ‘reduces’ Xian NM to Y.A.N.M.T (Yet Another Network Monitoring Tool), thus removing the validation of the premium price model. In this new setup Xian NM has to compete with existing products offering the same functionality.
I seriously wonder on what elements (besides price model) Jalasoft will compete with the other tools available. Only time will tell.
Wednesday, January 6, 2016
The why & how of this posting
Even though SCCM 1511 has almost the same ‘look & feel’ compared to SCCM 2012x, there are quite some differences. One of those differences is the way Automatic Deployment Rules (ADR) are deployed. Where as ADRs in SCCM 2012x have a one on one relationship with a Deployment, in SCCM 1511 this is changed into a one to many (Deployments) relationship.
This makes sense, since many times one ADR can serve multiple Collections. So why create yet another ADR? And yes, in SCCM 2012x one could work around the one-to-one relationship, by deploying the Software Update Group (SUG), created by the ADR, to other Collections as well. But this approach was pushing the boundaries of the SCCM UI and resulting in a messy overview which made it hard to see to what Collections a particular ADR was targeted. Because of this one could by mistake create multiple ADRs with the same functionality, ‘hitting’ the same Collections multiple times…
So in SCCM 1511 (and it’s future successors) this has been changed. As a result, many of the screens related to the ADR in SCCM 1511 are different compared to SCCM 2012x. I’ve noticed that these changes do raise some questions by people used to the SCCM 2012x ADRs.
Hence this posting in order to show the differences between both versions and the explanation behind it. For this purpose I’ve made two ADRs. One in my SCCM 2012 R2 SP1 test lab, titled SCCM 2012R2 ADR. And another in my SCCM 1511 test lab, titled SCCM 1511 ADR.
But where are the tabs Deployment Schedule, User Experience & Download Settings? These are moved to a new ADR entity in the SCCM 1511 Console, titled Deployment Settings, more about that later in this posting.
When looking in the related SCCM Consoles (\Software Library\Overview\Software Updates\Automatic Deployment Rules), you’ll notice some differences.
SCCM 1511 only: Deployment Properties screen
When opening the Properties screen of the selected Deployment (in this example for the Collection ‘All Mobile Devices’), you’ll find the previous missing tabs in the Properties screen of the ADR:
So there are the tabs Deployment Schedule, User Experience & Download Settings.
This makes perfectly sense, because in SCCM 1511 the relationship between an ADR and a Deployment is one to many. So the properties related to the specifics of a Deployment are gone from the basic properties screen of the ADR itself and brought into the new ADR entity, Deployment Settings.
This enables you to deploy a single ADR to other Collections, each with it’s specific required settings, like (but not limited to), to automatically reboot or not the devices related to the specific Collection when the updates are installed, what to do when the installation deadline is reached and so on.
What happens when an existing SCCM 1511 ADR is deployed to another Collection?
Let’s target the already created ADR SCCM 1511 ADR to another Collection, Windows Server 2012 for example.
Behaviour of a SCCM 1511 ADR with added Deployments. Sometimes a bit erratic?
It’s important to know how a SCCM 1511 based ADR ‘responds’ when an additional Deployment is added, or better, how it show’s itself in the SCCM Console under Software Update Groups (\Software Library\Overview\Software Updates\Software Update Groups).
In the same example the ADR SCCM 1511 ADR is targeted against the Collection All Mobile Devices, and saved. Before it’s first run the same ADR is deployed to another Collection as well, Windows Server 2012:
All is set now. Let’s run the ADR and check out the Software Update Groups view in the SCCM 1511 Console:
TWO SUGs are created?! Let’s check the details of each SUG, since the ADR is set to Add to an existing Software Update Group…
As you can see, per Collection a new SUG is created. Let’s see what happens when the ADR has run for a second time.
And it can get even a bit more confusing when creating a ADR, deployed initially against ONE Collection only which has run at least one time. With the default setting of the ADR (Add to an existing Software Update Group), only ONE SUG will be created.
When the ADR is modified in order to be deployed against other Collections as well, the SUG (after the ADR has run) will only show the original Collection it was initially deployed to…
It’s a good thing that an ADR can be deployed against multiple Collections. In SCCM 2012x this could be done as well but it took much more interaction and was prone to errors because it wasn’t easy to be found back in the Console that a single ADR had multiple deployments.
So far the good news. However, when deploying a SCCM 1511 based ADR to multiple Collections the same SCCM 1511 Console might throw some strange results, so be careful and double check it. Not only on the SCCM Console side of it all, but also on the side of the members of the related Collections to which the ADR is deployed to.
Excuse this play on words. But fact is that the well known Microsoft employer Ed Wilson, AKA The Microsoft Scripting Guy has started a whole new blog ALL about OMS!
This blog is titled Operations Management Suite Blog:
Why he starts this new blog as a PowerShell guru? To paraphrase his own words: ‘…OMS uses Windows PowerShell in a big way, so it is not like I am starting from scratch. In fact, for some things, OMS does Windows PowerShell better than Windows PowerShell does. This is especially true when it comes to Desired State Configuration (DSC)…’
Also his goals for this blog are totally awesome: ‘…We will have articles from the OMS team, from various MVPs, and from other community leaders. It is my hope that the OMS blog will become a focal point for all things related to Microsoft Operations Management Suite…’
So I am looking forward to follow this blog and will add it to the blog roll of this blog.
Four persons I respect highly (Tao Yang, Stanislav Zhelyazkov, Pete Zerger and Anders Bengtsson) are writing an ebook, all about Microsoft Operations Management Suite (OMS).
For now it’s a preview release but still it covers all aspects of OMS:
- Chapter 1: Introduction and Onboarding
- Chapter 2: Searching and Presenting OMS Data
- Chapter 3: Alert Management
- Chapter 4: Configuration Assessment and Change Tracking
- Chapter 5: Working with Performance Data
- Chapter 6: Process Automation and Desired State Configuration
- Chapter 7: Backup and Disaster Recovery
- Chapter 8: Security Configuration and Event Analysis
- Chapter 9: Analyzing Network Data
- Chapter 10: Accessing OMS Data Programmatically
- Chapter 11: Custom MP Authoring
- Chapter 12: Cross Platform Management and Automation
For anyone interested in OMS, this book is a MUST read. It can be downloaded from the TechNet Gallery.