Quantcast
Channel: Kevin Holman's System Center Blog
Viewing all 141 articles
Browse latest View live

My experience upgrading to OpsMgr R2 RTM

$
0
0

I upgraded my test lab from SP1 to R2-RTM this weekend.

 

My current test lab consists of the following servers:

OMRMS – Server 2003 - RMS role

OMMS3 – Server 2008 - MS role, Web Console

OMMS – Server 2003 - MS role, ACS collector

OMDB – Server 2003/SQL 2005 - OperationsManager Database

OMDW – Server 2003/SQL 2005 - OperationsManagerDW database, Reporting, SRS, ACSDB roles

 

There are 18 agents reporting to this management group.

 

So – I start – with a little light reading.

I begin with the release notes.  These are available from the R2 CD, and on the web at Operations Manager 2007 R2 Release Notes  I dont see anything in there that is terribly applicable to me…. but these are good to commit to short term memory – in case we hit a snag during/after the upgrade.

Next – I move on to the Upgrade guide.  This is available on the Technet Library – at Operations Manager 2007 Upgrade Guide  I need to spend a little time on this one, mapping out the pre-upgrade steps, and then planning the order of my upgrade based on how my management group is deployed.

 

So – I start by running down the pre-upgrade checklist at: Preparing to Upgrade Operations Manager 2007

I record my service accounts, make sure my DB’s have plenty of free space, and my t-logs are sized big enough.  I make sure the volume with TempDB has plenty of free disk space in case TempDB needs to auto-grow. 

Next – I map out my plan – and order of operations, for my management group, and share the plan with my team:

  1. Get most recent backup of Database, Encryption key and Export unsealed MP’s for safekeeping.
  2. Go to pending actions – and reject/remove anything in there.
  3. Verify free space on SQL database and validate log size is appropriate.
  4. I need to uninstall the agent from OMTERM – my terminal server which has a console and an agent only.  I decide to go ahead and uninstall the agent, the console, and the SP1 authoring console as well, since I will be replacing it with the R2 auth console.  I will replace the agent and consoles when the upgrade is complete for the management group.
  5. I need to disable all my notification subscriptions, and disabled my product connectors.  I am running a custom internal product connector – which runs as a service and updates alert properties – so I will stop and disable that service for the duration of the upgrade.
  6. I see a section on Improving Upgrade Performance so I will add that step here – right before I upgrade the first component.
  7. I am now ready to establish the upgrade order for my management group – this is available at: Planning your Operations Manager 2007 Upgrade
  8. RMS (OMRMS)
  9. Reporting Server (OMDW)
  10. Stand Alone Consoles (None – I uninstalled this already in my case)
  11. Management Servers (OMMS3, OMMS)
  12. Gateway Servers (None)
  13. Agents
  14. Web Console (on OMMS3 and OMMS)
  15. Post-Upgrade validation steps

Ok – that's my plan.  Time to get rolling.

The SP1 to R2 steps are outlined here:  Upgrading from Operations Manager 2007 SP1 to R2

I know from experience with customers – the success of your upgrade HINGES on how well you read AND follow the upgrade steps – VERBATIM.  The majority of issues we see (especially on clustered RMS) are when a customer does not follow the steps exactly as written, in the correct order.

 

I complete steps 1-7 in the plan above, and then start the RMS upgrade at step 8.  I run “SetupOM.exe” and kick off the pre-req checker before starting the install, where I hit my first snag.  I need to install WS-Management v1.1, because I do plan on monitoring Unix/Linux machines in the future with this management group.  (This was documented in the release notes, and in the upgrade guide – so I was expecting this… I should have added this to my plan)  So I install WS-man from the link provided in the pre-req, which just takes a few minutes.  Now – it looks much better in the pre-req checker:

image

 

The install instructions provided on TechNet are very straightforward.  The install took about 20 minutes for my small environment.  It waited the longest on “Loading Management Packs” on the screen in my environment.  It finally ended with an error:

 

image

 

The guide has a note on this – about the fact you might get a warning that a service failed to start – and to hit OK.  However – this is a different error – this is a service failing to stop…   I click OK, and then a few minutes later – setup completes.  I uncheck the box to start the console and to backup the encryption key.

 

I then ran the RMS upgrade validation steps – checking the registry and the services.  Registry setup version shows me all is good. 

***Note:  We have changed the service display names for R2.  See below:

image

 

I moved on to Reporting.  My SRS, Reporting, and DataWarehouse are all shared on a single server – OMDW.

 

As I read the guide at Upgrading from Operations Manager 2007 SP1 to R2 I notice this little tidbit – which needs to be given STRONG attention before I kick off the upgrade:

Prior to running the upgrade on the Reporting server, you must remove the Operations Manager 2007 agent; the upgrade will fail if this is not done.

So – I kick off the uninstall of the agent on the Reporting/SRS server (OMDW in my case) from Add/Remove programs – before I start the upgrade.  Missing little steps like this will drive you nuts if you aren't methodical.

After the agent uninstall – I pick back up on the guide – and kick off “SetupOM.exe”.  Since I am a freak – I go ahead and run a pre-req check just to make sure all is good:

image

 

Moving on…. I start the install according to the guide.  The install goes without a hitch, and took about 10 minutes to complete.

 

Next up – Management servers.  I start with OMMS3.  I hit the pre-req check – and I notice I already have WS-Man installed – so away I go.  The installer immediately failed with a pre-req failure.  I realized – I have the web console installed on this management server, and I forgot to add that when running the pre-req check manually.  When I do – I see: 

image

 

So – I need to grab the ASP.NET Ajax extensions…. this is to support the new cool health explorer in the Web Console.  I click “More” on the pre-req check – which gives me a link to the download.

After this little hurdle – the management servers upgraded very quickly.  Once again – I got an expected error about a failure to stop a service.

 

image

 

Click ok and setup completes.  I repeat this upgrade on the other management server (OMMS) and these are done.  A quick check of the registry – and the setup version is indeed 6.1.7221.0

 

I don't have any gateways in this lab – so next up is agents.

 

Lucky me – all 18 agents show up in pending actions for an update.  I will approve them all – and let the management server push the update down and upgrade them. 

***Note – do not upgrade more than 299 agents in this manner at a time.  This is documented in the Upgrade Guide.

All my agents upgraded successfully except for two.  BOTH that failed happened to be the two servers that I manually removed the SP1 agent from – OMTERM and OMDW.  (I forgot to delete their “agent managed” object from the management group)  Both have a different error.  OMTERM is failing to install with a push failure for MOMAgentInstaller.  I have had trouble with this agent before – possibly because of the TS role - so I just do a manual agent install here.  OMDW is different – the console push said it was a success – however – the System Center Management Service (HealthService) will not start – it gives an error:

Event Type:    Error
Event Source:    Service Control Manager
Event Category:    None
Event ID:    7024
Date:        5/23/2009
Time:        1:09:37 AM
User:        N/A
Computer:    OMDW
Description:
The System Center Management service terminated with service-specific error 2147500037 (0x80004005).

I ran a repair action from the console – but got the same error here.  So – I manually uninstalled the broken agent – and deleted the agent from the Agent Managed section of the console – and re-pushed the agent.  I had a little trouble getting these two to come into the management group… but eventually after a couple delete/reinstalls they finally appear to be working ok.  I’d recommend uninstalling them from the console next time…. so this will remove both the agent and the computer object from the console.

 

Next on the list:  Web Console

From the upgrade guide I see this note….

If your Web console server is on the same computer as a management server, the Web console server is upgraded when the management server is upgraded, rendering this upgrade procedure unnecessary. You can still run the verification procedure to ensure that the Web console server upgrade was successful.

Good – my web console is not a stand-alone – it was running on a management server (OMMS3) so that is already taken care of.

Aha – I found something we forgot on the plan…. the ACS Collector.  This role is missing from the table at Planning your Operations Manager 2007 Upgrade so I completely missed this as a planning step.  However the process is documented at Upgrading from Operations Manager 2007 SP1 to R2.  So – we need to do this – I will assume last since it is last on the upgrade detailed steps.  Following the guide…. I walked through the steps – no issues.

 

Looks like we are done!  I will now start the post-upgrade validation steps to make sure my management group is actually working as it should without any major issues.

There is a list of post-upgrade checks at Completing the Post-Upgrade Tasks

 

I am going to walk through those here:

1.  I open up discovered inventory – and change target to “Health Service Watcher” and compare this to the list I had before the upgrade.  These are agents that have a problem from the management server perspective – which causes them to appear “grey” in all other views.  My list is the same as before I started – I have 6 in this list as critical – 5 of them are agents that are VM’s that are currently down – so this is good.  1 of them is an old management server… for some reason we don't groom these out of the view/database – and these seem to stick around forever in this view.

2.  I review the event logs on the RMS and all MS roles.  I am seeing some errors like below:

Event Type:    Warning
Event Source:    HealthService
Event Category:    Health Service
Event ID:    2120
Date:        5/23/2009
Time:        10:02:15 AM
User:        N/A
Computer:    OMRMS
Description:
The Health Service has deleted one or more items for management group "OPS" which could not be sent in 1440 minutes.

This is normal – it happens when you have agents that are down in your environment.

Event Type:    Error
Event Source:    Health Service Modules
Event Category:    Data Warehouse
Event ID:    31552
Date:        5/23/2009
Time:        10:03:38 AM
User:        N/A
Computer:    OMRMS
Description:
Failed to store data in the Data Warehouse.
Exception 'SqlException': Sql execution failed. Error 777971002, Level 16, State 1, Procedure StandardDatasetGroom, Line 303, Message: Sql execution failed. Error 2812, Level 16, State 62, Procedure StandardDatasetGroom, Line 145, Message: Could not find stored procedure 'KMS_EventGroom'.

One or more workflows were affected by this. 

Workflow name: Microsoft.SystemCenter.DataWarehouse.StandardDataSetMaintenance
Instance name: KMS Activation Event Data Set
Instance ID: {800D8126-6F72-CA84-A76B-A94F7E3C93CF}
Management group: OPS

This is not normal – this looks like an issue with the KMS MP – and R2’s advanced logging is picking up on an error that's been there all along, I just didn't know it.

That is all from the RMS – pretty clean.  On the Management servers…. I found a bit more – but they were all due to the problems I was having with a handful of agents.  Once I removed and fixed those agents – the MS logs are clean.

3.  No cluster in this lab – so nothing to test there.

4.  Review alerts in the console.  I sort by Repeat Count and LastModified (I add these to all my alert views) and look for anything that stands out as repeating a LOT, or something new that looks like a problem.  I dont see anything here – so that is good!

5.  DB server in perfmon looks good.  I examine % Processor Time, and Logical Disk Avg disk sec/read and Avg disk sec/write.  Those are both avg under 15ms (.015) on the DB and log volumes - so that looks good.   CPU is avg under 25%.

6.  Check all the console views.  Much snappier than in SP1.  Nice.

7.  I opened up reporting – and ran the “Microsoft ODR Report Library > Most Common Alerts” report – to test out reporting.  It ran with no issues.  I test a few of my saved custom and favorite reports – no errors – all good.

8.  Authoring pane looks good – I can see my groups, monitors, rules – and wow – they open a LOT faster than before.  Very nice.

9.  I check out my MP versions.  The install upgraded all my core MP’s to 6.1.7221.0.   I was already pretty current on my MP’s – so not much to do here now that needs my immediate attention.

10.  Re-enable notification subscriptions and product connectors.  I turn my subscriptions back on – and fire off a test event that I use to generate an alert and email me a notification.  Works great.  Next – I got to my custom product connector – and enable the service and start it back up again.  I run some test alerts – to make sure my product connector is taking all the necessary actions on the alerts – and forwarding them as appropriately.  All good.

11.  Review My Workspace.  Yep – all my old custom views are there.

12.  Re-deploy agents.  I already did this.  Perhaps I should have waited on this step…. because I spent so much time troubleshooting those last few pesky agents that seem to have trouble.

13.  Oh – the BIG ONE.  This step is a bit odd – we tell you to go run this SQL query.  LET ME WARN YOU – this is not a “quick job”.  This is the script that is documented and discussed at my blog post:  Does your OpsDB keep growing- Is your localizedtext table using all the space-  Dont take this step lightly – running this script could take several hours – so plan accordingly.  Read the link above for the details – and consider skipping this step for now…. until you are sure you are ready to execute it.  Take some calculations based on the blog post above – how long it will take – how severely you are impacted (row count of your localizedtext table) and make sure you have a LOT of free space for the tempDB and tempDBlog to grow if needed.  My LT table was already really small – so no issues for me running this – it completed in less than a minute.

Done!  (with the “official” steps)

 

Now – I just have a couple cleanup steps I need to do – like go back and install the Ops Console and the Auth Console back on my terminal server.  Did that without issue.  All looks good.

 

And then I realized – we are missing another step in our plan – under the post-upgrade tasks – make sure the web console is working!  I saw lots of items in the release notes about how this might break…. and I imagine someone will complain rather quickly if it isnt working – so we better go check that out.

Sweet! I hit up the web console and it is all good.  I check out several of the new views – and run health explorer from the web console.  I have tasks, maintenance mode, and health explorer.  Very cool.  I event execute some of my favorite reports under “My Workspace” just to make sure those are good – ouch – not working.  I will have to look into that one.

 

Ok – that’s enough for today.  All in all – a successful upgrade.  A good plan written out at the beginning, based on the upgrade guide - makes all the difference.


How to force the Web Console to open a specific view, instead of the default Monitoring Overview

$
0
0

 

When you open the Web Console – the default view will open to a “Monitoring Overview” pane.  This view, in very large environments, can take considerable time to finish loading, before you can select any other views.  Sometimes, loading this view may time out as well.

 

image

 

 

Here is a way to load a specific view by default.  This helps when you have customers that only need to see a very specific pre-configured alert or state view.

The format is: 

 

http://webconsoleserver:51908/default.aspx?ViewID=8DB1F5A7-F3F3-2646-6C6B-E34672F7ED98&ViewType=AlertView

 

Lets break that all down:

 

The first part of the URL is a constant – this should be self explanatory:

http://webconsoleserver:51908/default.aspx?ViewID=

The next part is an ID for the view.  These will be constant for default built in views and views from Microsoft MP’s.  Your custom MP’s will have their own unique view ID’s.  I will talk more about how to find these ID’s below.  My example ID is for the “Active Alerts” view at the top of the console view list.

8DB1F5A7-F3F3-2646-6C6B-E34672F7ED98

The last part is the ViewType.  This describes to the console if we are dealing with an AlertView, StateView, or PerformanceView

&ViewType=AlertView

 

Here is a SQL query – to get all the view ID’s from the OperationsManager (OpsDB) for any view:

select vv.id as 'View Id',
vv.displayname as 'View DisplayName',
vv.name as 'View Name',
vtv.DisplayName as 'ViewType',
mpv.FriendlyName as 'MP Name'
from ViewsView vv
inner join managementpackview mpv on mpv.id = vv.managementpackid
inner join viewtypeview vtv on vtv.id = vv.monitoringviewtypeid
--where mpv.FriendlyName like '%default%'
--where vv.displayname like '%Service%'
order by mpv.FriendlyName, vv.displayname

 

Here are some examples of some built in views:

 

“Active Alerts” view at the top:

http://webconsoleserver:51908/default.aspx?ViewID=8DB1F5A7-F3F3-2646-6C6B-E34672F7ED98&ViewType=AlertView

“Windows Computers” state view:

http://webconsoleserver:51908/default.aspx?ViewID=E3D720DE-F6DD-185C-6FDC-0832377D910A&ViewType=StateView

“Operating System Performance” view from the BaseOS (Microsoft Windows Server) MP:

http://webconsoleserver:51908/default.aspx?ViewID=9B216021-6E88-EF6D-2A97-9E3EA1D6AD3B&ViewType=PerformanceView

 

Now – you can give a specific use this URL for their favorites – if they want to open the Web Console on this specific view.

Common Issues with the OpsMgr Web Console

$
0
0

This post is a collection of the most common issues and resolutions I see with the SCOM 2007 R2 Web Console:

 

image Windows Auth on a non RMS.  The most common issue is typically where customers install the Web Console using Windows Authentication, on some server that is not the RMS.  Initially when SCOM released – this wasn't a supported configuration.  Later, post SP1, instructions were released on how to make this work – but it took some changes in Active Directory for constrained delegation for Kerberos.  Now – this can be done, but in real practice, I have found it to be hit-or-miss.  In some domains – no issues…. in others, it breaks easily, or only works some of the time.  In general, my recommendation is to use Forms Based authentication with SSL for Web Consoles that are not installed on the RMS.  If you want to give constrained delegation a shot – there are several articles on the web, including mine:

http://blogs.technet.com/kevinholman/archive/2008/09/24/installing-the-web-console-on-a-2008-management-server-using-windows-authentication.aspx

 

image

Unexpected error immediately after install.  Another common issue, is that you install the Web Console using Forms Based authentication, but get an “unexpected error”:

Unexpected error
There was an error displaying the page you requested.

Try the following:

Restart the web browser
Refresh the page
If you are still not able to view the requested page, try contacting your administrator or Helpdesk

image

This is caused by installing the Web Console using Forms based authentication, but not having SSL set up yet.  Marnix wrote a great post on this topic:  http://thoughtsonopsmgr.blogspot.com/2010/03/scom-web-console-with-form-based.html

Bottom line – we recommend setting up SSL on IIS for the Web Console…. but if you dont have this set up yet – you need to edit the Web.Config file and change:

<authentication mode="Forms">
  <forms requireSSL="true" />
</authentication>

to

<authentication mode="Forms">
  <forms requireSSL="false" />
</authentication>

 

image

Web Console takes a long time to load and ASP.NET logs Event ID: 1309

 

You might see the web console take a very long time to load, and eventually even time out with:

Event message: The request has been aborted.
Exception information:
Exception type: HttpException
Exception message: Request timed out.

You might also get the following event in your application event log:

Event Type: Warning
Event Source: ASP.NET 2.0.50727.0
Event Category: Web Event
Event ID: 1309
Date: 4/07/2010
Time: 1:18:54 AM
User: N/A
Computer:
Description:
Event code: 3001
Event message: The request has been aborted.
Event time: 4/07/2010 1:18:54 AM
Event ID: 0fde28a70ef54acf9e269f9bed4eb70f
Event sequence: 6
Event occurrence: 1
Event detail code: 0
Application information:
Application domain: /LM/W3SVC/2/ROOT-1-129114331599062251
Trust level: Full
Application Virtual Path: /
Application Path: C:\Program Files\System Center Operations Manager 2007\Web Console\
Machine name:
Process information:
Process ID: 4460
Process name: w3wp.exe
Account name: NT AUTHORITY\NETWORK SERVICE
Exception information:
Exception type: HttpException
Exception message: Request timed out.
Request information:
Request URL: http://omms1.opsmgr.net:51908/login.aspx?ReturnUrl=The request has been aborted.fDefault.aspx
Request path: /login.aspx
User host address:
User: Domain\Username
Is authenticated: True
Authentication Type: Negotiate
Thread account name: NT AUTHORITY\NETWORK SERVICE
Thread information:
Thread ID: 1
Thread account name: NT AUTHORITY\NETWORK SERVICE
Is impersonating: False
Stack trace:

This can be caused by an IIS configuration – the SCOM Web Console website was configured to use Anonymous AND Windows Integrated Authentication, when ideally it should be configured to use ONLY Windows Integrated Authentication.

To resolve – you can uncheck Anonymous Authentication and then perform an iisreset.

 

imageCannot install/uninstall the web console.

Often times – this is an issue for Web Consoles that were upgrade from SP1 to R2.  Many times – this is due to old path information in the registry.

1. Go to registry editor on web console server.
2. Locate “My Computer\HKEY_CLASSES_ROOT\Installer\Products\DF6E5EFF035E66C49971553D96AA0E4D
3. Back up the registry key in step 2
4. Go to patches value and delete entry for “patches” REG_MULTI_SZ
5. Continue with the upgrade/uninstall/install.

 

imageError message when you try to access the Web console: "HTTP Error 401.2 – Unauthorized. You are not authorized to view this page due to invalid authentication headers"

See:  http://support.microsoft.com/kb/970043

 

 

 

image

Other good articles:

 

How to set up SSL for your SCOM R2 Web Console?  How to configure the SCOM R2 Web Console to use SSL only

All common customization options for the Web Console via the Web.Config file:  http://blogs.technet.com/michaelpearson/archive/2009/11/30/opsmgr-r2-web-console-web-config-settings.aspx

How to change the Web Console to force it to open a specific default view, instead of the default overview:  http://blogs.technet.com/kevinholman/archive/2009/11/09/how-to-force-the-web-console-to-open-a-specific-view-instead-of-the-default-monitoring-overview.aspx

Really making a web console powerful – check out Savision LiveMaps:  http://thoughtsonopsmgr.blogspot.com/2009/11/savision-live-maps-version-41-for.html

How to install IIS on Server 2008 for the Web Console:  http://blogs.technet.com/kevinholman/archive/2008/09/26/how-to-install-iis-on-server-2008-to-support-opsmgr-web-console-and-reporting.aspx

Upgrade issues with the web console and upgrading to R2:  http://blogs.technet.com/kevinholman/archive/2009/05/23/my-experience-upgrading-to-opsmgr-r2-rtm.aspx

My experience upgrading to OpsMgr R2 RTM

$
0
0

I upgraded my test lab from SP1 to R2-RTM this weekend.

 

My current test lab consists of the following servers:

OMRMS – Server 2003 - RMS role

OMMS3 – Server 2008 - MS role, Web Console

OMMS – Server 2003 - MS role, ACS collector

OMDB – Server 2003/SQL 2005 - OperationsManager Database

OMDW – Server 2003/SQL 2005 - OperationsManagerDW database, Reporting, SRS, ACSDB roles

 

There are 18 agents reporting to this management group.

 

So – I start – with a little light reading.

I begin with the release notes.  These are available from the R2 CD, and on the web at Operations Manager 2007 R2 Release Notes  I dont see anything in there that is terribly applicable to me…. but these are good to commit to short term memory – in case we hit a snag during/after the upgrade.

Next – I move on to the Upgrade guide.  This is available on the Technet Library – at Operations Manager 2007 Upgrade Guide  I need to spend a little time on this one, mapping out the pre-upgrade steps, and then planning the order of my upgrade based on how my management group is deployed.

 

So – I start by running down the pre-upgrade checklist at: Preparing to Upgrade Operations Manager 2007

I record my service accounts, make sure my DB’s have plenty of free space, and my t-logs are sized big enough.  I make sure the volume with TempDB has plenty of free disk space in case TempDB needs to auto-grow. 

Next – I map out my plan– and order of operations, for my management group, and share the plan with my team:

  1. Get most recent backup of Database, Encryption key and Export unsealed MP’s for safekeeping.
  2. Go to pending actions – and reject/remove anything in there.
  3. Verify free space on SQL database and validate log size is appropriate.
  4. I need to uninstall the agent from OMTERM – my terminal server which has a console and an agent only.  I decide to go ahead and uninstall the agent, the console, and the SP1 authoring console as well, since I will be replacing it with the R2 auth console.  I will replace the agent and consoles when the upgrade is complete for the management group.
  5. I need to disable all my notification subscriptions, and disabled my product connectors.  I am running a custom internal product connector – which runs as a service and updates alert properties – so I will stop and disable that service for the duration of the upgrade.
  6. I see a section on Improving Upgrade Performance so I will add that step here – right before I upgrade the first component.
  7. I am now ready to establish the upgrade order for my management group – this is available at: Planning your Operations Manager 2007 Upgrade
  8. RMS (OMRMS)
  9. Reporting Server (OMDW)
  10. Stand Alone Consoles (None – I uninstalled this already in my case)
  11. Management Servers (OMMS3, OMMS)
  12. Gateway Servers (None)
  13. Agents
  14. Web Console (on OMMS3 and OMMS)
  15. Post-Upgrade validation steps

Ok – that's my plan.  Time to get rolling.

The SP1 to R2 steps are outlined here:  Upgrading from Operations Manager 2007 SP1 to R2

I know from experience with customers – the success of your upgrade HINGES on how well you read AND follow the upgrade steps – VERBATIM.  The majority of issues we see (especially on clustered RMS) are when a customer does not follow the steps exactly as written, in the correct order.

 

I complete steps 1-7 in the plan above, and then start the RMS upgrade at step 8.  I run “SetupOM.exe” and kick off the pre-req checker before starting the install, where I hit my first snag.  I need to install WS-Management v1.1, because I do plan on monitoring Unix/Linux machines in the future with this management group.  (This was documented in the release notes, and in the upgrade guide – so I was expecting this… I should have added this to my plan)  So I install WS-man from the link provided in the pre-req, which just takes a few minutes.  Now – it looks much better in the pre-req checker:

image

 

The install instructions provided on TechNet are very straightforward.  The install took about 20 minutes for my small environment.  It waited the longest on “Loading Management Packs” on the screen in my environment.  It finally ended with an error:

 

image

 

The guide has a note on this– about the fact you might get a warning that a service failed to start – and to hit OK.  However – this is a different error – this is a service failing to stop…   I click OK, and then a few minutes later – setup completes.  I uncheck the box to start the console and to backup the encryption key.

 

I then ran the RMS upgrade validation steps – checking the registry and the services.  Registry setup version shows me all is good. 

***Note:  We have changed the service display names for R2.  See below:

image

 

I moved on to Reporting.  My SRS, Reporting, and DataWarehouse are all shared on a single server – OMDW.

 

As I read the guide at Upgrading from Operations Manager 2007 SP1 to R2 I notice this little tidbit – which needs to be given STRONG attention before I kick off the upgrade:

Prior to running the upgrade on the Reporting server, you must remove the Operations Manager 2007 agent; the upgrade will fail if this is not done.

So – I kick off the uninstall of the agent on the Reporting/SRS server (OMDW in my case) from Add/Remove programs – before I start the upgrade.  Missing little steps like this will drive you nuts if you aren't methodical.

After the agent uninstall – I pick back up on the guide – and kick off “SetupOM.exe”.  Since I am a freak – I go ahead and run a pre-req check just to make sure all is good:

image

 

Moving on…. I start the install according to the guide.  The install goes without a hitch, and took about 10 minutes to complete.

 

Next up – Management servers.  I start with OMMS3.  I hit the pre-req check – and I notice I already have WS-Man installed – so away I go.  The installer immediately failed with a pre-req failure.  I realized – I have the web console installed on this management server, and I forgot to add that when running the pre-req check manually.  When I do – I see: 

image

 

So – I need to grab the ASP.NET Ajax extensions…. this is to support the new cool health explorer in the Web Console.  I click “More” on the pre-req check – which gives me a link to the download.

After this little hurdle – the management servers upgraded very quickly.  Once again – I got an expected error about a failure to stop a service.

 

image

 

Click ok and setup completes.  I repeat this upgrade on the other management server (OMMS) and these are done.  A quick check of the registry – and the setup version is indeed 6.1.7221.0

 

I don't have any gateways in this lab – so next up is agents.

 

Lucky me – all 18 agents show up in pending actions for an update.  I will approve them all – and let the management server push the update down and upgrade them. 

***Note– do not upgrade more than 299 agents in this manner at a time.  This is documented in the Upgrade Guide.

All my agents upgraded successfully except for two.  BOTH that failed happened to be the two servers that I manually removed the SP1 agent from – OMTERM and OMDW.  (I forgot to delete their “agent managed” object from the management group)  Both have a different error.  OMTERM is failing to install with a push failure for MOMAgentInstaller.  I have had trouble with this agent before – possibly because of the TS role - so I just do a manual agent install here.  OMDW is different – the console push said it was a success – however – the System Center Management Service (HealthService) will not start – it gives an error:

Event Type:    Error
Event Source:    Service Control Manager
Event Category:    None
Event ID:    7024
Date:        5/23/2009
Time:        1:09:37 AM
User:        N/A
Computer:    OMDW
Description:
The System Center Management service terminated with service-specific error 2147500037 (0x80004005).

I ran a repair action from the console – but got the same error here.  So – I manually uninstalled the broken agent – and deleted the agent from the Agent Managed section of the console – and re-pushed the agent.  I had a little trouble getting these two to come into the management group… but eventually after a couple delete/reinstalls they finally appear to be working ok.  I’d recommend uninstalling them from the console next time…. so this will remove both the agent and the computer object from the console.

 

Next on the list:  Web Console

From the upgrade guide I see this note….

If your Web console server is on the same computer as a management server, the Web console server is upgraded when the management server is upgraded, rendering this upgrade procedure unnecessary. You can still run the verification procedure to ensure that the Web console server upgrade was successful.

Good – my web console is not a stand-alone – it was running on a management server (OMMS3) so that is already taken care of.

Aha– I found something we forgot on the plan…. the ACS Collector.  This role is missing from the table at Planning your Operations Manager 2007 Upgrade so I completely missed this as a planning step.  However the process is documented at Upgrading from Operations Manager 2007 SP1 to R2.  So – we need to do this – I will assume last since it is last on the upgrade detailed steps.  Following the guide…. I walked through the steps – no issues.

 

Looks like we are done!  I will now start the post-upgrade validation steps to make sure my management group is actually working as it should without any major issues.

There is a list of post-upgrade checks at Completing the Post-Upgrade Tasks

 

I am going to walk through those here:

1.  I open up discovered inventory– and change target to “Health Service Watcher” and compare this to the list I had before the upgrade.  These are agents that have a problem from the management server perspective – which causes them to appear “grey” in all other views.  My list is the same as before I started – I have 6 in this list as critical – 5 of them are agents that are VM’s that are currently down – so this is good.  1 of them is an old management server… for some reason we don't groom these out of the view/database – and these seem to stick around forever in this view.

2.  I review the event logs on the RMS and all MS roles.  I am seeing some errors like below:

Event Type:    Warning
Event Source:    HealthService
Event Category:    Health Service
Event ID:    2120
Date:        5/23/2009
Time:        10:02:15 AM
User:        N/A
Computer:    OMRMS
Description:
The Health Service has deleted one or more items for management group "OPS" which could not be sent in 1440 minutes.

This is normal – it happens when you have agents that are down in your environment.

Event Type:    Error
Event Source:    Health Service Modules
Event Category:    Data Warehouse
Event ID:    31552
Date:        5/23/2009
Time:        10:03:38 AM
User:        N/A
Computer:    OMRMS
Description:
Failed to store data in the Data Warehouse.
Exception 'SqlException': Sql execution failed. Error 777971002, Level 16, State 1, Procedure StandardDatasetGroom, Line 303, Message: Sql execution failed. Error 2812, Level 16, State 62, Procedure StandardDatasetGroom, Line 145, Message: Could not find stored procedure 'KMS_EventGroom'.

One or more workflows were affected by this. 

Workflow name: Microsoft.SystemCenter.DataWarehouse.StandardDataSetMaintenance
Instance name: KMS Activation Event Data Set
Instance ID: {800D8126-6F72-CA84-A76B-A94F7E3C93CF}
Management group: OPS

This is not normal – this looks like an issue with the KMS MP – and R2’s advanced logging is picking up on an error that's been there all along, I just didn't know it.

That is all from the RMS – pretty clean.  On the Management servers…. I found a bit more – but they were all due to the problems I was having with a handful of agents.  Once I removed and fixed those agents – the MS logs are clean.

3.  No cluster in this lab– so nothing to test there.

4.  Review alerts in the console.  I sort by Repeat Count and LastModified (I add these to all my alert views) and look for anything that stands out as repeating a LOT, or something new that looks like a problem.  I dont see anything here – so that is good!

5.  DB server in perfmon looks good.  I examine % Processor Time, and Logical Disk Avg disk sec/read and Avg disk sec/write.  Those are both avg under 15ms (.015) on the DB and log volumes - so that looks good.   CPU is avg under 25%.

6.  Check all the console views.  Much snappier than in SP1.  Nice.

7.  I opened up reporting– and ran the “Microsoft ODR Report Library > Most Common Alerts” report – to test out reporting.  It ran with no issues.  I test a few of my saved custom and favorite reports – no errors – all good.

8.  Authoring pane looks good– I can see my groups, monitors, rules – and wow – they open a LOT faster than before.  Very nice.

9.  I check out my MP versions.  The install upgraded all my core MP’s to 6.1.7221.0.   I was already pretty current on my MP’s – so not much to do here now that needs my immediate attention.

10.  Re-enable notification subscriptions and product connectors.  I turn my subscriptions back on – and fire off a test event that I use to generate an alert and email me a notification.  Works great.  Next – I got to my custom product connector – and enable the service and start it back up again.  I run some test alerts – to make sure my product connector is taking all the necessary actions on the alerts – and forwarding them as appropriately.  All good.

11.  Review My Workspace.  Yep – all my old custom views are there.

12.  Re-deploy agents.  I already did this.  Perhaps I should have waited on this step…. because I spent so much time troubleshooting those last few pesky agents that seem to have trouble.

13.  Oh – the BIG ONE.  This step is a bit odd – we tell you to go run this SQL query.  LET ME WARN YOU – this is not a “quick job”.  This is the script that is documented and discussed at my blog post:  Does your OpsDB keep growing- Is your localizedtext table using all the space-  Dont take this step lightly – running this script could take several hours – so plan accordingly.  Read the link above for the details – and consider skipping this step for now…. until you are sure you are ready to execute it.  Take some calculations based on the blog post above – how long it will take – how severely you are impacted (row count of your localizedtext table) and make sure you have a LOT of free space for the tempDB and tempDBlog to grow if needed.  My LT table was already really small – so no issues for me running this – it completed in less than a minute.

Done!  (with the “official” steps)

 

Now – I just have a couple cleanup steps I need to do – like go back and install the Ops Console and the Auth Console back on my terminal server.  Did that without issue.  All looks good.

 

And then I realized– we are missing another step in our plan – under the post-upgrade tasks – make sure the web console is working!  I saw lots of items in the release notes about how this might break…. and I imagine someone will complain rather quickly if it isnt working – so we better go check that out.

Sweet! I hit up the web console and it is all good.  I check out several of the new views – and run health explorer from the web console.  I have tasks, maintenance mode, and health explorer.  Very cool.  I event execute some of my favorite reports under “My Workspace” just to make sure those are good – ouch – not working.  I will have to look into that one.

 

Ok – that’s enough for today.  All in all – a successful upgrade.  A good plan written out at the beginning, based on the upgrade guide - makes all the difference.

How to force the Web Console to open a specific view, instead of the default Monitoring Overview

$
0
0

 

When you open the Web Console – the default view will open to a “Monitoring Overview” pane.  This view, in very large environments, can take considerable time to finish loading, before you can select any other views.  Sometimes, loading this view may time out as well.

 

image

 

 

Here is a way to load a specific view by default.  This helps when you have customers that only need to see a very specific pre-configured alert or state view.

The format is: 

 

http://webconsoleserver:51908/default.aspx?ViewID=8DB1F5A7-F3F3-2646-6C6B-E34672F7ED98&ViewType=AlertView

 

Lets break that all down:

 

The first part of the URL is a constant – this should be self explanatory:

http://webconsoleserver:51908/default.aspx?ViewID=

The next part is an ID for the view.  These will be constant for default built in views and views from Microsoft MP’s.  Your custom MP’s will have their own unique view ID’s.  I will talk more about how to find these ID’s below.  My example ID is for the “Active Alerts” view at the top of the console view list.

8DB1F5A7-F3F3-2646-6C6B-E34672F7ED98

The last part is the ViewType.  This describes to the console if we are dealing with an AlertView, StateView, or PerformanceView

&ViewType=AlertView

 

Here is a SQL query – to get all the view ID’s from the OperationsManager (OpsDB) for any view:

select vv.id as 'View Id',
vv.displayname as 'View DisplayName',
vv.name as 'View Name',
vtv.DisplayName as 'ViewType',
mpv.FriendlyName as 'MP Name'
from ViewsView vv
inner join managementpackview mpv on mpv.id = vv.managementpackid
inner join viewtypeview vtv on vtv.id = vv.monitoringviewtypeid
--where mpv.FriendlyName like '%default%'
--where vv.displayname like '%Service%'
order by mpv.FriendlyName, vv.displayname

 

Here are some examples of some built in views:

 

“Active Alerts” view at the top:

http://webconsoleserver:51908/default.aspx?ViewID=8DB1F5A7-F3F3-2646-6C6B-E34672F7ED98&ViewType=AlertView

“Windows Computers” state view:

http://webconsoleserver:51908/default.aspx?ViewID=E3D720DE-F6DD-185C-6FDC-0832377D910A&ViewType=StateView

“Operating System Performance” view from the BaseOS (Microsoft Windows Server) MP:

http://webconsoleserver:51908/default.aspx?ViewID=9B216021-6E88-EF6D-2A97-9E3EA1D6AD3B&ViewType=PerformanceView

 

Now – you can give a specific use this URL for their favorites – if they want to open the Web Console on this specific view.

Common Issues with the OpsMgr Web Console

$
0
0

This post is a collection of the most common issues and resolutions I see with the SCOM 2007 R2 Web Console:

 

imageWindows Auth on a non RMS.  The most common issue is typically where customers install the Web Console using Windows Authentication, on some server that is not the RMS.  Initially when SCOM released – this wasn't a supported configuration.  Later, post SP1, instructions were released on how to make this work – but it took some changes in Active Directory for constrained delegation for Kerberos.  Now – this can be done, but in real practice, I have found it to be hit-or-miss.  In some domains – no issues…. in others, it breaks easily, or only works some of the time.  In general, my recommendation is to use Forms Based authentication with SSL for Web Consoles that are not installed on the RMS.  If you want to give constrained delegation a shot – there are several articles on the web, including mine:

http://blogs.technet.com/kevinholman/archive/2008/09/24/installing-the-web-console-on-a-2008-management-server-using-windows-authentication.aspx

 

image

Unexpected error immediately after install.  Another common issue, is that you install the Web Console using Forms Based authentication, but get an “unexpected error”:

Unexpected error
There was an error displaying the page you requested.

Try the following:

Restart the web browser
Refresh the page
If you are still not able to view the requested page, try contacting your administrator or Helpdesk

image

This is caused by installing the Web Console using Forms based authentication, but not having SSL set up yet.  Marnix wrote a great post on this topic:  http://thoughtsonopsmgr.blogspot.com/2010/03/scom-web-console-with-form-based.html

Bottom line – we recommend setting up SSL on IIS for the Web Console…. but if you dont have this set up yet – you need to edit the Web.Config file and change:

<authentication mode="Forms">
  <forms requireSSL="true" />
</authentication>

to

<authentication mode="Forms">
  <forms requireSSL="false" />
</authentication>

 

image

Web Console takes a long time to load and ASP.NET logs Event ID: 1309

 

You might see the web console take a very long time to load, and eventually even time out with:

Event message: The request has been aborted.
Exception information:
Exception type: HttpException
Exception message: Request timed out.

You might also get the following event in your application event log:

Event Type: Warning
Event Source: ASP.NET 2.0.50727.0
Event Category: Web Event
Event ID: 1309
Date: 4/07/2010
Time: 1:18:54 AM
User: N/A
Computer:
Description:
Event code: 3001
Event message: The request has been aborted.
Event time: 4/07/2010 1:18:54 AM
Event ID: 0fde28a70ef54acf9e269f9bed4eb70f
Event sequence: 6
Event occurrence: 1
Event detail code: 0
Application information:
Application domain: /LM/W3SVC/2/ROOT-1-129114331599062251
Trust level: Full
Application Virtual Path: /
Application Path: C:\Program Files\System Center Operations Manager 2007\Web Console\
Machine name:
Process information:
Process ID: 4460
Process name: w3wp.exe
Account name: NT AUTHORITY\NETWORK SERVICE
Exception information:
Exception type: HttpException
Exception message: Request timed out.
Request information:
Request URL: http://omms1.opsmgr.net:51908/login.aspx?ReturnUrl=The request has been aborted.fDefault.aspx
Request path: /login.aspx
User host address:
User: Domain\Username
Is authenticated: True
Authentication Type: Negotiate
Thread account name: NT AUTHORITY\NETWORK SERVICE
Thread information:
Thread ID: 1
Thread account name: NT AUTHORITY\NETWORK SERVICE
Is impersonating: False
Stack trace:

This can be caused by an IIS configuration – the SCOM Web Console website was configured to use AnonymousAND Windows Integrated Authentication, when ideally it should be configured to use ONLY Windows Integrated Authentication.

To resolve – you can uncheck Anonymous Authentication and then perform an iisreset.

 

imageCannot install/uninstall the web console.

Often times – this is an issue for Web Consoles that were upgrade from SP1 to R2.  Many times – this is due to old path information in the registry.

1. Go to registry editor on web console server.
2. Locate “My Computer\HKEY_CLASSES_ROOT\Installer\Products\DF6E5EFF035E66C49971553D96AA0E4D
3. Back up the registry key in step 2
4. Go to patches value and delete entry for “patches” REG_MULTI_SZ
5. Continue with the upgrade/uninstall/install.

 

imageError message when you try to access the Web console: "HTTP Error 401.2 – Unauthorized. You are not authorized to view this page due to invalid authentication headers"

See:  http://support.microsoft.com/kb/970043

 

 

 

image

Other good articles:

 

How to set up SSL for your SCOM R2 Web Console?  How to configure the SCOM R2 Web Console to use SSL only

All common customization options for the Web Console via the Web.Config file:  http://blogs.technet.com/michaelpearson/archive/2009/11/30/opsmgr-r2-web-console-web-config-settings.aspx

How to change the Web Console to force it to open a specific default view, instead of the default overview:  http://blogs.technet.com/kevinholman/archive/2009/11/09/how-to-force-the-web-console-to-open-a-specific-view-instead-of-the-default-monitoring-overview.aspx

Really making a web console powerful – check out Savision LiveMaps:  http://thoughtsonopsmgr.blogspot.com/2009/11/savision-live-maps-version-41-for.html

How to install IIS on Server 2008 for the Web Console:  http://blogs.technet.com/kevinholman/archive/2008/09/26/how-to-install-iis-on-server-2008-to-support-opsmgr-web-console-and-reporting.aspx

Upgrade issues with the web console and upgrading to R2:  http://blogs.technet.com/kevinholman/archive/2009/05/23/my-experience-upgrading-to-opsmgr-r2-rtm.aspx

OpsMgr 2012: Web Console issue immediately after upgrade to SP1

$
0
0

 

Had an interesting call with a customer.  He had a working SCOM 2012 RTM environment, and applied SP1, and the service pack upgrade appeared to immediately break the web console with the following error:

 

image

 

In the Application log on the web console server, we saw the event at the end of this article, dealing with a “Could not load type 'System.ServiceModel.Activation.HttpModule'

 

This was caused by a prerequisite in SP1, that was not a blocking prerq in RTM.  When he applied the SP1 upgrade, he was prompted to add “HTTP Activation” to the role services of the OS.  Once added, he was able to continue the upgrade.

HOWEVER – this leaves IIS in a semi-broken state, and requires a re-registration of ASP NET in IIS to correct. 

On Server 2008 R2 – run the following in an elevated CMD:  C:\Windows\Microsoft.NET\Framework64\v4.0.30319>aspnet_regiis.exe -i -enable  

On Server 2012 - run the following in an elevated CMD:  C:\Windows\Microsoft.NET\Framework64\v4.0.30319>aspnet_regiis.exe -r  

 

 

Offending event:

 

 

Log Name:      Application
Source:        ASP.NET 4.0.30319.0
Date:          1/11/2013 10:33:19 AM
Event ID:      1310
Task Category: Web Event
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      SERVERNAME.DOMAIN.COM
Description:
Event code: 3008
Event message: A configuration error has occurred.
Event time: 1/11/2013 10:33:19 AM
Event time (UTC): 1/11/2013 4:33:19 PM
Event ID: 3c5b3b4438db4c52992734b9f5ef157b
Event sequence: 1
Event occurrence: 1
Event detail code: 0
Application information:
    Application domain: /LM/W3SVC/1/ROOT/OperationsManager-2-130023955997091166
    Trust level: Full
    Application Virtual Path: /OperationsManager
    Application Path: C:\Program Files\System Center 2012\Operations Manager\WebConsole\WebHost\
    Machine name: SERVERNAME
Process information:
    Process ID: 4600
    Process name: w3wp.exe
    Account name: IIS APPPOOL\OperationsManager
Exception information:
    Exception type: ConfigurationErrorsException
    Exception message: Could not load type 'System.ServiceModel.Activation.HttpModule' from assembly 'System.ServiceModel, Version=3.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089'.
   at System.Web.Configuration.ConfigUtil.GetType(String typeName, String propertyName, ConfigurationElement configElement, XmlNode node, Boolean checkAptcaBit, Boolean ignoreCase)
   at System.Web.Configuration.Common.ModulesEntry.SecureGetType(String typeName, String propertyName, ConfigurationElement configElement)
   at System.Web.Configuration.Common.ModulesEntry..ctor(String name, String typeName, String propertyName, ConfigurationElement configElement)
   at System.Web.HttpApplication.BuildIntegratedModuleCollection(List`1 moduleList)
   at System.Web.HttpApplication.GetModuleCollection(IntPtr appContext)
   at System.Web.HttpApplication.RegisterEventSubscriptionsWithIIS(IntPtr appContext, HttpContext context, MethodInfo[] handlers)
   at System.Web.HttpApplication.InitSpecial(HttpApplicationState state, MethodInfo[] handlers, IntPtr appContext, HttpContext context)
   at System.Web.HttpApplicationFactory.GetSpecialApplicationInstance(IntPtr appContext, HttpContext context)
   at System.Web.Hosting.PipelineRuntime.InitializeApplication(IntPtr appContext)
Could not load type 'System.ServiceModel.Activation.HttpModule' from assembly 'System.ServiceModel, Version=3.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089'.
   at System.RuntimeTypeHandle.GetTypeByName(String name, Boolean throwOnError, Boolean ignoreCase, Boolean reflectionOnly, StackCrawlMarkHandle stackMark, Boolean loadTypeFromPartialName, ObjectHandleOnStack type)
   at System.RuntimeTypeHandle.GetTypeByName(String name, Boolean throwOnError, Boolean ignoreCase, Boolean reflectionOnly, StackCrawlMark& stackMark, Boolean loadTypeFromPartialName)
   at System.Type.GetType(String typeName, Boolean throwOnError, Boolean ignoreCase)
   at System.Web.Compilation.BuildManager.GetType(String typeName, Boolean throwOnError, Boolean ignoreCase)
   at System.Web.Configuration.ConfigUtil.GetType(String typeName, String propertyName, ConfigurationElement configElement, XmlNode node, Boolean checkAptcaBit, Boolean ignoreCase)
Request information:
    Request URL: http://localhost/OperationsManager
    Request path: /OperationsManager
    User host address: ::1
    User: 
    Is authenticated: False
    Authentication Type: 
    Thread account name: IIS APPPOOL\OperationsManager
Thread information:
    Thread ID: 10
    Thread account name: IIS APPPOOL\OperationsManager
    Is impersonating: False
    Stack trace:    at System.Web.Configuration.ConfigUtil.GetType(String typeName, String propertyName, ConfigurationElement configElement, XmlNode node, Boolean checkAptcaBit, Boolean ignoreCase)
   at System.Web.Configuration.Common.ModulesEntry.SecureGetType(String typeName, String propertyName, ConfigurationElement configElement)
   at System.Web.Configuration.Common.ModulesEntry..ctor(String name, String typeName, String propertyName, ConfigurationElement configElement)
   at System.Web.HttpApplication.BuildIntegratedModuleCollection(List`1 moduleList)
   at System.Web.HttpApplication.GetModuleCollection(IntPtr appContext)
   at System.Web.HttpApplication.RegisterEventSubscriptionsWithIIS(IntPtr appContext, HttpContext context, MethodInfo[] handlers)
   at System.Web.HttpApplication.InitSpecial(HttpApplicationState state, MethodInfo[] handlers, IntPtr appContext, HttpContext context)
   at System.Web.HttpApplicationFactory.GetSpecialApplicationInstance(IntPtr appContext, HttpContext context)
   at System.Web.Hosting.PipelineRuntime.InitializeApplication(IntPtr appContext)
Custom event details:

System Center Universe is coming – January 19th!

$
0
0

 

REGISTER NOW HERE:  http://www.systemcenteruniverse.com/

image

 

Read Cameron Fuller’s blog post on this here:  http://blogs.catapultsystems.com/cfuller/archive/2015/12/17/scuniverse-returns-to-dallas-tx-and-the-world-on-january-19th-2016/

 

 

SCU is an awesome day of sessions covering Microsoft System Center, Windows Server, and Azure technologies from top speakers including Microsoft experts and MVP’s in the field.

There are two tracks depending on your interests – Cloud and Datacenter Management, and Enterprise Client Management.

The sponsors for 2016 include:

  • Catapult Systems
  • Microsoft
  • Veeam
  • Adaptiva
  • Secunia
  • Heat Software
  • MPx Alliance
  • Squared Up
  • Cireson

If you cannot attend in person – you can still attend via simulcast!  If you want to attend virtually, there are user group based simulcast locations around the world. Registration is available at: http://www.systemcenteruniverse.com/venue.htm

Simulcast event locations include:

  • Austin, TX
  • Denver, CO
  • Houston, TX
  • Omaha, NE
  • Phoenix, AZ
  • San Antonio, TX
  • Seattle, WA
  • Tampa, FL
  • Amsterdam
  • Germany
  • Vienna
  • And of course our event location in Dallas, TX!

If you want to attend, the in-person event it is available in Dallas Texas and registration is available at: https://www.eventbrite.com/e/scu-2016-live-tickets-7970023555


UR8 for SCOM 2012 R2 – Step by Step

$
0
0

 

image

 

NOTE:  I get this question every time we release an update rollup:   ALL SCOM Update Rollups are CUMULATIVE.  This means you do not need to apply them in order, you can always just apply the latest update.  If you have deployed SCOM 2012R2 and never applied an update rollup – you can go strait to the latest one available.  If you applied an older one (such as UR3) you can always go straight to the latest one!

 

 

KB Article for OpsMgr:  https://support.microsoft.com/en-us/kb/3096382

KB Article for all System Center components:  https://support.microsoft.com/en-us/kb/3096378

Download catalog site:  http://catalog.update.microsoft.com/v7/site/Search.aspx?q=3096382

 

Key fixes:

  • Slow load of alert view when it is opened by an operator
    Sometimes when the operators change between alert views, the views take up to two minutes to load. After this update rollup is installed, the reported performance issue is eradicated. The Alert View Load for the Operator role is now almost same as that for the Admin role user.
  • SCOMpercentageCPUTimeCounter.vbs causes enterprise wide performance issue
    Health Service encountered slow performance every five to six (5-6) minutes in a cyclical manner. This update rollup resolves this issue.
  • System Center Operations Manager Event ID 33333 Message: The statement has been terminated.
    This change filters out "statement has been terminated" warnings that SQL Server throws. These warning messages cannot be acted on. Therefore, they are removed.
  • System Center 2012 R2 Operations Manager: Report event 21404 occurs with error '0x80070057' after Update Rollup 3 or Update Rollup 4 is applied.
    In Update Rollup 3, a design change was made in the agent code that regressed and caused SCOM agent to report error ‘0x80070057’ and MonitoringHost.exe to stop responding/crash in some scenarios. This update rollup rolls back that UR3 change.
  • SDK service crashes because of Callback exceptions from event handlers being NULL
    In a connected management group environment in certain race condition scenarios, the SDK of the local management group crashes if there are issues during the connection to the different management groups. After this update rollup is installed, the SDK of the local management group should no longer crash.
  • Run As Account(s) Expiring Soon — Alert does not raise early enough
    The 14-day warning for the RunAs account expiration was not visible in the SCOM console. Customers received only an Error event in the console three days before the account expiration. After this update rollup is installed, customers will receive a warning in their SCOM console 14 days before the RunAs account expiration, and receive an Error event three (3) days before the RunAs account expiration.
  • Network Device Certification
    As part of Network device certification, we have certified the following additional devices in Operations Manager to make extended monitoring available for them:
    • Cisco ASA5515
    • Cisco ASA5525
    • Cisco ASA5545
    • Cisco IPS 4345
    • Cisco Nexus 3172PQ
    • Cisco ASA5515-IPS
    • Cisco ASA5545-IPS
    • F5 Networks BIG-IP 2000
    • Dell S4048
    • Dell S3048
    • Cisco ASA5515sc
    • Cisco ASA5545sc
  • French translation of APM abbreviation is misleading
    The French translation of “System Center Management APM service” is misleading. APM abbreviation is translated incorrectly in the French version of Microsoft System Center 2012 R2 Operations Manager. APM means “Application Performance Monitoring” but is translated as “Advanced Power Management." This fix corrects the translation.
  • p_HealthServiceRouteForTaskByManagedEntityId does not account for deleted resource pool members in System Center 2012 R2 Operations Manager
    If customers use Resource Pools and take some servers out of the pool, discovery tasks start failing in some scenarios. After this update rollup is installed, these issues are resolved.
  • Exception in the 'Managed Computer' view when you select Properties of a managed server in Operations Manager Console
    In the Operations Manager Server “Managed Computer” view on the Administrator tab, clicking the “Properties” button of a management server causes an error. After this update rollup is installed, a dialog box that contains a “Heart Beat” tab is displayed.
  • Duplicate entries for devices when network discovery runs
    When customers run discovery tasks to discover network devices, duplicate network devices that have alternative MAC addresses are discovered in some scenarios. After this update rollup is installed, customers will not receive any duplicate devices discovered in their environments.
  • Preferred Partner Program in Administration Pane
    This update lets customers view certified System Center Operations Manager partner solutions directly from the console. Customers can obtain an overview of the partner solutions and visit the partner websites to download and install the solutions.
There are no updates for Linux, and there are no updated MP’s for Linux in this update.

 

Lets get started.

From reading the KB article – the order of operations is:

  1. Install the update rollup package on the following server infrastructure:
    • Management servers
    • Gateway servers
    • Web console server role computers
    • Operations console role computers
  2. Apply SQL scripts.
  3. Manually import the management packs.
  4. Update Agents

Now, NORMALLY we need to add another step – if we are using Xplat monitoring – need to update the Linux/Unix MP’s and agents.   However, in UR8 for SCOM 2012 R2, there are no updates for Linux

 

 

 

1.  Management Servers

image

Since there is no RMS anymore, it doesn’t matter which management server I start with.  There is no need to begin with whomever holds the RMSe role.  I simply make sure I only patch one management server at a time to allow for agent failover without overloading any single management server.

I can apply this update manually via the MSP files, or I can use Windows Update.  I have 3 management servers, so I will demonstrate both.  I will do the first management server manually.  This management server holds 3 roles, and each must be patched:  Management Server, Web Console, and Console.

The first thing I do when I download the updates from the catalog, is copy the cab files for my language to a single location:

image

Then extract the contents:

image

Once I have the MSP files, I am ready to start applying the update to each server by role.

***Note:  You MUST log on to each server role as a Local Administrator, SCOM Admin, AND your account must also have System Administrator (SA) role to the database instances that host your OpsMgr databases.

My first server is a management server, and the web console, and has the OpsMgr console installed, so I copy those update files locally, and execute them per the KB, from an elevated command prompt:

image

This launches a quick UI which applies the update.  It will bounce the SCOM services as well.  The update usually does not provide any feedback that it had success or failure. 

I got a prompt to restart:

image

I choose yes and allow the server to restart to complete the update.

 

You can check the application log for the MsiInstaller events to show completion:

Log Name:      Application
Source:        MsiInstaller
Event ID:      1036
Level:         Information
Computer:      SCOM01.opsmgr.net
Description:
Windows Installer installed an update. Product Name: System Center Operations Manager 2012 Server. Product Version: 7.1.10226.0. Product Language: 1033. Manufacturer: Microsoft Corporation. Update Name: System Center 2012 R2 Operations Manager UR8 Update Patch. Installation success or error status: 0.

You can also spot check a couple DLL files for the file version attribute. 

image

Next up – run the Web Console update:

image

This runs much faster.   A quick file spot check:

image

Lastly – install the console update (make sure your console is closed):

image

A quick file spot check:

image

 

 

Secondary Management Servers:

image

I now move on to my secondary management servers, applying the server update, then the console update. 

On this next management server, I will use the example of Windows Update as opposed to manually installing the MSP files.  I check online, and make sure that I have configured Windows Update to give me updates for additional products: 

Apparently when I tried this – the catalog was broken – because none of the system center stuff was showing up in Windows Updates.

So….. because of this – I elect to do manual updates like I did above.

I apply these updates, and reboot each management server, until all management servers are updated.

 

 

 

Updating Gateways:

image

I can use Windows Update or manual installation.

image

The update launches a UI and quickly finishes.

Then I will spot check the DLL’s:

image

I can also spot-check the \AgentManagement folder, and make sure my agent update files are dropped here correctly:

image

 

 

 

2. Apply the SQL Scripts

In the path on your management servers, where you installed/extracted the update, there are two SQL script files: 

%SystemDrive%\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\SQL Script for Update Rollups

(note – your path may vary slightly depending on if you have an upgraded environment of clean install)

image

First – let’s run the script to update the OperationsManager database.  Open a SQL management studio query window, connect it to your Operations Manager database, and then open the script file.  Make sure it is pointing to your OperationsManager database, then execute the script.

You should run this script with each UR, even if you ran this on a previous UR.  The script body can change so as a best practice always re-run this.

image

Click the “Execute” button in SQL mgmt. studio.  The execution could take a considerable amount of time and you might see a spike in processor utilization on your SQL database server during this operation.  I have had customers state this takes from a few minutes to as long as an hour. In MOST cases – you will need to shut down the SDK, Config, and Monitoring Agent (healthservice) on ALL your management servers in order for this to be able to run with success.

You will see the following (or similar) output:

image47

or

image

IF YOU GET AN ERROR – STOP!  Do not continue.  Try re-running the script several times until it completes without errors.  In a production environment, you almost certainly have to shut down the services (sdk, config, and healthservice) on your management servers, to break their connection to the databases, to get a successful run.

Technical tidbit:   Even if you previously ran this script in UR1, UR2, UR3, UR4, UR5, UR6, or UR7, you should run this again for UR8, as the script body can change with updated UR’s.

image

Next, we have a script to run against the warehouse DB.  Do not skip this step under any circumstances.    From:

%SystemDrive%\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\SQL Script for Update Rollups

(note – your path may vary slightly depending on if you have an upgraded environment of clean install)

Open a SQL management studio query window, connect it to your OperationsManagerDW database, and then open the script file UR_Datawarehouse.sql.  Make sure it is pointing to your OperationsManagerDW database, then execute the script.

If you see a warning about line endings, choose Yes to continue.

image

Click the “Execute” button in SQL mgmt. studio.  The execution could take a considerable amount of time and you might see a spike in processor utilization on your SQL database server during this operation.

You will see the following (or similar) output:

image

 

 

 

3. Manually import the management packs

image

There are 26 management packs in this update!

The path for these is on your management server, after you have installed the “Server” update:

\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\Management Packs for Update Rollups

However, the majority of them are Advisor/OMS, and language specific.  Only import the ones you need, and that are correct for your language.  I will remove all the Advisor MP’s for other languages, and I am left with the following:

image

The TFS MP bundles are only used for specific scenarios, such as DevOps scenarios where you have integrated APM with TFS, etc.  If you are not currently using these MP’s, there is no need to import or update them.  I’d skip this MP import unless you already have these MP’s present in your environment.

The Advisor MP’s are only needed if you are using Microsoft Operations Management Suite cloud service, (Previously known as Advisor, and Operation Insights).

However, the Image and Visualization libraries deal with Dashboard updates, and these always need to be updated.

I import all of these shown without issue.

 

 

4.  Update Agents

image43_thumb

Agents should be placed into pending actions by this update (mine worked great) for any agent that was not manually installed (remotely manageable = yes):   One the Management servers where I used Windows Update to patch them, their agents did not show up in this list.  Only agents where I manually patched their management server showed up in this list.  FYI.

image

If your agents are not placed into pending management – this is generally caused by not running the update from an elevated command prompt, or having manually installed agents which will not be placed into pending.

In this case – my agents that were reporting to a management server that was updated using Windows Update – did NOT place agents into pending.  Only the agents reporting to the management server for which I manually executed the patch worked.

You can approve these – which will result in a success message once complete:

image

Soon you should start to see PatchList getting filled in from the Agents By Version view under Operations Manager monitoring folder in the console:

image

 

 

 

5.  Update Unix/Linux MPs and Agents

image

There are no updates for Linux in UR8.  Please see the instructions for UR7 if you are not updating from UR7 directly:

http://blogs.technet.com/b/kevinholman/archive/2015/08/17/ur7-for-scom-2012-r2-step-by-step.aspx

 

 

6.  Update the remaining deployed consoles

image

This is an important step.  I have consoles deployed around my infrastructure – on my Orchestrator server, SCVMM server, on my personal workstation, on all the other SCOM admins on my team, on a Terminal Server we use as a tools machine, etc.  These should all get the matching update version.

 

 

 

Review:

Now at this point, we would check the OpsMgr event logs on our management servers, check for any new or strange alerts coming in, and ensure that there are no issues after the update.

image

Known issues:

See the existing list of known issues documented in the KB article.

1.  Many people are reporting that the SQL script is failing to complete when executed.  You should attempt to run this multiple times until it completes without error.  You might need to stop the Exchange correlation engine, stop all the SCOM services on the management servers, and/or bounce the SQL server services in order to get a successful completion in a busy management group.  The errors reported appear as below:

——————————————————
(1 row(s) affected)
(1 row(s) affected)
Msg 1205, Level 13, State 56, Line 1
Transaction (Process ID 152) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
Msg 3727, Level 16, State 0, Line 1
Could not drop constraint. See previous errors.
——————————————————–

Writing a service recovery script – Cluster service example

$
0
0

 

I had a customer request the ability to monitor the cluster service on clusters, and ONLY alert when a recovery attempt failed.

This is a fairly standard request for service monitoring when we use recoveries – we generally don’t want an alert to be generated from the Service Monitor, because that will be immediate upon service down detection.  We want the service monitor to detect the service down, then run a recovery, and then if the recovery fails to restore service, generate an alert.

Here is an example of that.

The cluster service monitor is unique, in that it already has a built in recovery.  However, it is too simple for our needs, as it only runs NET START.

image

 

So the first thing we will need to do, is create an override disabling this built in recovery:

image

 

Next – override the “Cluster service status” monitor to not generate alerts:

image

 

Now we can add our own script base recovery to the monitor:

image

 

image

 

And paste in a script which I will provide below.  Here is the script:

'========================================================================== ' ' COMMENT: This is a recovery script to recovery the Cluster Service ' '========================================================================== Option Explicit SetLocale("en-us") Dim StartTime,EndTime,sTime 'Capture script start time StartTime = Now 'Time that the script starts so that we can see how long it has been watching to see if the service stops again. Dim strTime strTime = Time Dim oAPI Set oAPI = CreateObject("MOM.ScriptAPI") Call oAPI.LogScriptEvent("ClusterServiceRecovery.vbs",3750,0,"Service Recovery script is starting") Dim strComputer, strService, strStartMode, strState, objCount, strClusterService 'The script will always be run on the machine that generated the monitor error strComputer = "." strClusterService = "ClusSvc" 'Record the current state of each service before recovery in an event Dim strClusterServicestate ServiceState(strClusterService) strClusterServicestate = strState Call oAPI.LogScriptEvent("ClusterServiceRecovery.vbs",3751,0,"Current service state before recovery is: " & strClusterService & " : " & strClusterServicestate) 'Stop script if all services are running If (strClusterServicestate = "Running") Then Call oAPI.LogScriptEvent("ClusterServiceRecovery.vbs",3752,2,"All services were found to be already running, recovery should not run, ending script") Wscript.Quit End If 'Check to see if a specific event has been logged previously that means this recovery script should NOT run if event is present 'This section optional and not commonly used Dim dtmStartDate, iCount, colEvents, objWMIService, objEvent ' Const CONVERT_TO_LOCAL_TIME = True ' Set dtmStartDate = CreateObject("WbemScripting.SWbemDateTime") ' dtmStartDate.SetVarDate dateadd("n", -60, now)' CONVERT_TO_LOCAL_TIME ' ' iCount = 0 ' Set objWMIService = GetObject("winmgmts:" _ ' & "{impersonationLevel=impersonate,(Security)}!\\" _ ' & strComputer & "\root\cimv2") ' Set colEvents = objWMIService.ExecQuery _ ' ("Select * from Win32_NTLogEvent Where Logfile = 'Application' and " _ ' & "TimeWritten > '" & dtmStartDate & "' and EventCode = 100") ' For Each objEvent In colEvents ' iCount = iCount+1 ' Next ' If iCount => 1 Then ' EndTime = Now ' sTime = DateDiff("s", StartTime, EndTime) ' Call oAPI.LogScriptEvent("ClusterServiceRecovery.vbs",3761,2,"script found event which blocks execution of this recovery. Recovery will not run. Script ending after " & sTime & " seconds") ' WScript.Quit ' ElseIf iCount < 1 Then ' Call oAPI.LogScriptEvent("ClusterServiceRecovery.vbs",3762,0,"script did not find any blocking events. Script will continue") ' End If 'At least one service is stopped to cause this recovery, stopping all three services so we can start them in order 'You would only use this section if you had multiple services and they needed to be started in a specific order ' Call oAPI.LogScriptEvent("ServiceRecovery.vbs",3753,0,"At least one service was found not running. Recovery will run. Attempting to stop all services now") ' ServiceStop(strService1) ' ServiceStop(strService2) ' ServiceStop(strService3) 'Check to make sure all services are actually in stopped state ' Optional Wait 15 seconds for slow services to stop ' Wscript.Sleep 15000 ServiceState(strClusterService) strClusterServicestate = strState 'Stop script if all services are not stopped If (strClusterServicestate <> "Stopped") Then Call oAPI.LogScriptEvent("ClusterServiceRecovery.vbs",3754,2,"Recovery script found service is not in stopped state. Manual intervention is required, ending script. Current service state is: " & strClusterService & " : " & strClusterServicestate) Wscript.Quit Else Call oAPI.LogScriptEvent("ClusterServiceRecovery.vbs",3755,0,"Recovery script verified all services in stopped state. Continuing.") End If 'Start services in order. Call oAPI.LogScriptEvent("ClusterServiceRecovery.vbs",3756,0,"Attempting to start all services") Dim errReturn 'Restart Services and watch to see if the command executed without error ServiceStart(strClusterService) Wscript.sleep 5000 'Check service state to ensure all services started ServiceState(strClusterService) strClusterServicestate = strState 'Log success or fail of recovery If (strClusterServicestate = "Running") Then Call oAPI.LogScriptEvent("ClusterServiceRecovery.vbs",3757,0,"All services were successfully started and then found to be running") Else Call oAPI.LogScriptEvent("ClusterServiceRecovery.vbs",3758,2,"Recovery script failed to start all services. Manual intervention is required. Current service state is: " & strClusterService & " : " & strClusterServicestate) End If 'Check to see if this recovery script has been run three times in the last 60 minutes for loop detection Set dtmStartDate = CreateObject("WbemScripting.SWbemDateTime") dtmStartDate.SetVarDate dateadd("n", -60, now)' CONVERT_TO_LOCAL_TIME iCount = 0 Set objWMIService = GetObject("winmgmts:" _ & "{impersonationLevel=impersonate,(Security)}!\\" _ & strComputer & "\root\cimv2") Set colEvents = objWMIService.ExecQuery _ ("Select * from Win32_NTLogEvent Where Logfile = 'Operations Manager' and " _ & "TimeWritten > '" & dtmStartDate & "' and EventCode = 3750") For Each objEvent In colEvents iCount = iCount+1 Next If iCount => 3 Then EndTime = Now sTime = DateDiff("s", StartTime, EndTime) Call oAPI.LogScriptEvent("ClusterServiceRecovery.vbs",3759,2,"script restarted " & strClusterService & " service 3 or more times in the last hour, script ending after " & sTime & " seconds") WScript.Quit ElseIf iCount < 3 Then EndTime = Now sTime = DateDiff("s", StartTime, EndTime) Call oAPI.LogScriptEvent("ClusterServiceRecovery.vbs",3760,0,"script restarted " & strClusterService & " service less than 3 times in the last hour, script ending after " & sTime & " seconds") End If Wscript.Quit '================================================================================== ' Subroutine: ServiceState ' Purpose: Gets the service state and startmode from WMI '================================================================================== Sub ServiceState(strService) Dim objWMIService, colRunningServices, objService Set objWMIService = GetObject("winmgmts:" _ & "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2") Set colRunningServices = objWMIService.ExecQuery _ ("Select * from Win32_Service where Name = '"& strService & "'") For Each objService in colRunningServices strState = objService.State strStartMode = objService.StartMode Next End Sub '================================================================================== ' Subroutine: ServiceStart ' Purpose: Starts a service '================================================================================== Sub ServiceStart(strService) Dim objWMIService, colRunningServices, objService, colServiceList Set objWMIService = GetObject("winmgmts:" _ & "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2") Set colServiceList = objWMIService.ExecQuery _ ("Select * from Win32_Service where Name='"& strService & "'") For Each objService in colServiceList errReturn = objService.StartService() Next End Sub '================================================================================== ' Subroutine: ServiceStop ' Purpose: Stops a service '================================================================================== Sub ServiceStop(strService) Dim objWMIService, colRunningServices, objService, colServiceList Set objWMIService = GetObject("winmgmts:" _ & "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2") Set colServiceList = objWMIService.ExecQuery _ ("Select * from Win32_Service where Name='"& strService & "'") For Each objService in colServiceList errReturn = objService.StopService() Next End Sub

 

Here it is inserted into the UI.  I provide a 3 minute timeout for this one:

 

image

 

Here is how it will look once added:

image

 

Now – we need to generate an alert when the script detects that it failed to start the service:

image

 

Provide a name and we will target the same class as the service monitor:

image

 

For the expression – the ID comes from the event generated by the recovery script, and the string search makes sure we are only alerting on a Cluster service recovery, if we reuse the script for other services we need to be able to distinguish from them:

image

 

 

Lets test!

If we just simply stop the Cluster Service – the recovery kicks in and see evidence in the state changes, and event log:

 

image

 

I like REALLY verbose logging in the scripts I write…. more is MUCH better than less especially when troubleshooting, and recoveries should not be running often clogging up the logs.

image

image

image

image

 

image

image

 

 

If the recovery fails to start the service – the script detects this – drops a very specific event, and then an alert is generated for the service being down and manual intervention required:

 

image

 

image

 

 

There we have it – we only get alerts if the service is not recoverable.  This makes SCOM more actionable.  If we want a record of this for reporting – we can collect the events for recovery starting, and then report on those events.

You can download this example MP at:

https://gallery.technet.microsoft.com/Cluster-Service-Recovery-270ca2cd

UR9 for SCOM 2012 R2 – Step by Step

$
0
0

 

 

 

image48

 

NOTE:  I get this question every time we release an update rollup:   ALL SCOM Update Rollups are CUMULATIVE.  This means you do not need to apply them in order, you can always just apply the latest update.  If you have deployed SCOM 2012R2 and never applied an update rollup – you can go strait to the latest one available.  If you applied an older one (such as UR3) you can always go straight to the latest one!

 

 

KB Article for OpsMgr:  https://support.microsoft.com/en-us/kb/3129774

Download catalog site:  http://catalog.update.microsoft.com/v7/site/Search.aspx?q=3129774

 

Key fixes:

  • SharePoint workflows fail with an access violation under APM
    A certain sequence of the events may trigger an access violation in APM code when it tries to read data from the cache during the Application Domain unload. This fix resolves this kind of behavior.
  • Application Pool worker process crashes under APM with heap corruption
    During the Application Domain unload two threads might try to dispose of the same memory block leading to DOUBLE FREE heap corruption. This fix makes sure that memory is disposed of only one time.
  • Some Application Pool worker processes become unresponsive if many applications are started under APM at the same time
    Microsoft Monitoring Agent APM service has a critical section around WMI queries it performs. If a WMI query takes a long time to complete, many worker processes are waiting for the active one to complete the call. Those application pools may become unresponsive, depending on the wait duration. This fix eliminates the need in WMI query and significantly improves the performance of this code path.
  • MOMAgent cannot validate RunAs Account if only RODC is available
    If there's a read-only domain controller (RODC), the MonAgent cannot validate the RunAs account. This fix resolves this issue.
  • Missing event monitor does not warn within the specified time range in SCOM 2012 R2 the first time after restart
    When you create a monitor for a missed event, the first alert takes twice the amount of time specified time in the monitor. This fix resolves the issue, and the alert is generated in the time specified.
  • SCOM cannot verify the User Account / Password expiration date if it is set by using Password Setting object
    Fine grained password policies are stored in a different container from the user object container in Active Directory. This fix resolves the problems in computing resultant set of policy (RSOP) from these containers for a user object.
  • SLO Detail report displays histogram incorrectly
    In some specific scenarios, the representation of the downtime graph is not displayed correctly. This fix resolves this kind of behavior.
  • APM support for IIS 10 and Windows Server 2016
    Support of IIS 10 on Windows Server 2016 is added for the APM feature in System Center 2012 R2 Operations Manager. An additional management pack Microsoft.SystemCenter.Apm.Web.IIS10.mp is required to enable this functionality. This management pack is located in %SystemDrive%\Program Files\System Center 2012 R2\Operations Manager\Server\Management Packs for Update Rollups alongside its dependencies after the installation of Update Rollup 9.
    Important Note One dependency is not included in Update Rollup 9 and should be downloaded separately:

    Microsoft.Windows.InternetInformationServices.2016.mp

  • APM Agent Modules workflow fail during workflow shutdown with Null Reference Exception
    The Dispose() method of Retry Manager of APM connection workflow is executed two times during the module shutdown. The second try to execute this Dispose() method may cause a Null Reference Exception. This fix makes sure that the Dispose() method can be safely executed one or more times.
  • AEM Data fills up SCOM Operational database and is never groomed out
    If you use SCOM’s Agentless Exception Monitoring to examine application crash data and report on it, the data never grooms out of the SCOM Operational database. The problem with this is that soon the SCOM environment will be overloaded with all the instances and relationships of the applications, error groups, and Windows-based computers, all which are hosted by the management servers. This fix resolves this issue. Additionally, the following management pack’s must be imported in the following order:
    • Microsoft.SystemCenter.ClientMonitoring.Library.mp
    • Microsoft.SystemCenter.DataWarehouse.Report.Library.mp
    • Microsoft.SystemCenter.ClientMonitoring.Views.Internal.mp
    • Microsoft.SystemCenter.ClientMonitoring.Internal.mp
  • The DownTime report from the Availability report does not handle the Business Hours settings
    In the downtime report, the downtime table was not considering the business hours. This fix resolves this issue and business hours will be shown based on the specified business hour values.
    The updated RDL files are located in the following location:

    %SystemDrive%\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\Reporting

    To update the RDL file, follow these steps:

    1. Go to http://MachineName/Reports_INSTANCE1/Pages/Folder.aspxMachineName //Reporting Server.
    2. On this page, go to the folder to which you want to add the RDL file. In this case, click Microsoft.SystemCenter.DataWarehouse.Report.Library.
    3. Upload the new RDL files by clicking the upload button at the top. For more information, see https://msdn.microsoft.com/en-us/library/ms157332.aspx.
  • Adding a decimal sign in an SLT Collection Rule SLO in the ENU Console on a non-ENU OS does not work
    You run the System Center 2012 R2 Operations Manager Console in English on a computer that has the language settings configured to use a non-English (United States) language that uses a comma (,) as the decimal sign instead of a period (.). When you try to create Service Level Tracking, and you want to add a Collection Rule SLO, the value you enter as the threshold cannot be configured by using a decimal sign. This fix resolves the issue.
  • SCOM Agent issue while logging Operations Management Suite (OMS) communication failure
    An issue occurs when OMS communication failures are logged. This fix resolves this issue.

 

There are no updates for Linux, and there are no updated MP’s for Linux in this update as of this time.  The most current Linux MP’s are available below in the Linux section

 

Lets get started.

From reading the KB article – the order of operations is:

  1. Install the update rollup package on the following server infrastructure:
    • Management servers
    • Gateway servers
    • Web console server role computers
    • Operations console role computers
  2. Apply SQL scripts.
  3. Manually import the management packs.
  4. Update Agents

Now, NORMALLY we need to add another step – if we are using Xplat monitoring – need to update the Linux/Unix MP’s and agents.   However, in UR8 and UR9 for SCOM 2012 R2, there are no updates for Linux

 

 

 

1.  Management Servers

image

Since there is no RMS anymore, it doesn’t matter which management server I start with.  There is no need to begin with whomever holds the RMSe role.  I simply make sure I only patch one management server at a time to allow for agent failover without overloading any single management server.

I can apply this update manually via the MSP files, or I can use Windows Update.  I have 3 management servers, so I will demonstrate both.  I will do the first management server manually.  This management server holds 3 roles, and each must be patched:  Management Server, Web Console, and Console.

The first thing I do when I download the updates from the catalog, is copy the cab files for my language to a single location:

Then extract the contents:

image

Once I have the MSP files, I am ready to start applying the update to each server by role.

***Note:  You MUST log on to each server role as a Local Administrator, SCOM Admin, AND your account must also have System Administrator (SA) role to the database instances that host your OpsMgr databases.

My first server is a management server, and the web console, and has the OpsMgr console installed, so I copy those update files locally, and execute them per the KB, from an elevated command prompt:

image

This launches a quick UI which applies the update.  It will bounce the SCOM services as well.  The update usually does not provide any feedback that it had success or failure. 

I got a prompt to restart:

image

I choose yes and allow the server to restart to complete the update.

 

You can check the application log for the MsiInstaller events to show completion:

Log Name:      Application
Source:        MsiInstaller
Date:          1/27/2016 9:37:28 AM
Event ID:      1036
Description:
Windows Installer installed an update. Product Name: System Center Operations Manager 2012 Server. Product Version: 7.1.10226.0. Product Language: 1033. Manufacturer: Microsoft Corporation. Update Name: System Center 2012 R2 Operations Manager UR9 Update Patch. Installation success or error status: 0.

You can also spot check a couple DLL files for the file version attribute. 

image

Next up – run the Web Console update:

image

This runs much faster.   A quick file spot check:

image

Lastly – install the console update (make sure your console is closed):

image

A quick file spot check:

image

 

 

Additional Management Servers:

image

I now move on to my additional management servers, applying the server update, then the console update and web console update where applicable.

On this next management server, I will use the example of Windows Update as opposed to manually installing the MSP files.  I check online, and make sure that I have configured Windows Update to give me updates for additional products: 

image

The applicable updates show up under optional – so I tick the boxes and apply these updates.

After a reboot – go back and verify the update was a success by spot checking some file versions like we did above.

 

 

Updating Gateways:

image

I can use Windows Update or manual installation.

image

The update launches a UI and quickly finishes.

Then I will spot check the DLL’s:

image

I can also spot-check the \AgentManagement folder, and make sure my agent update files are dropped here correctly:

image

 

***NOTE:  You can delete any older UR update files from the \AgentManagement directories.  The UR’s do not clean these up and they provide no purpose for being present any longer.

 

 

 

2. Apply the SQL Scripts

In the path on your management servers, where you installed/extracted the update, there are two SQL script files: 

%SystemDrive%\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\SQL Script for Update Rollups

(note – your path may vary slightly depending on if you have an upgraded environment of clean install)

image

First – let’s run the script to update the OperationsManager database.  Open a SQL management studio query window, connect it to your Operations Manager database, and then open the script file.  Make sure it is pointing to your OperationsManager database, then execute the script.

You should run this script with each UR, even if you ran this on a previous UR.  The script body can change so as a best practice always re-run this.

image

Click the “Execute” button in SQL mgmt. studio.  The execution could take a considerable amount of time and you might see a spike in processor utilization on your SQL database server during this operation.  I have had customers state this takes from a few minutes to as long as an hour. In MOST cases – you will need to shut down the SDK, Config, and Monitoring Agent (healthservice) on ALL your management servers in order for this to be able to run with success.

You will see the following (or similar) output:

image47

or

image

IF YOU GET AN ERROR – STOP!  Do not continue.  Try re-running the script several times until it completes without errors.  In a production environment, you almost certainly have to shut down the services (sdk, config, and healthservice) on your management servers, to break their connection to the databases, to get a successful run.

Technical tidbit:   Even if you previously ran this script in UR1, UR2, UR3, UR4, UR5, UR6, UR7, or UR8, you should run this again for UR9, as the script body can change with updated UR’s.

image

Next, we have a script to run against the warehouse DB.  Do not skip this step under any circumstances.    From:

%SystemDrive%\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\SQL Script for Update Rollups

(note – your path may vary slightly depending on if you have an upgraded environment of clean install)

Open a SQL management studio query window, connect it to your OperationsManagerDW database, and then open the script file UR_Datawarehouse.sql.  Make sure it is pointing to your OperationsManagerDW database, then execute the script.

If you see a warning about line endings, choose Yes to continue.

image

Click the “Execute” button in SQL mgmt. studio.  The execution could take a considerable amount of time and you might see a spike in processor utilization on your SQL database server during this operation.

You will see the following (or similar) output:

image

 

 

 

3. Manually import the management packs

image

There are 55 management packs in this update!   Most of these we don’t need – so read carefully.

The path for these is on your management server, after you have installed the “Server” update:

\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\Management Packs for Update Rollups

However, the majority of them are Advisor/OMS, and language specific.  Only import the ones you need, and that are correct for your language.  I will remove all the MP’s for other languages (keeping only ENU), and I am left with the following:

image

 

What NOT to import:

The Advisor MP’s are only needed if you are using Microsoft Operations Management Suite cloud service, (Previously known as Advisor, and Operation Insights).

The APM MP’s are only needed if you are using the APM feature in SCOM.

Note the APM MP with a red X.  This MP requires the IIS MP’s for Windows Server 2016 which are in Technical Preview at the time of this writing.  Only import this if you are using APM *and* you need to monitor Windows Server 2016.  If so, you will need to download and install the technical preview editions of that MP from https://www.microsoft.com/en-us/download/details.aspx?id=48256

The TFS MP bundle is only used for specific scenarios, such as DevOps scenarios where you have integrated APM with TFS, etc.  If you are not currently using these MP’s, there is no need to import or update them.  I’d skip this MP import unless you already have these MP’s present in your environment.

However, the Image and Visualization libraries deal with Dashboard updates, and these always need to be updated.

I import all of these shown without issue.

 

 

4.  Update Agents

image43_thumb

Agents should be placed into pending actions by this update for any agent that was not manually installed (remotely manageable = yes):  

 

One the Management servers where I used Windows Update to patch them, their agents did not show up in this list.  Only agents where I manually patched their management server showed up in this list.  FYI.   The experience is NOT the same when using Windows Update vs manual.  If yours don’t show up – you can try running the update for that management server again – manually.

image

If your agents are not placed into pending management – this is generally caused by not running the update from an elevated command prompt, or having manually installed agents which will not be placed into pending.

In this case – my agents that were reporting to a management server that was updated using Windows Update – did NOT place agents into pending.  Only the agents reporting to the management server for which I manually executed the patch worked.

I manually re-ran the server MSP file manually on these management servers, from an elevated command prompt, and they all showed up:

image

 

 

You can approve these – which will result in a success message once complete:

image

Soon you should start to see PatchList getting filled in from the Agents By Version view under Operations Manager monitoring folder in the console:

image

 

 

 

5.  Update Unix/Linux MPs and Agents

image

There are no updates for Linux in UR9 at the time of this writing.   The current Linux MP’s can be downloaded from:

https://www.microsoft.com/en-us/download/details.aspx?id=29696

7.5.1045.0 is current at this time for SCOM 2012 R2 and these shipped with UR7.  If you are already running 7.5.1045.0 version of the Linux MP’s and agents – no update is necessary.

****Note – take GREAT care when downloading – that you select the correct download for R2.  You must scroll down in the list and select the MSI for 2012 R2:

image

Download the MSI and run it.  It will extract the MP’s to C:\Program Files (x86)\System Center Management Packs\System Center 2012 R2 Management Packs for Unix and Linux\

Update any MP’s you are already using.   These are mine for RHEL, SUSE, and the Universal Linux libraries. 

image

You will likely observe VERY high CPU utilization of your management servers and database server during and immediately following these MP imports.  Give it plenty of time to complete the process of the import and MPB deployments.

Next up – you would upgrade your agents on the Unix/Linux monitored agents.  You can now do this straight from the console:

image

image

You can input credentials or use existing RunAs accounts if those have enough rights to perform this action.

Mine FAILED, with an SSH exception about copying the new agent.  It turns out my files were not updated on the management server – see pic:

image

I had to restart the Healthservice on the management server, and within a few minutes all the new files were there.

Finally:

image

 

 

6.  Update the remaining deployed consoles

image

This is an important step.  I have consoles deployed around my infrastructure – on my Orchestrator server, SCVMM server, on my personal workstation, on all the other SCOM admins on my team, on a Terminal Server we use as a tools machine, etc.  These should all get the matching update version.

 

 

 

Review:

Now at this point, we would check the OpsMgr event logs on our management servers, check for any new or strange alerts coming in, and ensure that there are no issues after the update.

image

Known issues:

See the existing list of known issues documented in the KB article.

1.  Many people are reporting that the SQL script is failing to complete when executed.  You should attempt to run this multiple times until it completes without error.  You might need to stop the Exchange correlation engine, stop all the SCOM services on the management servers, and/or bounce the SQL server services in order to get a successful completion in a busy management group.  The errors reported appear as below:

——————————————————
(1 row(s) affected)
(1 row(s) affected)
Msg 1205, Level 13, State 56, Line 1
Transaction (Process ID 152) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
Msg 3727, Level 16, State 0, Line 1
Could not drop constraint. See previous errors.
——————————————————–

Removing / Migrating old Management Servers to new ones

$
0
0

 

This is a common practice for rotating old physical servers coming off lease, or when moving VM based management servers to a new operating system. 

 

There are some generic instructions on TechNet here:  https://technet.microsoft.com/en-us/library/hh456439.aspx   however, these don’t really paint the whole picture of what all should be checked first.  Customers sometimes run into orphaned objects, or management servers they cannot delete because the MS is hosting remote monitoring activities.

Here is a checklist I have put together, the steps are not necessarily enforced in this order… so you can rearrange much of this as you see fit.

 

  • Install new management server(s)
  • Configure any registry modifications in place on existing management servers for the new MS
  • Patch new MS with current UR to bring parity with other management servers in the management group.
  • If you have gateways reporting to old management servers, install certificates from the same trusted publisher on the new MS, and then use PowerShell to change GW to MS assignments.
  • Inspect Resource pools. Make sure old management server is removed from any Resource pools with manual membership, and place new management servers in those resource pools.
  • If you have any 3rd party service installations, ensure they are installed as needed on new MS (connector services, hardware monitoring add-ons.
  • If you have any hard coded script or EXE paths in place for notifications or scheduled tasks, ensure those are moved.
  • If you run the Exchange 2010 Correlation engine – ensure it is moved to a new MS.
  • If you use any URL watcher nodes hard coded to a management server – ensure those are moved to a new MS. (Web Transaction Monitoring)
  • If you have any other watcher nodes – migrate those templates (OLEDB probe, port, etc.)
  • If you have any custom registry keys in place on a MS, to discover it as a custom class for any reason, ensure these are migrated.
  • If you have any special roles, such as the RMSe - migrate them.
  • Ensure the new MS will host optional roles such as web console or console roles if required.
  • Migrate any agent assignments in the console or AD integration.
  • Ensure you have BOTH management servers online for a considerable time to allow all agents to get updated config – otherwise you will orphan the agents until they know about the new management server.
  • If you perform UNIX/LINUX monitoring, these should migrate with resource pools. You will need to import and export SCX certs for the new management servers that will take part in the pool.
  • If you use IM notifications, ensure the prerequisites are installed on the new MS.
  • Ensure any new management servers are allowed to send email notifications to your SMTP server if it uses an access list.
  • If you have any network devices, ensure the discovery is moved to another MS for any MS that is being removed.
  • If you are using AEM, ensure this role is reconfigured for any retiring MS.
  • If you are using ACS and the collector role needs to be migrated, perform this and update the forwarders to their new collector.
  • If you have customized heartbeat settings for the management server, ensure this consistent.
  • If you have any agentless monitored systems (rare) move their proxy server.
  • If you were running a hardware load balancer for the SDK service connections – remove the old management servers and add new ones.
  • Review event logs on new management servers and ensure there aren't any major health issues.
  • Uninstall old management server gracefully.
  • Delete management server object in console if required post-uninstall.

 

If you have any additional steps you feel should be part of this list – feel free to comment.

Event 18054 errors in the SQL application log – in SCOM 2012 R2 deployments

$
0
0

 

I wrote about this issue for SCOM 2007 here:

http://blogs.technet.com/b/kevinholman/archive/2010/10/26/after-moving-your-operationsmanager-database-you-might-find-event-18054-errors-in-the-sql-server-application-log.aspx

When SCOM is installed – it doesn’t just create the databases on the SQL instance – it adds data to the sysmessages view for different error scenarios, to the master database for the instance.

This is why after moving a database, or restoring a DB backup to a rebuilt SQL server, we might end up missing this data. 

These are important because they give very good detailed data about the error and how to resolve it.  If you see these – you need to update your SQL instance with some scripts.

Examples of these events on the SQL server:

Log Name:      Application
Source:        MSSQL$I01
Date:          10/23/2010 5:40:14 PM
Event ID:      18054
Task Category: Server
Level:         Error
Keywords:      Classic
User:          OPSMGR\msaa
Computer:      SQLDB1.opsmgr.net
Description:
Error 777980007, severity 16, state 1 was raised, but no message with that error number was found in sys.messages. If error is larger than 50000, make sure the user-defined message is added using sp_addmessage.

You might also notice some truncated events in the OpsMgr event log, on your RMS or management servers:

Event Type:    Warning
Event Source:    DataAccessLayer
Event Category:    None
Event ID:    33333
Date:        10/23/2010
Time:        5:40:13 PM
User:        N/A
Computer:    OMMS3
Description:
Data Access Layer rejected retry on SqlError:
Request: p_DiscoverySourceUpsert — (DiscoverySourceId=f0c57af0-927a-335f-1f74-3a3f1f5ca7cd), (DiscoverySourceType=0), (DiscoverySourceObjectId=74fb2fa8-94e5-264d-5f7e-57839f40de0f), (IsSnapshot=True), (TimeGenerated=10/23/2010 10:37:36 PM), (BoundManagedEntityId=3304d59d-5af5-ba80-5ba7-d13a07ed21d4), (IsDiscoveryPackageStale=), (RETURN_VALUE=1)
Class: 16
Number: 18054
Message: Error 777980007, severity 16, state 1 was raised, but no message with that error number was found in sys.messages. If error is larger than 50000, make sure the user-defined message is added using sp_addmessage.

Event Type:    Error
Event Source:    Health Service Modules
Event Category:    None
Event ID:    10801
Date:        10/23/2010
Time:        5:40:13 PM
User:        N/A
Computer:    OMMS3
Description:
Discovery data couldn't be inserted to the database. This could have happened because  of one of the following reasons:

     – Discovery data is stale. The discovery data is generated by an MP recently deleted.
     – Database connectivity problems or database running out of space.
     – Discovery data received is not valid.

The following details should help to further diagnose:

DiscoveryId: 74fb2fa8-94e5-264d-5f7e-57839f40de0f
HealthServiceId: bf43c6a9-8f4b-5d6d-5689-4e29d56fed88
Error 777980007, severity 16, state 1 was raised, but no message with that error number was found in sys.messages. If error is larger than 50000, make sure the user-defined message is added using sp_addmessage..

 

I have created some SQL scripts which are taken from the initial installation files, and you can download them below.  You simply run them in SQL Management studio to get this data back.

These are for SCOM 2012 R2 ONLY!!!!

 

Download link:   https://gallery.technet.microsoft.com/SQL-to-fix-event-18054-c4375367

Alert Lifecycle Management

$
0
0

 

Sometimes – this is almost a dirty word in some companies.  It is applying an ITSM process around monitoring, to ensure alerts are real, actionable, assigned, accountable, and reportable.

In my travels, I see companies with an excellent process around this.  I also see companies with ZERO process. 

My colleague Nathan Gau has a 3-part series on this topic – check it out over here:

 

http://blogs.technet.com/b/nathangau/archive/2016/02/04/the-anatomy-of-a-good-scom-alert-management-process-part-1-why-is-alert-management-necessary.aspx

The impact of moving databases in SCOM

$
0
0

 

I recently had an interesting customer issue.

We were deploying a new management group to do some performance testing of the impact to SCOM performance as we scale up agents.  This particular management group only had the default MP’s from installing SCOM, and the Base OS MP’s.  Nothing more.

When we scaled up to ~2000 agents, we took a checkpoint at performance.  The console was zippy, and the management servers were having no issues.  However – when we analyzed performance on the database, we saw really high CPU.

image

 

Zooming into a smaller time chunk – the CPU was pretty wild:

 

image

 

What we found – was that the customer had moved the SCOM databases to a different server than originally installed to.  When they did this – they did not fully follow the TechNet instructions, to ensure that SQL Broker is enabled and CLR is enabled.

You can check this :

SQL Broker:

SELECT is_broker_enabled FROM sys.databases WHERE name='OperationsManager'

CLR:

SELECT * FROM sys.configurations WHERE name = 'clr enabled'

Both should return a value of “1” to show they are enabled.

Changing these values are covered here:  https://technet.microsoft.com/en-ca/library/hh278848.aspx

Always make sure you handle the other changes necessary when moving a database, and don’t forget to add the sysmessages back, documented here:  Event 18054 errors in the SQL application log – in SCOM 2012 R2 deployments

 

After making these changes – the impact was significant, going from 50% avg CPU consumption, to 11%.

 

24 hour snapshot:

image

One hour snapshot:

image

 

Whenever you visit a SCOM customer, or inherit a SCOM environment that you don’t know the full history on, they might not have these settings optimized, and they might not even be aware they are impacted, especially if their agent count is low.  There are other symptoms you’d see, such as regular expressions failing in the logs without CLR enabled, and agent discovery not working without SQL broker…. but always a good thing to inspect when reviewing the health of a deployment.


Windows 10 Client MP’s are available

$
0
0

 

image

 

Download here:    https://www.microsoft.com/en-us/download/details.aspx?id=51189

The client OS MP’s are available when you need to monitor Windows clients in your SCOM management group.  These might be “light” monitoring of desktops and laptops in the organization, or these might be for mission critical roles such as Kiosks and ATM type machines running a Windows client OS.

 

image

The MP’s will upgrade your base client library (still has a name referencing to SCOM 2007 but these are applicable to SCOM 2012) and will import additional MP’s specific to discovering and monitoring Windows 10 clients.

 

image

If you are importing this MP for Windows 10 clients, and you also already monitor Windows 8 clients, make SURE you update your Windows 8 MP’s to the latest version 6.0.7251.0 available here:  https://www.microsoft.com/en-us/download/details.aspx?id=38434    6.0.7251.0 MP’s contain a fix to stop discovering a Win10 client as a Windows 8 client, otherwise you will get duplicate monitoring and overload your Win10 clients unnecessarily.  Make sure you upgrade the Windows 8 MP’s FIRST before installing the agents on any Windows 10 clients.  If you still have duplicate instances of Windows 8 Computer for a Windows 10 client, you need to delete the agent from Agent Managed in SCOM, then approve them again, and this will clean up the old discovered objects from the Windows 8 client MP’s.

 

Individual workflows are enabled on every client computer, to discover and monitor disks, memory, CPU, etc.  However, the monitors are all set to not generate alerts via overrides.  You have to put clients in a “Business Critical” group in order to see alerts for these clients.  However, the monitors will still show health state for all clients.  Just not alerts.

Same goes for performance collection rules.  There are overrides to enable these (all disabled out of the box) and collect performance data for business critical computers.

The guide also discusses the use of aggregate client monitoring.  These load special workflows that fill the data warehouse with trending reports, and run SQL queries against the warehouse on a regular basis.  Make sure you DON’T import the Aggregate MP’s if you don’t want or need this type of monitoring, as it is optional.

See the MP guide for advanced details on how to configure this MP, and other client OS management packs.

Base OS MP’s have been updated – version 6.0.7303.0

$
0
0

 

***WARNING***  There are some significant issues in this release of the Base OS MP, I do not recommend applying this one until an updated version comes out.

Issues:

  • Cluster Disks on Server 2008R2 clusters are no longer discovered as cluster disks.
  • Cluster Disks on Server 2008 clusters are not discovered as logical disks.
  • Quorum (or small size) disks on clusters that ARE discovered as Cluster disks, do not monitor for free space correctly.
  • Cluster shared volumes are discovered twice, once as a Cluster Shared Volume instance, and once as a Logical disk instance, with the latter likely cause by enabling mounted disk discovery.
  • On Hyper-V servers, I discover an extra disk, which has no properties:

image

 

 

What was changed?

 

From the guide:

MP used to discover physical CPU, which performance monitor instance name property was not correlated with Windows PerfMon object (expecting instance name in (socket, core) format). That affected related rules and monitors. With this release, MP discovers logical processors, rather than physical, and populates performance monitor instance name in proper format

That was a real problem for anyone trying to monitor individual CPU’s in the past – we actually discovered “sockets” not cores – so this didn’t jive with Perfmon at all.  I look forward to testing this.

Microsoft.Windows.Server.ClusterSharedVolumeMonitoring.mp and Microsoft.Windows.Server.Library.mp scripts code migration to PowerShell in scope of Windows Server 2016 Nano support (relevantly introduced in Windows Server 2016 MP version 10.0.1.0).

It is these changes that likely broke cluster disk discovery.

Updated Microsoft.Windows.Server.ClusterSharedVolumeMonitoring.ClusterSharedVolume.Monitoring.State monitor alert properties and description. The fix resolved property replacement failure warning been generated on monitor alert firing.

Exchange 2013 Addendum MP – for Exchange 2013 and 2016

$
0
0

 

image

 

 

 

The Exchange 2013 MP has been released for some time now.  The current version at this writing is 15.0.666.19 which you can get HERE

This MP can be used to discover and monitor Exchange Server 2013 and 2016.

 

 

 

 

However, one of the things I always disliked about this MP – is that it does not use a seed class discovery.  Therefore – it runs a PowerShell script every 4 hours on EVERY machine in your management group, looking for Exchange servers.  The problem with this, is that it doesn’t follow best practices.  As a general best practice, we should NOT run scripts on all servers unless truly necessary.  Another issue – many customers have servers running 2003 and 2008 that DON’T have PowerShell installed!  You will see nuisance events like the following:

 

Event Type:    Error
Event Source:    Health Service Modules
Event Category:    None
Event ID:    21400
Date:        3/2/2016
Time:        3:29:26 AM
User:        N/A
Computer:    WINS2003X64
Description:
Failed to create process due to error '0x80070003 : The system cannot find the path specified.
', this workflow will be unloaded.
Command executed:    "C:\WINDOWS\System32\WindowsPowerShell\v1.0\powershell.exe" -PSConsoleFile "bin\exshell.psc1" -Command "& '"C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\Monitoring Host Temporary Files 85\26558\MicrosoftExchangeDiscovery.ps1"'" 0 '{3E7D658E-FA5E-924E-334E-97C84E068C4A}' '{B21B34F9-2817-4800-73BD-012E79609F7E}' 'wins2003x64.dmz.corp' 'wins2003x64' 'Default-First-Site-Name' 'dmz.corp' '' '' '0' 'false'
Working Directory:    C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\Monitoring Host Temporary Files 85\26558\
One or more workflows were affected by this. 
Workflow name: Microsoft.Exchange.15.Server.DiscoveryRule
Instance name: wins2003x64.dmz.corp
Instance ID: {B21B34F9-2817-4800-73BD-012E79609F7E}
Management group: OMMG1

 

 

So, I have created an addendum MP which should resolve this.  My MP creates a class and discovery, looking for “HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\ExchangeServer\v15\Setup\MsiInstallPath” in the registry.  If it finds the registry path, SCOM will add it as an instance of my seed class.

image

 

Then, I created a group of Windows Computer objects that “contain” an instance of the seed class. 

image

 

Next, I added an override to disable the main script discovery the Exchange 2013 MP.

Finally, I added an override to enable this same discovery, for my custom group.  This should have the effect that our Exchange discovery script ONLY runs on server that actually have Exchange installed (based on the registry key)

image

 

 

This works for discovering Exchange 2013 and Exchange 2016 with the current Exchange 2013 MP.

 

You can download this sample MP at the following location:

https://gallery.technet.microsoft.com/Exchange-Server-2013-and-cfdfcf2f

How to generate an alert and make it look like it came from someone else

$
0
0

 

This capability has been around forever, but I have never seen it documented.  This is a really cool way to generate alerts as if they came from other agents, but target a different agent.

Suppose a scenario:  You have a client/server application (such as a backup program) where a central server logs all the events about success or failed jobs from clients.

This is scenario – we could simply generate alerts targeting the central server, and reading the event log, and bubble up the broken client name from the logs, into the alert.  The challenge becomes, what if some agents are test, or dev, and some are prod?  What if we have already put in place “tiering” of servers by groupings, and we use this to filter which alerts from which servers get ticketed?

There is actually a way to target one instance of a class with a workflow, but to generate alerts as if they came from a different instance of a different class, EVEN if that instance is a different agent altogether!

Let me demonstrate:

The most common write action for generating alerts in rules, is System.Health.GenerateAlert, which is the one commonly used in every Alert Generating rule you typically come across.  It is documented here:  https://msdn.microsoft.com/en-us/library/ee809352.aspx

HOWEVER – there is another write action you can use:  System.Health.GenerateAlertForType. 

This is documented here:  https://msdn.microsoft.com/en-us/library/jj130310.aspx  While we document the modules and a sample XML example, we don’t really give much guidance anywhere on use cases.

This is a really cool write action, which allows us to generate alerts “on behalf” of a different object type, or even a different object type from a different computer!  Let me show the difference:

A typical System.Health.GenerateAlert looks like this:

<WriteAction ID="GenerateAlert" TypeID="Health!System.Health.GenerateAlert"> <Priority>1</Priority> <Severity>1</Severity> <AlertMessageId>$MPElement[Name="Demo.Rule.AlertMessage"]$</AlertMessageId> <AlertParameters> <AlertParameter1>$Data/EventDescription$</AlertParameter1> </AlertParameters> </WriteAction>

As you can see – very simple.  It sets the priority and severity of the alert, references the Alert Message ID (which is the alert name and description configuration) and contains any alert parameters we want to use in the display output (in this case, Event Description is very common).

 

Now, see the System.Health.GenerateAlertForType:

<WriteAction ID="GenerateAlertForTypeWA" TypeID="Health!System.Health.GenerateAlertForType"> <Priority>1</Priority> <Severity>1</Severity> <ManagedEntityTypeId>$MPElement[Name="Example.Client.Class"]$</ManagedEntityTypeId> <KeyProperties> <KeyProperty> <PropertyId>$MPElement[Name="Windows!Microsoft.Windows.Computer"]/PrincipalName$</PropertyId> <IsCaseSensitive>false</IsCaseSensitive> <Value>servername.fqdn.local</Value> </KeyProperty> <KeyProperty> <PropertyId>$MPElement[Name="Example.Client.Class"]/ClientName$</PropertyId> <IsCaseSensitive>false</IsCaseSensitive> <Value>servername.fqdn.local</Value> </KeyProperty> </KeyProperties> <AlertMessageId>$MPElement[Name="Demo.Rule.AlertMessage"]$</AlertMessageId> <AlertParameters> <AlertParameter1>$Data/EventDescription$</AlertParameter1> </AlertParameters> </WriteAction>

The key section here is <ManagedEntityTypeId> and then some <KeyProperties>

In the <ManagedEntityTypeId> we need to reference the CLASS that we want the alert to appear as it is coming FROM.

Then, in the <KeyProperties> we need two sections:

The first key property is mapping the Windows Computer principal name to the fqdn of the agent we want the alert to “appear to be from”.  This part is easy.

The second key property is mapping the SAME fqdn, to a matching property on the CLASS we referenced in <ManagedEntityTypeId>, or a parent base class that has the key property defined.

The second key property is the tough one.  The criteria for this to work (from my testing) is that we MUST have a class with a key property first, and that key property MUST be the fqdn of the agent/server for each instance (or whatever value we are “matching” on.

In most of my classes I create, I don’t create key properties.  Key properties aren't required unless I have a class that will discover multiple instances on the same healthservice (agent).  For stuff I do – this is rarely the case.  However, it is EASY to create a key property for your custom classes, and many Microsoft classes already have key properties.  The big “gotchya” here is that in order to generate an alert for another instance of a class (not the targeted instance), the class we specify MUST have a key property defined for this to work.

So – I simply added a key property of “ClientName” to my custom class, and then to discover it, all I have to do is add some simple code to the discovery which maps the hosting Windows Computer principal name to the property.

Ok…. I know…. I probably lost a lot of you up to this point….. but it is easier to just do it, than it is to understand it.  That’s why I will post my XML examples at a link below.  Smile

 

Here is an example of me adding a custom key property to my custom class:

<ClassType ID="Example.AlertFromAnotherInstance.Client.Class" Accessibility="Public" Abstract="false" Base="Windows!Microsoft.Windows.LocalApplication" Hosted="true" Singleton="false" Extension="false"> <Property ID="ClientName" Type="string" AutoIncrement="false" Key="true" CaseSensitive="false" MaxLength="256" MinLength="0" Required="false" Scale="0" /> </ClassType>

And here is part of the discovery that I will use to map “ClientName” to the hosting Windows Computer principal name:

 

<Discovery ID="Example.AlertFromAnotherInstance.Client.Class.Discovery" Enabled="true" Target="Windows!Microsoft.Windows.Server.OperatingSystem" ConfirmDelivery="false" Remotable="true" Priority="Normal"> <Category>Discovery</Category> <DiscoveryTypes> <DiscoveryClass TypeID="Example.AlertFromAnotherInstance.Client.Class" /> </DiscoveryTypes> <DataSource ID="DS" TypeID="Windows!Microsoft.Windows.FilteredRegistryDiscoveryProvider"> <ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</ComputerName> <RegistryAttributeDefinitions> <RegistryAttributeDefinition> <AttributeName>ClientExists</AttributeName> <Path>SOFTWARE\Demo\Client</Path> <PathType>0</PathType> <AttributeType>0</AttributeType> </RegistryAttributeDefinition> </RegistryAttributeDefinitions> <Frequency>86400</Frequency> <ClassId>$MPElement[Name="Example.AlertFromAnotherInstance.Client.Class"]$</ClassId> <InstanceSettings> <Settings> <Setting> <Name>$MPElement[Name="Windows!Microsoft.Windows.Computer"]/PrincipalName$</Name> <Value>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</Value> </Setting> <Setting> <Name>$MPElement[Name="System!System.Entity"]/DisplayName$</Name> <Value>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</Value> </Setting> <Setting> <Name>$MPElement[Name="Example.AlertFromAnotherInstance.Client.Class"]/ClientName$</Name> <Value>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</Value> </Setting> </Settings> </InstanceSettings>

 

So – now all I need to do it write a rule, and use our new write action.

You can write the event rule like my example will do using the console, or any other tool, then simply modify the write action section in XML.

Here is my simple rule:

 

<Rule ID="Example.AlertFromAnotherInstance.Server.Event.Rule" Enabled="true" Target="Example.AlertFromAnotherInstance.CentralServer.Class" ConfirmDelivery="false" Remotable="true" Priority="Normal" DiscardLevel="100"> <Category>EventCollection</Category> <DataSources> <DataSource ID="Microsoft.Windows.EventCollector" TypeID="Windows!Microsoft.Windows.EventCollector"> <ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName> <LogName>Application</LogName> <AllowProxying>false</AllowProxying> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">EventDisplayNumber</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="UnsignedInteger">999</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="String">PublisherName</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="String">TEST</Value> </ValueExpression> </SimpleExpression> </Expression> </And> </Expression> </DataSource> </DataSources> <WriteActions> <WriteAction ID="GenerateAlertForTypeWA" TypeID="Health!System.Health.GenerateAlertForType"> <Priority>1</Priority> <Severity>2</Severity> <ManagedEntityTypeId>$MPElement[Name="Example.AlertFromAnotherInstance.Client.Class"]$</ManagedEntityTypeId> <KeyProperties> <KeyProperty> <PropertyId>$MPElement[Name="Windows!Microsoft.Windows.Computer"]/PrincipalName$</PropertyId> <IsCaseSensitive>false</IsCaseSensitive> <Value>$Data/Params/Param[1]$</Value> </KeyProperty> <KeyProperty> <PropertyId>$MPElement[Name="Example.AlertFromAnotherInstance.Client.Class"]/ClientName$</PropertyId> <IsCaseSensitive>false</IsCaseSensitive> <Value>$Data/Params/Param[1]$</Value> </KeyProperty> </KeyProperties> <AlertMessageId>$MPElement[Name="Example.AlertFromAnotherInstance.Server.Event.Rule.AlertMessage"]$</AlertMessageId> <AlertParameters> <AlertParameter1>$Data/Params/Param[1]$</AlertParameter1> <AlertParameter2>$Data/EventDescription$</AlertParameter2> </AlertParameters> </WriteAction> </WriteActions> </Rule>

 

The rule is simple.  It simply looks in the Application Event log for an event ID 999, with a event source of “TEST”.  If found, run the write action.  If you scroll down, you can see the write action part, which I will explain:

 

In my rule, I am targeting the workflow to run on the “Server” class.  However, in my write action, I want the alert generated by instances of the “Client” class.  So on my <ManagedEntityTypeId> line, I am using Example.AlertFromAnotherInstance.Client.Class which is my client class ID.

Next, I map the key property for Windows Computer (Principal Name) to the machine I want to appear to generate the alert.  In this case, the name of the affected machine is in Param 1 of my test event, so I am mapping whatever name is in Param1 of the event to generate the alert.

Next, I map the key property of my custom class to the SAME FQDN value.

That’s it!

 

In this example – I create an event on my “Server”, and param 1 of the event will have the name of the client I want the alert to come from:

image

 

Note:  in the above image – the event was logged on a Server named “STORAGE.opsmgr.net” but param1 contained a name of “RD01.opsmgr.net”.

As long as RD01.opsmgr.net hosts an instance of my “Client” class, an alert will be generated as if it came from this server:

 

image

 

 

If you want to test my example XML out in your own environment, simply create some reg keys to be the “Server” and the “Client” instances to be discovered:

HKEY_LOCAL_MACHINE\SOFTWARE\Demo\Server

HKEY_LOCAL_MACHINE\SOFTWARE\Demo\Client

 

The example management pack is available for download at:  https://gallery.technet.microsoft.com/Management-pack-sample-How-8b6741e3

How to remove OMS and Advisor management packs

$
0
0

 

When testing OMS (Previously called Advisor) with SCOM, there is one side effect:  Once connected, the OMS rules import management packs into your management group with no notification or change control process for you.  Furthermore – if you want to remove OMS Management packs from a SCOM management group, there is a rule that will actually re-download them while you are trying to delete them!  This makes OMS very difficult to remove by default.

Brian Wren posted a method to control this behavior here, and I will demonstrate the same.

https://blogs.technet.microsoft.com/msoms/2016/03/16/control-management-pack-updates-between-ms-oms-and-operations-manager/

 

First, create a new management pack to store our temporary overrides – called “OMS Temp Overrides”

Then in the console, go to Authoring > Rules, and set your scope only to “Operations Manager Management Group”

Disable the following two rules:

image

 

This will stop new OMS/Advisor packs from coming down automatically.

 

Now you can start removing the packs as needed from your management group.    You can use PowerShell to do this in bulk, but it will fail for any MP’s with dependencies.  Here is a simple example:

Get-SCOMManagementPack -name “*advisor*” | Remove-SCOMManagementPack

Get-SCOMManagementPack -name “*IntelligencePack*” | Remove-SCOMManagementPack

get-SCOMManagementPack -name “Microsoft.EnterpriseManagement.Mom.Modules.AggregationModuleLibrary” | Remove-SCOMManagementPack

Be VERY careful using the above statements – they are provided as examples only.  Make SURE they return only the ones you wish to remove and not any custom packs you created that happen to match the naming scheme.

Now – that should leave you with just the following MP’s:

 

image

 

Delete your temp Override MP you created, then (quickly) delete the above MP’s in the order above.

That’s it.

 

If you want to bring OMS back into a Management Group – simply import the Advisor Packs in whatever current UR (Update Rollup) you are on, such as these from UR9:

image

Viewing all 141 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>