Monday, November 29, 2010

SCOM R2 MP updated

There is a new version of Operations Manager management pack available since last week.
Find the version 6.1.7695.0 here:
http://pinpoint.microsoft.com/en-us/applications/operations-manager-2007-r2-management-pack-12884901986
Here you’ll find the updates grabbed from the MP guide:
Version 6.1.7695.0 of the Operations Manager Management Pack for Operations Manager 2007 R2 includes the following changes:
  • Added the “Agents by Health State” report which will list all agents, management servers, gateway servers and the root management server grouped by their current health state (i.e. unavailable, error, warning or success). For more information, see Appendix: Reports.
  • Added the “An alert subscription has been automatically disabled due to invalid configuration” rule to generate an alert when an alert subscription is disabled due to invalid configuration, such as when the account that created the subscription being deleted.
  • Added the “WMI Service Availability” aggregate monitor and the “Windows Management Instrumentation Service” unit monitor to monitor the state of the Windows Management Instrumentation (WMI) service (winmgmt) on agents. By default, the unit monitor samples the WMI service every 125 seconds and generates an alert when the WMI service is not running for 3 consecutive samples. These settings can be changed by using overrides.
  • Added rules to that can be enabled in place of monitors that require manual reset of the health state. For more information, see Manual Reset Monitors.
  • Updated product knowledge for some workflows.
  • Changed the "Computer Verification: Verification Error" event collection rule to be disabled by default. The alert from this rule would only be generated when running the discovery wizard, when the user would directly observe that one or more computer verifications failed. The alert is an unnecessary duplication.
  • Change the “Collect Configuration Parse Error Events” rule to be disabled by default.
  • Changed the parameter used for alert suppression for the following rules:
  • Alert generation was temporarily suspended due to too many alerts
  • Workflow Runtime: Failed to access a Windows event log
  • Workflow Initialization: Failed to initialize access to an event log
  • An error occurred during computer verification from the discovery wizard
  • A generic error occurred during computer verification from the discovery wizard
  • Removed alerting from the “Data Integrity” aggregate monitor and added alerting to its unit monitors:
  • Repeated Event Raised
  • Spoofed Data Check
  • Root Connector Data Validity Check
    The Operational Data Reporting Management Pack has not changed in this release. The version number of the Operational Data Reporting Management Pack was updated to keep the versions the same across all of the management packs in this package.
All information is provided "as is" without any warranty! Try in lab before. Handle with care in production.
Read Full Post...

Sunday, November 28, 2010

Why monitoring is necessary?

This week a colleague of mine asked me to be a co-presenter for a Microsoft Event on Lync 2010 (Office Communication Server 14, find the event here) where he wants me to talk about the SCOM implementation for Lync. So I thought about what may be a good entry to loosening up the audience.
 
I asked myself (once again):
Why are you doing system management?
What are the benefits of monitoring?
What is the business value of being proactive?
And how do you measure ROI?
 
From time to time most of us go for a preventive medical health1checkup (even those who life active healthy lifestyle).

We do that to know the health state of our own body. To know how we can prevent illnesses like hypertension, circulatory disturbance, blood glucose ailment, ... .
I compare that with technical proactive monitoring because things can go wrong without anybody being aware of it.
 
We also have other kinds of health check-up for more serious conditions like cancer examination, heart insufficiency, osteoporosis, and so on.
In the above case the system (yes, your bodies’ system too)  is in an unhealthy state but all services are working as expected. To prevents an unwanted break you have to know that there is a problem as soon as possible to repair the issue with less impact and subsequent damage.
 
By being proactive if we do have acute health problems we can go to the doctor or even to the hospital to determine the reason and to have the correct medical treatment immediately.
 
My opinion is, that servers and application should do that too to give us the possibility for the correct diagnostic, analyzes and recovery to minimalize the downtime.
 
Does this make sense to you?
I guess, because it is necessary to know your bodies’ health state. And I think it's also necessary to know the health state of your datacenter environment - at any time!
 
This is from the technical perspective.
On the other hand there is always the business perspective. Unfortunately it is not that easy to determine the ROI for that kind of software.
 
How should you declare the costs saved for service downtime that never happens? Or to declare the costs saved because of much more faster response and service recovery in case of an issue?
 
What will you consider in your ROI calculation? Do you observe file system thresholds too (because no more space available = no more service available)? Or do you only observe real service downtime? Is this the whole truth?
 
What about performance issues? Do you consider the saved costs health2because users can work faster (or even smoother) after you start up an additional server in your farm/cloud?
And on the other hand what about the costs saved for power, cooling, lower MTBF because shutting down a server when the workload in your farm/cloud is decreasing?
 
You see, that’s partially an absolutely philosophical proposal concerning calculating the costs.
 
But hopefully you keep in mind, that it is essential to do proactive monitoring. So: call the doctor you trust to get an appointment for your medical health check. And call the consultant you trust to implement useful (!) monitoring.
 
All information is provided "as is" without any warranty! Try in lab before. Handle with care in production.
Read Full Post...