Tuesday, December 21, 2010

Quest QMX and SDK running with Local System

When you install QMX on an additional server (as suggested) and running the SCOM SDK service (System Center Data Access) with local system you should add the server account you use for QMX to the SCOM administrator group instead of an user account as described in the guide. Therefore use domain\server$ and if you like to verify that by Check Names make sure that you have the Computers objects selected.

All information is provided "as is" without any warranty! Try in lab before. Handle with care in production.

Read Full Post...

Friday, December 17, 2010

How to use Published Data where no subscription is possible.

During an Opalis project I hat a little problem when I wanted to work with published data within foundation objects without any additional IPs.

Of course this particular issue could be solved another way but the solution should work for other objects too.

My scenario: I have a workflow that grabs some lines from a CSV file. I wanted to add additional information to the line but because of project description I had to use foundation objects and not the Codeplex Data Manipulation IP (you find it here).

So imagine like that:

origin text World,Universe
wanted text Hello World,Universe

To do that I use the Read Line object to get the data out of the CSV file (you can use any other that makes sense to you – it doesn’t matter) and the Map Published Data object to manipulate the lines.

In the properties of the Map Published Data object I added a data mapping. Unfortunately I can’t subscribe to the published data in the Map to field by right-click, Subscribe,… . So I solved it that way:

I used the following values:

Output Published Data myWantedText *
Source Data {Line text from “Read Line”} **
Pattern World* ***
Map to Hello \`d.T.~Ed/{A610DBE8-A278-45FF-96CB-3C35191332DE}.LineText\`d.T.~Ed/ ****

To explain that shortly:

* any description you like to use later in the data bus
** the source data from the normal data bus subscription
*** the pattern you are searching for
**** the additional text including my (!) copy&pasted source data field

So you have to simple copy your subscription from the source data and paste it into the map to field so that it looks like the following screenshot:

image

I didn’t double-check it at the moment but I can imagine that this solution works for all other objects too. Anyway, you can try it out very easily by yourself…

All information is provided "as is" without any warranty! Try in lab before. Handle with care in production.

Read Full Post...

Tuesday, December 14, 2010

How to create a refreshing web page from an Opalis workflow

Of course you know the Create Web Page object to create a website through a workflow.
What might be really cool is to create a web page using a template that refreshes automatically. So you can create your own dashboard views to provide information on a central dashboard screen in a office.
At first, create a file called template.html with the following content:
<HTML>
<HEAD>
<TITLE><DOC-TITLE></TITLE>
<META HTTP-EQUIV="Refresh" CONTENT="5">
</HEAD>
<BODY>
<BR>
<HR>
<H1>This is my Opalis dashboard web page</H1>
<H1><DOC-TEXT></H1>
<HR>
<BR>
Your browser should automatically refresh every five seconds.
<P>
</BODY>

Important are the tags <DOC-TITLE> and <DOC-TEXT>. These tags will be replaced in your workflow.

In your workflow you add a Write Web Page object and type path and file name into the Template field. Now the tags above will be replaced by the Content Title and Text of this object.
If you are getting more than one line from an object before, you can flatten the data publishing object.

Hint 1: to get separate lines in the web page for every single entry simple use <BR> as separator when flattening.

Hint 2: if you like to use different styles in the body, you can add the HTML tags to the Text field as you know it from normal HTML developing.

All information is provided "as is" without any warranty! Try in lab before. Handle with care in production.
Read Full Post...

Monday, December 13, 2010

Now it really rooocks!

Now you reach my blog by the address http://www.systemcenterrocks.com/

All information is provided "as is" without any warranty! Try in lab before. Handle with care in production.
Read Full Post...

Opalis UI generator to trigger workflows

I just want to post a link to another blog with a very useful tool which can be used to create UIs to trigger workflows an user friendly way.

Check it out:
http://blogs.technet.com/b/yasc/archive/2010/11/06/need-to-trigger-opalis-policies-remotely-in-a-custom-and-user-friendly-way-here-is-the-opalis-ui-generator.aspx

All information is provided "as is" without any warranty! Try in lab before. Handle with care in production.
Read Full Post...

Tuesday, December 7, 2010

This time I will DO IT!

Humans (= admins) are lazy. Sorry, I do not want to affront anyone! I will come to that point later, so don’t inveigh me at this time!

But to be honest: did you ever said: “next time… next time I will make it better”? I guess so.

Did you ever said: “we will do everything to be sure to not get this failure again”? Of course you did.

And did it ever happened that nothing (or at least not enough) happened? Probably yes (be honest, nobody will find it out (and I will not tell it anyone Winking smile )).

Believe me: in System Management it’s also the same. You can’t build a 100% monitoring solution from scratch. But you can (or somebody can do this for you) implement a best practice solution. However, this solution will also not cover 100%.

But you can learn from every unexpected situation that decreases your service availability. If that happens it would be good to adapt your monitoring to alert as soon as possible before a crash happens again – or at least as soon as possible after a crash happened again. With the best possible information about that problem. In the worst case this affects the user, yes. But you can immediately begin with the right recovery and your helpdesk can tell your users what the problem is and that you are already working on a solution (and that’s also very (!) important).

From my experience (in- and outside perspective of datacenters) the motivation to extend monitoring for an specific service/problem can be visualized as in the chart below.

Normally, immediately after an unwanted service downtime happens, the motivation is nearly at 100% to do everything to beware of that issue in the future. Take advantage of this timeframe to extend your monitoring!

image

Legend:

vertical axis % motivation and % monitoring integrated
horizontal axis time before/after crash (take any unit)
blue line motivation to improve monitoring
red line level of monitoring implemented already
orange area when the crash happens
green area time of highest motivation to become better

Nobody (!) can monitor all services for all failures that can ever happen. But it is mandatory to learn from every problem that occurs and most of the time there is not that much time to improve your monitoring configuration. Because everybody has a lot of workload and the motivation to create monitoring for this particular problem increases.

The good thing to know is, that it must not depend on lazy or not system management admins. In real world it is very often that exactly these guys are not in the review process and the involved admins or operators do not care about monitoring (I will write an own blog for that because this is a very common problem).

Read this blog for some other thoughts: Why monitoring is necessary?

Credits to my friend Alexander Edelmann („Das Regenschirm Prinzip“ ISBN-13: 978-3639098624) for inspiring me.

All information is provided "as is" without any warranty! Try in lab before. Handle with care in production.

Read Full Post...

Wednesday, December 1, 2010

Opalis: Cannot connect to the Management Server

After you have installed Windows and SQL you installed Management Server (step 1) using a new domain account and configured the datastore (step 2) but when trying to import the license key (step 3) you are receiving an error, that you can’t connect to the management server service:

image

Investigating the log under %ProgramFiles%\Opalis Software\Opalis Integration Server\Management Service\Logs and analyzing the OpalisManagementService.* latest log file you find an error like “Cannot open DB connection”.

In that case mostly your OpalisManagementService account is not configured to connect to SQL. So add the account as a new login and configure it to the Opalis- DB.

(yes, you caught me: I created the screenshot above after I’ve finished the setup Winking smile (as you can see that all 4 steps finished successfully))

Update: see also the information here:

http://technet.microsoft.com/en-us/library/gg440635.aspx

All information is provided "as is" without any warranty! Try in lab before. Handle with care in production.

Read Full Post...

Monday, November 29, 2010

SCOM R2 MP updated

There is a new version of Operations Manager management pack available since last week.
Find the version 6.1.7695.0 here:
http://pinpoint.microsoft.com/en-us/applications/operations-manager-2007-r2-management-pack-12884901986
Here you’ll find the updates grabbed from the MP guide:
Version 6.1.7695.0 of the Operations Manager Management Pack for Operations Manager 2007 R2 includes the following changes:
  • Added the “Agents by Health State” report which will list all agents, management servers, gateway servers and the root management server grouped by their current health state (i.e. unavailable, error, warning or success). For more information, see Appendix: Reports.
  • Added the “An alert subscription has been automatically disabled due to invalid configuration” rule to generate an alert when an alert subscription is disabled due to invalid configuration, such as when the account that created the subscription being deleted.
  • Added the “WMI Service Availability” aggregate monitor and the “Windows Management Instrumentation Service” unit monitor to monitor the state of the Windows Management Instrumentation (WMI) service (winmgmt) on agents. By default, the unit monitor samples the WMI service every 125 seconds and generates an alert when the WMI service is not running for 3 consecutive samples. These settings can be changed by using overrides.
  • Added rules to that can be enabled in place of monitors that require manual reset of the health state. For more information, see Manual Reset Monitors.
  • Updated product knowledge for some workflows.
  • Changed the "Computer Verification: Verification Error" event collection rule to be disabled by default. The alert from this rule would only be generated when running the discovery wizard, when the user would directly observe that one or more computer verifications failed. The alert is an unnecessary duplication.
  • Change the “Collect Configuration Parse Error Events” rule to be disabled by default.
  • Changed the parameter used for alert suppression for the following rules:
  • Alert generation was temporarily suspended due to too many alerts
  • Workflow Runtime: Failed to access a Windows event log
  • Workflow Initialization: Failed to initialize access to an event log
  • An error occurred during computer verification from the discovery wizard
  • A generic error occurred during computer verification from the discovery wizard
  • Removed alerting from the “Data Integrity” aggregate monitor and added alerting to its unit monitors:
  • Repeated Event Raised
  • Spoofed Data Check
  • Root Connector Data Validity Check
    The Operational Data Reporting Management Pack has not changed in this release. The version number of the Operational Data Reporting Management Pack was updated to keep the versions the same across all of the management packs in this package.
All information is provided "as is" without any warranty! Try in lab before. Handle with care in production.
Read Full Post...

Sunday, November 28, 2010

Why monitoring is necessary?

This week a colleague of mine asked me to be a co-presenter for a Microsoft Event on Lync 2010 (Office Communication Server 14, find the event here) where he wants me to talk about the SCOM implementation for Lync. So I thought about what may be a good entry to loosening up the audience.
 
I asked myself (once again):
Why are you doing system management?
What are the benefits of monitoring?
What is the business value of being proactive?
And how do you measure ROI?
 
From time to time most of us go for a preventive medical health1checkup (even those who life active healthy lifestyle).

We do that to know the health state of our own body. To know how we can prevent illnesses like hypertension, circulatory disturbance, blood glucose ailment, ... .
I compare that with technical proactive monitoring because things can go wrong without anybody being aware of it.
 
We also have other kinds of health check-up for more serious conditions like cancer examination, heart insufficiency, osteoporosis, and so on.
In the above case the system (yes, your bodies’ system too)  is in an unhealthy state but all services are working as expected. To prevents an unwanted break you have to know that there is a problem as soon as possible to repair the issue with less impact and subsequent damage.
 
By being proactive if we do have acute health problems we can go to the doctor or even to the hospital to determine the reason and to have the correct medical treatment immediately.
 
My opinion is, that servers and application should do that too to give us the possibility for the correct diagnostic, analyzes and recovery to minimalize the downtime.
 
Does this make sense to you?
I guess, because it is necessary to know your bodies’ health state. And I think it's also necessary to know the health state of your datacenter environment - at any time!
 
This is from the technical perspective.
On the other hand there is always the business perspective. Unfortunately it is not that easy to determine the ROI for that kind of software.
 
How should you declare the costs saved for service downtime that never happens? Or to declare the costs saved because of much more faster response and service recovery in case of an issue?
 
What will you consider in your ROI calculation? Do you observe file system thresholds too (because no more space available = no more service available)? Or do you only observe real service downtime? Is this the whole truth?
 
What about performance issues? Do you consider the saved costs health2because users can work faster (or even smoother) after you start up an additional server in your farm/cloud?
And on the other hand what about the costs saved for power, cooling, lower MTBF because shutting down a server when the workload in your farm/cloud is decreasing?
 
You see, that’s partially an absolutely philosophical proposal concerning calculating the costs.
 
But hopefully you keep in mind, that it is essential to do proactive monitoring. So: call the doctor you trust to get an appointment for your medical health check. And call the consultant you trust to implement useful (!) monitoring.
 
All information is provided "as is" without any warranty! Try in lab before. Handle with care in production.
Read Full Post...