First Fault Problem Resolution Technologies LLC
Book information: First Fault Software Problem Solving: A Guide for Engineers, Managers and Users
Home
Services & Consulting
Biography - CV - Resume
Book information
Contact us
Book review at German computer professional magazine  c't (www.ct.de), published March 29, 2010. A recent translation:


No single large software package comes without errors. It seems that customers simply accept this, patiently waiting and hoping for patches or updates. Skwire sticks up for a more target-aimed approach: one will never get a faultless software, but it would already be a great improvement if flaws were already solved on their first occurence ("first fault") and not only after a long analysis ("second fault").

The advantages are actually obvious. However, a corresponding stringent system architecture, as common on mainframes such as IBM's z/OS, did not become prevalent in the PC market.

Skwire outlines the types of errors and strategies to resolve them in all details. His 40 years of experience, such as at IBM, shimmers through again and again. He puts emphasis on making sure that the reader understands the terminology he is using: "What is a problem in the first place?", "What is a service point?" - in some cases he also explains specific metrics such as the "serviceability rating".

His tool classification includes teaching tips, e.g. regarding the structure of a protocol in case of errors; or for tracking the important information how often an error must occure before a solution has to be approached. His suggestions equally address developers, designers, testers, managers - and the end user. In his last chapter he presents and reviews commercial tools in the first fault and second fault environment.

Skwire addresses a topic which is unfortunately very much neglected, and this alone already makes it worth enough to take a look at his book (***). Short quotations and humorous drawings relax the technical topic. If you are looking for an overview then you will be fine with this book. However, if you are a software developer looking for source code samples then you will search in vain. Skwire has released the book under the print-on-demand process. You will find it on amazon, for example.

(Tobias Engler/fm)

 

Here is a zOS mainframe-centric book review from Alan Radding.  Alan Radding, the author of DancingDinosaur, is a 20-year IT industry analyst and journalist covering mainframe, midrange, PC, web, and cloud computing. Feel welcome to check out his website -- http://www.technologywriter.com.


z/OS Problem Solving for Private Clouds

May 15, 2011 by dancingdinosaur

First fault software problem solving (FFSPS) is an old mainframe approach that calls for solving problems as soon as they occur. It’s an approach that has gone out of favor except in classic mainframe data centers, but it may be worth reviving as the IT industry moves toward cloud computing and especially private clouds, for which the zEnterprise (z196 and zBX) is particularly well suited.

The point of Dan Skwire’s book First Fault Software Problem Solving: Guide for Engineers, Managers, and Users, is that FFSPS is an effective approach even today. Troubleshooting after a problem has occurred is time-consuming, more costly, inefficient, and often unsuccessful. Complicating troubleshooting typically is lack of information. As Skwire notes: if you have to start troubleshooting after the problem occurs, the odds indicate you will not solve the problem, and along the way, you consume valuable time, extra hardware and software, and other measurable resources.

The FFSPS trick is to capture problem solving data from the start. This is what mainframe data centers did routinely. Specially, they used trace tables and included recovery routines. This continues to be the case with modern z/OS today.

So why should IT managers today care about mainframe disciplines like FFSPS? Skwire’s answer: there surely will be greater customer satisfaction if you solve and repair the customer‘s problem, or if he is empowered to solve and repair his own problem rapidly.

Another reason is risk minimization. As classic mainframe shops have become increasingly heterogeneous, the mainframe disciplines that kept the mainframe rock solid have not been enforced across the new platforms.

Skwire also likes to talk about System Yuk. You probably have a few System Yuks in your shop. What’s System Yuk? As Skwire explains, System Yuk is very complex. It makes many decisions, and analyzes much data. However, the only means it has of conveying an error is the single message to the operator console: SYSTEM HAS DETECTED AN ERROR, which is not particularly helpful.

System Yuk has no trace table or FFSPS tools. To diagnose problems in Yuk you must re-create the environment in your Yuk test-bed, and add instrumentation (write statements, traces, etc) and various tools to get a decent explanation of problems with Yuk, or setup some second-fault tool to capture more and better data on the production System Yuk, which is high risk.

Toward the end of the book Skwire gets into what you can do about System Yuk. It amounts to a call for defensive programming. He then introduces a variety of tools to troubleshoot and fix software problems. These include:ServiceLink by Axeda, AlarmPoint Systems, and LogLogic. Of course, mainframe shops have long relied on management tools from IBM, CA, BMC, and others to enable FFSPS.

With the industry gravitating toward private clouds as a way to efficiently deliver IT as a flexible service, the disciplined methodologies that continue to keep the mainframe a still critical platform in large enterprises will be worth adopting. FFSPS should be one in particular to keep in mind.

.... Alan posted this book review at URL http://dancingdinosaur.wordpress.com/ . 

Here is Alan Radding's evaluation of the book as it relates to univeral software environments, with a view towards helping problem solving in Private Clouds, one of the many foci of 'First Fault Software Problem Solving'.

Software Problem Solving for Private Clouds

Posted by dancingdinosaur in bottomlineIT on May 15, 2011

First fault software problem solving (FFSPS) is an old mainframe approach that calls for solving problems as soon as they occur. It’s an approach that has gone out of favor except in classic mainframe data centers, but it may be worth reviving as the IT industry moves toward cloud computing and especially private clouds, for which the zEnterprise (z196 and zBX) is particularly well suited.

The point of Dan Skwire’s book First Fault Software Problem Solving: Guide for Engineers, Managers, and Users, is that FFSPS is an effective approach even today. Troubleshooting after a problem has occurred is time-consuming, more costly, inefficient, and often unsuccessful. Complicating troubleshooting typically is lack of information. As Skwire notes: if you have to start troubleshooting after the problem occurs, the odds indicate you will not solve the problem, and along the way, you consume valuable time, extra hardware and software, and other measurable resources.

The FFSPS trick is to capture problem solving data from the start. This is what mainframe data centers did routinely. Specially, they used trace tables and included recovery routines. This continues to be the case with z/OS today. Full disclosure: I’m a fan of mainframe computers and Power Systems and follow both regularly in my independent mainframe blog, DancingDinosaur.

So why should IT managers today care about mainframe disciplines like FFSPS? Skwire’s answer: there surely will be greater customer satisfaction if you solve and repair the customer‘s problem, or if he is empowered to solve and repair his own problem rapidly. Another reason is risk minimization.

Skwire also likes to talk about System Yuk. You probably have a few System Yuks in your shop. What’s System Yuk? As Skwire explains, System YUK is very complex. It makes many decisions, and analyzes much data. However, the only means it has of conveying an error is the single message to the operator console: SYSTEM HAS DETECTED AN ERROR, which is not particularly helpful. System YUK has no trace table or FFSPS tools. To diagnose problems in YUK you must re-create the environment in your YUK test-bed, and add instrumentation (write statements, traces, etc) and various tools to get a decent explanation of problems with YUK, or setup some second-fault tool to capture more and better data on the production System YUK, which is high risk.

Toward the end of the book Skwire gets into what you can do about System Yuk. It amounts to a call for defensive programming. He then introduces a variety of tools to troubleshoot and fix software problems. These include: ServiceLink by Axeda, AlarmPoint Systems, LogLogic, IBM Tivoli Performance Analyzer, and CA Technologies’ Wily Introscope.

With the industry gravitating toward private clouds as a way to efficiently deliver IT as a flexible service, the disciplined methodologies that continue to keep the mainframe a still critical platform in large enterprises will be worth adopting. FFSPS should be one in particular to keep in mind.

 
The article was captured from the post at http://bottomlineit.wordpress.com/

Enter supporting content here