|
Here is a zOS mainframe-centric book review from Alan Radding. Alan Radding, the author
of DancingDinosaur, is a 20-year IT industry analyst and journalist covering mainframe, midrange, PC, web, and cloud computing.
Feel welcome to check out his website -- http://www.technologywriter.com. z/OS Problem Solving for Private Clouds
May 15, 2011 by dancingdinosaur First fault software problem solving (FFSPS) is an old mainframe approach that
calls for solving problems as soon as they occur. It’s an approach that has gone out of favor except in classic mainframe
data centers, but it may be worth reviving as the IT industry moves toward cloud computing and especially private clouds,
for which the zEnterprise (z196 and zBX) is particularly well suited. The point of Dan Skwire’s book First Fault Software Problem Solving: Guide for Engineers, Managers, and Users, is that FFSPS is an effective approach even today. Troubleshooting after a problem has occurred is time-consuming, more
costly, inefficient, and often unsuccessful. Complicating troubleshooting typically is lack of information. As Skwire notes:
if you have to start troubleshooting after the problem occurs, the odds indicate you will not solve the problem, and along
the way, you consume valuable time, extra hardware and software, and other measurable resources. The FFSPS trick is
to capture problem solving data from the start. This is what mainframe data centers did routinely. Specially, they used trace
tables and included recovery routines. This continues to be the case with modern z/OS today. So why should IT managers
today care about mainframe disciplines like FFSPS? Skwire’s answer: there surely will be greater customer satisfaction
if you solve and repair the customer‘s problem, or if he is empowered to solve and repair his own problem rapidly. Another
reason is risk minimization. As classic mainframe shops have become increasingly heterogeneous, the mainframe disciplines
that kept the mainframe rock solid have not been enforced across the new platforms. Skwire also likes to talk about
System Yuk. You probably have a few System Yuks in your shop. What’s System Yuk? As Skwire explains, System Yuk is very
complex. It makes many decisions, and analyzes much data. However, the only means it has of conveying an error is the single
message to the operator console: SYSTEM HAS DETECTED AN ERROR, which is not particularly helpful. System Yuk has no
trace table or FFSPS tools. To diagnose problems in Yuk you must re-create the environment in your Yuk test-bed, and add instrumentation
(write statements, traces, etc) and various tools to get a decent explanation of problems with Yuk, or setup some second-fault
tool to capture more and better data on the production System Yuk, which is high risk. Toward the end of the book Skwire
gets into what you can do about System Yuk. It amounts to a call for defensive programming. He then introduces a variety of
tools to troubleshoot and fix software problems. These include:ServiceLink by Axeda, AlarmPoint Systems, and LogLogic. Of course, mainframe shops have long relied on management tools from IBM, CA, BMC, and others to enable FFSPS. With
the industry gravitating toward private clouds as a way to efficiently deliver IT as a flexible service, the disciplined methodologies
that continue to keep the mainframe a still critical platform in large enterprises will be worth adopting. FFSPS should be
one in particular to keep in mind. .... Alan posted this book review at URL http://dancingdinosaur.wordpress.com/ .
Here is Alan Radding's evaluation of the book as it relates to univeral software environments,
with a view towards helping problem solving in Private Clouds, one of the many foci of 'First Fault Software Problem Solving'.
Software Problem Solving for Private
Clouds Posted by dancingdinosaur in bottomlineIT on May 15, 2011 First fault software problem solving (FFSPS) is an old mainframe approach that calls for solving problems
as soon as they occur. It’s an approach that has gone out of favor except in classic mainframe data centers, but it
may be worth reviving as the IT industry moves toward cloud computing and especially private clouds, for which the zEnterprise
(z196 and zBX) is particularly well suited. The point of Dan Skwire’s book First Fault Software Problem Solving: Guide for Engineers, Managers, and Users, is that FFSPS is an effective approach even today. Troubleshooting after a problem has occurred is time-consuming, more
costly, inefficient, and often unsuccessful. Complicating troubleshooting typically is lack of information. As Skwire notes:
if you have to start troubleshooting after the problem occurs, the odds indicate you will not solve the problem, and along
the way, you consume valuable time, extra hardware and software, and other measurable resources. The FFSPS trick is
to capture problem solving data from the start. This is what mainframe data centers did routinely. Specially, they used trace
tables and included recovery routines. This continues to be the case with z/OS today. Full disclosure: I’m a fan of
mainframe computers and Power Systems and follow both regularly in my independent mainframe blog, DancingDinosaur. So why should IT managers today care about mainframe disciplines like FFSPS? Skwire’s answer: there surely will
be greater customer satisfaction if you solve and repair the customer‘s problem, or if he is empowered to solve and
repair his own problem rapidly. Another reason is risk minimization. Skwire also likes to talk about System Yuk. You
probably have a few System Yuks in your shop. What’s System Yuk? As Skwire explains, System YUK is very complex. It
makes many decisions, and analyzes much data. However, the only means it has of conveying an error is the single message to
the operator console: SYSTEM HAS DETECTED AN ERROR, which is not particularly helpful. System YUK has no trace table or FFSPS
tools. To diagnose problems in YUK you must re-create the environment in your YUK test-bed, and add instrumentation (write
statements, traces, etc) and various tools to get a decent explanation of problems with YUK, or setup some second-fault tool
to capture more and better data on the production System YUK, which is high risk. Toward the end of the book Skwire
gets into what you can do about System Yuk. It amounts to a call for defensive programming. He then introduces a variety of
tools to troubleshoot and fix software problems. These include: ServiceLink by Axeda, AlarmPoint Systems, LogLogic, IBM Tivoli Performance Analyzer, and CA Technologies’ Wily Introscope. With the industry gravitating toward private clouds as a way to efficiently deliver IT as a flexible service, the
disciplined methodologies that continue to keep the mainframe a still critical platform in large enterprises will be worth
adopting. FFSPS should be one in particular to keep in mind. The article was captured from the post at http://bottomlineit.wordpress.com/
Enter supporting content here
|