Home » Currently Reading:

AKC11: Failure History (failure patterns)

Description:

A record of the recent application failures that have occurred. Assists with correlating current failures and those that have occurred in the last 6 to 12 months. Helps the support analyst in analyzing failure history to determine patterns or root cause analysis of an existing failure.

Provides understanding of components that may fail again (based on previous experience) and helps to fix problem quickly by reviewing previous solution(s)

Provides information such as:

  • Date and Time the failure occurred
  • Job or Online Screen that failed
  • Job Step where the failure occurred
  • Component that failed
  • Failure Code or Error Message if there is no Failure Code (for example, JCL ERR)
  • Cause of the failure and a synopsis of the Problem Solving activities that took place to resolve it

How is it used?

Helps the support analyst in analyzing failure history to determine patterns or root cause analysis of an existing failure.

Why is it important?

(1) Provides a list of recent  ”Production Defects” which enables the support analyst to look for trends or help diagnos a current production incident by identifying if a similar incident has occurred in the recent past;

(2) Provides understanding of components that may fail again (based on previous experience) and helps to fix problem quickly by reviewing previous solution(s);
(3) Provides information such as:

  • Date and Time the failure occurred
  • Job or Online Screen that failed
  • Job Step where the failure occurred
  • Component that failed
  • Failure Code or Error Message if there is no Failure Code (for example, JCL ERR)
  • Cause of the failure and a synopsis of the Problem Solving activities that took place to resolve it

Our Sponsors

IT Support Services Comments

  • Bob Anderson: Daniel, from a certain point of view you are correct. CMMI- DEV deals primarily with software development best practices, the old CMM Level-5 dealt a great deal with defects. However, as you know the ...
  • Bob Anderson: Gunter, there are many possible SLA components and metrics that can be defined for any application software support. First I would recommend that you read this article which I had published in Compute...
  • Bob Anderson: Amiet, I would put it under the "Incident" process and track dates, number of occurrences, how much lost time, cause (who did it). You will need data for management if the practice has to stop. If you...
  • Bob Anderson: Amit, first of all why is the customer powering down the equipment? This should be brought to the attention of management and a very strong note sent to whoever is doing this.  If they are doing it on...
  • Bob Anderson: Mark, it is doubtful that you can fix the problem, it is mainly a management issue. The best you can do is to gather statistics on the backlog of enhancements, the number and severity of incidents, an...

ITIL V3 Application Support Q & A

If you have any question on the blog content or have some specific question on how ITSM & ITIL can dramatically improve performance and reduce the cost of your Application Support service "Ask Bob"
Question :
Answer :
Gunter, there are many possible SLA components and metrics that can be defined for any application software support. First I would recommend that you read this article which I had published in Computer World on "How to create Meaningful IT Support SLA's"  use this link...
Question :
Answer :
Daniel, from a certain point of view you are correct. CMMI- DEV deals primarily with software development best practices, the old CMM Level-5 dealt a great deal with defects. However, as you know the folks who developed the original CMM  were not really initially inter...
Question :
Answer :
Amiet, I would put it under the "Incident" process and track dates, number of occurrences, how much lost time, cause (who did it). You will need data for management if the practice has to stop. If you want to be "proactive" in stopping this practice" you must capture bu...
Question :
Answer :
Mark, it is doubtful that you can fix the problem, it is mainly a management issue. The best you can do is to gather statistics on the backlog of enhancements, the number and severity of incidents, and how many technical support calls from users you get and the average...
Question :
Answer :
Amit, first of all why is the customer powering down the equipment? This should be brought to the attention of management and a very strong note sent to whoever is doing this.  If they are doing it on their own without any instruction to do so and it affects other user...