Published 6th October, 2020
Failure, Investigation and Learning – the Anatomy of a Loss, a report on an event at the Institution of Mechanical Engineers on 3 December 2019
Our December 2019 event, and unknowingly at the time, our last physical meeting before Covid-19 necessitated the March 2020 shutdown in the United Kingdom, was a well-attended event, held at the London offices of the IMechE and sponsored by the IChemE, and chaired by their President, Stephen Richardson. The event topic aimed to introduce the membership to some of the parties that may be involved in the all-important forensic analysis after a loss, or near miss event.
In the aftermath of a loss, focus turns to getting to the root cause and causal factors associated with the loss. Many parties have interest in this activity, and with often different agendas, ranging from the Operator’s desire to learn and take actions to ensure the event or similar events are not repeated within the company; the regulators’/standards providers’ need to ensure that learnings are made available to a wider audience and as necessary change regulation or practice; the insurers’ need to establish if the loss is covered and the associated quantum; and the lawyers’ need to gather evidence to support criminal or civil actions arising from such a loss.
With such different needs, an industry has developed to support for the needs of all the above parties, to allow detailed forensic analysis to be conducted and to ensure the truth is reasonably established.
The event looked at three different aspects of this process, including:
- Material Failure Investigation & Analysis
- Incident Investigation
- Forensic Accounting
Material Failure Investigation & Analysis
Andrew Piercy, a Principal Engineer within the Failure Investigation and Consultancy Team of Intertek Production & Integrity Assurance spoke on the role of material forensic analysis within the context of failure analysis and incident investigation. Andrew has more than 30 years’ experience of corrosion and metallurgical investigations and testing. His main areas of expertise are corrosion and corrosion-related failure analysis, metallurgy and mechanical failure analysis.
Andrew first put the concept of material failure into perspective, and it does not necessarily mean a complete collapse or loss of containment (although this may well be the case), but can also apply to a loss of operability, functionality or reliability such that continued use is not possible, or is unsafe. It was emphasised that failure analysis is not aimed at concluding a root cause (this comes later in the investigation), but is there to answer two main questions:
- What is the mechanism of the failure (How)?
- What was the cause (immediate) of the failure (Why or What)?
Emphasis was placed on the need for a structured investigation, that broadly follows the process depicted in the following diagram; this diagram also provided the structure for the body of the presentation.
The importance of visual inspection, and ideally site inspection to contextualise was stated, in addition to taking as many photographs as you can. Just as it is at a crime scene, you only get one chance to capture the ‘as is’ evidence.
The presentation then provided an overview of the laboratory techniques that could be used to support the failure analysis investigation.
Once a range of test results are available, these are then reviewed in conjunction with other evidence gathered, to put the results into context with environmental and physical conditions, mode of use, design specifications of the material and whether results tally with owner/user provided information.
It was acknowledged that during the reporting you may conclude that there is no ‘smoking gun’, and that a “Sherlock Holmes” approach may need to be taken: “ it is an old maxim of mine that when you have excluded the impossible, whatever remains, however improbable, must be the truth”.
Often a material failure investigation may conclude the need for further analysis.
It was also stated that failure investigation work can become part of legal proceedings, and will need to be able to stand up to the processes and techniques used in such proceedings.
Roger Stokes of BakerRisk Europe Ltd, presented a high-level review of the key stages of the incident investigation process, and illustrated how the work in the failure investigation stage is used to drive towards a root cause and, most importantly, the lessons that can be learned from the incident.
Roger has close to 40 years of post-graduate experience in the processing industries, from chemical manufacturing, loss adjusting in the insurance context and more recently as part of the Process Safety Group, where his work is currently focused on incident investigations, insurance risk engineering and process safety management. In 2018, he co-authored a number of sections in the latest (3rd) edition of the CCPS book: Guidelines for Investigating Process Safety Incidents.
Roger started with some useful cause definitions, as illustrated in the diagram opposite. He then stressed the need for a cooperative and collaborative approach to investigation that was sensitive to the needs of the various parties involved. It was stressed that Incident Investigation was an essential part of a risk-based process safety driven management system.
The need for systematic evidence collection and cause determination was again given emphasis, and a distinction made between time sensitive and not so time sensitive evidence, which can help to give structure to an early part of an investigation. Scientific Method as given in NFPA 921 was also illustrated, and a range of root cause determination techniques offered depending on the depth of analysis warranted.
The presentation concluded with a case study of a pipe rupture, which helped to illustrate a hypothesis matrix approach as illustrated opposite. From this the investigation was able to systematically and rapidly eliminate scenarios and home in on the most likely scenario to take forward to more in depth root cause analysis and ultimately lessons learned.
Justin then provided a useful business interruption insurance cover ‘101’, emphasising the difference between an accounting gross profit and insurance gross profit (which provides the basis of recovery for many business interruption insurance policies). The importance of contractual commitments, and the impact they may have on business interruption was stressed, and how all behaviour is governed by contract, and that the insurer has to work within the restrictions of such a contract , and that some costs thought to be variable, may actually be fixed (take or pay contracts).
Justin then drew on the many hundreds of claims he has helped adjust, and considered the key lessons learned concerning getting a business back on its feet, particularly how this should be captured as part of disaster recovery plans. He contextualised this with three case studies, which strove to emphasise the importance of recovery plans being in-place and subject to detailed analysis and testing. Further, pre-loss reviews should be undertaken involving internal parties (with an intimate knowledge of operations) and external experts. Loss events should be ‘desk-top’ tested, drawing on actual experience either in the company or within the wider industry. This should highlight both gaps in the plans as presented and any potential gaps in insurance recovery before an actual event arises.