by Sean McLinden
The following, alleged, accounting of the HBGary incident, while tinged with mildly satirical comments is, nonetheless, one of the most thought-provoking, if not accurate, descriptions of the surrounding events. It can be found here:
and it got me thinking.
Many years ago, I had a horrific experience at Chicago O’Hare International Airport when, while looking out the window of the concourse, I witnessed American Airlines DC-10 Flight 191 take off, roll to the left, and plunge to the ground along with 258 passengers and 13 crew. There is something unnerving about air crashes though they kill far far less than automobile accidents and, I suspect, part of the reason why is that they often lack a reasonable explanation when they happen.
But one difference between airplane accidents and digital incident investigations is that the former is a public process and a process by which we learn what failed, and why. And American Airlines Flight 191, like the account described in the link above, is a perfect example of what is to be learned through a public process.
I don’t have the space to describe all of the events which culminated in the destruction of Flight 191 but I will say a few things relevant to the issue that I raise.
First, and most disturbing, was the fact that reconstruction of the events of Flight 191 concluded that the damage to the aircraft was survivable. What caused the aircraft to fail was the physical loss of an engine and, more importantly, the loss of the engine pylon which should not have failed, but did, because an aircraft maintenance worker failed to follow established procedures for engine maintenance. But the crew and passengers didn’t survive because the pilots were unaware of the exact cause of the engine failure and did the opposite of what they needed to do to save the aircraft.
They didn’t recognize what was wrong because with the loss of the pylon there occured damage to critical hydraulics and the electronic systems which would have detected the hydraulic failure. And because of the sweep of the wing, they could not see for themselves what had happened. Perhaps most importantly, because the damage caused by the pylon loss had never been anticipated, pilots had never been trained to consider it. In the 20 seconds that they had to make the right decision, they decided based upon training and experience.
Complex systems are a result of “multiagent planning” in which various parts of the design are handled by domain specific experts. Often, these experts fail to communicate the limitations of their understanding and the design assumptions which should not be taken as fact. The DC-10 engineers created an instruction manual which stated how the DC-10 engines should be removed for maintenance, but they failed to communicate how a variance to that process could damage the airworthiness of the aircraft. The pilots were never trained to anticipate such a failure because it was assumed that they would never need to face it.
There was a failure in the process by which the aircraft design was approved as well. At that time, aircraft were required to be capable of flying and landing after “any combination of failures not shown to be extremely improbable”. The combination of failures which occured surrounding the failed engine and pylon were considered mathematically highly improbable, but that was based upon the assumption that normal maintenance procedures were being followed.
Not considered were the many ways in which human failures might compromise the design. But what was important in the case of Flight 191 was that the openness of the investigation led to a better understanding of this, and new design and training procedures designed to address these failures which prevented further incidents of this type.
If the above account of events surrounding the HBGary incident are even close to being factual then what is disturbing, to me, is that this sequence of events is all too common. In my own practice I could have changed the names of the victims and perpetrators and left everything else the same and the same script would have applied to other clients as well.
Perhaps it is time for more disclosure regarding incidents involving digital data. Perhaps, it is time for a public process in which the response is not only to determine the cause of the problem, but also to insure that the failures that created it cannot happen, again, to anyone.
Click here to discuss this article.
Sean McLinden, MD, is the President and CEO of Outcome Technology Associates, Inc. (OTA), a provider of digital forensics, incident response., eDiscovery and litigation support services to clients in the US and abroad. Trained as a neurologist, McLinden applies the same methodologies he uses as a diagnostician to problems in digital forensics which includes the use of a probabilistic approach in determining the strategy by which to conduct an investigation. McLinden lives with his wife, also a forensic investigator, and son in a sleepy little Ohio River community near Pittsburgh, PA where, when he is not dabbling in forensics, he relaxes with his family on an vintage (1928) sternwheel paddleboat.