So far at this symposium we’ve heard about some exceptional technical developments that are underway. Now, I’m going to talk about a less-discussed, but critical, component of technology development and operations – the human component. Specifically, I’ll share with you the results of a study we undertook at SSP, and how the findings apply to you and your organizations.
A Challenge of Complex Technology
First, some context. As the Director of SSP, it is my responsibility to maintain the safety, security, and effectiveness of the Trident II Strategic Weapon System (SWS). As you know, the D5 SWS is a highly complex weapon system. We characterize it as a public risk technology because though the likelihood of a disaster with the SWS is small, the consequences are enormous, and they could well affect the public domain. Clearly we must do all we can to prevent such an event.
SSP has been successful at providing the benefits of this technology in a safe and secure manner. However, history is replete with examples of organizations that thought themselves successful yet still suffered catastrophic failures. Think NASA before Challenger or the Deep water Horizon, which had been recognized with an industry safety award just before its spectacular failure.
I find we do well at SSP with the familiar. New endeavors are more challenging, and we sometimes find ourselves wondering, “How could they do that?”
The nuclear disaster at Fukushima was a wake-up call for us, delivered in the form of the Executive Summary of a special commission established to investigate the causes of the disaster.
The message applies broadly; our submarine community, all of us, can never permit ourselves to fall into the traps that drove the outcomes at Fukushima.
In Japan, before Fukushima, the national narrative was that nuclear power was 100% safe. Clearly earthquakes and tsunamis are out of human control. The reactor disaster on the other hand, was an entirely different story.
The event was part of the series of stimuli that have driven change in Japan. At the same time, the public sense of betrayal by the government and the nuclear power industry led to a complete shutdown at odds with the energy demands of a dynamic recovery. We all know that Japan’s desperation for energy resources was a principal contributor to its decision to attack Pearl Harbor in 1941; little had changed to solve that problem other than the advent of nuclear power.
Following the disaster, the Japanese Diet, their legislative body, launched an unprecedented investigation to find the root causes of the failure. The report is replete with technical explanation, but the Chairman, in his one page summary took another step in root cause analysis. He pointed the finger inward, at the culture of the responsible organizations and of Japan itself.
“The fundamental causes are to be found in the in-grained conventions of Japanese culture: our reflexive obedience; our reluctance to question authority; our devotion to ‘sticking with the program’; our groupism; and our insularity”
Kiyoshi Kurokawa, Chairman
When I read those words, I was taken with the succinct relation of culture to catastrophe, and, more troubling, I recognized some of the Chairman’s issues as potential problems in my organization. It was this incident that led to our study at SSP. We wanted to know what human weaknesses have spooled together in ways that have led to high consequence events, so that we might help our people develop the strengths that should act to prevent such outcomes.
Strategic Weapons System Elements
As I said before, the SWS is a complex, public risk technology that also delivers a uniquely important national security capability. We must maintain the trust of the public so we can continue to benefit from this technology.
But our system is more than the hardware and software represented by the eight sub-systems of the SWS. The people who engage the system every day are just as much a part of the system operation, and I believe that Fukushima tells us that the strong technical elements of a system can be rendered useless if the human elements of that system are not equally strong.
The reactor disaster at Fukushima was not delivered by the operators who engaged the system after the disruption of the tsunami. They were set up by the woeful shortfalls in design and sustainment of the plants through their lifecycle. The technology developers failed them, leading directly to the operational outcomes witnessed by the world.
So it is for all of us. The outcomes that we attain in our systems are directly a function of the way we engage our systems across their lifecycle. Safety and security of the SWS, my top priority, are clearly one of the outcomes driven by how we engage.
Most, if not all of you, work on systems with similar complexities and histories of success. And while the technical capabilities of your efforts are key, the human element is equally important to your successes.
So what do we do? A safety engineer named Heinrich wrote about causality of industrial accidents in the early 20th century. His views and findings bear on our challenge.
Heinrich described a triangle relating data he analyzed from industry. At its pinnacle were major accidents (a fatality), beneath those minor accidents (an injury) and then near-miss events in which no injury occurred. We can add another type above major accidents, and that would be Apex Events like Fukushima – incidents so massive that they eclipse what any organization would call a major accident.
Heinrich found, from the data that he had available, that the types of events occurred in a rough proportion (1:29:300) that seemed to hold up over time. He acknowledged that the ratio was likely to be industry dependent, but expected that similar proportionality would be found. He also found that most of the events were caused by human error, and that the sorts of errors which occurred were common across the categories of events.
He suggested that organizations could reduce the frequency (or likelihood) of events by paying close attention to the human errors leading to near misses, taking action to learn from those errors and strengthen the work force.
Success in strengthening the human element would have the natural result of suppressing all types of events and acting to move an organization away from an APEX event.
It is this idea that we wanted to understand at SSP: what does it mean to have a strong human element, whose culture and habits of operation act to move us away from the Apex?
Habits and Technology
This is where the rubber meets the road in organizational culture. It starts with the TRIAD of responsibility, authority and accountability, which when clearly communicated to our people, empowers them to do the jobs that we want from them.
Foundational values like Respect and Commitment are key to proper organizational function. As stewards of complex or innovative systems, we must have both strong technical skills married with strong human habits (human element strengths). We have to deploy our technical and human element strengths habitually to succeed in our complex system operations. These two concepts are the front-end of an organization’s culture, the parts that we actively use day-to-day. I describe all this because every organization has a culture. The difference is whether or not that culture is deliberately designed, or allowed to form without any control.
The latter is where problems come about. An organization without an actively developed culture can atrophy, both in technical and human elements, which can lead to a high consequence or apex event.
And that’s how this relates to everyone in this room. The habits with which you engage your systems and technology development can echo for years.
High Consequence Events
Typically when we talk about high consequence events, there is significant discourse regarding different actions that operators could have taken to prevent a disaster. But many events have their roots in actions or decisions taken years or decades before the event.
Here are the 12 events we examined to understand what human weaknesses lead to high consequence events. These are incidents like Fukushima, the explosion of Deep water Horizon in the Gulf of Mexico, and the crash of a B-2 bomber in Guam. (NOTE: there are two NASA events, and two AF nuclear enterprise events)
While there were weaknesses displayed by operators in many of these events, the design and operational decisions by developers, technical authorities, and operational managers were central in most of them.
As designers and developers, you are part of that headquarters and management influence. It is critical you understand how your actions can potentially affect not just the operators of your systems, but also the public at large.
Characterizing Human Element Weaknesses
When we studied the 12 events on the preceding graphic, we found 22 human weaknesses that operators, engineers, and management displayed that led to the various outcomes.
No single event suffered from all of the weaknesses, but each event was founded in multiple weaknesses. We categorized the weaknesses as follows:
– The Looking Up Weaknesses are habits which result in failures of subordinates to engage their leadership in ways that would help the organization succeed.
– Looking Down Weaknesses relate habits of leaders as they engage subordinates that set a tone of operation which is counter to organizational success.
– Looking Across weaknesses are habits of team engagement which are not supportive of system or organizational effectiveness
– And finally, Looking Within Weaknesses are failures of personal ethics or integrity that can feed system failure.
By learning what weaknesses fed high consequence events, we can understand the corresponding strengths for engaging our system responsibilities that give us the best opportunity for system success, be it safely drilling for oil or reliably providing the sea-based strategic deterrent.
We are using these ideas today at SSP to help us build on our record of success. History tells us that we cannot count on past success to be a promise for future results.
Here is the full listing of what we characterized as human element strengths. We’ve labeled these strengths in such a way that their definitions are usually self-evident.
You’ll see here the Looking Up and Looking Down weaknesses, those that relate to how supervisors and subordinates interact with each other and the broader system.
And the remaining two categories, Looking Across that relate to team interactions, and Looking Within strengths that concern personal integrity.
We use the weaknesses to help us identify the true root causes of problems, and thereby administer the best corrective action based on the corresponding strength.
You’ll also see a new arrow, corresponding to how to turn these concepts into action. Implementation depends upon personnel empowerment and leadership to make these ideas something people use every day.
Relevant Weaknesses for Developers
Now I’m going to touch on three weaknesses from the 22 we found that are especially applicable to each of you. What’s more, I’m going to describe them in the context of how they contributed to one of the 12 failures we studied.
The first weakness is that of a Culture of Production. Many of you may consider production the ultimate goal of your organization, and rightly so. As a developer, you want to produce something on the other end.
– However, this becomes a problem when production is more important than risk evaluation as happened during the Deepwater Horizon incident. There, management placed a priority on capping the well quickly and efficiently so the rig could be moved to the next job. Their emphasis was not on safety and effectiveness and resulted in poor risk decisions regarding completion quality.
– The aggregate result of a series of such decisions was failure of the well capping job, resulting in an explosion that destroyed the rig, killed 11 crew aboard, and caused the worst maritime oil spill in history.
– This weakness of a Culture of Production must be countered by the strength of a Culture of Risk Evaluation. This can be characterized as a keystone habit. A Culture of Risk Evaluation allows an organization to identify where their weaknesses are and take action to fix them.
The second weakness is Sticking to Past Program Decisions. This occurs when an organization allows previous assumptions or decisions to control how it does business.
– For instance, when the Fukushima power plants were built in 1967, theories of plate tectonics were virtually unknown. The safety case was predicated on the extant understanding of seismic vulnerability, later understood to be demonstrably wrong
– As years progressed and the risks of earthquakes and tsunamis became better understood, an engineering assessment revealed that the design basis of the plants’ safety systems was inadequate for the tsunami potential. We now know that the risk of such an enormous tsunami, certain to disable the installed emergency power supplies, was about 1 in 20 over the anticipated life of the Fukushima plants.
– The utility and regulator knew these risks, and yet neither took strong action to change the plant’s design or alert the public to the increased risk.
– To counter this weakness, organizations must consider carefully when there is a need to review past assumptions and decisions. Such reviews should happen when you realize you’re importing previous assumptions into a new system design, when new relevant information is learned, or when using existing systems in a new way. And certainly decision review must happen when a near-miss or accident occurs that shows your assumptions may not be valid any longer.
– As designers, the requirements and assumptions from which you begin must be examined with intense scrutiny. In the case of Fukushima, the decisions of the late 1960s were not challenged for over 40 years, and the consequences were devastating.
The last weakness I’ll talk about today is what we call Not My Problem. This is where an individual or a team defines their responsibility as a narrow portion of the overall system, ignoring problems with other areas at the detriment of system success.
– This weakness was evident in the crash of a B-2 Spirit in Guam. Normally hangered in environmentally controlled facilities, the B-2 began conducting deployments to tropical Guam in 2006.
– Early in the deployment an aircraft failed a pre-flight check. The maintenance crew consulted with a support contractor to develop a technically sound work-around to this failure. But the crew did not inform supervisors and did not push this knowledge to the rest of the B-2 maintenance fleet.
– In 2008, a new maintenance crew ran into the same problem. They did not know about the work-around and fol-lowed the pre-flight check procedure as written.
– The result was that, when the B-2 took off, its pilots and on-board computer saw the wrong air speed. The aircraft stalled just after take-off, and while the air crew success-fully ejected, the B-2 crashed. Luckily, the crash did not kill anyone, however the $1.4 billion aircraft was destroyed.
– What the original maintenance crews needed to display was a habit of Broad System Ownership. Had they considered the ramifications of the technical problem across the entire fleet of B-2s, the manufacturers could have made the work-around into a permanent procedure.
As developers, it is essential you consider yourselves not only responsible for the areas you control, but also for the entire success of the system.
Risk Ignorant/Cavalier Organizations
The purpose of understanding these habits is so we can strengthen our human element and ensure we continue reaping the benefit of technologies like the strategic weapon system, and systems like it that many of you are developing.
When we talk about managing risk, most people intuitively understand the hazards of operating without due regard or in complete ignorance of the risks in their operations and organization.
Nevertheless, organizations seem to succumb to production, budget, or schedule pressures. Despite intuitive understanding, they are being ignorant or cavalier about the risks.
This is clearly a losing proposition that can lead to a high consequence event.
Dangers of Risk Averse organizations
However, fear of taking risk is not a winning proposition either. Desires to have impossibly high standards of operation, operating without regard to budgets or schedules, or armoring an organization against even a possibility of failure is indicative of risk aversion.
Ultimately, this losing proposition prevents us from obtaining the benefits of the technology, and worse, it wastes taxpayer money.
Risk Aware Organization
What we need is a risk aware organization. In this, each individual is empowered to engage his or her system with conscious competence, applying the collection of habits that we are emphasizing at SSP.
No one can predict how the strengths will gang together to allow us to avoid a high consequence event, but surely we are better able to balance between the pressures of production and the pressures to avoid any adverse outcome by arming ourselves with these strengths.
As you continue your efforts to improve our undersea capabilities, keep in mind that as our systems grow in complexity, the potential for error grows with it. Furthermore, these errors are not just an operator problem. Their foundations can be laid at any point in the lifecycle of the system.
To ensure our technology operates as intended in a safe and effective manner, you and your teams must have both technical and human element strength.
And finally, the basis of that human strength must be a culture of risk evaluation.
It is incumbent upon each of us, as stewards of incredibly useful yet potentially dangerous technology, to never satisfy ourselves that our past success will carry us forward.
“Success is a lousy teacher. It seduces smart people into thinking they can’t lose. And it’s an unreliable guide to the future.”
We must be vigilant for new problems, and thereby ensure that our people will be ready to solve them.