A significant shift is occurring in the process the Submarine Force uses for tactical training. This shift was marked by decisions by both Submarine Force Type Commanders to quantitatively measure mission performance against defined standards. Although initiated through the Tactical Readiness Evaluation (TRE) process, it promises to have a profound impact not only on all aspects of submarine training but on the process for technology acquisition and determination of readiness metrics as well. The pervasiveness and significance of these impacts warrant labeling this process shift transformational.
Although developed independently, this new process is not only fully consistent with the Navy’s recently announced Revolution in Training, but is a necessary step toward fulfilling the revolution.
Limitations of Previous Methods
The previous methodology used for evaluation consisted of two phases. The first phase involved monitoring an event, observing the behavior and actions of the crew, and recording the environment in which they were acting. This process typically included evaluators with steno pads writing down orders given, reports made, and actions taken. Additionally, data such as distances to contacts, status of equipment, time of message receipt were recorded. Logs and records served to complete the picture.
The second phase consisted of comparing the actions taken by the crew to the prescribed procedures. For example, in piloting, if sounding data were not reported to the bridge as specified, this was noted as a deficiency. The deficiencies were then considered and a grade determined.
The primary limitation to this method is that instead off focusing on the ends-keeping the ship in the center of the channel, or putting the fire out, crews trained on the means-executing the procedures that have been designed to accomplish those ends. This diluted effort from the main thing. Additionally, since the focus was on following the steps of the procedure, innovative and creative methods of accomplishing the objectives were not encouraged.
A subtler disadvantage derived from the process of determining the grade based upon the relative number and significance of deficiencies. As opposed to standards-based grading, this practice pitted ships against each other.
Finally, this process was poorly suited to identify overall force weaknesses or contribute to decisions about the value of particular training or the acquisition of new technology.
A Better Way: The New Process CS-step)
The new process strives to quantitatively measure m1ss1on accomplishment against defined standards. Mechanically, this is accomplished by the development of attribute sheets that populate a database when completed. These attribute sheets have been published for the Force to use.
The 5-step process for quantitatively measuring mission accomplishment is as follows:
1. List the attributes and identify the critical attributes
2. Define the standard
3. Measure performance
4. Analyze the data
5. Determine the appropriate response
1. List the attributes and identify the critical attributes. The 5-step process begins with listing all the attributes for a particular mission or event, and identifying the critical attributes that best measure the effectiveness of the team in accomplishing its mission. For example, for a fire, the Measure of Effectiveness (MOE) is putting the fire out and the critical attributes would include the time the portable fire extinguisher arrives, the time the pressurized fire hose arrives, and the gap, if any, between application of extinguishing agents.
For approach and attack, the length of time contact is held before an attack is launched and the length of time the ship spends within a certain range of the target could measure risk of counter detection and loss of tactical control.
We have found that the determination of these critical attributes, although sometimes difficult to discern, is a supremely valuable effort. For it is by identifying the critical attributes that we convey to the Force what is important for a particular event.
2. Define the standard. These standards tend to be defined as times, distances, yes/no, or number or percent of defects. For example: 2 minutes for a fire hose, l 00 yards for a radar fix, report made to the operational commander made/not made, number of Interior Communication violations.
Where do the standards come from? In some cases, such as the fire example, the standard is based upon empirical studies and modeling. In this case, as reported in Naval Ship’s Technical Manual (NSTM) 555, if there is longer than a 2-minute delay in attacking the fire with a fire hose, untenable conditions and significant damage become probable. Thus, the standard is defined by what is required, not by what is achievable with current methods.
In other cases, standards have been specified by higher fleet commanders. In the case of Tomahawk strike, for example, the fleet commanders have specified certain time requirements for various responses. These higher fleet requirements have been incorporated into the attribute sheets so a submarine meeting the standards of the attribute sheets is de facto meeting the standards demanded by the overseas fleet commander.
There are many areas where neither well-defined modeling nor specifications from the warfighters exist to help us determine the standards. Radar piloting is one. No defined standard exists. The designed capabilities of the installed radars might help us, but this is also a trap. We should ask, “how good do we need to be at radar navigation” rather than asking “how good can our currently installed equipment let us be… In these cases, consensus opinion among experts can be used to determine the standard.
Note: the currently identified standards are evolving and have been determined by a collaborative effort among the Tactical Readiness Teams, Squadron Deputies, and Training Centers on both coasts.
Taken together, steps 1 and 2 fill in the first quadrant, Define Requirements, of the 4-Quadrant Human Performance System Model defined by Task Force EXCEL, now the Naval Personnel Development Command.
3. Measure Performance. As with the previous method, the ship/watch team is observed and performance is measured. Although measuring the accomplishment of critical attributes is key, it is not enough. The previous practice of watching the behavior of the crew and recording that as best as possible is still relevant. Why? This is because when a standard is not met, only by observation of the behavior (process) is it possible to determine why.
4. Analyze the Data. One of the strengths of the new system is its disciplined and repetitive development of quantitative data that can be analyzed. Let’s assume that the figure below reflects a histogram of radar fix accuracy. Radar fixes for a large sample of the population of measured against actual ship’s position. Fixes are counted in 15-yard bins-for example, how many fixes were accurate to within 15 yards, 15-30 yards, etc.
This distribution has measurable characteristics that would be useful for Submarine Force decision makers. To start with, we could determine the mean (average) error. Additionally, we could determine the proportion of fixes that fell outside a certain standard. Based on well-defined statistical principles and knowing the sample size, we could derive the corresponding parameters for the entire population.
However, let’s say we could look at the data more carefully. and we find that the data actually consists of 2 groups. which I’ve labeled group I and 2, and shown in the figure below. Again, we can measure the difference in performance between these two groups. quantitatively.
Consider that group I consisted of ships with the BPS-ISH radar, an improved radar recently installed in many ships. and that group 2 consisted of ships with the older radar. We now can determine quantitatively the benefits from this additional technology. Groups l and 2 could also be a comparison of operators that have had attended a certain course or not had attended the course, or Navigators with greater than I year experience compared to less than I year experience.
When one considers that through this process. data such as this will be collected on everything from firehose arrival times to range errors at time-of-fire, one can see the potential power of this process to warfighters, trainers, and acquisition decision makers.
The ability of this process to generate data that shows how forces are meeting defined standards is a key requirement to fully embracing the Navy’s Revolution in Training. Quadrant 4 of the 4-Quadrant Human Performance Model. Execute and Measure, requires a disciplined and rigorous process for measuring performance against actual standards. The previous methods of counting deviations from procedures will not be effective in supporting this model.
5. Determine the appropriate response. For the ship, the appropriate response will generally be to conduct training on identified shortfalls. This may involve an intimate look at the process used, watch bills, etc. However, more training is not always the answer. It may be that with currently installed equipment, the ship is doing the best it reasonably can. In this case, the appropriate response may be to investigate new equipment and technologies for acquisition.
Analogous to Statistical Process Control Application to Industry
In many ways, this transition directly parallels the revolutionary changes in manufacturing processes brought about by the application of statistical process control procedures over the past 3 decades. The application of rigorous statistical methods have been responsible for a revolution in quality. Monitoring for procedural compliance without measuring the objective is akin to evaluating a machine operator’s behavior but not measuring the dimensions of the finished part. In the same way, mission performance will be revolutionized.
Having said that, these complex human processes, with their built in causes for variations (a course change, the onset of reduced visibility) will rarely be in statistical control. This will complicate the statistical analysis.
Benefits of the New Process
This process will provide rigorous, quantifiable information about the force’s performance in assigned mission areas relative to external standards or requirements. Additionally, it will provide quantifiable comparative information about the benefits of different technologies, procedures, and training courses. When integrated with current initiatives in monitoring officer experience, this process will be able to determine the correlation between experience level and mission performance. Finally, since performance is measured against a standard, and there is no limit to the number of ships that can be evaluated as above standards the competition among ships is replaced by collaboration.
Role of the Training Centers
At this point, I’d like to discuss the unique role the training centers can play. In our example, I’ve conveniently avoided a discussion of how the truth was determined. How do we decide where the ship or target really were? For the TRE teams, this consists of reconstructing the track and using precise OPS or instrumented range data that may or may not have been available to the piloting or fire control tracking team.
But, in reality, this is only where the TRE team thought the ship or target was, and is subject to errors. Here is where the training centers play a special role because in their trainers, they actually do know the truth: the range to a contact; the actual position of the ship. Hence, data collected by training centers plays an important role in formulating the picture of force performance.
Additionally, by measuring the same attributes, the training centers reinforce a common picture of what the critical attributes of a mission are.
Current Obstacles
Having used this process for a year now, we are in a position to identify some of the problems encountered. I would advocate that we should look at these as issues to resolve rather than reasons not to continue down this path.
The first problem is how to deal with material problems. In this reality-based regime, since performance is what counts, material differences or casualties will impact mission accomplishment. Take, for example, a ship that has their high-frequency (HF) active sonar in a significantly degraded condition. This ship will be unable to detect potential mines, and has lost the capability to perform the mission area of minefield detection and avoidance. The ship may be able to demonstrate an intimate understanding of the procedures for this mission area, plan a mission, and even execute a simulated mission, but the bottom line is that they cannot perform the mission.
Assigning a score of zero here seems unjust. Accepting that each ship is primarily responsible for their material condition, there are some things that are beyond their control. White assigning a score of zero does not capture the training capability of the ship, it does reflect their ability to perform this mission. Alternatively, assigning a higher score would send a false picture of the submarine’s capability to other stakeholders.
The next problem deals with accounting for differences in scenario difficulty. A ship conducting an approach and attack against an unaugmented 688 simulating a modern adversary would be expected to have a shorter detection range, engagement range, and greater chance of losing tactical control than a ship conducting an approach and attack against a less capable adversary. If range at CPA is taken as a measure of tactical control, the first ship will do worse unless there is some accounting for the degree of difficulty. How this is accomplished, in the database and in the grading, needs to be resolved.
Where Do We Go From Here?
The Submarine Force has taken a significant step through adoption of this standards-based, quantitative system of measuring mission accomplishment. This process has the potential to transform current tactical training for our Force, as well as having far-reaching impacts on training centers and acquisition processes. The next steps involve developing a common, accessible database, widespread use of the attribute sheets with healthy feedback to the sheet owners, flexibility in development of the sheets and adjustment of point values.