Reliability Solutions for the Aerospace Industry
Powerful solutions for high-performing components

Reliability Solutions for the Airline Industry
Reliability solutions that make your profits soar

Reliability Solutions for the Automotive Industry
Reliability from R&D to the road

Reliability Solutions for the Chemical and Process Industry
Build processes with reliable results

Reliability Solutions for the Defense Industry
A strong reliability program breeds strong defense

Reliability Solutions for the Electronics and Appliances Industry
Reliable components ensure reliable business

Reliability Solutions for the Energy Industry
Let reliability light the way

Reliability Solutions for the IT Hardware
Reliability keeps the world connected

Reliability Solutions for the Medical and Healthcare Industry
Reliability for life

Reliability Solutions for Military Organizations
Dependability begins with reliability

Reliability Solutions for the Oil and Gas Industry
Streamline processes along the pipeline

Reliability Solutions for the Semiconductor Industry
Big results for small materials

Reliability Solutions for the Telecommunications Industry
Reliable solutions to better help you connect the world

Reliability Solutions for the Transportation Industry
Results that get you where you need to be

Reliability Solutions for the Trucks and Heavy Equipment
Reliability solutions that do the heavy lifting

Ordering Software from ReliaSoft
Choose the software license and configuration that's best fit for your organization

The Synthesis Platform
Integration to Empower the Reliability Organization

ALTA — Accelerated Life Testing Data Analysis Software
Accelerated life tests... quantifiable resultsTM

RENO — Simulation Software for Probabilistic Event and Risk Analysis
If you can flowchart it, you can simulate itTM

RBI — Risk Based Inspection (RBI) Software
Full-featured RCM with risk based inspection

ReliaSoft Offers Unparalleled Technical Support and Documentation
Connect with ReliaSoft's acclaimed technical support team via phone, email and live chat at locations throughout the world

The Synthesis Enterprise Portal
Open up the Synthesis Platform, and your work in the platform, to your whole organization with this web-based Portal

DOE++ — Software Tool for Experiment Design and Analysis (DOE)
DOE software designed with reliability in mindTM

Lambda Predict — Standards Based Reliability Prediction Software
MIL-217 * Bellcore/Telcordia * FIDES * NSWC Mechanical

MPC — MSG-3 Aircraft Maintenance Program Creation Software
An MSG-3 compliant maintenance program creator

Register and Activate Software from ReliaSoft
Start using ReliaSoft's helpful applications by registering and activating your software today!

The Synthesis API
Application Programming Interfaces (APIs) for the Synthesis Platform

RGA — Reliability Growth Analysis and Repairable System Analysis
Reliability growth and repairable system analysisTM

Xfmea — FMEA and FMECA Software Tool
Expert support for all types of FMEA and FMECATM

XFRACAS — Web-Based Failure Reporting Software (FRACAS System)
Highly configurable, web-based FRACAS

Download Free Software Demos and Free Service Releases
Get a free trial of the reliability engineering software trusted by thousands of companies, or install the latest update

Weibull++ — Life Data Analysis (Weibull Analysis) Software
The standard for reliability life data analysisTM

BlockSim — System Reliability and Maintainability Analysis Software
The ultimate system visualization and analysis toolTM

RCM++ — Reliability Centered Maintenance (RCM) Software
Putting the reliability back into reliability centered maintenanceTM

Orion eAPI — Enterprise System Focused on Asset Performance Management
Reliability Business IntelligenceTM

Reliability Education Course List
Peruse the variety of introductory and advanced courses offered for reliability practitioners of all experience levels

The New Master the Subject, Master the Tools!SM Course Lineup
Choose the reliability engineering education track that best fits your needs: Product Development or Asset Management

Worldwide Public Seminar Calendar
Find out which courses are being taught near you, and register today!

Why Attend ReliaSoft's Reliability Training Seminars?
Learn about ReliaSoft's commitment to reliability engineering education, as well as the benefits of attending

Seminar Registration
Register for a course today, and learn about seminar options and other details

Seminar Instructors
Meet the esteemed group of professionals that bring ReliaSoft's training seminars to life

Reliability Training Center (RTC)
Take a glimpse inside ReliaSoft's newly built, multimillion dollar state-of-the-art training facility

Frequently Asked Questions about ReliaSoft's Seminars
Get answers to the most common questions about the finer details of attending ReliaSoft's courses

Available Reliability Consulting Services
ReliaSoft's expert consulting team can accommodate requests of any size or complexity in a wide variety of areas

Customers that Benefited from ReliaSoft's Consulting Services
View some of the organizations that took advantage of ReliaSoft's on-site consulting and education services

ReliaSoft's Expert Reliability Consultants
Meet the esteemed group of professionals that organizations worldwide have trusted for their consulting needs

Frequently Asked Questions about ReliaSoft's Consulting Services
Get answers to the most common questions about the finer details of ReliaSoft's consulting services

About ReliaSoft Corporation
Learn more about ReliaSoft and its commitment to empowering the reliability professional

ReliaSoft Corporation Online Store
Purchase new software licenses, upgrades or resource books, and register for training seminars or ARS events

Companies that Rely on ReliaSoft's Reliability Software and Services
See some of the hundreds of companies that look to ReliaSoft for the tools necessary to power their reliability programs

Press Releases
Read about the latest exciting ReliaSoft product releases and announcements

Print-Ready Publications and Media Links
Browse for publications and videos relating to ReliaSoft's products and services

Career Opportunities at ReliaSoft Corporation
Want to work for the leading provider of reliability engineering software tools and services? Check for openings today!

How to Contact ReliaSoft Corporation
ReliaSoft offers unparalleled phone, e-mail and live chat support at locations throughout the world

ReliaSoft Customer Testimonials
"Amazing," "User Friendly," "Cutting Edge..." See what people are saying about ReliaSoft's software!
The most complete website devoted entirely to the topic of reliability engineering, theory, data analysis and modeling
A multimedia resource portal for both users of our software tools as well as all reliability engineering professionals

Synthesis Applications eDocuments & ePublications Library
Access useful support documents for ReliaSoft's Synthesis Applications, including user's guides and reference books

A resource portal and a wiki for professionals in reliability engineering and related fields

Certified Reliability Professional (CRP)
Distinguishing reliability engineers who have gained and successfully demonstrated unquestionable expertise in the field

International Applied Reliability Symposium (ARS)
Sharing applications, success stories and lessons learned in reliability and maintainability engineering

Reliability Discussion Forum
Connect, discuss and share with the reliability engineering community

Reliability Edge Newsletter
In-depth articles on reliability engineering theories and principles, as well as ReliaSoft updates

Reliability HotWire
Your source of information and tips on how to improve your reliability practices

Volume 3, Issue 3

Reliability Edge Home

Reliability Growth Test Planning and Management

An effective reliability growth test planning and management strategy can contribute greatly to successful product design and development through its impact on the ability of the design/development team to meet desired reliability goals on time and within the project budget. An effective reliability growth management program both produces and utilizes important information about the reliability of the product design, such as the demonstrated MTBF through testing, the growth in MTBF that has been achieved through implementation of corrective actions, the maximum potential MTBF that can likely be achieved for the product design and estimates regarding latent failure modes that have not yet been uncovered through testing. 

This article presents a brief conceptual overview of a reliability growth test planning/management strategy and data analysis methodology that provide information that can be instrumental to various management decisions for product design/development. Dr. Larry H. Crow, a leading practitioner in the field of Reliability Growth Analysis for over 30 years, developed the approach described in this article and has cooperated with design/development teams in both the military and the private sector to implement, validate and refine the relevant techniques. This article has been written with cooperation from Dr. Crow based on his lectures on the subject and published standards for reliability growth analysis.

Also, as noted in previous articles by Dr. Crow, a comprehensive reliability growth program actually begins in early design and identified potential problem failure modes are mitigated before formal testing. This potential failure mode mitigation in design is highly productive when managed with Failure Mode and Effects Analysis (FMEA), System Reliability Block Diagram (RBD) Analysis and/or Fault Tree Analysis (FTA). The objective of these analyses is to increase the reliability before testing. 

Background and Assumptions
The reliability growth test planning and management strategy described in this article assumes that as the product design matures, the design/development team identifies potential failure modes for the product through controlled testing in a series of phases. The design/development team then decides to implement corrective measures or "fixes" for some or all of the failure modes that have been identified in order to reduce the likelihood that the revised product design will fail due to the particular failure modes that have been identified. The corrective actions that are actually implemented, the effectiveness of these corrective actions and when the corrective actions are implemented determine the reliability growth management strategy. There are three basic approaches for implementing corrective actions into the design and the approach used will affect the analysis and decision-making process. These three approaches are: 

  • Test-Find-Test: Failure modes are identified but the fixes are not implemented until after the completion of the testing phase. In this case, the reliability growth due to the implementation of fixes takes place after the completion of a given testing phase and the improved product design is in place for the beginning of the next testing phase. 
  • Test-Fix-Test: Fixes are implemented during the test after the failure modes have been identified and the corrective actions have been determined. The testing may be stopped until the corrective action is implemented, but it is not necessary. The testing continues with the revised product design. In this case, the reliability growth is due to the implementation of the fixes during the given testing phase. 
  • Test-Fix-Test with Delayed Fixes: Some fixes are implemented during the test while other corrective actions are delayed until the completion of the test phase. In this case, the reliability growth due to the implementation of fixes takes place both during and after the completion of the given testing phase. 

Based on the results of each reliability growth testing phase and the subsequent analysis, the project manager may wish to make changes to the design/development approach. Specifically, he/she may choose to revise the program schedule, change the number of products tested and/or the duration of the test and/or increase, decrease or reallocate the program budget and resources. In addition, the design/development team may reevaluate the criteria used to determine which failure modes will receive corrective actions and institute any necessary changes. That is, it may be appropriate to change the management strategy. 

Analysis Procedure 
The analysis and management approach described here is an iterative process that begins with the data capture from the first testing phase and continues through subsequent testing phases until reliability goals have been achieved and the product is released. Before the first testing phase begins, the design/development team will have completed a number of important steps to prepare the groundwork for subsequent analysis and decision-making. These important activities include the analysis of previous programs to identify any relevant reliability growth patterns that are likely to appear for the new design, the development of a reliability growth testing plan (including decisions as to the duration of the test, sample size, policies for implementing fixes, etc.) and the creation of a planned reliability growth curve to provide the team with a general outline of what they can expect over the course of each testing phase. Once this has been completed, the following analysis procedures can be implemented. 

Reliability Growth Testing: Test a sample of units according to the test plan that has been established and record failure information for the units under test. In practice, the units may start the test at different times, but it is generally assumed that the test units have the same design configurations at any point in the testing. The methods also apply to discrete (one-shot) success/failure events. 

Categorize Observed Failures: Categorize each observed failure according to whether corrective action will be performed to address the problem that caused the failure. In a "Test-Find-Test" scenario, one of two categories can be assigned to each failure mode: Category A or Category B. 

  • Category A: Corrective actions will not be performed. A failure mode may be assigned to Category A for a variety of reasons including, but not limited to: 
  • The failure mode occurs in existing technology for which re-design is not possible or cost-effective.
  • The likelihood of occurrence for the failure mode is not large enough to justify the cost of corrective action.
  • The severity of the potential effect of failure is not serious enough to justify the cost of corrective action.
  • The budget for corrective actions does not permit corrective actions to be performed.
  • Other reasons to be determined based on the particular situation and the organization's design/development and reliability growth management strategy. 
  • Category B: Corrective actions to eliminate or mitigate the cause of failure will be performed after the current test phase has been completed. Corrective actions for Category B failure modes are often called "delayed corrective actions" or "delayed fixes." 

Characterize Category B Failure Modes: Identify and characterize the failure mode for each Category B failure. The failure mode description typically provides information about the specific physical cause of the problem. For example, "leaking actuator, worn seal" and "leaking actuator, flange radius crack from fatigue" are two unique failure modes. In this case, the phrase "leaking actuator" is not sufficiently descriptive of the failure mode because there is more than one physical cause that can result in the failure of the item via a leaking actuator. 

For bookkeeping purposes, it can be helpful to assign an alphanumeric code to all Category B failure modes according to the sequence in which unique modes have been identified. For example, the first Category B failure can be identified as B1, the second as B2, and so on. When/if another failure occurs due to a failure mode that has already been identified, it is given the same number as the first instance of that failure. 

Quantify Effectiveness of Corrective Actions: For each unique Category B failure mode, examine the likely effectiveness of the corrective action. The effectiveness factor is a number between 0 and 1, which represents the fraction decrease in the failure mode's failure rate due to the corrective action. For example, if the corrective action is expected to reduce the failure rate due to a given mode by 75%, then the effectiveness factor for the corrective action is 0.75. If this mode is expected to be responsible for 8 failures before the fix has been implemented, then after the corrective action has been performed, we would expect to observe 2 failures due to the given mode. Numerically, this would be 
8 * (1 - 0.75) = 2. 

Effectiveness factors are assigned based on engineering judgment and the predictions made based on the various factors will be affected by the quality of this assessment. Based on past experience with reliability growth analysis testing, the average effectiveness factor for all modes is likely to be in the range of 0.65 to 0.75. An individual effectiveness factor may be smaller or larger than this average, but the average over a large number of effectiveness factors during a test is likely to be in this range based on data. 

Apply Statistical Model: The Crow (AMSAA) projection model uses a nonhomogeneous Poisson process (N.H.P.P.) statistical model to analyze reliability growth data and incorporate the failure classifications and effectiveness factors. This model can be used to obtain a variety of plots and results, including the reliability that has been demonstrated during the test and the expected reliability of the design after the delayed fixes for Category B failure modes have been implemented. These results are presented graphically in Figure 1, which shows the demonstrated MTBF of the current design as a straight line at 9.55 and the projection for the new design (which incorporates the delayed fixes) as a point at 15.13 MTBF. The projection of 15.13 estimates the impact of the proposed delayed corrective actions and effectiveness factors on the system reliability. 

Figure 1: Demonstrated and projected MTBF
Figure 1: Demonstrated and projected MTBF

Evaluate and Adjust Management Strategy: In addition to the demonstrated and projected MTBF results, the Crow (AMSAA) projection model supports the generation of other results and plots that can be invaluable for evaluating the current design/development management strategy and making any necessary adjustments. The growth potential metric and the analysis of unseen failure modes are important metrics for this purpose. 

The growth potential is an estimate of the maximum system MTBF that can be attained with the product design and reliability growth management strategy. This can be displayed with a straight line on the MTBF vs. Test Time plot, as shown in Figure 2 where the growth potential is identified at 22.45 MTBF. This metric can help to confirm the manager's expectation that the ultimate reliability goal for the design is feasible, but it can also provide a clear warning if the reliability goal cannot be achieved for the current design under the given conditions. Management can then respond to this warning by making changes to the management strategy, such as converting some Category A failure modes to Category B failure modes and/or changing the criteria for the classification of new modes that are uncovered or adding redundancy.

Figure 2: MTBF vs. Time with growth potential line
Figure 2: MTBF vs. Time with growth potential line

Analysis of the unseen failure modes provides another important set of metrics for evaluating the product design and the reliability growth management strategy. Based on the failure modes that have been uncovered during the test, the Crow (AMSAA) projection model can be used to provide estimates about the failure modes that have not yet occurred. Such metrics include the current rate of uncovering new Category B failure modes, the estimated number of unseen Category B failure modes and the estimated failure rate for unseen Category B failure modes. This analysis can provide an indicator of how many problems are yet to be discovered in the design and how much test time will be required to identify and correct those latent causes of failure. The pie chart in Figure 3 represents one method to display this information graphically. The pie chart illustrates the quantity and ratio of seen and unseen failure modes after the completion of a particular phase of testing. 

Figure 3: Seen and unseen failure modes
Figure 3: Seen and unseen failure modes

Incorporating Category C Failures 
Although the previous discussion assumed that failure modes would not be corrected (Category A) or that corrective actions would be performed at the end of the testing phase (Category B), it is also possible to implement some corrective actions during the test and then continue testing with the corrective action in place (i.e., a Test-Fix-Test or Test-Fix-Test with Delayed Fixes approach). These failure modes are classified as Category C. Because it is assumed that the effect of the corrective action will be demonstrated empirically as the corrected units continue to operate in the test, there is no need to assign an effectiveness factor to Category C failure modes. The Crow (AMSAA) model (MIL-HDBK-189) is widely used to evaluate the reliability growth in the presence of Category A and Category C failure modes. This approach will likely result in a gradual increase in the reliability of the product during the test time. 

If the test also includes Category B failure modes, then this gradual increase will also be accompanied by a jump in reliability when the Category B corrective actions are implemented at the end of the test phase. The Generalized Crow Projection model accommodates Category A , B and C failure modes and Figure 4 displays the MTBF vs. Time plot for such analyses. This plot is similar to the ones shown in Figures 1 and 2, except that it includes a gradual increase in the reliability observed during the test, due to the implementation of fixes for some failure modes while the test was in progress. 

Figure 4: Incorporating category C failure modes
Figure 4: Incorporating category C failure modes

The reliability growth planning/management strategy and data analysis methodology described in this article will be supported by the next version of ReliaSoft's reliability growth analysis software, RGA++. This software is currently under development, with cooperation from Dr. Crow and other partners from the military and commercial sectors to determine the functional requirements. The RGA++ software is anticipated for release in 2Q 2003 and will provide a complete array of analysis options for both continuous (time-to-failure) and discrete (one-shot, success/failure) data sets, including the incorporation of the Crow (AMSAA) projection model and related analyses described in this article. 

Dr. Larry H. Crow has developed and implemented the management and analysis approach described in this article and the article has been written with his cooperation and review. This general presentation of the basic concepts of the approach is based largely on lecture notes, discussions and other information provided by Dr. Crow. In addition, the following documents are also relevant to this discussion: 

United States Department of Defense. MIL-HDBK-189:Reliability Growth Management,
      February 13, 1981. 

International Electrotechnical Commission. IEC 61164:Reliability Growth - Statistical Test and
      Estimation Methods
, June 1995. 

NOTE: Two IEC publications on reliability growth, IEC 61164 and IEC 61014, are currently undergoing revision. For more information, search for works in progress at

End Article

Dr. Larry H. Crow

Larry H. Crow is Vice President, Reliability and Sustainment Programs at Alion Science and Technology, Huntsville, Alabama. He held this position at IIT Research Institute before Alion was established in 2002 by 1600 former IITRI employees. Previously, Dr. Crow was Director, Reliability at General Dynamics Advanced Technology Systems (formerly Bell Laboratories ATS). Before joining Bell Laboratories in 1985, Dr. Crow was chief of the Reliability Methodology Office at the US Army Materiel Systems Analysis Activity (AMSAA). He developed the Crow (AMSAA) model and the Crow Projection model, which have been incorporated into US DoD military handbooks as well as national and international standards and service regulations on reliability. Dr. Crow chaired the Tri-Service Committee to develop US MIL-HDBK- 189, Reliability Growth Management and is the principal author of that document. He is also the principal author of the IEC 61164, Reliability Growth-Statistical Tests and Estimation Methods. He developed the widely used NHPP Power Law model for analyzing repairable systems reliability, which is featured in a new IEC 61710, Goodness-of-Fit and Estimation Methods for the Power Law Model. Dr. Crow is an elected Fellow of the American Statistical Association and the Institute of Environmental Sciences and Technology and is on the Board of Directors of the Annual Reliability and Maintainability Symposium (RAMS). He is the recipient of The Florida State University "Grad Made Good" Award for the Year 2000, the highest honor given to a graduate by Florida State University. Footer

Copyright © 1992 - ReliaSoft Corporation. All Rights Reserved.
Privacy Statement | Terms of Use | Site Map | Contact | About Us    ReliaSoft on Facebook  ReliaSoft on Twitter  ReliaSoft on LinkedIn  ReliaSoft on Google+