Reliability Edge Newsletter

Volume 2, Issue 3

Reliability Edge Home

Limitations of the Exponential Distribution
for Reliability Analysis
(with Exponential and Preventive Maintenance Example)

Many years ago, science held and defended the theory that the earth was flat. Once that theory was overturned, great scientific strides were made, leading us to new theories that better describe and model the physical world we live in. Today, even though not widely defended, the unsupported assumption that most reliability engineering problems can be modeled well by the exponential distribution is still widely held. In a quest for simplicity and solutions that we can grasp, derive and easily communicate, many practitioners have embraced simple equations derived from the underlying assumption of an exponential distribution for reliability prediction, accelerated testing, reliability growth, maintainability and system reliability analyses. This practice is perpetuated by some reliability authors and lecturers, some reliability software makers, and most military standards that deal with reliability.

So what is wrong with the widespread use of the exponential distribution for reliability analysis? To answer that question, we need to understand the basic constant failure rate assumption of the exponential distribution and examine whether it is supported in most real world applications. The exponential distribution models the behavior of units that fail at a constant rate, regardless of the accumulated age. Although this property greatly simplifies analysis, it makes the distribution inappropriate for most “good” reliability analyses because it does not apply to most real world applications.

Inapplicability of the Constant Failure Rate Assumption
Like the theory that the world is flat, the hypothesis of a constant failure rate provides mathematical models that can be easily implemented and explained, yet leads us away from the benefits that can be gained by adopting models that more accurately represent real world conditions. Like Galileo, who studied the phases of Venus through his telescope and made observations that contradicted Aristotelian and Ptolemaic astronomy, we can also look around the physical world we live in and find irrefutable evidence that the failure rates of most, if not all, real world products are not constant.

A simple analysis of human mortality data obtained from the Sunday newspaper provides an illustration of the erroneous conclusions we can reach through the assumption of a constant failure rate when analyzing real world data. Using the Weibull++ software to analyze the human mortality data with an exponential distribution, we find that if the human mortality rate (failure rate) were constant, a significant percentage of the population (10% based on the data sample used) would be dead by age 10, while another 10% would be alive and well beyond 175 years of age, and a lucky 1% of us would continue to live well past 350 years of age! These calculations clearly disagree with our observation of human mortality in the real world and Figure 1 demonstrates the discrepancy. This graph displays the human mortality data analyzed with both the exponential and Weibull distributions. It shows that the Weibull distribution models the behavior better, while the exponential distribution overestimates the initial failure rate and significantly underestimates the rate in later stages of life.

Human mortality rate analyzed with the exponential and Weibull distributions

Figure 1: Human mortality rate analyzed with the exponential and Weibull distributions

Similar examples are abundant among manufactured products as well. If cars exhibited a constant failure rate, then the vehicle’s mileage would not be a factor in the price of a used car because it would not affect the subsequent reliability of the vehicle. In other words, if a product can wear out over time, then that product does not have a constant failure rate. Unfortunately, most items in this world do wear out, even electronic components and non-physical products such as computer software. Electronic components have been shown to exhibit degradation over time and computer software also exhibits wear-out mechanisms. For example, a freshly rebooted PC is less likely to crash than a PC that has been running for a while, indicating an increasing software failure rate during each run.

Persistence in Reliability Analysis of the Exponential Assumption
Despite the inadequacy of the exponential distribution to accurately model the behavior of most products in the real world, it is still widely used in today’s reliability practices, standards and methods. As an example, the first term learned by most people when they are introduced to reliability is MTBF (mean time between failures). This term is so ingrained in current reliability science that it forms the basis for many comparisons and most reliability standards, is widely used in reliability specifications and is the desired result of many reliability, maintainability and availability analyses.

What many people fail to understand, however, is that the sole use of the MTBF reliability metric almost always implies that the exponential distribution was used to analyze the data. Under an exponential distribution assumption, the mean completely characterizes the distribution and is a sufficient metric. However, if the data are modeled by another distribution, then the mean is not sufficient to describe the data and is, in many cases, a poor reliability metric. In addition, the term itself has led to many erroneous assumptions and confusions about its relationship to other terms, such as MTTF (mean time to failure or the mean of the data based on the assumed distribution). The reason for the confusion is that both of these terms are equal if you assume a constant failure rate. In other words, if a product experiences one failure per hour, the MTTF is one hour and so is the MTBF. However, if the failure rate is not constant, then each metric has a different meaning and a different result. Thus, in the majority of cases, most practitioners are really looking for and solving for the MTTF, regardless of what they choose to call it. In the analysis of repairable systems, one might argue that the MTBF is a valuable metric because we record the times between failures for the system (the random variable) and compute the mean of these times. However, in reality, is this not the same as computing the distribution mean (i.e., the MTTF) utilizing times between failure as our random variable instead of times-to-failure? [Note: Reliability Edge Volume 1, Issue 1 presents a more complete discussion of issues with the MTBF metric, on the Web at http://www.ReliaSoft.com/newsletter.]

Another reason for the extensive use of the exponential distribution is a reliance by some practitioners on antiquated techniques of reliability prediction, which are not based on actual life data for the products. Instead, they utilize compiled tables of generic failure rates (exponential failure rates) and simplistic multiplication factors (e.g., MIL-HDBK-217). These analyses provide little, if any, information and insight as to the true reliability of the products in the field. This misuse more often than not leads to an averaging of the true variable failure rate and, in the case of an increasing failure rate, the overestimation of this rate. This may result in reliability estimates that are too low in the early stages of life and too high in later stages, as demonstrated in the human mortality graph in Figure 1.

The exponential distribution is also widely used, although inappropriately, in the development of preventive maintenance strategies. In many cases, the MTBF is used to determine a preventive maintenance interval for a component. However, the use of the MTBF metric implies that the data were analyzed with an exponential distribution since the mean will only fully describe the distribution when the exponential distribution is used for analysis. The use of the exponential distribution, in turn, implies that the component has a constant failure rate. This now begs the question of why anyone would preventively replace a component that has a constant failure rate and does not experience wear-out over time! With a constant failure rate assumption, preventive maintenance actions do not improve the reliability of the component, but rather waste time and parts, as illustrated in the exponential and preventive maintenance example. [Note: More accurate methods for determining the optimum replacement interval for components with non-constant failure rates are presented in Reliability Edge Volume 1, Issue 1.]

Exponential Distribution’s Contribution to Reliability
Although it is not applicable to most real world applications, the use of the exponential distribution still has some value to reliability analysis. As more of an exception than the norm, the distribution can be effectively incorporated into reliability analysis if the constant failure rate assumption can be justified. Additionally, prior efforts and standards that extensively utilized the exponential distribution should be commended for introducing and formalizing the reliability methods that formed the basis of more advanced analysis techniques and for applying more rigorous scientific approaches within the field. We cannot underestimate the exponential distribution’s contribution to the development of current reliability principles/theory. However, today’s high product reliability goals require the use of more sophisticated analysis methods and metrics that more accurately reflect real world conditions. Such models have been developed and computer technology addresses the more complex mathematical formulations they require.

Exponential and Preventive Maintenance Example 

This simple example demonstrates that preventive maintenance actions do not improve the overall reliability of components that fail at a constant rate (i.e., follow an exponential distribution).

Two components follow an exponential distribution with MTTF = 100 hrs (or Lambda = 0.01). Component 1 is preventively replaced every 50 hrs, while component 2 is never maintained.

Preventive Maintenance for Two Components

Compare the reliabilities of the components from 0 to 60 hrs:

  • With PM: The reliability from 0 to 60 hrs is based on the reliability of the original component for 50 hrs, R(t=50)=60.65%, multiplied by the reliability of the new component for 10 hrs, R(t=10)=90.48%. The overall result is 54.88%.
  • Without PM: Without PM, the reliability from 0 to 60 hrs is based on the reliability of the original component operating to 60 hrs, R(60)=54.88%.

Compare the reliabilities of the components from 50 to 60 hrs:

  • With PM: The reliability from 50 to 60 hrs is based on the reliability of the new component, R(t=10)=90.48%.
  • Without PM: Without PM, the reliability from 50 to 60 hrs is based on the conditional reliability of the original component operating to 60 hrs, having already survived to 50 hrs, or RC(T=60|50)=R(60)/R(50)=90.48%.
 

--End of Reliability Edge Article--