Design for Reliability (DFR) is not a new concept, but it has begun to receive a great deal of attention in recent years. What is DFR? What are the ingredients for designing for reliability, and what is involved in implementing DFR? Should DFR be part of a Design for Six Sigma (DFSS) program, and is DFR the same as DFSS? In this article, we will try to answer these questions and, at the same time, we will propose a general DFR process that can be adopted and deployed with a few modifications across different industries in a way that will fit well into the overall Product Development Process.
What is Design for Reliability (DFR)?
All reliability professionals are familiar with the terms Weibull Analysis and/or Life Data Analysis. In fact, for many, these analysis techniques have become almost synonymous with reliability and achieving high reliability. The reality, though, is that although life data analysis is an important piece of the pie, performing just this type of analysis is not enough to achieve reliable products. Rather, there are a variety of activities involved in an effective reliability program and in arriving at reliable products. Achieving the organization’s reliability goals requires strategic vision, proper planning, sufficient organizational resource allocation and the integration and institutionalization of reliability practices into development projects.
Design for Reliability, however, is more specific than these general ideas. It is actually a process. Specifically, DFR describes the entire set of tools that support product and process design (typically from early in the concept stage all the way through to product obsolescence) to ensure that customer expectations for reliability are fully met throughout the life of the product with low overall life-cycle costs. In other words, DFR is a systematic, streamlined, concurrent engineering program in which reliability engineering is weaved into the total development cycle. It relies on an array of reliability engineering tools along with a proper understanding of when and how to use these tools throughout the design cycle. This process encompasses a variety of tools and practices and describes the overall order of deployment that an organization needs to follow in order to design reliability into its products.
Why is DFR Important?
Why should a company commit resources for deploying a DFR process? The answer to this question is quite simple... warranty costs and customer satisfaction. Field failures are very costly. One case in point is the recently publicized Xbox issue, which has cost Microsoft more than a billion dollars in warranties (aside from loss of business and market share). Clearly, in order to be profitable, an organization’s products must be reliable, and reliable products require a formal reliability process. Three important statements summarize the best practice reliability philosophy of successful companies:
1) Reliability must be designed into products and processes using the best available science-based methods.
2) Knowing how to calculate reliability is important, but knowing how to achieve reliability is equally, if not more, important.
3) Reliability practices must begin early in the design process and must be well integrated into the overall product development cycle.
Understanding when, what and where to use the wide variety of reliability engineering tools available will help to achieve the reliability mission of an organization. And this is becoming more and more important with the increasing complexity of systems as well as the complexity of the methods available for determining their reliability. System interactions, interfaces, complex usage and stress profiles need to be addressed and accounted for. With such increasing complexity in all aspects of product development, it becomes a necessity to have a well defined process for incorporating reliability activities into the design cycle. Without such a process, trying to implement all of the different reliability activities involved in product development can become a chaotic situation, where different reliability tools are deployed too late, randomly, or not at all, resulting in the waste of time and resources as well as the occurrence of problems in the field.
Managers and engineers have come to this realization and a push for a more structured process has been seen in recent years. The circumstances are very similar to what happened with the "Quality Assurance" discipline back in the 1980s, which spawned successful processes such as Six Sigma and Design for Six Sigma (DFSS). It is thus only natural for organizations to look to these existing processes and sometimes even try to include reliability in them. However, although Six Sigma and DFSS have been quite successful in achieving higher quality, reducing variation and cutting down the number of non-conforming products, the methodologies are primarily focused on product quality and many organizations are starting to realize that they do not adequately support the achievement of high reliability. Therefore, these organizations are starting to put more emphasis on the separate, although often complementary, techniques of Design for Reliability.
Since the distinctions between reliability and quality, and consequently between DFR and DFSS, are often still poorly understood, it is worthwhile to address this topic briefly in the next few sections before presenting the overall process and the specific techniques that comprise DFR.
Distinction Between Reliability and Quality
First, let us start with some basic clarifications. Traditional quality control assures that the product will work after assembly and as designed. Whereas reliability provides the probability that an item will perform its intended function for a designated period of time without failure under specified conditions. In other words, reliability looks at how long the product will work as designed, which is a very different objective than that of traditional quality control. Therefore, different tools and models apply to reliability that do not necessarily apply to quality and vice versa. This is exemplified by the following comparison of DFSS (focused on quality) and DFR (focused on reliability).
Distinction Between DFSS and DFR
Design for Six Sigma emerged from the Six Sigma and the Define-Measure-Analyze-Improve-Control (DMAIC) quality methodologies, which were originally developed by Motorola to systematically improve processes by eliminating defects. Unlike its traditional Six Sigma/DMAIC predecessors, which are usually focused on solving existing manufacturing issues (i.e., "fire fighting"), DFSS aims at avoiding manufacturing problems by taking a more proactive approach to problem solving and engaging the company efforts at an early stage to reduce problems that could occur (i.e., "fire prevention"). The primary goal of DFSS is to achieve a significant reduction in the number of nonconforming units and production variation. It starts from an understanding of the customer expectations, needs and Critical to Quality issues (CTQs) before a design can be completed. Typically in a DFSS program, only a small portion of the CTQs are reliability-related, and therefore, reliability does not get center stage attention in DFSS. DFSS rarely looks at the long-term (after manufacturing) issues that might arise in the product.
On the other hand, Design for Reliability is a process specifically geared toward achieving high long-term reliability. This process attempts to identify and prevent design issues early in the development phase, instead of having these issues found in the hands of the customer. As mentioned previously, a variety of tools are used in order to accomplish this objective. These tools are different than those used in DFSS, even though there is some overlap. Figure 1 illustrates the different tools used in DFSS and DFR, as well as the overlap between the two. As you can see from this graphic, the types of tools used in DFR are based on modeling the life of the product, understanding the operating stresses and the physics of failure. The common area between DFSS and DFR includes tools such as Voice of the Customer (VOC), Design of Experiments (DOE) and Failure Modes and Effects Analysis (FMEA), which are essential elements in any kind of product improvement program.
Proper Relationship Between Reliability and Quality
As the previous sections demonstrate, there is a clear distinction between the goals and tools employed to assure quality versus those employed to analyze and improve reliability. Of course, there are also many natural affinities between the two disciplines and it is understandable that many organizations have traditionally combined both quality and reliability under the same umbrella. In some cases, when the organization clearly understands the distinction between quality and reliability and applies the appropriate tools for both objectives, this combination can be appropriate and effective. However, when there is not a clear understanding of the essential differences in the tools involved, this can lead to very poor outcomes resulting from the improper use of tools and data. The rest of this article attempts to distinguish the specific processes and techniques that are necessary to ensure a product’s reliability by presenting a high-level overview of a general DFR process.
The DFR Process
The Stress-Strength Interference principle states that a product fails when the stress experienced by the product exceeds its strength (as shown in Figure 2). In order to reduce the failure probability (and thus increase the reliability), we must reduce the interference between stress and strength. A structured process, such as the DFR process presented in this article, is needed in order to achieve this. The proposed process can be used as guide to the sequence of deploying the different tools and methods involved in a program to ensure high reliability. This process can be adapted and customized based on your specific industry, your corporate culture and other existing processes within your company (such as Six Sigma and/or DFSS). In addition, the sequence of the activities within the DFR process will vary based on the nature of the product and the amount of information available. It is important to note that even though this process is presented in a linear sequence, in reality some activities would be performed in parallel and/or in a loop based on the knowledge gained as a project moves forward. Figure 3 presents a summary of the full process and the ways in which techniques may interact.
In order to make this DFR process general enough, and applicable to different industries, we decided to break the process down into six key activities, which are: 1) Define, 2) Identify, 3) Analyze and Assess, 4) Quantify and Improve, 5) Validate and 6) Monitor and Control. By dividing the process into these activities, we can identify and group the different tools, and provide a roadmap that can easily be followed, as well as easily mapped into a Product Development Process (Concept, Design, Assurance, Manufacturing and Launch).
The purpose of this stage is to clearly and quantitatively define the reliability requirements and goals for a product as well as the end-user product environmental/usage conditions. These can be at the system level, assembly level, component level or even down to the failure mode level.
Determining the usage and environmental conditions is an important early step of a DFR program. Companies need to know what it is that they are designing for and what types of stresses their products are supposed to withstand. The conditions can be determined based on customer surveys, environmental measurement and sampling.
Requirements can be determined in many different ways, or through a combination of those different ways. Requirements can be based on contracts, benchmarks, competitive analysis, customer expectations, cost, safety, best practices, etc. Some of the tools worth mentioning that help in quantifying the "voice of the customer" include KANO models, affinity diagrams and pair-wise comparisons. Of particular interest to DFR are the requirements that are Critical to Reliability (CTR). While the emphasis of DFSS is on satisfying the Criticality to Quality issues (CTQs), which are typically not reliability-related, DFR focuses specifically on the reliability aspects of a product.
The system reliability requirement goal can be allocated to the assembly level, component level or even down to the failure mode level. Different allocation techniques are available, such as Equal, AGREE, Feasibility, ARINC, Repairable Systems Allocation and the cost-based RS-Allocation methods (which are supported by ReliaSoft’s BlockSim software).
Once the requirements have been defined, they must be translated into design requirements and then into manufacturing requirements. A commonly used methodology is the Quality Function Deployment (QFD) approach using what is commonly called the House of Quality tool. This is a systematic tool to translate customer requirements into functional requirements, physical characteristics and process controls.
In this stage, a clearer picture about what the product is supposed to do starts developing. It is important to understand how much change is introduced with this new product. A product can be an upgrade of an existing product, an existing product that is introduced to a new market or application, a product that is not new to the market but is new to the company or it could be a completely new product that does not exist in the market. With more design or application change, more reliability risks are introduced to the success of the product and company.
A thorough change point analysis should reveal changes in design, material, parts, manufacturing, supplier design or process, usage environment, system's interface points, system's upstream and downstream parts, specifications, interface between internal departments, performance requirements, etc. A formal methodology called Change Point Analysis can be used to examine what changes, if any, have taken place. The purpose of this exercise is to identify and prioritize the Key Reliability Risk items and their corresponding Risk Reduction Strategy. Designers should consider reducing design complexity and maximizing the use of standard (proven) components.
A good tool to assess risk early in the DFR program is the FMEA. FMEAs identify potential failure modes for a product or process, assess the risk associated with those failure modes, prioritize issues for corrective action and identify and carry out corrective actions to address the most serious concerns. A properly applied Design FMEA (DFMEA) takes requirements, customer usage and environment information as inputs and, through its findings, initiates and/or informs many reliability-centered activities such as Physics of Failure, System Analysis, Reliability Prediction, Life Testing and Accelerated Life Testing.
Figure 3: DFR Process
Analyze and Assess
It is highly important to estimate the product's reliability, even with a rough first cut estimate, early in the design phase. This can be done with estimates based on engineering judgment and expert opinion, Physics of Failure (PoF) analysis, simulation models, prior warranty and test data from similar products/components (using life data analysis techniques) or Standards Based Reliability Prediction (using common military or commercial libraries, such as MIL-217, Bellcore and Telcordia, to come up with rough MTBF estimates or to compare different design concepts when failure data is not yet available).
PoF analysis provides much needed insights into the failure risks and mechanics that lead to them (especially when actual test data is not available yet). PoF utilizes knowledge of life-cycle load profile, package architecture, material properties, relevant geometry, processes, technologies, etc, to identify potential Key Process Indicator Variables (KPIVs) for failure mechanisms. It can also be used to identify design margins and failure prevention actions as well as to focus reliability testing.
Quantify and Improve
In this stage, we will start quantifying all of the previous work based on test results. By this stage, prototypes should be ready for testing and more detailed analysis. Typically, this involves an iterative process where different types of tests are performed, the results are analyzed, design changes are made, and tests are repeated. A wide array of tools are available for the reliability engineer to uncover product weaknesses, predict life and manage the reliability improvement efforts. The following is a summary of the most commonly used tools.
Design of Experiments (DOE) provides a methodology to create organized test plans to identify important variables, to estimate their effect on a certain product characteristic and to optimize the settings of these variables to improve the design robustness. Within the DFR concept, we are mostly interested in the effect of stresses on our test units. DOEs play an important role in DFR because they assist in identifying the factors that are significant to the life of the product, especially when the physics of failure are not well understood. Knowing the significant factors results in more realistic reliability tests and more efficient accelerated tests (since resources are not wasted on including insignificant stresses in the test).
With testing comes data, such as failure times and censoring times. Test results can be analyzed with Life Data Analysis (LDA) techniques to statistically estimate the reliability of the product and calculate various reliability-related metrics with a certain confidence interval. Applicable metrics may include reliability after a certain time of use, conditional reliability, B(X) information, failure rate, MTBF, median life, etc. These calculations can help in verifying whether the product meets its reliability goals, comparing designs, projecting failures and warranty returns, etc.
As an alternative to testing under normal use conditions and LDA, Quantitative Accelerated Life Testing (QALT) can also be employed to cut down on the testing time. By carefully elevating the stress levels applied during testing, failures occur faster and thus failure modes are revealed (and statistical life data analysis can be applied) more quickly.
Highly Accelerated Tests (HALT/HASS) are qualitative accelerated tests used to reveal possible failure modes and complement the physics of failure knowledge about the product. However, data from qualitative tests cannot be used to quantitatively project the product's reliability.
A very important aspect of the DFR process also includes performing Failure Analysis (FA) or Root Cause Analysis (RCA). FA relies on careful examination of failed devices to determine the root cause of failure and to improve product reliability. This is where the engineers come face-to-face with the failure, see what a failure actually looks like and study the processes that lead to it. FA provides better understanding of physics of failure and can discover issues not foreseen by techniques used prior to testing (such as FMEA). FA helps with developing tests focused on problematic failure modes. It can also help with selecting better materials and/or designs and processes, and with implementing appropriate design changes to make the product more robust.
System Reliability Analysis with Reliability Block Diagrams (RBDs) can be used in lieu of testing an entire system by relying on the information and probabilistic models developed on the component or subsystem level to model the overall reliability of the system. It can also be used to identify weak areas of the system, find optimum reliability allocation schemes, compare different designs and to perform auxiliary analysis such as availability analysis (by combining maintainability and reliability information).
Fault Tree Analysis (FTA) may be employed to identify defects and risks and the combination of events that lead to them. This may also include an analysis of the likelihood of occurrence for each event.
Reliability Growth (RG) testing and analysis is an effective methodology to discover defects and improve the design during testing. Different strategies can be employed within the reliability growth program, namely: test-find-test (to discover failures and plan delayed fixes), test-fix-test (to discover failures and implement fixes during the test) and test-fix-find-test (to discover failures, fix some and delay fixes for some). RG analysis can track the effectiveness of each design change and can be used to decide if a reliability goal has been met and whether, and how much, additional testing is required.
The activities described thus far should continue until the design is considered to be "acceptable." In the Validate stage, a Demonstration Test can be used to make sure that the product is ready for high volume production. Statistical methods (such as Parametric Binomial and Non-Parametric Binomial) can be used to develop a test plan (i.e., a combination of test units, test time and acceptable failures) that will demonstrate the desired goal with the least expenditure of resources.
If the design has been "demonstrated," the product can go into production. When reaching the manufacturing stage, the DFR efforts should focus primarily on reducing or eliminating problems introduced by the manufacturing process. Manufacturing introduces variations in material, processes, manufacturing sites, human operators, contamination, etc. The product's reliability should be reevaluated in light of these additional variables. Design modifications might be necessary to improve robustness. For example, a design should require the minimal possible amount of non-value-added manual work and assembly. Whenever possible, it should use common parts and materials to facilitate manufacturing/assembling. It should also avoid tight design tolerances beyond the natural capability of the manufacturing processes.
Suppliers also present another area of risk that needs to be addressed in a DFR program and, therefore, procedures should be developed to assist and control the suppliers. Continuous sampling of units for testing and QALT and LDA analysis is highly desirable throughout manufacturing to estimate the reliability of the product and assess whether the reliability goal is still expected to be met.
Monitor and Control
Process FMEAs (PFMEAs) can be used to examine the ways the reliability and quality of a product or service can be jeopardized by the manufacturing and assembly processes. Control Plans can be used to describe the actions that are required at each phase of the process to assure that all process outputs will be in a state of control. Factory Audits are necessary to ensure that manufacturing activities (such as inspections, supplier control, routine tests, storing finished products, Measurement System Analysis and record keeping) are being implemented according to requirements.
The manufacturing process is also prone to deviations. The reliability engineer ought to communicate to the production engineer the specification limits on the KPIVs that would define a "reliability conforming" unit. The production engineer is then responsible for ensuring that the manufacturing process does not deviate from the specifications. Here, we start seeing more aspects of reliability engineering discipline merge with quality engineering. Statistical Process Control (SPC) methods can be useful in this regard.
Burn-in and Screening are DFR tools that can be useful in preventing infant mortality failures, which are typically caused by manufacturing-related problems, from happening in the field. Deciding on the appropriate burn-in time can be derived from QALT and/or LDA. Also, manufacturability challenges might force some design changes that would trigger many of the DFR activities already mentioned.
Does the DFR process end here, though? The answer is a definite No. Continuous monitoring and field data analysis are necessary in order to observe the behavior of the product in its actual use (and abuse) conditions, and use the gained knowledge for further improvements or in future projects. In other words, we need to close the loop, review the successful activities as well as the mistakes, and ensure that the lessons learned are not lost in the process. Tools such as Failure Reporting, Analysis and Corrective Action Systems (FRACAS) can assist in capturing the knowledge gained, as well as the necessary data, and can be deployed throughout the Product Development Cycle.
In this article, we attempted to give an overall picture as to what Design for Reliability is, and we proposed a process to follow for implementing DFR. The proposed process is general enough to be easily adopted by different kinds of industries and to fit into the overall Product Development Process. It is important to note that certain methods, tools and/or principles are called upon in multiple parts of this process. A stage might require different tools; also, a specific tool may be used in multiple stages.
In general, the DFR methodology can bring a reliable product to market using a process focused on designing out or mitigating potential failure modes prior to production release, based on an understanding of the physics of failure, testing to discover issues and statistical analysis methods for reliability prediction. DFR can open up many opportunities for companies who want to move beyond securing a basic offering to the marketplace to creating a true competitive advantage in which reliability plays a critical role for customer satisfaction.
NOTE: Many of the techniques described very briefly in this article are presented in detail in other ReliaSoft publications published on the Web via www.weibull.com.