Using Simulation to Determine the Optimal Interval for On-Condition Inspections
The results of a Reliability Centered Maintenance (RCM) analysis yield a set of maintenance tasks required for a given item. Optimizing the scheduling and packaging of the required tasks is a relatively new and challenging concept. This article will demonstrate an example of how flowchart simulation can be used to determine the optimal interval at which to perform an on-condition inspection for a single item. Additional examples expand the model to consider the optimal interval for multiple tasks that may be combined together in the same maintenance package.
Background: On-Condition Tasks
The belief that most items exhibit wearout with age is diminishing. This has led to the rise of on-condition inspections and the declining use of "Hard Time" (also referred to as "High Time") overhauls or replacements of expensive components. The result is that on-condition tasks are becoming increasingly important as a fundamental part of RCM analyses.
An on-condition task is a periodic or continuous inspection designed to detect a potential failure condition prior to functional failure. If the inspection reveals a potential failure condition, corrective action (or preventive maintenance) must be taken. If a potential failure condition is not detected, nothing is done and the item continues in service. The inspection avoids the considerable costs associated with replacing an item for which the useful life is not yet expended.
RCM theory states that on-condition inspections related to safety should be performed at an interval that will reduce the probability of experiencing a failure to an acceptable level. Several factors that determine the interval include: the acceptable probability of failure (dependent on the severity of the failure), the detection probability (dependent on the type of inspection) and the potential to functional failure (P-F) interval . However, the intervals for inspections that are not related to safety are cost-based decisions meant to maximize the useful life of the equipment. The costs of performing inspections (and any necessary corrective actions) are balanced out with the costs associated with allowing an item to fail. If the inspection interval is increased greatly, the inspection costs are lowered. However, more failures can be expected due to the smaller number of inspections. On the other hand, many inspections will most likely produce fewer failures, but the inspection costs are greatly increased. Therefore an optimal inspection interval that minimizes the total expected cost should be determined.
Using RENO to Find the Optimal Task Interval
ReliaSoft’s RENO software provides a flexible platform for visualizing and dynamically analyzing and/or simulating nearly any kind of probabilistic or deterministic scenario. Instead of writing customized computer code to analyze a particular problem, you can use RENO to build a flowchart model and then execute the model via simulation in order to obtain a wide variety of results and plots.
In order to use RENO to find the optimal on-condition interval for a required task, several variables and constants must be defined. For example, some constants include: fixed preventive maintenance costs associated with repairing potential failures, the cost of downtime per hour, fixed inspection costs and P-F intervals. Equation variables that must be defined include: corrective costs associated with repairing functional failures, the time when the last inspection takes place and the total inspection costs. Defined random variables include: the distribution for repair times of the corrective actions and failure distributions. Note that some of the equation variables and random variables could be defined as constants (or vice-versa), depending on the situation.
Figure 1 shows the flowchart model built to obtain the optimal inspection interval for a single item. To build this model in RENO, a standard block is first added to the flowchart to determine the start of the wearout region of the item (i.e., the start of the P-F interval). As shown in the following pictures, the block (A) automatically generates the failure time from the failure distribution (defined as a random variable - B). If the generated failure time is greater than the P-F interval (defined as a constant - C), then the block subtracts the P-F interval from the failure time in order to determine the start of the wearout region. If the generated failure time is smaller than the P-F interval, then the failure time is simply used.
As shown in the following pictures, a conditional block (D) is then added to the flowchart to determine whether the last inspection before failure caught the potential failure condition. The "Last_Inspection_A" equation variable (E) determines how many inspections take place before the failure, converts the value into an integer and multiplies by the inspection interval (defined as a constant - F) in order to determine when the last inspection before failure will take place. For example, if the failure would occur at 500 hours (from the "FailureA" random variable - B), and the inspection interval takes place every 200 hours (from the "InspectionInterval" constant - F), then the output of the "Last_Inspection_A" equation (E) would be 400 hours. (Note that the initial value entered for the "InspectionInterval" constant is irrelevant. It will be varied during the optimization process.)
Recall that the input for the conditional block was the time for the start of the wearout process. The conditional block determines whether the start of the wearout process occurs before or after the final inspection before failure. If the wearout occurs before the final inspection, it is assumed that a potential failure condition exists and that it would be detected during the inspection. In that case, preventive maintenance will be taken to prevent the functional failure of the item. Therefore, the cost of preventing a potential failure is incurred and the simulation follows the "true" path at the top of the flowchart.
The following pictures show the standard block (G) that calculates the preventive maintenance cost per unit time by taking the cost of each repair action (the "PMCost_A" constant - H) and dividing it by the cycle length (i.e., the time when the last inspection was done).
The next step is to add the cost of all inspections using the standard block, function and constant shown below. The "InspectionCostA_PUT" function (I) determines how many inspections take place before the failure. It then multiplies the number of inspections by the fixed cost of each inspection (a constant - J), getting the total inspection cost. Similar to the previous equation, it finds the cost per unit time by dividing by the cycle length (i.e. the time of the last inspection, which is passed to the function by the block - K). This equation is valid only if there was at least one inspection. Therefore, an IF statement is used to state that the cost is 0 when there are no inspections. The "Add Inspection Cost of A" block adds the total inspection costs to the PM costs calculated by the previous block, which provides the total costs for all simulations that follow the "true" path of the flowchart.
If the conditional block (D) determines that the wearout occurred after the final inspection, then the inspections did not catch any potential failure conditions. Consequently, the item will experience a functional failure, and the cost of repair must be captured. In this case, the simulation follows the "false" path at the bottom of the flowchart. The following pictures show the standard block, random variable and constant that are used to add the cost of repairing a functional failure. The equation defined in the "Failure Costs A" block (L) takes the cost of the repair, which is dependent on the time of the repair (M x N), and divides it by the cycle length (i.e., the failure time) in order to get cost per unit time.
The next step is to add the cost of all the inspections (because the inspections took place even though they did not detect the approaching failure). This is handled the same way as described above for the PM ("true") path of the flowchart, except that the cycle length for the CM ("false") path of the flowchart is the failure time. To complete the flowchart model, the two paths of the conditional block must be combined with a Summing Gate. The total costs are captured and displayed through the use of a Result Storage block. (Recall that all the costs have already been normalized to determine the cost per unit time.)
If the simulation is run only once, the initial value inputted for the "InspectionInterval" constant (F) will simply give the expected costs for that inspection interval. However, the objective is to vary the inspection interval in order to determine what interval would produce the lowest expected costs. Using the Sensitivity Analysis option in RENO’s Simulation Console, you can choose to vary the inspection interval constant by specifying the lower and upper limits as well as increments. The simulation settings for this example are shown next.
Once the simulations have been run, the resulting plot of Cost Per Unit Time vs. Inspection Interval (Figure 2) identifies the point where the optimal inspection interval minimizes the cost per unit time: i.e., every 60 hours, with an expected cost per unit time of .093.
The previous example demonstrated RENO's capabilities to solve complex maintenance decisions through the use of flowcharts. Although this was a simple example, more complex flowcharts and decision logic can be applied to the basic model. For instance in this example, it was assumed that a potential failure would always be detected, given that the potential failure condition existed. Alternatively, a specific probability of detection (such as 90%) could be assumed for a given task, which would make the flowchart more accurate for certain types of inspections. In that case, 90% of the potential failure conditions would be detected and corrective actions would be taken. The other 10% would allow the item to run to failure. The addition of one simple conditional block within the flowchart would make this analysis possible.
Other possible enhancements to the model would involve adding complexity to the cost calculations. For example, instead of having fixed costs (constants) for inspections and corrective actions, we could make them time-dependent, which would require the use of more complicated equation variables. Alternatively, distributions could be applied for the costs instead of equations, where applicable.
As mentioned previously, the result of an RCM analysis is a set of recommendations for a series of discrete maintenance tasks. Task packaging is the process of combining those discrete tasks into an efficient, effective and executable maintenance program. If used properly, task packaging can greatly reduce downtime and optimize the use of resources. The previous example demonstrated how RENO found the optimal inspection interval for a single task. However, if two or more maintenance tasks are to be packaged together, the basic flowchart could be replicated in order to find the optimal interval at which to perform the set of tasks.
Figure 3 presents a flowchart for two tasks. Obviously, each task in this flowchart would possess its own characteristics such as failure distribution, repair time, costs, P-F interval, etc. However, there would be only one inspection interval and it would apply to both of the tasks being packaged. Combining the flowchart for each task into a single summing gate would enable the user to minimize the expected cost per unit time for performing both tasks together.
A more efficient way of packaging multiple tasks (and making the application much more dynamic) is to make the model more generic and incorporate tables within the RENO project in order to capture the characteristics for each task. Figure 4 shows a flowchart that accomplishes this. The general model begins with a start flag, followed by a counter block (O) that records the task that is currently running through the simulation. Every time the simulation is run, the counter block is incremented by 1. The next block is a conditional block (P) that determines whether the simulation has evaluated all of the tasks in the model. It compares the value of the counter block to the total number of tasks in the model (which is a defined constant). The following pictures show the properties of these counter and conditional blocks.
If all of the tasks have been evaluated, the sum of all the costs is captured using a storage block. If all the tasks have not been evaluated, the simulation runs through a model that is very similar to the original model for optimizing a single task. A few extra blocks are added to store how many inspections are performed and the time for the last inspection. The end of the flowchart is a "Go to" flag that links back to the start flag.
In order to capture the characteristics of each task, an input table is created to store the data. The table saves the user from having to add random variables and constants every time a task is added to the model. That is because the functions and constants are already defined in the table. The following picture (Q) shows the table with inputs for two maintenance tasks that need to be packaged.
The use of the table requires function variables to be created for this model. As an example, the following pictures show the function variable for the failure distribution and the standard block from the flowchart that calls the function. The function (R) states that the items follow a Weibull distribution, and the parameters are taken from the "Input_Data" table (Q). In the flowchart, the "Get Failure" block (S) references the function, which in turn references the table in order to obtain the failure time. Similar logic is used in a function that obtains the repair time.
Most of the blocks in the flowchart reference the table in a similar manner. The result is a more complicated, but more efficient, model that can easily add tasks to the optimization equation. For example, if a third task were to be added, the flowchart would not need any additional blocks. Also, no new variables or constants would need to be added. The characteristics of the task would simply be added to the table. Although acquiring the skills to master such a complex flowchart would take some practice, the result is a greatly improved model that minimizes the overall time and effort required to solve a complex problem with many tasks.
This article demonstrated how the RENO software can be used to help optimize required maintenance tasks derived from RCM analyses. The dynamic nature of this model allows maintenance planners to adapt to changing situations and adjust schedules when needed. As efficiency and flexibility demands have increased, tools such as these are becoming increasingly important in many industries. These simple examples show how utilization of advanced software applications can save a great deal of money and time by solving complex scheduling issues. When applied to multiple tasks at once, hundreds of difficult calculations can be solved in a matter of minutes.
In this article, we also discussed how more complexity can be added to the basic model when appropriate. The RENO software allows the flexibility required for different organizations to customize the tool according to their needs.
Download the RENO Models
All three of the flowchart models described in this article are posted here. If you do not have RENO installed on your computer, you can download a free trial version from http://RENO.ReliaSoft.com.
 Moubray, John, Reliability-Centered Maintenance II, New York: Industrial Press, 1997.
 Naval Air Systems Command. Naval Air Systems Command. 2008. http://www.navair.navy.mil/logistics/rcm/courses.cfm
 SAE JA1011, Evaluation Criteria for Reliability-Centered Maintenance Processes, 1999.
 SAE JA1012, A Guide to the Reliability-Centered Maintenance Standard, 2002.
 WebRCM. WebRCM. 2008. http://www.webrcm.org
About the Author
David Sada is a Reliability Engineer and RCM analyst at Andromeda Systems Incorporated. He holds a B.S. and M.S. in Industrial and Systems Engineering from the University of Florida with a specialization in Operations Research. Mr. Sada has completed and implemented numerous RCM analyses on aircraft platforms, ground vehicles, data centers and industrial plant equipment. He has also developed optimization models that enabled the scheduling of maintenance tasks to be performed efficiently in a dynamic environment. Mr. Sada is experienced in conducting statistical studies to predict and analyze failure behaviors and developing cost-effective maintenance strategies for different programs. He is certified by the Naval Air Systems Command as a Level II RCM analyst and RCM Instructor, with experience in developing and instructing numerous courses. He has presented papers at various conferences and symposiums.
Mr. Sada can be reached via e-mail at firstname.lastname@example.org.