Volume 4, Issue 1

Reliability Edge Home

Reliability and Maintainability Analysis for a Remote Telecommunications System

This article presents a fictional example designed to demonstrate some useful techniques for system reliability, maintainability and availability analysis. The purpose is to investigate the reliability and maintainability of a telecommunications system that will be constructed in an uninhabited stretch of jungle. The BlockSim 6 software is used to model the system and perform the analysis.

Reliability-Wise System Configuration
The first step in the analysis is to model the reliability-wise configuration of the system, which consists of a transmitter and receiver with six relay stations to connect them. The relays are situated so that the signal originating from one station can be picked up by the next two stations down the line. For example, a signal from the transmitter can be received by relays 1 and 2; a signal from relay 1 can be received by relays 2 and 3; and so forth. Thus, this arrangement requires two consecutive relays to fail for the system to fail. (This is also known as a consecutive-k-out-of-n: F system.) 

Figure 1 displays the reliability block diagram (RBD) to describe the reliability-wise configuration of the system. In addition, the transmitter and receiver are made up of three subassemblies each, while the relay stations have two subassemblies each (all in series). Specifically: 

  • Subassembly SPS1 (solar power supply) is common to all. 

  • The transmitter has two additional subassemblies: TRC1 and TRC2. 

  • The receiver has two additional subassemblies: RCR1 and RCR2. 

  • Each relay station has one additional subassembly: RLYC1.

Figure 1: RBD for remote telecommunications system

Figure 1: RBD for remote telecommunications system

These subassemblies are defined in BlockSim as subdiagrams to the master diagram (i.e., separate diagrams linked to blocks in the main diagram). The subdiagrams are presented in Figure 2. In addition, Table 1 presents the failure distributions and parameters that have been estimated from data collected for each subassembly.

Figure 2: Subdiagrams for the three types of components

Figure 2: Subdiagrams for the three types of components

 

Table 1: Distribution and Parameters to Describe the Failure Properties of Each Subassembly (in hours)

Table 1: Distribution and Parameters to Describe the Failure Properties of Each Subassembly (in hours)

Basic Reliability Analysis
Once the analysts have modeled the reliability-wise configuration of the system and defined the reliability characteristics of the components, they can use BlockSim to calculate the reliability function for the system and answer questions of interest regarding the reliability of the system. For example, they can use the Analytical QCP to determine that the reliability of the system after 1000 hours of operation is 97.67%. 

In addition, the analysts can generate a Reliability Importance vs. Time plot and use it to determine whether different relays have different impacts on the reliability of the system when they fail, based on their position in the configuration. As shown in Figure 3, the position of the relays within the diagram does matter, even though they are reliability-wise identical. The failure of relay 1 or 5 has the greatest impact on the reliability of the system. Relays 3 and 4 have the second greatest impact and relays 1 and 6 have the smallest impact on system reliability. 

Figure 3: Reliability Importance vs. Time plot to compare the impact of the relays on the system reliability

Figure 3: Reliability Importance vs. Time plot to compare the impact of the relays on the system reliability

Basic Maintainability and Availability Analysis
By expanding the analysis to include information on the maintenance plan for the system, the analysts can make important estimates regarding the maintainability and availability of the system. In this example, all of the components described in Table 1 are line replaceable units that can be 100% restored by replacing the failed component with a new one. To simplify the example, we will assume that each repair begins immediately upon the failure of a unit, that there are unlimited maintenance crews and spare parts to perform the maintenance and that no logistical delays exist. In addition, the components do not continue to operate (i.e. accumulate age) when the system is down. Table 2 presents the repair distributions and parameters that have been estimated from data collected for each subassembly. Note that the maintenance plan described here consists of corrective maintenance (CM) only and does not include preventive maintenance (PM) or inspections.

Table 2: Distributions and Parameters for Corrective Maintenance Durations for Each Subassembly (in hours)

Table 2: Distributions and Parameters for Corrective Maintenance Durations for Each Subassembly (in hours)

When the maintenance characteristics for the system have been added to the model, the analysts can use BlockSim's simulation utility to obtain desired results regarding the maintainability and availability of the system. The results generated by completing 10,000 simulation runs for one year (8760 hours) of operation include: 

  • The point availability of the system after one year of operation, A(t = 8760), is 99.86%. This represents the probability that the system is operational at the given time. 

  • The average availability of the system after one year of operation is 99.93%. This is also called "operational availability" and it represents the total uptime divided by the total downtime. 

  • The mean time to first failure (MTTFF) of the system is 16,397 hours, or almost two years. 

  • The total system downtime is 6.16 hours per year. 

In addition, the analysts can rank the components according to their Failure Criticality Index (RS FCI), which represents the percentage of the system failures that were due to the failure of the given component. As shown in Figure 4, 99.9% of all system failures were due to the transmitter or the receiver. Among those failures, 54% were due to the transmitter and 20.5% of the transmitter failures were due to the solar power supply (SPS1) component. Therefore, improvement to the availability of the SPS1 component will have the greatest impact on the availability of the system.

Figure 4: Summary of selected RS FCI results

Figure 4: Summary of selected RS FCI results

Finally, the analysts can estimate the number of spare parts required to maintain the system by looking at the expected number of failures for component, presented in Figure 5. Because all maintenance in this example involves the replacement of a failed component, a spare part will be required for each failure. For SPS1, 0.9579 failures are expected per year. Another way to look at this is to say that there is a 96% chance that maintenance personnel will need a spare part for SPS1 during the year. Of course, the choice as to whether to keep spare parts in stock is based on additional economic and logistic information (e.g., How quickly can the part be obtained? How much does it cost to keep the part on-hand? etc.)

Figure 5: Expected failures (spare parts required)

Figure 5: Expected failures (spare parts required)

More Complex System Analysis
Although the analysis up to this point has been purposely simplified to consider only failure and repair distributions and system configuration, other factors may impact the reliability, maintainability and availability of a system in a real-world situation. This may include other maintenance approaches (PM and/or inspections), components that continue to operate when the system is down, dormant (hidden) failures, imperfect repairs (i.e., the component is less than 100% restored by the maintenance action), limitations on maintenance personnel and/or spare parts, etc. 

To demonstrate two such factors, we will modify this example to suppose that a subcontractor has been engaged to repair the system when needed and that it takes an average of 36 hours (following a normal distribution with a standard deviation of 6) for a technician to reach the site and begin the repair. Furthermore, we will assume that only two technicians are qualified to service the system and that the subcontractor keeps a single spare for SPC1 and RLYC1 on-hand. When one of these parts is used, another is ordered. On-hand spares are available immediately but other parts must be ordered and shipped when needed. The time of arrival for all parts that are ordered and shipped follows a normal distribution with a mean of 72 hours and a standard deviation of 12. 

Under this scenario, the analysts must expand the system model to include additional information on the resources that are required to perform repairs (i.e., maintenance personnel and spare parts). In BlockSim, this requires the assignment of a maintenance crew policy and spare parts policy to each component. The maintenance crew policy describes any limitations on the number of simultaneous repairs that can be performed on the system (two), any logistical delay time before the maintenance personnel can initiate the action (duration follows a normal distribution with Mean = 36 and Std = 6) and any costs associated with engaging the crew (none). 

The spare parts policy describes the number of parts in stock (1 each for SPC1 and RLYC1, 0 for the rest), any logistical delay time before an available part can be used for a maintenance action (none) and the conditions for ordering and shipping parts when needed (order 1 when the stock drops to 0, time for arrival follows normal distribution with Mean = 72 and Std = 12). 

When the simulation is repeated for one year of operation according to the modified maintenance plan, new maintainability and availability results are generated. For example, the average availability after one year of operation is 99.6% with the personnel and parts limitations established by the subcontractor. This is slightly less than the 99.93% estimated for unlimited spares and maintenance crews. The expected system downtime is 36.21 hours per year, which is greater than the downtime estimate that did not take maintenance resources into account.

Conclusion
As this article demonstrates, there are many factors that can affect the performance of a repairable system. Flexible reliability block diagram (RBD) techniques along with powerful analysis and simulation engines enable analysts to model systems as realistically as possible in order to obtain reliability, maintainability and availability estimates that can be used to improve performance, reduce cost and avoid risk. ReliaSoft's BlockSim 6 software supports these and other analysis techniques, plots and results. Other examples are available on the Web at http://www.ReliaSoft.com/blocksim.

End Article

 

ReliaSoft.com Footer

Copyright © 1992 - ReliaSoft Corporation. All Rights Reserved.
Privacy Statement | Terms of Use | Site Map | Contact | About Us

Like ReliaSoft on Facebook  Follow ReliaSoft on Twitter  Connect with ReliaSoft on LinkedIn  Follow ReliaSoft on Google+  Watch ReliaSoft videos on YouTube