Reliability

You Cannot Call Out the AA at 40,000ft

by John Crocker

Combat aircraft are expensive and so are their crew so no operator wants to lose either. The current need to deploy large ground forces to maintain and support them is also expensive and, potentially hazardous. It is therefore not surprising that the operators are looking to the manufacturers to produce aircraft so reliable that they can go for weeks without any maintenance. The question is, however, can we achieve the necessary level of reliability, with sufficient confidence, at an affordable price, to meet this requirement?

We all recognise that the post-war scenario, in which hundreds of aircraft were deployed on both sides of the borders between the NATO and Eastern Bloc countries, has changed. It is far more likely that today’s combat aircraft will deployed in relatively small numbers, in what seems like an ever increasing number of trouble spots, to perform a ‘peace-keeping’ role possibly by bombing the hell out of anyone whom so much as threatens that peace.

This new type of scenario poses a number of problems: how many aircraft should be deployed, what equipment should accompany them, how many personnel will be needed and what spares should be taken. With the current aircraft, which it has to be admitted were not originally designed for this type of role, the answer to all of these questions is invariably, ‘too many’.

The ideal situation is to deploy only as many aircraft as needs be in the air at any one time. And, you make these aircraft (and their various systems) sufficiently reliable that they will never fail during their deployment and hence there will be no need for any spares. With no spares and no maintenance, other than replenishment (of fuel, ammunition etc), there is no need for skilled mechanics or for special equipment that would be needed to replace/repair any of the on-board systems. So instead of needing eight aircraft, 150 personnel and equipment that would fill 30 large cargo aircraft eg the C130 (Hercules), it would only be necessary to deploy two aircraft, ten personnel and one C130, say.

This would allow the other 6 aircraft to perform similar roles elsewhere or, ultimately to reduce the size of the airforce. It would put only a fraction of the lives at risk from enemy action and, often regarded as more important than any of the other considerations, it would significantly reduce the cost of the operation. Or would it?

There is no doubt that the marginal cost of this deployment would be significantly reduced but, nothing is for free. To make an aircraft that can be operated for 150 hours over a 30 day period, say, without the need for any maintenance during that period is currently beyond our capabilities. It is, in fact, an impossibility. The best we could hope for is that there will be a high, say 95%, probability of surviving such a period without the need for any (non-replenishment) maintenance.

Current situation

Based on RAF figures, the Tornado can expect to suffer 800 faults per 1000 aircraft flying hours. It is not known how many of these faults would stop the aircraft from flying or, indeed, how many would stop it from performing the required missions. It is normal practice, in peacetime, at least, to rectify any reported faults as soon as possible so, in essence, we can assume that they are all critical.

At 800 faults per 1000 hours, we could expect to fly an average of 1 hours between faults. If we now assume that the time between faults is exponentially distributed, we can determine the probability of surviving 150 hours without a fault as 7.8x10-53. Alternatively, we can calculate the length of time the aircraft would survive with a given probability, 95% say, as 3 min 51 sec.

With the new generation of aircraft, typified by the Typhoon, the target is to almost halve the number of faults per flying hour. This doubles the mean operational time between faults to 2.5 hours and has a corresponding increase on the probability of surviving 150 hours and the time it will survive with a given (95%) confidence to just over 7 min.

Requirements restated

By knowing the required probability of survival (95%) and the desired duration of the maintenance-free operating period (MFOP), we could calculate the minimum system mean (operating) time between failures (MTBF). Again, assuming the times between failures are exponentially distributed, the minimum MTBF for the system is 2924 hours (or approximately 0.34 failures per 1000 hour versus the 420 requirement on the Typhoon).

If we assume that a typical aircraft is made up of 5 systems, viz airframe, armament, avionics, propulsion and ‘general’ (see Figure 1) then we can calculate the survival probability for each of these in order to ensure the system achieves the required level. One way we could do this is to give equal weightings to each system. Another would be to apportion the requirement in line with other similar aircraft such that each system has to make a similar percentage improvement in terms of its mean time to failure.

Figure 1: Aircraft configuration

Taking the probability of survival, for a given maintenance-free operating period, to be 95% then each of the 5 systems will need to achieve very nearly 99% probability of surviving the period. If we use the method of allocation based on an existing aircraft, the probabilities for the airframe, armament, avionics, propulsion and general systems are 98.75%, 99.5%, 98.6%, 99.72% and 98.4% respectively.

Taking the propulsion system as an example, 99% equates to an MTBF of 14,925 hours (or 67 [failures] per million [flying] hours). With the alternative apportionment method, the probability is 99.72% which implies an MTBF of at least 53,500 hours or 19 per million hours. Now the propulsion system is made up of two engines and two sets of accessories (e.g. control units, oil pumps and, fuel pumps). If we arbitrarily assume that the accessories account for 90% of propulsion system failures we can determine that the required probability of survival for an engine is 99.986% giving an MTBF of 1,069,928 engine flying hours (EFH) or 0.93 per million EFH.

Putting this into perspective

To put these numbers into some kind of perspective, the in-flight shut down (IFSD) rate on the Boeing 767 after its first 10,000,000 hours in service was around 20 per million. These actually represent only a small fraction of the engine arisings which require unscheduled maintenance. The B‘767 does not normally perform 9G turns, fly at Mach 2 at an altitude of 50 feet, climb vertically using reheat or any of the other aerobatics that a combat aircraft is expected to perform. Most of its flying is done at 35,000 ft (or thereabouts) at its cruise speed (approximately Mach 0.8) for several hours at a time. The difference is similar to that between a long-distance coach and a Formula 1 racing car. The coach’s engine typically lasts over million miles, the racing car’s often less than 200 miles.

It has been suggested that punitive charges may be levied against the manufacturers if they fail to achieve the required levels of reliability. One possibility is that they may be expected to perform any unscheduled maintenance at their own cost. This could mean having to hire a Hercules (C130) to transport a crew of skilled mechanics along with any special equipment and a spare (engine) to the aircraft’s location, possibly at or near to the front line.

Reliability demonstration

Given these considerations, it is reasonable to suppose that the manufacturers will want to be reasonably confident that their systems will meet the reliability requirements. The usual way of doing this is to run pre-production engines on test beds to ensure there are no unexpected failures and to establish the MTTF. If we assume, again, that the times to failure are exponentially distributed then we can use Bayes Theorem to determine the length of testing time to achieve an estimate of the MTTF with a given level of confidence. Using the above numbers (MTBF = 1,069,928 EFH), the amount of testing to be 95% confident is just 2381 years. This would require over 75 million tonnes of aviation fuel and produce a similar amount of CO2.

The other major problem with a reliability demonstration is that test beds, by their very nature, tend to be static. It is not possible to run engines inverted, pulling +9G or –4G or stood on end with reheat. Similarly, test beds do not normally have birds flying through them, at least not Canada geese. They also do not tend to suffer from stones being thrown up off the runway. Engines are usually tested straight from the build shop so have had very little opportunity to corrode or get damaged. Testing therefore tends to be under almost ideal conditions and hence does not really measure in-service reliability but is, perhaps, more indicative of inherent reliability.

It is increasingly common practice to perform accelerated testing (ASMET) by condensing around 4 hours of in-service operation into 1 hour of testing. This is done by removing much of the "steady-state" flying (when there is little happening in terms of accelerations and decelerations). This practice is good for identifying faults and potential points of weakness in the design but, the correlation between in-service stress and that measured on the test beds is not always very high and is inconsistent across different components.

Redundancy and fault tolerance

We have seen from the above that there are likely to be a number of difficulties in meeting the MFOP requirement by simply increasing the MTBF’s. There are, however, a number of other opportunities and approaches that, at least, need to be considered. One of these is redundancy, in particular cold redundancy. This is when additional engines are carried but only started up if there is a failure in one of the operating engines.

If we assume that all failures are exponentially distributed and that if an engine fails it has no effect on any of the other redundant or non-redundant engines then the number required can be determined using the Poisson distribution. If we now assume that the aircraft requires two engines to be operational in order for it to fly then for an MFOP of 150 hours, we would need to achieve 300 engine flying hours. By knowing the failure rate (or its reciprocal, the MTBF) for an engine, we can calculate the expected number of failures in this 300 hour period. This then forms the mean of the Poisson distribution which can then be used to determine the number of failures for which the cumulative probability is greater than or equal to the desired value.

Figure 2 shows that for a range of MTBF’s from 100 to 10,000,000, the number of redundant engines required to achieve a 150 hour MFOP at 99.986% probability of survival ranges from 10 to 0. For military aircraft, the achievable range of MTBF’s implies at least 4 redundant engines will be needed. With each engine weighing over a tonne with its accompanying accessories and fittings and taking up probably in excess of 2 m3 and costing maybe £2 million this will add very considerably to the all-up weight, size and price of the aircraft. It is likely that the increase will be so great that higher thrust rated engines will be required which will still further add to the weight and size and possibly cost.

Physics of failure

If we cannot achieve the requirement by increasing the mean time between failures or by adding in redundancy then we must consider alternative approaches. In this section we will consider how with a better understanding of the causes of failure, the distributions of the times to failure and the use of preventative and opportunistic maintenance, it might be possible to meet the requirement.

So far we have assumed that the times between failures are exponentially distributed. For complex unreliable systems, this may not be an unreasonable assumption, at the system level but, such an assumption misses a major opportunity to improve the in-service reliability and possibly the inherent reliability. At the lower levels of indenture, components are much more likely to wear out. The probability of them failing in the first hour will generally be very much lower than in the thousandth hour, say. Knowing how these probabilities change with age means that we can, in many cases, prevent an engine failure by replacing a component before it has worn out just as many of us will do by having the cam belt on our cars replaced at the manufacturer’s recommended mileage.

Figure 2: Graph showing the number of redundant engines required against MTBF

To fit a time-to-failure distribution other than an exponential requires a knowledge of the ages of the individual components at their times of failure. This means keeping track of a very large number of components and being able to mark them in some way which allows them to be individually identified. This is known in the trade as parts life tracking.

Figure 3 shows that as the time-to-failure becomes increasing more age-related so we can achieve the required MFOP probability of survival with a lower mean time to failure (MTTF). For a shape of 1 (non-age-related or exponential) the MTTF is over 1 million hours but by the time the shape has been increased to 5, this has dropped to under 1000 hours. Many of the failure modes of components are certainly age-related although the current evidence suggests it would be difficult to achieve a shape greater than 3. But, even with a shape of 3 the MTTF reduces from over 1 million hours to 2580 hours.

Although this MTTF should be very much more achievable, there is a disadvantage. At the end of the MFOP (150 hours), any that survive will be some ten times more likely to fail in the next MFOP and hence be below the required probability of survival. This means that any component which has an MTTF that only just meets the requirement will have to be replaced every MRP (maintenance recovery period). This will massively increase the number of engine removals and, as like as not, similarly increase the number of maintenance induced failures therefore defeating the object.

Naturally, if parts are replaced before they have actually failed then there will be an inevitable increase in parts’ consumption. The average time a part will spend in service will be less than the mean time to failure for that part. To make this policy cost-effective, it will be necessary to demonstrate that the greater convenience of being able to schedule maintenance, the reduction in the amount of secondary damage and the increased probability of an engine surviving a given period will more than offset the increase in parts’ usage. (Secondary damage is when a part fails and causes damage to other parts).

Figure 3: Graph showing how the MTTF varies with the Weibull Shape

Engine health monitoring and automatic inspection

Most of the modern airliners now have engine health monitoring systems. These not only record the number of low-cycle fatigue stress cycles that each of the safety-critical parts have endured during each flight but can analyse this data and identify trends. This is a similar process to what many drivers do by keeping track, usually mentally, of the fuel consumption of their cars.

With airliners, which tend to fly in a very predictable way, this analysis can prove particularly successful. Unfortunately, military aircraft are seldom flown the same way from one sortie to next. It is proving very difficult to isolate the signal from the noise even using such sophisticated methods as neural networks and Kalman filtering.

Metal detectors can pickup particles in the oil and, when analysed, detect which component(s) are starting to wear excessively. Unfortunately, they cannot determine when a component has started to crack or corrode. It may, however, be possible to fit (electronic) equipment that could. A problem with this is that the inside of a gas turbine engine is far from ideal conditions for such equipment with the result that more engine rejections could result from the monitoring equipment than from the engine it is monitoring.

The Future

Engine designers can predict with a high level of accuracy, whilst the engine is still ‘on paper’, the thrust at a given specific fuel consumption. This is done by gas flow analysis using techniques such as finite element analysis. The state-of-the-art, when it comes to predicting engine/component reliability is still a very long way behind.

For military engines, the spares market is still around the same size as for new engines with profit margins typically higher. In the civil engine market, the volume of spares has already started to drop below that of new engine sales. From the manufacturer’s point of view, this trend does give cause for concern as the sale of new (civil) engines is extremely competitive. Winning a (new engine) contract is critical because there is very little after-market business for the loser(s).

Improving reliability is something of a two-edged sword. With all three major engine manufacturers having similar capabilities of producing engines with essentially the same thrust, power-to-weight ratio and specific fuel consumption, reliability will become increasingly more important. The manufacturer who can produce a demonstrably more reliable engine is likely to have a competitive edge. The problem is that, having won the contract, the after-market income may not be sufficient to sustain that company for the life of the engine or, more particularly until the next generation of aircraft is ordered.

We have already seen a trend towards competitive tendering for fixed price support contracts. With reducing margins on both new business and after-market, the possibility for punitive measures on failure to achieve unrealistic reliability levels and ever-increasing lives of aircraft one wonders how aerospace companies will survive.

For the interested reader

Kumar U D, Knezevic J and Crocker J (1999) Maintenance free operating period – an alternative measure to MTBF and failure rate for specifying reliability? Reliability Engineering & System Safety 64 127-131
Sabbagh K (1996) 21st Century Jet: The Making of the Boeing 777, Macmillan, London

JOHN CROCKER has been a member of the OR Society since the early 1970’s when he worked for the British Steel Corporation. For the past 24 years he has been involved in OR, in general and, logistics in particular for Rolls-Royce plc. Two years ago, he completed an MSc in Logistics Engineering under Dr Knezevic, Centre for MIRCE, University of Exeter and has now embarked on a PhD at the same Centre. He is also a recognised lecturer and involved in writing a number of monographs and books for use on the various courses being offered by the Centre. John is currently regional representative on the Central Council for the OR Society, Book Review Editor for JORS and an assistant editor for Communications in Dependability and Quality Management.