You Cannot Call Out the AA at 40,000ft
by John Crocker
Combat aircraft are expensive and so are their crew so
no operator wants to lose either. The current need to deploy large ground
forces to maintain and support them is also expensive and, potentially hazardous.
It is therefore not surprising that the operators are looking to the manufacturers
to produce aircraft so reliable that they can go for weeks without any maintenance.
The question is, however, can we achieve the necessary level of reliability,
with sufficient confidence, at an affordable price, to meet this requirement?
-oo0oo-
We all recognise that the post-war scenario, in which hundreds of aircraft
were deployed on both sides of the borders between the NATO and Eastern Bloc
countries, has changed. It is far more likely that today’s combat aircraft
will deployed in relatively small numbers, in what seems like an ever increasing
number of trouble spots, to perform a ‘peace-keeping’ role possibly by bombing
the hell out of anyone whom so much as threatens that peace.
This new type of scenario poses a number of problems: how many aircraft should
be deployed, what equipment should accompany them, how many personnel will
be needed and what spares should be taken. With the current aircraft, which
it has to be admitted were not originally designed for this type of role, the
answer to all of these questions is invariably, ‘too many’.
The ideal situation is to deploy only as many aircraft as needs be in the
air at any one time. And, you make these aircraft (and their various systems)
sufficiently reliable that they will never fail during their deployment and
hence there will be no need for any spares. With no spares and no maintenance,
other than replenishment (of fuel, ammunition etc), there is no need for skilled
mechanics or for special equipment that would be needed to replace/repair any
of the on-board systems. So instead of needing eight aircraft, 150 personnel
and equipment that would fill 30 large cargo aircraft eg the C130 (Hercules),
it would only be necessary to deploy two aircraft, ten personnel and one C130,
say.
This would allow the other 6 aircraft to perform similar roles elsewhere or,
ultimately to reduce the size of the airforce. It would put only a fraction
of the lives at risk from enemy action and, often regarded as more important
than any of the other considerations, it would significantly reduce the cost
of the operation. Or would it?
There is no doubt that the marginal cost of this deployment would be significantly
reduced but, nothing is for free. To make an aircraft that can be operated
for 150 hours over a 30 day period, say, without the need for any maintenance
during that period is currently beyond our capabilities. It is, in fact, an
impossibility. The best we could hope for is that there will be a high, say
95%, probability of surviving such a period without the need for any (non-replenishment)
maintenance.
Current situation
Based on RAF figures, the Tornado can expect to suffer 800 faults
per 1000 aircraft flying hours. It is not known how many of these faults would
stop the aircraft from flying or, indeed, how many would stop it from performing
the required missions. It is normal practice, in peacetime, at least, to rectify
any reported faults as soon as possible so, in essence, we can assume that
they are all critical.
At 800 faults per 1000 hours, we could expect to fly an average of 1 hours
between faults. If we now assume that the time between faults is exponentially
distributed, we can determine the probability of surviving 150 hours without
a fault as 7.8x10-53. Alternatively, we can calculate the length of time the
aircraft would survive with a given probability, 95% say, as 3 min 51 sec.
With the new generation of aircraft, typified by the Typhoon, the
target is to almost halve the number of faults per flying hour. This doubles
the mean operational time between faults to 2.5 hours and has a corresponding
increase on the probability of surviving 150 hours and the time it will survive
with a given (95%) confidence to just over 7 min.
Requirements restated
By knowing the required probability of survival (95%) and the desired duration
of the maintenance-free operating period (MFOP), we could calculate the minimum
system mean (operating) time between failures (MTBF). Again, assuming the times
between failures are exponentially distributed, the minimum MTBF for the system
is 2924 hours (or approximately 0.34 failures per 1000 hour versus the 420
requirement on the Typhoon).
If we assume that a typical aircraft is made up of 5 systems, viz airframe,
armament, avionics, propulsion and ‘general’ (see Figure 1) then we can calculate
the survival probability for each of these in order to ensure the system achieves
the required level. One way we could do this is to give equal weightings to
each system. Another would be to apportion the requirement in line with other
similar aircraft such that each system has to make a similar percentage improvement
in terms of its mean time to failure.

Figure 1: Aircraft configuration
Taking the probability of survival, for a given maintenance-free operating
period, to be 95% then each of the 5 systems will need to achieve very nearly
99% probability of surviving the period. If we use the method of allocation
based on an existing aircraft, the probabilities for the airframe, armament,
avionics, propulsion and general systems are 98.75%, 99.5%, 98.6%, 99.72% and
98.4% respectively.
Taking the propulsion system as an example, 99% equates to an MTBF of 14,925
hours (or 67 [failures] per million [flying] hours). With the alternative apportionment
method, the probability is 99.72% which implies an MTBF of at least 53,500
hours or 19 per million hours. Now the propulsion system is made up of two
engines and two sets of accessories (e.g. control units, oil pumps and, fuel
pumps). If we arbitrarily assume that the accessories account for 90% of propulsion
system failures we can determine that the required probability of survival
for an engine is 99.986% giving an MTBF of 1,069,928 engine flying hours (EFH)
or 0.93 per million EFH.
Putting this into perspective
To put these numbers into some kind of perspective, the in-flight shut down
(IFSD) rate on the Boeing 767 after its first 10,000,000 hours in service was
around 20 per million. These actually represent only a small fraction of the
engine arisings which require unscheduled maintenance. The B‘767 does not normally
perform 9G turns, fly at Mach 2 at an altitude of 50 feet, climb vertically
using reheat or any of the other aerobatics that a combat aircraft is expected
to perform. Most of its flying is done at 35,000 ft (or thereabouts) at its
cruise speed (approximately Mach 0.8) for several hours at a time. The difference
is similar to that between a long-distance coach and a Formula 1 racing car.
The coach’s engine typically lasts over million miles, the racing car’s often
less than 200 miles.
It has been suggested that punitive charges may be levied against the manufacturers
if they fail to achieve the required levels of reliability. One possibility
is that they may be expected to perform any unscheduled maintenance at their
own cost. This could mean having to hire a Hercules (C130) to transport
a crew of skilled mechanics along with any special equipment and a spare (engine)
to the aircraft’s location, possibly at or near to the front line.
Reliability demonstration
Given these considerations, it is reasonable to suppose that the manufacturers
will want to be reasonably confident that their systems will meet the reliability
requirements. The usual way of doing this is to run pre-production engines
on test beds to ensure there are no unexpected failures and to establish the
MTTF. If we assume, again, that the times to failure are exponentially distributed
then we can use Bayes Theorem to determine the length of testing time to achieve
an estimate of the MTTF with a given level of confidence. Using the above numbers
(MTBF = 1,069,928 EFH), the amount of testing to be 95% confident is just 2381
years. This would require over 75 million tonnes of aviation fuel and produce
a similar amount of CO2.
The other major problem with a reliability demonstration is that test beds,
by their very nature, tend to be static. It is not possible to run engines
inverted, pulling +9G or –4G or stood on end with reheat. Similarly, test beds
do not normally have birds flying through them, at least not Canada geese.
They also do not tend to suffer from stones being thrown up off the runway.
Engines are usually tested straight from the build shop so have had very little
opportunity to corrode or get damaged. Testing therefore tends to be under
almost ideal conditions and hence does not really measure in-service reliability
but is, perhaps, more indicative of inherent reliability.
It is increasingly common practice to perform accelerated testing (ASMET)
by condensing around 4 hours of in-service operation into 1 hour of testing.
This is done by removing much of the "steady-state" flying (when there is little
happening in terms of accelerations and decelerations). This practice is good
for identifying faults and potential points of weakness in the design but,
the correlation between in-service stress and that measured on the test beds
is not always very high and is inconsistent across different components.
Redundancy and fault tolerance
We have seen from the above that there are likely to be a number of difficulties
in meeting the MFOP requirement by simply increasing the MTBF’s. There are,
however, a number of other opportunities and approaches that, at least, need
to be considered. One of these is redundancy, in particular cold redundancy.
This is when additional engines are carried but only started up if there is
a failure in one of the operating engines.
If we assume that all failures are exponentially distributed and that if an
engine fails it has no effect on any of the other redundant or non-redundant
engines then the number required can be determined using the Poisson distribution.
If we now assume that the aircraft requires two engines to be operational in
order for it to fly then for an MFOP of 150 hours, we would need to achieve
300 engine flying hours. By knowing the failure rate (or its reciprocal, the
MTBF) for an engine, we can calculate the expected number of failures in this
300 hour period. This then forms the mean of the Poisson distribution which
can then be used to determine the number of failures for which the cumulative
probability is greater than or equal to the desired value.
Figure 2 shows that for a range of MTBF’s from 100 to 10,000,000, the number
of redundant engines required to achieve a 150 hour MFOP at 99.986% probability
of survival ranges from 10 to 0. For military aircraft, the achievable range
of MTBF’s implies at least 4 redundant engines will be needed. With each engine
weighing over a tonne with its accompanying accessories and fittings and taking
up probably in excess of 2 m3 and costing maybe £2 million this will add very
considerably to the all-up weight, size and price of the aircraft. It is likely
that the increase will be so great that higher thrust rated engines will be
required which will still further add to the weight and size and possibly cost.

Figure 2: Graph showing the number of redundant engines required
against MTBF
Physics of failure
If we cannot achieve the requirement by increasing the mean time between failures
or by adding in redundancy then we must consider alternative approaches. In
this section we will consider how with a better understanding of the causes
of failure, the distributions of the times to failure and the use of preventative
and opportunistic maintenance, it might be possible to meet the requirement.
So far we have assumed that the times between failures are exponentially distributed.
For complex unreliable systems, this may not be an unreasonable assumption,
at the system level but, such an assumption misses a major opportunity to improve
the in-service reliability and possibly the inherent reliability. At the lower
levels of indenture, components are much more likely to wear out. The probability
of them failing in the first hour will generally be very much lower than in
the thousandth hour, say. Knowing how these probabilities change with age means
that we can, in many cases, prevent an engine failure by replacing a component
before it has worn out just as many of us will do by having the cam belt on
our cars replaced at the manufacturer’s recommended mileage.
To fit a time-to-failure distribution other than an exponential requires a
knowledge of the ages of the individual components at their times of failure.
This means keeping track of a very large number of components and being able
to mark them in some way which allows them to be individually identified. This
is known in the trade as parts life tracking.
Figure 3 shows that as the time-to-failure becomes increasing more age-related
so we can achieve the required MFOP probability of survival with a lower mean
time to failure (MTTF). For a shape of 1 (non-age-related or exponential) the
MTTF is over 1 million hours but by the time the shape has been increased to
5, this has dropped to under 1000 hours. Many of the failure modes of components
are certainly age-related although the current evidence suggests it would be
difficult to achieve a shape greater than 3. But, even with a shape of 3 the
MTTF reduces from over 1 million hours to 2580 hours.

Figure 3: Graph showing how the MTTF varies with the Weibull Shape
Although this MTTF should be very much more achievable, there is a disadvantage.
At the end of the MFOP (150 hours), any that survive will be some ten times
more likely to fail in the next MFOP and hence be below the required probability
of survival. This means that any component which has an MTTF that only just
meets the requirement will have to be replaced every MRP (maintenance recovery
period). This will massively increase the number of engine removals and, as
like as not, similarly increase the number of maintenance induced failures
therefore defeating the object.
Naturally, if parts are replaced before they have actually failed then there
will be an inevitable increase in parts’ consumption. The average time a part
will spend in service will be less than the mean time to failure for that part.
To make this policy cost-effective, it will be necessary to demonstrate that
the greater convenience of being able to schedule maintenance, the reduction
in the amount of secondary damage and the increased probability of an engine
surviving a given period will more than offset the increase in parts’ usage.
(Secondary damage is when a part fails and causes damage to other parts).
Engine health monitoring and automatic inspection
Most of the modern airliners now have engine health monitoring systems. These
not only record the number of low-cycle fatigue stress cycles that each of
the safety-critical parts have endured during each flight but can analyse this
data and identify trends. This is a similar process to what many drivers do
by keeping track, usually mentally, of the fuel consumption of their cars.
With airliners, which tend to fly in a very predictable way, this analysis
can prove particularly successful. Unfortunately, military aircraft are seldom
flown the same way from one sortie to next. It is proving very difficult to
isolate the signal from the noise even using such sophisticated methods as
neural networks and Kalman filtering.
Metal detectors can pickup particles in the oil and, when analysed, detect
which component(s) are starting to wear excessively. Unfortunately, they cannot
determine when a component has started to crack or corrode. It may, however,
be possible to fit (electronic) equipment that could. A problem with this is
that the inside of a gas turbine engine is far from ideal conditions for such
equipment with the result that more engine rejections could result from the
monitoring equipment than from the engine it is monitoring.
The Future
Engine designers can predict with a high level of accuracy, whilst the engine
is still ‘on paper’, the thrust at a given specific fuel consumption. This
is done by gas flow analysis using techniques such as finite element analysis.
The state-of-the-art, when it comes to predicting engine/component reliability
is still a very long way behind.
For military engines, the spares market is still around the same size as for
new engines with profit margins typically higher. In the civil engine market,
the volume of spares has already started to drop below that of new engine sales.
From the manufacturer’s point of view, this trend does give cause for concern
as the sale of new (civil) engines is extremely competitive. Winning a (new
engine) contract is critical because there is very little after-market business
for the loser(s).
Improving reliability is something of a two-edged sword. With all three major
engine manufacturers having similar capabilities of producing engines with
essentially the same thrust, power-to-weight ratio and specific fuel consumption,
reliability will become increasingly more important. The manufacturer who can
produce a demonstrably more reliable engine is likely to have a competitive
edge. The problem is that, having won the contract, the after-market income
may not be sufficient to sustain that company for the life of the engine or,
more particularly until the next generation of aircraft is ordered.
We have already seen a trend towards competitive tendering for fixed price
support contracts. With reducing margins on both new business and after-market,
the possibility for punitive measures on failure to achieve unrealistic reliability
levels and ever-increasing lives of aircraft one wonders how aerospace companies
will survive.
For the interested reader
- Kumar U D, Knezevic J and Crocker J (1999) Maintenance free operating
period – an alternative measure to MTBF and failure rate for specifying reliability? Reliability
Engineering & System Safety 64 127-131
- Sabbagh K (1996) 21st Century Jet: The Making of
the Boeing 777, Macmillan, London
JOHN CROCKER has been a member of the OR Society since the
early 1970’s when he worked for the British Steel Corporation. For the past
24 years he has been involved in OR, in general and, logistics in particular
for Rolls-Royce plc. Two years ago, he completed an MSc in Logistics Engineering
under Dr Knezevic, Centre for MIRCE, University of Exeter and has now embarked
on a PhD at the same Centre. He is also a recognised lecturer and involved
in writing a number of monographs and books for use on the various courses
being offered by the Centre. John is currently regional representative on the
Central Council for the OR Society, Book Review Editor for JORS and
an assistant editor for Communications in Dependability and Quality Management.
First published to members of the Operational Research Society in OR
Insight April - June 1999